The data has been published by Divvy in 2014 as part of the their data Visualization Challenge

A community of users has then integrated that data with other openly available data and released it pubblicly

We have used a cleaned dataset published on Steven's Vance GitHub page, the weather data provided by Weather Underground and added sunrise and sunset times for each day.

Once we have gathered all the data that we needed we started to look into it and clean it up.

The bike trips data has first been putted in a MySQL database then checked.

Due to the amount of data (more than 750'000 trips) we have created smaller tables to speed up the database queries and the transfer time of the information over the internet. Those lightweight tables are our key to quasi-realtime visualization.

To speed up even further our database, we have setted up table views, multiple seconday indexes and removed the dirty data

Some attributes were redundant like age and year of birth. For each station there were two ids, for each trip there were two start times. We removed all the redundant attributes and gained in speed.

We have used PHP scripting language to query the database and generate JSON data to feed our Javascript application.