Currently, I'm aggregating data from github since there's no way of getting those values live (to my knowledge). I'm staying with the limit of the api rule so I'm limited to how much data I can retrieve per week. Yes it would be awesome if I can do it for more cities.
This is great. Is it open source? I would love to know more about how it works. Where do you get the source data? What is required in order to add more cities?
They've said that they would rather do a few cities very well than do a bunch of cities poorly. Also, one of the biggest obstacle is finding, getting permission to use, cleaning up, and publishing different datasets. This is much more of a social problem than technological, although it might improve when they have good reference implementations in their initial cities. Right now they have different datasets for each city because they take what data they can get.
The good news is that the journalism grant that funded them stipulated that they have to release their code at the end of the project, so when that comes, each city can make their own Everyblock if they have the will.
Would love if this could be built into a website that could analyze the distortions on any city automatically. probably a lot of issues getting an normalizing the data though.
There is Open Street Map (or Wikipedia for that matter). A large enough army of volunteers could produce or tag a dataset that rivals Google's data, but it would be a lot of work.
This looks awesome. Really pretty, and definitely a service I would use if I lived in one of the available cities, especially if you can maintain the quality of the recommendations.
How did you get the seed data (i.e. the initial set of recommendations)?
I took the list of 4 thousand odd cities from the UN statistics report of cities over 100k and then ran it through the batch geocoding site: http://www.findlatitudeandlongitude.com/ ... apart from that just old fashioned excel to calculate
How I built this: I pulled the population data from the UN statistics report on cities over 100k and then batch geocoded all of the cities (there was also a fair bit of data cleansing required). I then used a macro in excel to calculate the distance between every city pair (using Vincenty's formula) and uploaded the results to a database. Happy to share more if anyone is curious.
On the front end, it's easy——Add a couple lines to a configuration file, make a couple images, and it's done.
The difficult part is that the search is not optimized to be RAM-efficient; each city takes between two and ten gigabytes of RAM. You also need to have several hundred thousand tile images, which are commercially available, but not free.
If you've got a hefty server and the images, then it takes about a computer-day to compute the features and create the search index.
I use data sets to seed with and then crowdsourced edits by site users to add to the data.
reply