Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It’s incredibly challenging. Imagine trying to collect 250-500 data points for 1000-2000 cities. And doing it consistently.

I use data sets to seed with and then crowdsourced edits by site users to add to the data.



sort by: page size:

This is cool. Did you automatically extract the data, or did you have to search out a dataset for each different city?

Hi, great work. Just curious, how did you collect the data? I tried searching with some cities in Europe, but it does not work.

Currently, I'm aggregating data from github since there's no way of getting those values live (to my knowledge). I'm staying with the limit of the api rule so I'm limited to how much data I can retrieve per week. Yes it would be awesome if I can do it for more cities.

This is great. Is it open source? I would love to know more about how it works. Where do you get the source data? What is required in order to add more cities?

Thanks for sharing!


Arbitrary + github cities with sufficient data. I have a lot more cities that I'm not showing because there isn't enough data.

They've said that they would rather do a few cities very well than do a bunch of cities poorly. Also, one of the biggest obstacle is finding, getting permission to use, cleaning up, and publishing different datasets. This is much more of a social problem than technological, although it might improve when they have good reference implementations in their initial cities. Right now they have different datasets for each city because they take what data they can get.

The good news is that the journalism grant that funded them stipulated that they have to release their code at the end of the project, so when that comes, each city can make their own Everyblock if they have the will.


How many cities are doing something like this, and have this much data available? Is it the norm?

How did you find all the open data and is there cities that you couldn't do because of that?

The rendering is really nice!


Would love if this could be built into a website that could analyze the distortions on any city automatically. probably a lot of issues getting an normalizing the data though.

No sorry, I've scraped Crunchbase and built my own analytics tool for the data for my own analyses hence me being able to pull the city counts.

There is Open Street Map (or Wikipedia for that matter). A large enough army of volunteers could produce or tag a dataset that rivals Google's data, but it would be a lot of work.

I like it. Where are you getting the data from? Could it be expanded to include other cities outside the US?

This looks awesome. Really pretty, and definitely a service I would use if I lived in one of the available cities, especially if you can maintain the quality of the recommendations.

How did you get the seed data (i.e. the initial set of recommendations)?


Thanks for doing this.

How did you pick the cities? Is it based on need or data availability?


I took the list of 4 thousand odd cities from the UN statistics report of cities over 100k and then ran it through the batch geocoding site: http://www.findlatitudeandlongitude.com/ ... apart from that just old fashioned excel to calculate

By rolling it out to different segments of your data. Like a single city or category.

I've just spend 30 minutes examining the cities on your site.

Great work !


How I built this: I pulled the population data from the UN statistics report on cities over 100k and then batch geocoded all of the cities (there was also a fair bit of data cleansing required). I then used a macro in excel to calculate the distance between every city pair (using Vincenty's formula) and uploaded the results to a database. Happy to share more if anyone is curious.

On the front end, it's easy——Add a couple lines to a configuration file, make a couple images, and it's done.

The difficult part is that the search is not optimized to be RAM-efficient; each city takes between two and ten gigabytes of RAM. You also need to have several hundred thousand tile images, which are commercially available, but not free.

If you've got a hefty server and the images, then it takes about a computer-day to compute the features and create the search index.

next

Legal | privacy