Hacker Read

pieterhg · 2018-10-16 01:15:05

It’s incredibly challenging. Imagine trying to collect 250-500 data points for 1000-2000 cities. And doing it consistently.

I use data sets to seed with and then crowdsourced edits by site users to add to the data.

reply

supermdguy | karma 834 | avg karma 2.6 · | 2017-03-01 20:00:14

This is cool. Did you automatically extract the data, or did you have to search out a dataset for each different city?

shreyassaxena | karma 174 | avg karma 3.87 · | 2017-01-23 02:53:18

Hi, great work. Just curious, how did you collect the data? I tried searching with some cities in Europe, but it does not work.

aaronfeng | karma 108 | avg karma 3.6 · | 2011-03-10 15:50:12

Currently, I'm aggregating data from github since there's no way of getting those values live (to my knowledge). I'm staying with the limit of the api rule so I'm limited to how much data I can retrieve per week. Yes it would be awesome if I can do it for more cities.

abrichr | karma 1348 | avg karma 2.35 · | 2020-06-08 22:57:24+00:00

This is great. Is it open source? I would love to know more about how it works. Where do you get the source data? What is required in order to add more cities?

Thanks for sharing!

reply

aaronfeng | karma 108 | avg karma 3.6 · | 2011-04-23 12:36:56

Arbitrary + github cities with sufficient data. I have a lot more cities that I'm not showing because there isn't enough data.

pchristensen | karma 13106 | avg karma 3.91 · | 2008-06-12 17:21:24+00:00

They've said that they would rather do a few cities very well than do a bunch of cities poorly. Also, one of the biggest obstacle is finding, getting permission to use, cleaning up, and publishing different datasets. This is much more of a social problem than technological, although it might improve when they have good reference implementations in their initial cities. Right now they have different datasets for each city because they take what data they can get.

The good news is that the journalism grant that funded them stipulated that they have to release their code at the end of the project, so when that comes, each city can make their own Everyblock if they have the will.

reply

hackuser | karma 12788 | avg karma 2.89 · | 2015-03-20 21:21:05+00:00

How many cities are doing something like this, and have this much data available? Is it the norm?

Mikechaos | karma 4 | avg karma 0.5 · | 2015-04-29 14:22:55+00:00

How did you find all the open data and is there cities that you couldn't do because of that?

The rendering is really nice!

reply

droopyEyelids | karma 3907 | avg karma 2.25 · | 2024-05-16 17:55:26

Would love if this could be built into a website that could analyze the distortions on any city automatically. probably a lot of issues getting an normalizing the data though.

ig1 | karma 12019 | avg karma 3.39 · | 2012-01-28 18:45:25

No sorry, I've scraped Crunchbase and built my own analytics tool for the data for my own analyses hence me being able to pull the city counts.

rjh29 | karma 2702 | avg karma 2.54 · | 2023-01-12 08:36:01

There is Open Street Map (or Wikipedia for that matter). A large enough army of volunteers could produce or tag a dataset that rivals Google's data, but it would be a lot of work.

quiquebras | karma 8 | avg karma 1.33 · | 2013-02-03 15:05:56+00:00

I like it. Where are you getting the data from? Could it be expanded to include other cities outside the US?

lemming | karma 4817 | avg karma 4.52 · | 2011-07-01 14:18:46

This looks awesome. Really pretty, and definitely a service I would use if I lived in one of the available cities, especially if you can maintain the quality of the recommendations.

How did you get the seed data (i.e. the initial set of recommendations)?

reply

mulmen | karma 13520 | avg karma 2.38 · | 2022-01-29 19:15:18

Thanks for doing this.

How did you pick the cities? Is it based on need or data availability?

reply

googleme | karma 18 | avg karma 2.0 · | 2017-08-02 17:25:47+00:00

I took the list of 4 thousand odd cities from the UN statistics report of cities over 100k and then ran it through the batch geocoding site: http://www.findlatitudeandlongitude.com/ ... apart from that just old fashioned excel to calculate

justinchen | karma 871 | avg karma 3.38 · | 2010-06-04 03:54:24+00:00

By rolling it out to different segments of your data. Like a single city or category.

justsaysmthng | karma 1019 | avg karma 6.62 · | 2016-07-10 21:39:54

I've just spend 30 minutes examining the cities on your site.

Great work !

reply

googleme | karma 18 | avg karma 2.0 · | 2017-07-31 15:23:07+00:00

How I built this: I pulled the population data from the UN statistics report on cities over 100k and then batch geocoded all of the cities (there was also a fair bit of data cleansing required). I then used a macro in excel to calculate the distance between every city pair (using Vincenty's formula) and uploaded the results to a database. Happy to share more if anyone is curious.

workergnome | karma 164 | avg karma 7.13 · | 2016-05-25 20:57:38

On the front end, it's easy——Add a couple lines to a configuration file, make a couple images, and it's done.

The difficult part is that the search is not optimized to be RAM-efficient; each city takes between two and ten gigabytes of RAM. You also need to have several hundred thousand tile images, which are commercially available, but not free.

If you've got a hefty server and the images, then it takes about a computer-day to compute the features and create the search index.

reply