Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Gisgraphy – open-source geocoder based on open data (www.gisgraphy.com) similar stories update story
130.0 points by pabs3 | karma 43824 | avg karma 6.39 2021-04-26 03:23:29+00:00 | hide | past | favorite | 32 comments



view as:

This looks severely out of date, even with the best data sources this data gets stale very quickly unfortunately.

I recently moved to Belgium (Wallonie) coming from France, and it's a nightmare with regards to addresses… Even google has trouble finding my home address, or it's off by a few hundred meters.

I can't stress enough that it should be one of the job of the state to do this or make this happen (e.g. in France we have https://fr.wikipedia.org/wiki/Base_adresse_nationale which, restrospectively, is probably why I never had any problem with addresses in France).


France is just superior to every other country in regards to address data access, and anything geography really.

In the worst offenders, you have UK, where Royal Mail charges you for postcode data [0], or China where private mapping is simply illegal [1] and you have to go through some state-approved cash register (i.e. NavInfo).

[0] https://www.poweredbypaf.com/licence-our-products/licence-ag...

[1] https://en.wikipedia.org/wiki/Restrictions_on_geographic_dat...


> France is just superior to every other country in regards to address data access, and anything geography really.

That is a very bold statement? Do you have a source for that?


The simple fact that France provides lots of APIs and open data, including the API BAN (Base adresse nationale) one cited above is a first reference which gives confidence from the perspective of a developer.

https://geo.api.gouv.fr/adresse https://www.data.gouv.fr/fr

But you're right, other countries could do as well, in their language. It's hard to be aware of all the initiatives.


It's bold, and in many ways it's just my opinion, but having analyzed the ease data access of most developed nations, the French data sources are the only ones I felt like I wasn't missing some basic data in a significant manner. Cadastre is lacking postcode-wise, but it can be easily complemented with the data here [2].

Sibling parent already mentioned BAN, but you can also check cadastre [0] which is the main source of polygon data for France on OpenStreetMap. Under the general subdomain [1] you have access to all sorts of stuff, but geography wise, there are plenty of geotagged aerial photographs, both historical and current. You also have shapefile data for towns, regions, postal codes, you name it.

No shenanigans, all official, verified, and regularly updated.

[0] https://cadastre.data.gouv.fr/

[1] https://data.gouv.fr/

[2] https://datanova.legroupe.laposte.fr/explore/?sort=modified


You can also visualize the price of all the residential transactions in a given zone. This house was sold twice, in 2015, in 2018, at 350 000€, for 105m² of constructed area, and 150m² in total.

I guess it is groundbreaking for most countries.

https://app.dvf.etalab.gouv.fr


New Zealand has very high quality data. Every single address is in a open database that can be queried or downloaded. It also includes building outlines, roads, land parcel shapes, aerial photos, elevation data and more. OSM for NZ was originally copied from official data, and even before OSM the community used the open data to make high quality map files for GPS devices that were superior to the paid maps.

It would be nice for property prices to be included the central government data. It is free and publicity available but from each municipality or many websites that republish the data along with analytics.

https://data.linz.govt.nz/


Honestly I hadn't looked into NZ, I centered my attention mostly to NA, Europe and China. It looks pretty good, and in the same level of coverage as France. That's a wonderful surprise! I'll keep that in mind if I ever go back to this subject.

I don't know who's best in this area today, but anecdotally I would've looked to Denmark as the leader in this area in the past.

Yeah, agree. There's a government agency taking care of this, and all kind of GIS data is available to everybody: https://www.geodata-info.dk/srv/eng/catalog.search#/home

> In the worst offenders, you have UK, where Royal Mail charges you for postcode data

Actually, the UK is kind of a funny situation. A number of years ago the Ordnance Survey began publishing postcode data under a free (CC) licence. Royal Mail wasn't too happy, but as Royal Mail's APIs were a bit simpler and more direct (OS service was SPARQL-only) so I guess they didn't feel threatened enough to change their pricing. I worked for a company that had an account with Royal Mail at the time and I was tasked with parsing OS data.

That's many years ago and now that I revisit the OS offerings have changed. They appear to have relicensed, and split their offerings between a free db[0] without full addresses and a commercial db[1] with full addresses.

Either way, Royal Mail may have the established customer base but I wouldn't look to them exclusively to see who's "leading the way" in this space in the UK.

[0] https://ordnancesurvey.co.uk/business-government/products/op...

[1] https://ordnancesurvey.co.uk/business-government/products/ad...


In Flanders (for non-Belgians: the northern Dutch-speaking part of Belgium) there is CRAB ("Centraal Referentieadressenbestand" / "Central Reference Address File") which contains the lat./long. for every address in Flanders.

It can be accessed as a map layer at http://geopunt.be/ or as open data at https://overheid.vlaanderen.be/informatie-vlaanderen/product...


After moving to a newly-finished building, I realized most companies didn't have up-to-date databases and some forms prevented me from correcting the "mis-corrected" addresses. Missed a few deliveries during the first few months/years.

On the other hand, having lived in a country where such databases were notoriously unreliable, such automated processing didn't exist, so I didn't encounter that problem.

Sometimes, having something work "way too well" can lead people to believe it's perfect, and enforce it too strictly.


I'm surprised I hadn't seen this project before.

I had been dabbling with the idea of a geocoding service that is highly accurate based on open data, but gave up because data quality is all over the place even for countries with excellent open data access (France being the gold standard in my book). I found that to have accurate geocoding I had to have an ETL stack that pulled data from 2-5 sources per country, with country-specific code for basically every country, and that's just to store usable data. Even keeping it to the major economic powers felt too daunting, nobody wants a partial coverage geocoder and I don't want to be a one man data janitor gig, which is what the whole thing felt like. And still large swaths of countries are totally uncoded for all sorts of reasons.

A quick search shows some of the problems I found in my quest, and some new.

1. Wrong postal codes for certain addresses, e.g. Place de Gaulle, Antibes, shows 06160 but it's 06600. It's probably taken from cadastre or nearest OSM tags, but it's wrong, different parts of the town have either of the postal codes. For France, the postal code relations/polygons from OSM are accurate, just don't take postcode info from anywhere else, really, unless for some reason your point falls within France but somehow out of any postal code polygons.

2. Missing addresses. For France, I did some analysis on open data to find what street names were the most common, and found ~15k steets/squares/etc named "de l'Église", due to basically every town having at least a church and naming their street/square after them. Not sure if a problem with the data or with the parsing, but it only shows a few dozen.

3. While OpenAddresses is doing an amazing job, I only found the coordinate data to be accurate, and not always. Most of the fields in their own JSON are empty (postcode, district, region, and sometimes even city), but Gisgraphy takes them at face value, e.g. first result for ????? ????????? is just that, it only shows street name. Also the coordinates are broken: 0.140642,34.6784995 falls either in Kenya or Algeria (coordinate swap is also rampant in many sources, which is a another topic). That is the official data from Kharkiv, Ukraine, which probably geocodes the locations in a different system than WSG84 for, you know, reasons. OA tends to focus on easy to process data, and can't blame them, since the state of the whole geocoding world is an absolute clusterfuck.

4. Related to 3, OpenAddresses only has (had?) two towns from all of Ukraine (Kharkiv and Dnipro) with really terrible data. But OSM data is also fed to their model, so data overlaps, sometimes with different scripts, or languages. So the street above, "????? ?????????", which is in Russian btw, appears again as "?????’???? ??????" which is Ukrainian. The whole l10n/i18n part of geocoding is another tough problem to solve, because letters written to Derevyanka Vulitsa will arrive as well, transliteration data is spotty at best, but for many languages it can be transliterated through software, again, depending on many factors.

That said, I'd love to work on a project like this as part of a team.


The guys at OpenCage Geocoder[0] are doing a great job using only open data. But they are a team with over a decade of experience parsing an deduplicating address data from dozens of countries.

That said, their jobs page doesn't have much for now, but you may want to keep an eye on it.

Disclaimer: the founder of the company and I are acquaintances, but my assessment is only based on the quality of their service. I've been using it in production for reverse geocoding for a few years now.

[0] https://opencagedata.com/


Hi,

Ed from OpenCage here, thanks for the kinds words! It's true we don't have any open positions right now. But anyone who is into geo stuff in general and geocoding specifically can dive in to OpenStreetMap and the open source libraries we (and many others) rely on and contribute to. Most notably Nominatim https://nominatim.org

Here's a podcast interview I did last summer with Sarah Hoffmann, the lead maintainer. https://thegeomob.com/podcast/episode-35


If you're ever hiring folks to work on open source geo stuff, please post them on FOSSjobs and the other aggregators linked from the wiki:

https://www.fossjobs.net/ https://github.com/fossjobs/fossjobs/wiki/resources


Hi! Shame you're not hiring.

While your API service is stellar, if not the best with open data, unfortunately the data quality is always the limit and one can only extract so much from it. While I noticed a lot of sanitization when running some queries, it didn't take a long time to find hiccups, mainly because I know the types of warts open geo data has.

But from my quick tests, there are two issues.

1. Spain. Like, the whole of it. OSM Spain is lacking a lot of number information. Even Madrid (city) alone is missing a lot, and some reasonably large towns are basically unnumbered. E.g. 40.309452, -3.730451, the whole of Getafe (180k people) lacks numbering.

All that information is available in the catastro, but names are often shortened, missing prepositions, lacking accents ("Calle de la Pasión" becomes "CL PASION" in the catastro) and is a horrible mess overall with no 100% proof way to cross correlate data, but here I don't see any cross correlation happening at all.

2. Searching for "Place de Gaulle", because it's a solid no-strange-characters way to obtain an endless supply of points within France, shows a mysterious result at rank 10: 47.63341, -83.04979, in the middle of nowhere, ON, Canada. No info whatsoever. Why would that rank that high, vs thousands of French counterparts? It doesn't appear in Nominatim either, nor in any of the datasets I've worked with; not sure where that comes from. Now I am curious, what's that?


Hi,

thanks for the kind words.

You are right that a geocoder is only as good as the data available to it. Happily OSM is great for many use cases and getting better literally every day.

Whether it is good enough now for your use case will depend ... on your use case. Not everyone needs comprehensive house numbering in Getafe. Until the local OSM community decides to add those numbers we do the best we can for the use cases where open data is a viable option today. As an aside, I am not sure the catastro qualifies as "open" data (even if it may be public), and even if so, as you correctly note, someone with local familiarity for all the abbreviations and common usages will need to help with adding it. Local knowledge is key.

re: "Place de Gaulle", of the top of my head I couldn't say, I would have to a detailed look. It's complicated, which is what makes geo fun.


Catastro doesn't have a clear license, but the spirit is certainly open[0]:

"It's worth noting the mass download service of cadastral information, available since 2011, that makes it free for companies and individuals said information, including the possibility of it being reused."

Translation mine.

I'd love to hear about the origins of such mysterious Ontario spot!

[0] http://www.catastro.minhap.gob.es/esp/usos_utilidades.asp


Without having looked in detail I would guess this is a situation where no license just causes confusion. Now it's unclear what is allowed. Ideally they would be explicit about what is allowed. Anyway, if it is allowed, using that data is a decision for the local OSM community. If you live there or have a local connection, please get involved, or just with mapping generally. It's good fun. Here's a tutorial of how to add house numbers to OSM, really it is pretty simple:

https://opencagedata.com/tutorials/adding-an-address-to-open...

re: Ontario, I will eventually have a look, but the list of projects is long and priority goes to bugs reported by customers.


I've had a pet projects that involved a lot of geo-things. I've tried heaps of ways to do it but nothing comes close to Google unfortunately. Especially in non-USA addresses.

Are paid options any better? Like the one from Arcgis? Or is google still more accurate?


Depends on the scope of your project, but for address data the best I found is Loqate[0], which is IMO very expensive but also very good. I'd only target their API if I'm making/saving money in the process, which kinda limits its use for toy projects. Google's accuracy varies wildly, and in most regions I find OSM Nominatim better.

[0] https://www.loqate.com/


in most regions I find OSM Nominatim better.

What regions? I work in GIS and have tried Nominatim in several projects, and it always ends up being so error prone we have to abandon it.


I briefly checked Nominatim against Google head to head a few months ago, and except for messes like Spain, most cities of Western Europe and North America have similar accuracy, with no clear winner. Google results that are wrong are very wrong, whereas Nominatim has weirder results more often.

But that's for cities, in anything rural/mountain Nominatim takes the lead. Google seems not to care for addresses that aren't residential or a business. Arguably most geocoding uses are shipping things and business info, so most commercial efforts go that way.

Then some areas are just abandoned by Google. When I lived in Ukraine I tested some addresses and about 1/10 of the requests didn't work or were just wrong, and that's leaving aside the fact that it always showed 01001 as postal code on every reverse geocoding within Kiev city that I threw at it. The only information that was reliable was business data, and mostly because of businesses themselves updating it. The transliteration was software based and often wrong. Russia and Central Asia are also a big mess, though big cities tend to have OK data, but my feeling is that Google dropped the ball against Yandex and they just don't care anymore; not their turf.

It might have changed for Ukraine since, since there has been a push to drive away all Russian techs, maybe someone can shed some light on the subject.


I've found the here.com (https://developer.here.com/documentation/geocoder/dev_guide/...) geocoder to be close to the quality of Google's, with nicer pricing model - a good amount of free geocodes before you start paying.

If this works it will be really useful. You know how much Google charges for reverse geocoding these days!

I was literally one hour ago talking to my previous professor and supervisor regarding a reverse geocoding solution that targets local consumers!

There is also Pelias which is a very good open source geocoder: https://pelias.io

Legal | privacy