Show HN: Free database of geographic place names and geospatial data

rmc · on Sept 25, 2015

"RESOURCES.md" ( https://github.com/delight-im/FreeGeoDB/blob/master/RESOURCE... ) includes OpenStreetMap, which is licenced under the Open Database Licence (ODbL). You cannot relicence it under the Apache licence. I've raised an issue ( https://github.com/delight-im/FreeGeoDB/issues/1 )

marco1 · on Sept 25, 2015

That's just miscommunication caused by us, sorry! The "RESOURCES.md" does only contain helpful links and generic resources for working with map/geo data. We thought this could be helpful.

The actual source along with the complete build process is described in "SOURCE.md". Sorry for possibility of confusion!

Not sure if we should remove "RESOURCES.md" altogether or just rename it to make clear what these links actually are.

funkysquid · on Sept 25, 2015

It seems like OpenStreetMap is not actually the source for the data - https://github.com/delight-im/FreeGeoDB/issues/1#issuecommen...

marco1 · on Sept 25, 2015

This is true! Thank you!

maze-le · on Sept 25, 2015

What if you make extracts, reprojections or create a new dataset from a ODbL-Licensed resources (or even more Licenses, depending on the number of datasets you use to create a new one), can you relicense it then?

Doctor_Fegg · on Sept 25, 2015

This is known as a "Derivative Database" and so ODbL still applies. It's a classic share-alike/copyleft licence, with the wrinkle that it has to use copyright, database rights, and contract law because database protection varies across legislations. See http://opendatacommons.org/licenses/odbl/ .

rmc · on Sept 25, 2015

> It's a classic share-alike/copyleft licence

A big difference from a CC-SA licence is that it is possible to make a produced work from OSM data and all you have to do is attribute OSM, there is no share-alike requirement. In this way, the OSM ODbL is less restrictive than a standard share-alike licence.

The main example of that is making a map image. You can make a map image from 100% OSM data, and that image doesn't have to be share-alike.

If you create a database as is the case of FreeGeoDB (and perhaps if you use OSM to geocode another database), then share-alike applies.

jumperjake · on Sept 25, 2015

I don't understand. If I produce a map image with all the details I'm interested in, and publish it, then use opencv to extract the data from that image into a database, would I be free to license the resulting database as a I wish?

rmc · on Sept 25, 2015

I don't know. Ask a lawyer or judge. You can read the licence http://opendatacommons.org/licenses/odbl/1.0/ and look at the definitions of "Produced work" (a map) or "Derivative Database".

In practice, no-one's really done that or likely to do it. Either you'd do something silly like make the map a SVG with all data encoded as textual attributes and your "Computer Vision Algorithm" is basically grep (in which case it would probably be seen as a Derivative Database), or do real CV on a real image, which is very hard to do and will result in bad results. It's sufficiently hard that no-one's worried about it.

If you really don't like the OSM licence, you are free to go to another map data provider, pay them what they charge and agree to whatever they want, and get something else. If you want OSM, agree to OSM's terms.

marco1 · on Sept 25, 2015

Interesting question! You've got the replies already. Just wanted to clarify that this issue does not affect FreeGeoDB.

rmc · on Sept 25, 2015

No. That's a derived database, and would be under ODbL.

Quai · on Sept 25, 2015

First three norwegian cities I looked up was given the wrong name. (Plesund (should be Ålesund), Bodi (Bodø) and Tdnsberg (Tønsberg)). So, I think it is safe to say that there are encoding issues in the data set.

bhaak · on Sept 25, 2015

There's definitely something wrong.

Looking at the SQL file, "Zürich" is listed as "Zdrich", Munich is only shown as "Munich" without the German name "München" anywhere, and Cologne is listed as "Cologne" with the wrong "Koln" as alternative name.

Doesn't look like a reliable data source.

seszett · on Sept 25, 2015

French names have the same problem, with "La Réunion" being shown as "La Rcunion" and "Rhône-Alpes" as "RhAne-Alpes".

It's strange, I can't really understand how these accents can get mangled as these particular single letters.

vcarl · on Sept 25, 2015

Almost looks like OCR errors, especially Rhône -> RhAne.

bhaak · on Sept 25, 2015

I don't think that the reason are OCR errors. It's too consistent for that.

I think I also have seen such character set corruptions before, but I can't remember how exactly you can get these particular corruptions.

marco1 · on Sept 25, 2015

They're not OCR errors, unfortunately. Sorry for these problems! We have to investigate!

maze-le · on Sept 25, 2015

It's 2015 and we still have UTF-8 problems... sigh

spacemanmatt · on Sept 25, 2015

Or this is just a sloppy translation/compilation from better sources.

marco1 · on Sept 25, 2015

Sorry for the inaccuracies!

We'd love to fix all those issues -- maybe with some help from the community :)

The goal is definitely to turn this into a more and more reliable data source every day.

marco1 · on Sept 25, 2015

This is definitely a major problem that has to be fixed! Thanks for pointing this out again.

The complete build process is described in "SOURCE.md". That's where we got these encoding issues from -- right from the source material. Maybe we didn't open or parse the source material correctly. But we were not able to get them without the encoding errors.

We'll love to fix all these issues :)

atombender · on Sept 25, 2015

Instead of Markdown file, you should consider creating a set of scripts that will fetch, cache and process upstream data into the desired formats. As someone interested in using this data, I really need to be able to easily run the entire build process myself.

marco1 · on Sept 26, 2015

Of course, you're right, thanks! This has been noted down as one of the next steps that are required.

Vespasian · on Sept 25, 2015

So this is bascially Natural Earth repacked as JSON, CSV and SQL?

It's a nice start. I like it. Would it be possible to add the programm /instructions you used for converting the data from their original shape?

I would love to see an additional SQL version for PostGIS and the likes so that I can use their a large number of spatial functions to work with this data.

maze-le · on Sept 25, 2015

It seems to be the Natural Earth Dataset -- I think the 1:50m-resolution, but with a slightly reduced set of properties (columns). I haven't tried it, but from the looks of it, the sql-files should work with PostGIS as they are.

Vespasian · on Sept 25, 2015

For what I've seen the data type of the columns is always varchar and not geometry.

Not a big issues merely it's more convenient otherwise ;)

marco1 · on Sept 25, 2015

We should definitely fix this in the future! Thank you :)

marco1 · on Sept 25, 2015

You're right! Please see the "SOURCE.md" file for all necessary information. Sorry for not linking this file in a prominent place!

maze-le · on Sept 25, 2015

Actually, according to the resource-file is isn't Natural Earth after all. The column-names almost fit the NE-Datasaet though...

anc84 · on Sept 25, 2015

The SOURCE.md file only lists Natural Earth.

funkysquid · on Sept 25, 2015

Looks like they include instructions on how to build in SOURCE.md (https://github.com/delight-im/FreeGeoDB/blob/master/SOURCE.m...) - would be great to have that linked in the README though.

marco1 · on Sept 25, 2015

Thanks! We'll have to add a link to the README, you're right!

marco1 · on Sept 25, 2015

Exactly! Thanks. Although to some it may be "just Natural Earth", we thought it could be useful. JSON, CSV and SQL should be more helpful than the original shape files when building (web and mobile) applications.

We'll love to extend the data to make it ready for PostGIS etc. Will be useful!

jerrysievert · on Sept 25, 2015

unfortunately, while it is in JSON, it is not in GEOJSON, nor anything that appears to be standard.

at least the points are in WKT, so should be able to be converted into something useful.

marco1 · on Sept 25, 2015

Thank you! We'll love to hear and discuss what's the best way to represent and store the data. We're open to changes between GeoJSON, WKT, etc.

legulere · on Sept 25, 2015

Apache seems a strange license for data. Why not CC-0 like for instance wikidata?

marco1 · on Sept 25, 2015

Thank you!

We know that Apache, MIT, etc. are usually for code and the Creative Commons licenses are for content (e.g. writing, images).

We thought that the data sets (CSV, JSON, SQL) are rather in the intersection of code and content, so the Apache license would be okay.

Is there anything specifically wrong with the Apache license for this type of project? We couldn't find any tangible downsides but we'd love to hear about any pros and cons.

pki · on Sept 26, 2015

data sets are simply that - data - content

your code would be the scripts you write to process it, and your content would be your data in my opinion

vezzoni · on Sept 25, 2015

congratz, nice job! really interesting and helpful. I'll try to use it. nowadays I am using sollo atlas (which is used by the http://www.findmyninja.io):

http://atlas.sollo.io/atlas/api

it's possible to easily navigate through resources (places data) and, a helpful feature is its capacity to have synonyms.

aembleton · on Sept 25, 2015

This would have been useful if the railroads and roads actually had names attached to them.

marco1 · on Sept 25, 2015

We didn't get those from our sources but we'll love to add them!

Crocode · on Sept 25, 2015

The data like these are available for free (and updated by community!) for years. See http://www.geonames.org/

Numerous website use the Geonames dataset foe there work. (I also partisipated in this madness: http://www.wemakemaps.com/)

datamongers · on Sept 25, 2015

I've not see http://www.wemakemaps.com/ before, looks like a cool site, but whats with consistently referring to OpenStreetMap as OpenMap? and how about adding proper attribution on the OSM maps?

krick · on Sept 25, 2015

There are free airport lists that are much more complete. Also, two columns (lon, lat) for points would be better, as it's much easier to transform "lon, lat" into "POINT(lon lat)" than otherwise.

And, yeah, although I don't really care for licenses usually, here it makes me feel uneasy.

marco1 · on Sept 25, 2015

Thank you!

What makes you feel uneasy about the license? We'll love to fix this!

Regarding the airport lists, we're definitely open to merging in more complete and accurate data.

IanCal · on Sept 25, 2015

Looks interesting, how would you compare it to GeoNames?

marco1 · on Sept 25, 2015

Admittedly, it's similar! But we wanted to include more data, other data, and just offer an alternative in general.

Three things we definitely wanted were (1) complete boundaries, (2) easiest programmatic access and (3) efficient collaboration.

mileszim · on Sept 25, 2015

Cool! Do you have any plans to track historical changes of geospacial areas? What reference frame are you using for labeling areas? (places America considers countries vs what China considers, for example)

marco1 · on Sept 25, 2015

Thanks! Tracking historical changes is a great idea, but this requires perfect data from day one, doesn't it? Otherwise, how will you differentiate between historic changes and mere factual/technical corrections? Thus, right now, there are no plans to track this. But we're open to ideas and contributions on how to do this.

Regarding the reference frame, that's definitely an issue. Right now, it follows the souce's guidelines (see "SOURCE.md"), which means "boundaries of sovereign states according to defacto status. We show who actually controls the situation on the ground. For instance, we show China and Taiwan as two separate states. But we show Palestine as part of Israel."

electriclove · on Sept 25, 2015

How will this be kept up-to-date?

spacemanmatt · on Sept 25, 2015

Yeah, a quick perusal (VERY quick) revealed no provenance, maintenance, or currency information. There's a lot more to data than a snapshot of some tables.

marco1 · on Sept 25, 2015

Thanks for reminding us of the importance of efficient updates!

As outlined in "SOURCE.md", one part of the update process should be to continuously compare to the latest Natural Earth data.

Apart from that, we're totally open to contributions and ideas from the community!