Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Free database of geographic place names and geospatial data (github.com/delight-im)
108 points by marco1 on Sept 25, 2015 | hide | past | favorite | 52 comments



"RESOURCES.md" ( https://github.com/delight-im/FreeGeoDB/blob/master/RESOURCE... ) includes OpenStreetMap, which is licenced under the Open Database Licence (ODbL). You cannot relicence it under the Apache licence. I've raised an issue ( https://github.com/delight-im/FreeGeoDB/issues/1 )


That's just miscommunication caused by us, sorry! The "RESOURCES.md" does only contain helpful links and generic resources for working with map/geo data. We thought this could be helpful.

The actual source along with the complete build process is described in "SOURCE.md". Sorry for possibility of confusion!

Not sure if we should remove "RESOURCES.md" altogether or just rename it to make clear what these links actually are.


It seems like OpenStreetMap is not actually the source for the data - https://github.com/delight-im/FreeGeoDB/issues/1#issuecommen...


This is true! Thank you!


What if you make extracts, reprojections or create a new dataset from a ODbL-Licensed resources (or even more Licenses, depending on the number of datasets you use to create a new one), can you relicense it then?


This is known as a "Derivative Database" and so ODbL still applies. It's a classic share-alike/copyleft licence, with the wrinkle that it has to use copyright, database rights, and contract law because database protection varies across legislations. See http://opendatacommons.org/licenses/odbl/ .


> It's a classic share-alike/copyleft licence

A big difference from a CC-SA licence is that it is possible to make a produced work from OSM data and all you have to do is attribute OSM, there is no share-alike requirement. In this way, the OSM ODbL is less restrictive than a standard share-alike licence.

The main example of that is making a map image. You can make a map image from 100% OSM data, and that image doesn't have to be share-alike.

If you create a database as is the case of FreeGeoDB (and perhaps if you use OSM to geocode another database), then share-alike applies.


I don't understand. If I produce a map image with all the details I'm interested in, and publish it, then use opencv to extract the data from that image into a database, would I be free to license the resulting database as a I wish?


I don't know. Ask a lawyer or judge. You can read the licence http://opendatacommons.org/licenses/odbl/1.0/ and look at the definitions of "Produced work" (a map) or "Derivative Database".

In practice, no-one's really done that or likely to do it. Either you'd do something silly like make the map a SVG with all data encoded as textual attributes and your "Computer Vision Algorithm" is basically grep (in which case it would probably be seen as a Derivative Database), or do real CV on a real image, which is very hard to do and will result in bad results. It's sufficiently hard that no-one's worried about it.

If you really don't like the OSM licence, you are free to go to another map data provider, pay them what they charge and agree to whatever they want, and get something else. If you want OSM, agree to OSM's terms.


Interesting question! You've got the replies already. Just wanted to clarify that this issue does not affect FreeGeoDB.


No. That's a derived database, and would be under ODbL.


First three norwegian cities I looked up was given the wrong name. (Plesund (should be Ålesund), Bodi (Bodø) and Tdnsberg (Tønsberg)). So, I think it is safe to say that there are encoding issues in the data set.


There's definitely something wrong.

Looking at the SQL file, "Zürich" is listed as "Zdrich", Munich is only shown as "Munich" without the German name "München" anywhere, and Cologne is listed as "Cologne" with the wrong "Koln" as alternative name.

Doesn't look like a reliable data source.


French names have the same problem, with "La Réunion" being shown as "La Rcunion" and "Rhône-Alpes" as "RhAne-Alpes".

It's strange, I can't really understand how these accents can get mangled as these particular single letters.


Almost looks like OCR errors, especially Rhône -> RhAne.


I don't think that the reason are OCR errors. It's too consistent for that.

I think I also have seen such character set corruptions before, but I can't remember how exactly you can get these particular corruptions.


They're not OCR errors, unfortunately. Sorry for these problems! We have to investigate!


It's 2015 and we still have UTF-8 problems... sigh


Or this is just a sloppy translation/compilation from better sources.


Sorry for the inaccuracies!

We'd love to fix all those issues -- maybe with some help from the community :)

The goal is definitely to turn this into a more and more reliable data source every day.


This is definitely a major problem that has to be fixed! Thanks for pointing this out again.

The complete build process is described in "SOURCE.md". That's where we got these encoding issues from -- right from the source material. Maybe we didn't open or parse the source material correctly. But we were not able to get them without the encoding errors.

We'll love to fix all these issues :)


Instead of Markdown file, you should consider creating a set of scripts that will fetch, cache and process upstream data into the desired formats. As someone interested in using this data, I really need to be able to easily run the entire build process myself.


Of course, you're right, thanks! This has been noted down as one of the next steps that are required.


So this is bascially Natural Earth repacked as JSON, CSV and SQL?

It's a nice start. I like it. Would it be possible to add the programm /instructions you used for converting the data from their original shape?

I would love to see an additional SQL version for PostGIS and the likes so that I can use their a large number of spatial functions to work with this data.


It seems to be the Natural Earth Dataset -- I think the 1:50m-resolution, but with a slightly reduced set of properties (columns). I haven't tried it, but from the looks of it, the sql-files should work with PostGIS as they are.


For what I've seen the data type of the columns is always varchar and not geometry.

Not a big issues merely it's more convenient otherwise ;)


We should definitely fix this in the future! Thank you :)


You're right! Please see the "SOURCE.md" file for all necessary information. Sorry for not linking this file in a prominent place!


Actually, according to the resource-file is isn't Natural Earth after all. The column-names almost fit the NE-Datasaet though...


The SOURCE.md file only lists Natural Earth.


Looks like they include instructions on how to build in SOURCE.md (https://github.com/delight-im/FreeGeoDB/blob/master/SOURCE.m...) - would be great to have that linked in the README though.


Thanks! We'll have to add a link to the README, you're right!


Exactly! Thanks. Although to some it may be "just Natural Earth", we thought it could be useful. JSON, CSV and SQL should be more helpful than the original shape files when building (web and mobile) applications.

We'll love to extend the data to make it ready for PostGIS etc. Will be useful!


unfortunately, while it is in JSON, it is not in GEOJSON, nor anything that appears to be standard.

at least the points are in WKT, so should be able to be converted into something useful.


Thank you! We'll love to hear and discuss what's the best way to represent and store the data. We're open to changes between GeoJSON, WKT, etc.


Apache seems a strange license for data. Why not CC-0 like for instance wikidata?


Thank you!

We know that Apache, MIT, etc. are usually for code and the Creative Commons licenses are for content (e.g. writing, images).

We thought that the data sets (CSV, JSON, SQL) are rather in the intersection of code and content, so the Apache license would be okay.

Is there anything specifically wrong with the Apache license for this type of project? We couldn't find any tangible downsides but we'd love to hear about any pros and cons.


data sets are simply that - data - content

your code would be the scripts you write to process it, and your content would be your data in my opinion


congratz, nice job! really interesting and helpful. I'll try to use it. nowadays I am using sollo atlas (which is used by the http://www.findmyninja.io):

http://atlas.sollo.io/atlas/api

it's possible to easily navigate through resources (places data) and, a helpful feature is its capacity to have synonyms.


This would have been useful if the railroads and roads actually had names attached to them.


We didn't get those from our sources but we'll love to add them!


The data like these are available for free (and updated by community!) for years. See http://www.geonames.org/

Numerous website use the Geonames dataset foe there work. (I also partisipated in this madness: http://www.wemakemaps.com/)


I've not see http://www.wemakemaps.com/ before, looks like a cool site, but whats with consistently referring to OpenStreetMap as OpenMap? and how about adding proper attribution on the OSM maps?


There are free airport lists that are much more complete. Also, two columns (lon, lat) for points would be better, as it's much easier to transform "lon, lat" into "POINT(lon lat)" than otherwise.

And, yeah, although I don't really care for licenses usually, here it makes me feel uneasy.


Thank you!

What makes you feel uneasy about the license? We'll love to fix this!

Regarding the airport lists, we're definitely open to merging in more complete and accurate data.


Looks interesting, how would you compare it to GeoNames?


Admittedly, it's similar! But we wanted to include more data, other data, and just offer an alternative in general.

Three things we definitely wanted were (1) complete boundaries, (2) easiest programmatic access and (3) efficient collaboration.


Cool! Do you have any plans to track historical changes of geospacial areas? What reference frame are you using for labeling areas? (places America considers countries vs what China considers, for example)


Thanks! Tracking historical changes is a great idea, but this requires perfect data from day one, doesn't it? Otherwise, how will you differentiate between historic changes and mere factual/technical corrections? Thus, right now, there are no plans to track this. But we're open to ideas and contributions on how to do this.

Regarding the reference frame, that's definitely an issue. Right now, it follows the souce's guidelines (see "SOURCE.md"), which means "boundaries of sovereign states according to defacto status. We show who actually controls the situation on the ground. For instance, we show China and Taiwan as two separate states. But we show Palestine as part of Israel."


How will this be kept up-to-date?


Yeah, a quick perusal (VERY quick) revealed no provenance, maintenance, or currency information. There's a lot more to data than a snapshot of some tables.


Thanks for reminding us of the importance of efficient updates!

As outlined in "SOURCE.md", one part of the update process should be to continuously compare to the latest Natural Earth data.

Apart from that, we're totally open to contributions and ideas from the community!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: