Hacker News new | past | comments | ask | show | jobs | submit login
Getting Started on Geospatial Analysis with Python, GeoJSON and GeoPandas (twilio.com)
248 points by shakes on Sept 18, 2017 | hide | past | favorite | 41 comments



Using geojson.io to visualise the GeoJSON seems like the wrong tool for the job; just use folium, it works perfectly in Jupyter Notebooks: https://github.com/python-visualization/folium


Thank you, just what I was looking for.


I would also be sure to be aware of GDAL and OGR. Both are quite useful when you're doing anything GIS/Remote Sensing.

http://www.gdal.org/


If you're working with python it's also worth looking at rasterio and fiona which provide some nice wrappers around a subset of gdal and ogr.


Shapely too!


Why isn't GRASS GIS more widely used in geospatial analysis (or it's just my impression that is not)? I tried several tools but usually GRASS proves to be the fastest and more flexible one.


Because GRASS doesn't play nice others. Calling GRASS from other apps isn't as easy as one might like and GRASS has a very GRASS-specific way it want to do things. Which is a shame because most of GRASS's algorithms are really best in class.


I agree with you. That being said, it's a go-to tool for me when I'm cleaning road networks and for the subsequent network analysis. And that's a hugely underwhelming use-case, I'm touching a fraction of what GRASS does.

Very helpful community as well.


If someone made a friendly python GRASS wrapper that would let you use GRASS function in a pythonic manner with numpy/rasterio/fiona/shapely data i think GRASS usage would increase massively.


Yes, this is it for me too. Whenever I have a huge vector dataset I need to clean up, I would use GRASS simply because it is topological, and can re-interpret, fix, manipulate vectors efficiently and valid, as a single layer entity. Even though PostGIS added topology, I have not been able to figure out how to do the same manipulations, or achieve the same speed.


A GUI from the 90s, documentation often indecipherable if you are not a subject expert, awkward UX.

I like using it via QGIS but that's as near I can make myself go near it. It saddens me because it seems very powerful and featureful.


The trend in GIS these days is away from writing code and towards configurable apps, only dropping into code where necessary. Why not just use QGIS instead? Far simpler.


Perhaps our experiences are different, I've noticed the opposite. There will always be a strong role for GIS applications, but I rarely see geospatial problem solving these days that doesn't require some amount of code.

I still do 50%+ of my work in QGIS and find the embedded python interpreter to be essential. There are very few projects where I don't open it up, or otherwise have organized/cleaned the data beforehand (often with python, I'm a one trick pony).

In 2017 so far only one project has not required some coding, and that was a print map for a small transit agency. All the data could be easily hand-digitized.


The issue is the same as with code everywhere: unless you start from a position of discipline you can easily end up in trouble - code that was intended to be throwaway ends up in production, code gets shared around and modified hence duplication and difficulty in maintenance.. I could go on. Suffice to say in my experience the majority of organisations that have a GIS requirement of any significance usually seek to control code tightly, and would rather spend money on a COTS solution that can be operated by a relatively cheap GIS analyst, rather than hiring in more expensive dev skills - the dev work can be performed by dedicated devs or contracted out as necessary.

Similarly the demand for apps with a geospatial element has exploded and the last thing a GIS manager wants to do is to have heaps of custom code written for an app that may only be used for a month, for example. It's in this situation that wizard driven app creation is valuable, with perhaps minimal code for special requirements.

I don't like it because you see deskilling of GIS analysts at one end, and deskilling of GIS devs at the other - but this seems to be an emerging trend. I'll certainly acknowledge though that the profusion of free tools and data are shaking things up, so I'd be happy to see it continue. ARC/INFO was command line, lest we forget.


By the way, I've found myself clicking too much on stuff with QGIS, and often having to repeat myself later if the input data changes.

Is there a way to save the clicks as macros, or perhaps at least get an idea of the underlying commands behind the clicks (load vector, update extents, change colors, intersect geometries, etc.)?

I know python so I would love to have a CLI to QGIS, but can't find anything on this.


Read up on the "graphical modeler" for QGIS and see if that's what you're looking for.


Absolutely. The longer I can avoid opening a GUI the better.


For the case presented using, QGIS would be a far more obvious solution. However another trend in GIS is much larger scale analysis. Someone will do some analysis i QGIS by hand for a small area, then someone will go "wait a minute. We have the necessary data for the whole city/county/country. Why not run the same analysis over all that data?" Then knowing how to do reproduce the whole analysis in code becomes vital.


Ish - in an enterprise of any reasonable size you may likely choose to divide your work by employee skills, hence perhaps most of your GIS analysts need not get their hands dirty with code (nor want to).


I've tried teaching QGIS. As a long-time developer/OSS user, I greatly appreciate it as a community-driven piece of software. But I also recognize that such a paradigm comes with tradeoffs, including interfaces that feel welded on ad-hoc. I can get through it because I have decent experience with GIS concepts (through coding), but I understand why others can greatly struggle. Even setting it up (on OSX) can be a non-novice challenge.

I've moved to teaching CartoDB, because it has many of the features and it is based off of PostgreSQL and PostGIS. I already teach SQL so PostgreSQL is straightforward and doing GIS with SQL is a natural evolution. Carto is commercial and has its own opaqueness, so I might go back to QGIS.

But for folks who already know Python, I think being able to do GIS with code is hugely advantageous, without being overly cumbersome. R with ggplot2, is actually quite easy and graceful.


I think the 'tradeoffs of the paradigm' are largely inherent in the concept of GIS software that's useful for a ton of different use cases. Digging into just symbology functions showcases this - it's a broad toolset that fits most needs, but I still run into micro-cases where I can't quite get something done (I still struggle with rendering polylines which have ends that touch other polylines where colors/width don't match - ie, an issue with rendering priority. The next comment here is probably going to be pointing me towards a perfect solution).

I assume the issues you find teaching QGIS are also found in teaching Arc, but my last experience with Arc was version 9.x so I'm a little out of the loop.


I think symbology is still not properly solved - even Esri's latest tools (which as far as I'm aware are generally considered to be the best available as regards symbology) are often used only to generate vectors to be imported into something like Adobe Illustrator.


What does esri do that modern qgis does not?


Basically you have increasingly sophisticated symbology and labelling options depending on your requirements. At the top end you have functionality that effectively allows you to recreate a national mapping agency within ArcGIS Desktop (see Production Mapping for more details). If you are familiar with Ordnance Survey mapping this is the kind of quality I am talking about. It's a serious engineering task to put together the toolset and workflows to get anywhere near this. Nothing open source comes close (awaiting OGC true believers..).

That is to say, it's still not perfect, but as far as I'm aware it's near the best COTS option available.


As others have said as well, I have seen the opposite of this. The real advancements have all been in coding up custom solutions to work with larger data in new ways. Things like ArcGIS / QGIS have just been used for visualization after the fact.


And yet Esri are still by far the largest GIS company in the world, so they must be doing something right.


Great tutorial, and thank goodness it's in 3.x. About a year ago it was difficult to find many 3.x examples of geospatial analysis. It felt like there was always some dependency that was not quite 3.x compatible, so it made sense that a lot of folks would just stick to 2.x just to get started.


You might also be interested in GeoNotebook, a Jupyter Notebook extension for geospatial analysis

https://github.com/OpenGeoscience/geonotebook


Another possibility for interactively manipulating maps is the ipyleaflet Jupyter widget library: https://github.com/ellisonbg/ipyleaflet. See examples of notebooks at https://github.com/ellisonbg/ipyleaflet/tree/master/examples

(disclosure: I have helped with this library)


If you are interested in GIS, make sure to check out rasterio and Fiona to go with Shapely. All great tools for GIS in python.


Rasterio ftw! Way easier to use than gdal's Python bindings.

Fun fact: Geopandas uses Fiona and shapely under the hood.


A friend just made a presentation on these exact tools at the Pycon JP 2017 a few days ago, I half expected to see Halfdan's face there: https://www.youtube.com/watch?v=Yd5oEIBFQ_E


Where is a good place to start to take a set of lat/long data (e.g., bunch of ride/walk traces) and plot a neat looking map? It seems to be a little harder than hello world, but not worth a full blown GIS stack.


<self promotion, but relevant> I just launched a product similar to this but a bit more powerful and aimed at a less technical audience (most of my customers are Excel users). There is a python client that does geocoding, census data lookup, and driving distance/time if you are looking for something more than the open source options provide. If you're interested, email me or check out https://cairngeographics.com/


For JavaScript folks, there's Turfjs from Mapbox.

http://turfjs.org/


Would it be possible with something like this to locate all the pools in a give area and find the street addresses?


Sure, but getting the necessary data would be far from trivial.

How would you locate pools? Using areal photographs is an option, but you'd either have to do it manually which would be very time consuming or using image recognition which would be very error prone. Depending on where you live people might have to apply for a building permit to put in a pool on their property in which case the local municipality should have a database over which houses have pools, but there is no guarantee that it is up do date and getting a hold of that database is far from easy.

Connecting points to street addresses is also a bit hit and miss. Things like Google's geocoding API works OK for buildings, but is tends to be quite hit and miss for points outside of buildings. Generally it will give you the address of the closest building rather than the address the plot of land actually belongs to. So if you want to be correct you have to get a map with actual property lines and who owns what property.

So the ease of doing something like that is entirely dependent on what data you have access and how accurate you have to be. Basically the hard part of any GIS project is always data gathering/cleaing/pre-processing and never the actual analysis.


Speaking of pools, Greek government in their efforts to collect more tax, used to search for pools using Google Earth

http://www.spiegel.de/international/europe/finding-swimming-...


That's a very broad statement and not one I can agree with - some of the processes I've encountered have been very complex - in fact in certain situations the dynamics of a problem can be so complex that they defy conventional analysis. The field of Spatial Decision Support Systems arose in order to address these type of poorly structured spatial questions. Plenty of literature out there, not so trendy these days though.


I'll admit I was being slightly factious and open data is making everything much easier these days. But geospatial analysis is mostly just math with a bit of programming while data collection often involves trying to deal with county and state level employees operating with a very high power Someone Else's Problem field and then trying to explain that scanned photocopies of old maps is not quite what I was expecting when they said that they had all their maps 'digitized' :)

But I'm the sort of person that much prefers dealing with math to dealing with people.


The linked post is just a very basic tutorial. Your idea is quite complicated geospatial analysis depending on the source data and desired correctness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: