Offtopic, but while I may have nerdy GISers gathering here I'll shoot my shot.
Someone knows what is a common/standard way of displaying vector information (points in this case) without having that information scrape-able? Is that even possible? I have a project where I want to sell some georeferenced data and be able to show said data (without all attributes, but showing all points) to potential customers in a map, but IIRC I've never failed to scrape vector web map data so I'm not sure it's even possible. I could imprint them in the tile rasters but then they wouldn't be interactable. Thanks
(if offtopic ain't permitted on HN just delete this, sorry)
Edit: So many insightful answers, thank you so much for the pointers HN, love y'all... and sorry OP for piggybacking.
Nope, not that I've seen -- if the browser gets the data, anyone can get the data. You can make it more difficult, and the defenses I've seen vary based on the type of data. Here are a few:
- Limit the geography of the sample
- Use raster tiles at far zooms and switch to vector at close zooms for interactivity. Combine this with a limit on the number of tiles an unauthenticated user can consume to make mass downloading more difficult.
- Have data that changes frequently enough that a one-time scrape decreases in value pretty quickly
- Only share the real value at scale with authenticated customers. The real value might be the geometries or the attributes or the combo pack.
- Trust that most serious customers will prefer to pay to work with you rather than abscond with the data. Those that are willing to put work into scraping it probably won't pay anyways
Really, the last point is the key. You want to have a data product where most consumers want to create a legitimate business relationship with you. My opinion is if your potential customers don't fit in this bucket, there likely are deeper concerns, and if most do, you'll be fine! Aka sweat getting the first customer rather than blocking the first scraper.
Thank you for the suggestions, especially that last one - I may be "paranoid" considering the target audience are professionals, I really should focus on the product rather than the fence (besides some basic defense so the points aren't in plain JSON).
If you really want to go hog wild, you could use a system where tokens with a short expiry are used to authenticate requests even when users aren't logged in. You'd combine that with rate limits + IP-level bans for when active or expired tokens are overused. I would say that's total overkill for 99% of services though.
You could bake the points into the tiles and then have a web service that answers requests on mouse clicks i.e. when a user clicks one of the points, you send the coordinates to the server and the server determines if there's a point (or multiple) nearby and sends back the related attributes.
You'd also add some rate limiting on the server-side so that someone couldn't easily request from your server all attributes for point 0,0 then 0,1 then 0,2 etc.
If someone was very determined to break this, they could make a screenshot and manually triangulate each point (depending how many there are -- if you're hiding 10 points, don't bother. If it's 1k or more, that'd be harder to do manually) or even use computer vision/pixel color thresholding to extract points (say, red pixels). Same thing for the attributes, they could always use different IP addresses to break any IP-based rate limit.
In response to that, you could force users to authentify (and use recaptcha during signup) to minimize the IP-rotation problem.
Love this idea, great way to have a raster be interactive without keeping point coordinates client-side. I do assume that someone very motivated and knowledgeable will always find a way to scrape it, I just have to make it hard enough that it's not worth the hassle over just buying the (reasonably priced) product. Thank you
Google Maps has the same issue. Their trick is to pass a ton of the data as protobuf, then decrypt it in WASM and load it into WebGL to render and interact with.
Whilst all a massive pain, you can still scrape it with raw obj dumps from the GPU. So it's always a & game.
Haha one of the vesseltracking sites used .swf for the longest time to be able to not allow you to grab the points corresponding to the ships. You could probably prerender images but that's a bad solution. Good question, I'm sure someone else will have a better idea of the latest state-of-the-art for this.
Thanks for your input! I'll be trying to implement hardly-scrapeable data as a learning opportunity (as I'm working towards coding + GIS career), but settling for a subset of data + demonstrative video would be a satisfactory compromise if the former turns out to be unachievable or overkill.
You could try to "encrypt" data and use Mapbox expressions/frontend transform to decrypt.
Point coordinates will be randomly shifted, you will send in a separate request as Wasm/J module to reposition features on the map. Wasm module could call to Mapbox expression to reposition points.
This would make it very hard to scrape. If someone scrape vector data without reverse engendering decryption module, then will get incorrect data. You just make sure that wasm module obfuscated.
Not sure if mapbox expression can perform change of coordinates. But there might be different ways to transform vector data on frontend.
Only risk that if someone reverse engineer yours wasm/JS module, they will still get the data from that point of time. The above approach is good if data changes over time. I think this should stop most people.
Realtime server side rendering that updates with something like jpeg based live streaming? For example vnc can be pretty snappy because it only sends what's changed, and doesn't use video codecs, mostly libturbojpeg. If the mapping is mostly plane colors then I'd imagine it would be responsive enough. Maybe could get even more creative if all the map colors fit into 8 bit png's, purely hypothetical.
Sounds a bit heavy without a tool designed for maps or rasters, but this is similar in nature to other tips I got here using GIS-tools (namely the GeoServer one). Thank you!
You could try OmniSci, it’s a database, rendering engine, and interactive analytics frontend (or any combination of the above) and can easily query and render millions to tens of billions of points interactively while allowing for things like tooltips on the data. See omnisci.com/demos for some live examples.
You could serve them in vector tiles, which are served protobuf-encoded. It's still fairly easily scrapeable (get the URL via the browser's Network tab, run through vt2geojson) but would probably deter the casual scraper.
Thanks I'll look into that. Deterring the casual scraper would be the goal basically, make them work enough that it's not worth the hassle in respect to the price for legitimate access as a motivated and technical person with a lot of time would always get to extract data which is shown client-side.
You could quantize the coordinates, making them still look great for viewing on a screen, but insufficiently accurate for whatever the scrapers want to use them for.
Interesting article. It's been a few years since I've had a play with this kind of stuff; what's the best option available for someone to host custom styled vector maps and display them in a performant way? preferably on an open source stack. I haven't used MapBox before as I was worried about costs and being locked in. I'm interested in experimenting with map interaction/UX.
deck.gl is another open source rendering option with a TileLayer, TerrainLayer, and MVTLayer.
Other libraries mentioned have better text label and styling support out of the box compared to deck, so typically people do interleaved WebGL rendering with deck.gl and other basemap libraries to get a beautiful base and a super performant deck overlay.
Tile hosting is still typically a paid service from someone, though COGS and S3 are a self-hosting option.
I primarily work on libraries adjacent to deck.gl, happy to answer questions.
Both ArcGIS Platform and Mapbox make this quite easy. Not fully open, but you get what you pay for. ArcGIS at least makes data import and export easy, and comes with a generous free tier
You're not wrong. But not all maps are tile based. A layer can come from a tile server, yes, or local cached data served with the web page, or geojson, or bitmaps, or algorithmic shapes, or geo databases, etc.
OL is much more than a rendering library, it understands different geospatial input formats (raster and vector, tiled or not) and dynamically reprojects them to the same map projection, and lets you overlay UI elements (points of interest, polygons, lines, animations, etc.) or blend layers together (backgrounds, base maps, hill shades, etc.)
That much is definitely true and it's indeed interesting to see the many approaches to maps!
Now, personally, I'd still go for using a tile server in most cases in apps where the map functionality isn't the main part of the app, because of how simple that approach is, as well as how friendly to battery life those maps can be.
Of course, if maps are front and center to your app, then things probably change somewhat!
Though i do believe that OP was also particularly asking about how to host their own solution, presumably on a VPS or another server somewhere. In that regard, it's interesting to see the options out there, in part because there didn't used to be that many - sure, PostGIS is perfect for processing and storing geospatial data, yet when it comes to the display of maps, that niche of the industry is a little bit more new!
Exactly. The best stack to use would ideally depend on your particular use case (and constraints, of course).
Tiles make sense if you have a large area of interest (a country, a continent, the world). But in many cases your users would be better served by one of the big providers instead (Google, Mapbox, Here.com, Bing, OpenStreetMap) because of their superior infrastructure. If I got a dime for every time some local government GIS site self-hosted on ESRI bloatware slowed to a crawl or just plain broke... (though, to be fair, the ESRI web viewers are getting rapidly better. there's just a lot of legacy crap from old versions still in the wild)
For smaller areas of interest (say, a city map, a campus map, an indoor building map) tiles are not only excessively complicated but also produce a worse UX because they don't allow for seamless zooming & panning. If your geometry can be easily described in a few hundred lines, the whole thing can just be served as a few kB geojson attached to the page and rendered dynamically by openlayers or leaflet. With a modern phone or computer, that rendering happens way quicker than it would take to load map tiles over the phone, and once it's all loaded the rest of the interactions are instant. You can even make it a progressive web app for offline use if you wanted to (but does anyone actually do that?)
Tiles were a 90s/early 2000s solution to the infrastructure at the time, where servers were much more powerful than clients and Javascript was still in its infancy. These days a lot can be done on the clientside and arguably should be, except in cases where the data to be transmitted would be huge due to complex geometries or intricate graphics (e.g. for clients without a gfx card, rendering 3D textured maps on the fly a la Google Earth may be difficult... in that case prerendering them into tiles on a server makes sense)
In any case, real-world maps typically combine approaches. They could use a tiled basemap as a background, overlay first-party bitmaps onto them (like a pretty map ripped from the print prochure), then add geojson POIs on top of them.
This is one we built on openlayers using hybrid data sources, including a tiled background from OSM: https://map.fieldmuseum.org/
> This is one we built on openlayers using hybrid data sources, including a tiled background from OSM: https://map.fieldmuseum.org/
That's a really cool map and makes me hopeful for there eventually being more indoors maps! Also, the design itself seems simple, approachable and the illustrations make it more humane!
That said, the performance leaves something to be desired:
- tested it on a desktop with Firefox (93) on a Ryzen 1200 and RX 570, got anywhere between 15-30 fps, usable but not smooth
- tested it on a mobile Android budget phone (Ulefone Armor X7 Pro) with a Mediatek chip, got somewhere between 5-10 fps
- seems like it's related to rendering the building, because panning away so it's offscreen improves the performance
- this appears to be confirmed by also zooming out so less details are shown, in which case the performance improves
That's not probably an issue that actually needs addressing (newer hardware would perform better, the one mentioned performs passably), however there's definitely something to be said about how things will probably get better in this regard in the coming years (Mapbox alone has made plentiful advances) and how mixing different data sources or technologies may incur certain performance penalties until then, which may or may not actually be a concern, depending on the target audience.
Now that WebGL2 is (almost) universal, any sense of which techniques make more sense now? Still no geometry shaders afaict, but maybe there is something else in there. We currently tessellate ahead of time - even streaming in from GPUs on our server - but that's not how we'd do it with say raw OpenGL. Maybe there's something else close enough now?
(If you're into that kind of thing, plz email build@graphistry :D )
Someone knows what is a common/standard way of displaying vector information (points in this case) without having that information scrape-able? Is that even possible? I have a project where I want to sell some georeferenced data and be able to show said data (without all attributes, but showing all points) to potential customers in a map, but IIRC I've never failed to scrape vector web map data so I'm not sure it's even possible. I could imprint them in the tile rasters but then they wouldn't be interactable. Thanks
(if offtopic ain't permitted on HN just delete this, sorry)
Edit: So many insightful answers, thank you so much for the pointers HN, love y'all... and sorry OP for piggybacking.