Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: BeaconDB – An Alternative to Mozilla Location Services (beacondb.net)
237 points by joelkoen 5 months ago | hide | past | favorite | 50 comments



> ethically sourced: opt-in only data collection

Good on them but how does this work? If my neighbour scans my WiFi network and uploads it to BeaconDB I didn’t exactly opt-in, did I? The privacy policy mentions you can add ‘_optout’ to the WiFi name, so it’s more opt-out instead of opt-in?


This line refers to opting in to using your device to collect this data. Apple and Google are taking advantage of their global user coverage by using their devices to collect this data without their consent.

Your WiFi network is broadcasting its presence 10 times a second in all directions. It is well known that you should not put sensitive information in your network SSID, for example, as anybody nearby can pick that up. Hence, you can opt out here instead.


While most users probably don't realize that they contribute to Wifi crowd sourcing, AFAIR using locations services is opt-in on iOS. So "without their consent" doesn't seem true. The info popup also explicitly mentions the WiFi location crowd sourcing.


Sure but any opt-in iOS user walking past other people's wifi is crowd sourcing those networks without the network operators consent.

Unless they only contribute networks that the device has authenticated with.


The person collecting the data opted in to doing it, heh. As far as the data collectors are concerned, your wifi is out in the public.


> If my neighbour scans my WiFi network and uploads it to BeaconDB I didn’t exactly opt-in, did I?

To clarify: all phones doing geolocation are already uploading your AP macaddr to remote location services, but BeaconDB will *not* publish this information in cleartext.

Any data dump will contain only non-reversible cryptographically hashed data or aggregated data.


A MAC address is only 48 bits and some of the bits are restricted. It is well within the range of brute force to reverse all of the hashes.


You can truncate the hash to cause collisions, meaning that one MAC address does not map to one location. This requires the client to be aware of multiple physically nearby MACs in order to get a location, as it then needs to estimate which "possible" locations are most likely.

This is a really interesting problem, and I've loved thinking about it recently. If you're keen on it too I'm happy to discuss further, feel free to reach out.


To put that into perspective, 48 bits is 256T, which is roughly the number of bits in a 32TB hard drive.


> and some of the bits are restricted


Absolutely right, great point. That's why I only use Windows addresses now. Can't break those with brute force!


You can opt to hide your SSID and use 5GHz WiFi which doesn't reach too far, gets attenuated through walls, so it's basically kind of useless as a geolocation beacon.


Last time I looked into something like this for GrapheneOS it wasn't possible to provide a custom location service.

It would be awesome to have this on GrapheneOS - so I'm very happy if someone knows a way to do this without using microG (I use the sandboxed GMS)


The author doesn't seem to have an open source mobile app or anything that would allow them to source the data from devices themselves. I'm curious where the data was collected from, esp. if it was opt-in (at the collecting device)


I haven't built any apps for contributing to beaconDB as of yet. The website links to NeoStumbler and TowerCollector, which are Android apps that can be used to collect this data.


> TowerCollector

The developer might be open to add other services since MLS is being retired: https://github.com/zamojski/TowerCollector/issues/223

Doesn't hurt to contact them/make suggestions on this issue.


Just commented on that issue, thanks!


Thanks, based on the copy I thought it was recently opened to contribution, and the original dataset had come from somewhere else.


I am curious what would cause such a distributed user base to contribute to this though?


Distributed referring to the community not yet recognising one specific software as "the go to"? Or distributed physically?


Physically! Like how so many users from all over the place decided to contribute to this


It is rather surprising how many people have started contributing already. I believe that people want to support alternatives to big tech so they aren't completely reliant on these providers, and beaconDB is currently the only database not owned by big tech. Not 100% sure that answers your question :)


Gotcha, I guess I was asking whether people specifically opted in to contributing to beaconDB, sounds like that's the case


Wasn't the main issue with MLS that they got patent trolled/sued by Skyhook? Anyone know the patents involved and how beacon DB is avoiding the issues?


Reading the MLS retirement issue[1] it seems that multiple established organizations (e foundation, Graphene) are also interested in providing an alternative service. Does this mean that we're now in a situation where multiple open source location service providers are competing, or is this the only publicly accessible project in this space for now?

This project is cool and all, but seems to just be a one person effort with not a lot of engagement on GitHub[2]. Are you in talks with other people with similar goals to expand and collaborate on the project? Having the backing of an existing developer community could really bring this to the next level.

1) https://github.com/mozilla/ichnaea/issues/2065

2) https://github.com/beacondb/beacondb

Edit: the actual project seems to be on Codeberg[3], where there is a bit more engagement from others than the primary dev.

3) https://codeberg.org/beacondb


beaconDB is currently the only publicly accessible project, but I am currently discussing working together with various other projects and organisations.

The project was originally on GitHub, but it has now moved to Codeberg.


How is this different from WiGLE?


WiGLE is very expensive to use.


For what it's worth, /e/ OS is now using its own location service, but I don't know what, if anything, restricts access to it.


Is there a reason the API doesn't return the locations of the access points so the clients can calculate their positions by themselves?


This is planned to help clients cache data locally, which would improve the privacy of the client and reduce server load. I would like to implement this over the next few days.

I have not yet found any clients that have implemented making use of such data, please let me know if you have found one or are developing one.


Ah, okay.

I was just thinking if there were any technical constraints preventing this.

Because you mention Ichnaea API compatibility, and I didn't know if that spec even allows that.


Hope GrapheneOS adds support for this soon, as currently their non-Google GPS Provider is basically hopeless unless you are outside.


This is such a cool project. Always glad to see problem solvers filling the void left by MLS. (Unrelated, but the design looks great!)


Thank you, this means a lot!


Curious if the last data dump from MLS can still be downloaded anywhere? I can't seem to find it online. I'm working on a project that locates the connected tower based on mcc, mnc, cid, etc. Currently only sourcing data from opencellid and combain, this would be a great addition!



Would be nice to see some cooperation with geoclue2, as they now disabled location guessing for wifi as MLS shuts downs.

https://gitlab.freedesktop.org/geoclue/geoclue


Really nice, hopefully more software switches to this, I'm 100% gonna contribute


Is this only offered as an API? E.g. you can't dump it and analyze locally?


> data dumps are currently not available as I'm still researching the measures I need to take to protect the privacy of both contributors and AP owners.

Ah


Yes, I really want to be able to release data dumps as this opens up a lot of great opportunities. I'm also worried that people may have lost trust in a service like MLS now that it has shutdown and abandoned all of the data contributors had collected.

At the moment, there simply isn't enough data to anonymise contributions.


Apple probably has one of the largest databases. Their API is unauthenticated and not rate limited either. Can be used for both APs and cell towers: https://github.com/acheong08/apple-corelocation-experiments


Soon: their API is authenticated and rate limited


That’ll break a lot of older devices. Unlikely


It's a Metter of time, they are waiting till these devices reach EOL.


As nobody has yet mentioned it, there is also WiGLE [1] which has tracked over a billion unique networks.

[1] https://wigle.net/


I was just going to ask, what ever happened to WiGLE and why build a clone of it rather than add to it?


WiGLE severely rate limits their APIs and don’t even allow normal people to pay for more access. They refuse to provide a data dump since they sell it for enterprise. No academic access either.

People literally spend their time mapping APs and they don’t even get anything in return


The couple of times I did a lookup it was woefully outdated as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: