Hacker News new | past | comments | ask | show | jobs | submit login

> And how does this apply to logs? It's common for web servers by default to record urls visited and IP addresses. Would that be legal without explicit consent?

I'm not certain on the details, but in eg Norway that is already much in line with the GDPR ip addresses are considered personal data, and as such regulated. However, there are provisions for storing such information for auditing purposes (both security, technical and financial/invoicing) - but would likely require explicit consent. ("i accept"/"i will not use this service" - more of an informational confirmation - like signs stating a shop is under camera surveillance - and you could choose not to shop the (even if: good luck finding a shop w/o surveillance cameras)).

In general "personal data" is anything that can significantly help identify a single individual (time/place, ip, phone number/date, address, full name, email, etc).

Ed: regarding logging ips for auditing - there's usually a fixed time limit. It's longest afaik for financial purposes (and the data storage directive explicitly states/stated minimum duration - effectively being lobbied into a tool to go after amateur copyright infringement by forcing isps to keep records longer than before). But from a data protection viewpoint, you can't say/get consent for "we log for auditing" and then keep the data indefinitely . More like 6 or 12 months.




Can one collect data without "personal data" by anonymizing those specific attributes (ip, phone number)?

Also, say, I want to collect data about the performance of the web application. Can that be collected, or does it require explicit consent as well?


Yes. If the data cannot be tracked to one person, it is not regulated. So anonymous tracking cookies are still OK, but attaching it to an IP/name/phone number etc. is not.


I don’t believe that’s true. “Anonymous” tracking cookies are still unique to the individual and such can easily contribute to identifying them, and so are included as personal.


Browser signature + IP address is all you need to uniquely correlate a user from system A to system B in the vast majority of cases


Only if you can/do meaningfully anonymize the data. Eg, knowing the subnet of most/all Norwegian isps, it's trivial to recover ips that are simply hashed (probably even with salt), similarly Norwegian phone numbers are only eight digits, so any kind of deterministic mapping is likely to be too trivial to actually amount to anonymization.

Also rember that one of the goals is to avoid illicit linking - so being able to verify that ip n.n.n.n is the same as slow_hash(salt+other-ip) won't fly as "not storing".

In general, anonymizing data in sparse populations is tricky - where "small" can be quite large. Just imagine building a bitfield of variables like: sex,age +/-50;2 bits. Rough location (easily 6 bits), browser (2 bits), mobile? 1 bit - that's already 12 bits etc. See also NYC taxi dataset, eg (not the article I had in mind, but seems to cover similar points):

https://research.neustar.biz/2014/09/15/riding-with-the-star...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: