The post is from March 18th. Did Chris end up hosting the data somewhere?
Edit: Talked to Chris on Facebook. He is in the process of uploading it to someone who has volunteered to host and I have offered to host on Summon.com
If you can host/mirror this data please reply here.
You’re not only required to provide a large enough hard drive, but it must be “brand new, still in the box and unopened”, presumably for security reasons. This requirement is a bit silly in my opinion, and probably prevents a lot of would-be FOILers from getting this data,
That's a completely sensible stipulation. It's imperfect like all security, but it's a legitimate reason rather than just a barrier.
If it's for security, it would be theater. How hard would it be for someone to open the package, put their spyware on it and repackage the drive? Even easier are OEM drives that come in unsealed boxes.
I'm guessing it's more to make sure the drive is in working order and clean of any data. This would limit liability for accidentally deleted data and broken drives. If the drive doesn't work, it's still under warranty. It also saves time on trying to make an old drive work.
It also stops Joe Shmoe from coming in with the same drive he's been using for a few years already but has no clue is filled with all sorts of malware and viruses.
Seems like they could close that gap easily enough by buying the drive themselves, and charging you the cost. Perhaps there are requirements of the law that prohibit this but allow restrictions on user-provided hardware.
A "new" drive in a box certainly wouldn't prevent your scenario from happening (somebody faking a new drive), but is that really the person's concern?
I'm thinking the biggest concern would be plugging in someone's drive who didn't know they had malware. In that case, the requirement works pretty well.
Wow, I'm impressed with the responsiveness. It could be better, but honestly, if the other municipalities in New York state were up to at least this standard, I'd be very happy. In my experience, however, FOIL requests are often delayed, "forgotten", or ridiculously stored (on reams of paper, in ancient data formats, etc).
Since I've lived in NY, I've seen plenty of cool visualizations and stories...about where pickups happen, time of day, volume, etc., and I've periodically asked around, where does the data come from? Obviously I didn't ask well enough...because if all it took was an old-fashioned public request (and a brand new hard drive)...wow.
The trip data is interesting enough...but the fare data is really mind blowing. Everytime I get out of a cab, I wonder, "should I have tipped that much?" The (crowd-based) answer is apparently not that hard to find...
1. People aren't tipping
2. Cab drivers aren't reporting their tips.
Being a native new yorker, legality aside, I'd bet most cash tips go unreported. Reporting a tip makes that tip taxable, so there is a very strong incentive to bury it if they think they can avoid trouble or suspicion.
Then you can filter the data to show only CC transactions. There might be some additional variance of CC vs cash tipping, but I think the overall trend will still be there in just the CC transactions.
Am I the only one that sees some significant privacy issues with exact pickup/drop off times and location being released? It seems like singling out a single passenger's data (e.g. to/from home) would not be that difficult.
Not many people take taxis from home to work every day, or even weekly. If you are the type, then likely, you are someone who lives in Manhattan (getting to work via cab in a borough is a tenuous situation) and in a dense enough area where you are one of dozens/hundreds of people who could conceivably be dropped off at your home spot (think of the density of high-rises).
The question is more inspired by someone I know who lives in Manhattan who has a psycho ex. This data would answer the question (if he were tech savvy enough to mine it) "Where does her new boyfriend live?" which is rather frightening IMO.
OK...but how would this psycho-ex track the new boyfriend down?
Presumably, the ex knows where the girlfriend lives...and I guess, he also knows what the new boyfriend looks like? So he watches the apartment until the BF leaves by taxi. The ex then notes the taxi's time of pickup. And then...
The ex waits a full month before calling up the TLC, buying a new hard drive, transferring a couple of GB, and then doing the data analysis to find that particular taxi that made a pickup within the vicinity of the girlfriend's apartment, and finding where that taxi made a dropoff?
And then the ex goes to those coordinates and...then what? Barges into one of high-rises and knock on every door until he finds the new boyfriend?
I think that if the psycho-ex were to act like a psycho, he probably will not do it through this kind of data analysis.
Very cool article! I'm torn on the "bring your own hard drive" issue. In one way it is very anachronistic given today's cloud technology but the flip side is that the OP was dealt zero procedural roadblocks along the way. Nobody at the city said "No." and they seemed helpful at every step. I'd tally that as a Win in today's bureaucratic and overly secretive world.
I find myself wanting to make a FOI request to my city. I have seen tricked out Parking Enforcement cars trolling the streets this year. They have license plate reading cameras mounted along the car's perimeter. I want to know if that information is stored, for how long, and who has access to it. Have any law enforcement agencies queried the database?
I would appreciate all the pointers I can get for proceeding with a FOI request. So far, I have been using MuckRock as my primary source of tutorial.
Is he allowed to post the data online? If so maybe we just need to collectively do the FOIL requests and upload the data to a community managed site where it can be made available to anyone.
S3 can serve its contents through bittorrent as well, so you only really need to serve up 1 copy of the file(realistically, it will end up being a bit more than that if we are dealing with good internet citizens)
Anybody got cool idea's how to visualize such a dataset? I have a similar set of data I collected, but haven't gotten beyond the "trips per day, length, etc" basics. I feel there is something beyond the most basic visualisation, on a more meta level, but am not sure what.
Just a thought. You could do some interesting reporting on this data. What about finding all trips to a particular address e.g. a politician's house ? Or finding all taxis exiting a known crime sense.
Abortion clinics, drug clinics, psychiatrists, cancer specialists...
Now Manhattan may be dense enough it might not leak too much personal information but the same granularity of location data in a suburban or rural area may be very intrusive.
Edit: Talked to Chris on Facebook. He is in the process of uploading it to someone who has volunteered to host and I have offered to host on Summon.com
If you can host/mirror this data please reply here.