Hacker News new | past | comments | ask | show | jobs | submit login
U.S. National Park Service API (nps.gov)
410 points by jeffennell 9 months ago | hide | past | favorite | 191 comments



It's a super cool API, but last I chatted with the team who maintains this (~2020?) it was already understaffed and they only had capacity for bug fixes. The roadmap hasn't been updated since 2017 so YMMV.


Disclaimer at bottom of page:

All information provided as part of the NPS API roadmap is provided for informational purposes only, is general in nature, and is not intended to and should not be relied upon or construed as a binding commitment. Please do not rely on this information in making decisions, as the development, release, and timing of any products, features, or functionality are subject to change.

Maintenance mode since 2017 is fine. I'm planning to write the interface with Jquery UI.


The alerts is nice, though most of the rest (visitor centers, lesson plans) is static data and doesn't need to be an API, it can just be a static json file.

Campgrounds is also only providing static data. An "API" should show campsite availability and provide endpoints to book campsites to be useful as an API.

I was hoping to have API access to live bear collar tracking or some such. I'd love to create a system that auto-alerts me if a bear's motion path is intersecting my hike.



Do you have contact info for that team? The NPS app is one of the best I've ever used and I'd love to know who built it.


Booze Allen


Ol’ Boozy at it again!


this isn't out of 18F -- the really cool tech team at the whitehouse(?)? what happened to all that digital govt funding?


Well, the president who launched all of that left office in 2017; and we had a new president with a bias against national parks… meaning the funding NPS did receive had to be spent on critical work in the parks themselves.


The President does not decide funding for the NPS, Congress does.


Seems like you're just nitpicking here. Congress holds the purse strings, but the President and Congress influence each other, and in the cast of the 2016 U.S. election, the president had a lot of influence on the Congress.


A more complete version of your rebuttal could have included the fact that the President typically submits his budget proposal for the year to Congress as an opening bid:

> The executive budget process consists of three main phases: development of the President’s budget proposal, submission and justification of the President’s budget proposal, and execution of enacted annual appropriations and other budgetary legislation.

https://crsreports.congress.gov/product/pdf/R/R47092

I just think it's important to remind people whenever possible that the United States President isn't a king, isn't all-powerful, and isn't even a Prime Minister. The role is deliberately separated from the legislative branch, and as such cannot introduce legislation of any form.


That report leaves out the unofficial step 1.5: "Pronouncement by the opposition in Congress that the President's budget is Dead On Arrival".

It's not quite true. They don't want to do all of that work, so they do generally follow the executive branch's requests. But Congress has the ability to cut anything they want -- and to add in items that the department doesn't want. (e.g. https://publicintegrity.org/national-security/congress-funds...)

Thus, the annual ritual of the dead-on-arrival pronouncement. It's especially mandatory when the President and Speaker are of opposing parties.


However to state that about a President diminishes the reality of what happens on the ground, from party allegiance politik to complete back-room deals. It's about as likely that a President won't sign off on something because it was introduced by the opposing party, than any real conflict with 'values' etc.


Pretending there's no cross-talk or influence across those boundaries is intellectually dishonest.


have you met congress? they can barely agree on anything so getting more funding for National Parks is an uphill battle.


It’s not like Congress needs to separately pass funding for the NPS. In 2023, the NPS was given a 6% budget increase, even!


Let's say the NSF wanted to give everyone in NSF raises in 2023 to offset inflation from 2022. I believe that would require a 6.5% increase right? I personally believe that the people there deserve more salary than just meeting inflation, though imo.


Whoa no, 100% of their funds don't go to salaries. I'd guess it would be substantially lower than a 6.5% budget increase, even.


There are two. THe USDS which works under the office of the President and 18f. Both are subject to the whims of the current president and congress. They can easily be seen as a cost center and while doing great work, are still unable to cover the entire spectrum of federal agency requirements. I am not as familiar with 18F but if the workflow is similar to the USDS, they more like sophisticated consultants to help get government work done.


I’m curious how many HNers would like to add their skills and join an USDS or 18f considering you can make hundreds of thousands more in private practice


There are a decent number of people here in academia, which pays the same/worse.

Money is important, obviously, but it's not always the only thing that motivates people.


"at a senior level at FAANG-level companies" is the last bit of that sentence you left off.

The median software engineer salary in the US is something on the order of $120k/yr. The GS pay scale tops out at just under $160k before locality is taken into account and that's only a small single-digit percentage increase at best, but nobody is getting hired as a GS15 to do development work.

You will make more in the private sector for sure but "hundreds of thousands" is going to be the minority by far.


Agreed - I'm a state-level government employee with 2 YOE and I'm making $100k, projected to make $120k over the next three years and project managers top out at ~$240k. While that's nowhere near a senior-level FAANG engineer, the job security is nice and there's a guaranteed pension if I'm patient enough.


One thing I don't understand is why at the federal there are plenty of employees in the $300k salary range, and several employees (single or low double digits in absolute numbers) make more than POTUS but for some reason 18F pops everyone into the GS pay schedule and it seems puts them at level 12 or 13. So clearly there's no law requiring employees not made something roughly comparable to what they could get in the private sector so it's hard to understand why the 18F folks aren't getting $150-200k/yr.

Edit: After a bit more research there's a separate executive pay schedule (not that different than the private sector after all) but it's not super clear to this outsider what determines which schedule you get on other than probably just defaulting to GS.


I know several people who are GS 15's and in IT - it's not uncommon at all.

Also 18F has special hiring authorities with 2 year appointments that are not restricted to the GS pay scales for the very issues you bring up.


> The 2024 salary cap for all GS employees is $191,900 per year. You cannot be offered more than this under any circumstance.

> https://join.tts.gsa.gov/compensation-and-benefits/

That is the page where you find jobs for 18F, nothing there indicates anyone at 18F is not on the GS scale.

It's important to note that that is GS15, step 10, with the maximum locality adjustment (which I think is just Alaska but California is not far behind). GS15 step 1 is going to be $123k before any locality adjustment so still very likely under $150k/yr unless you're in one of the highest cost areas. Alaska is 32%; most midwest states are 17%.


That sounds about right tbh... funding is not great for these kinds of things :(


do you still have the contact? I have a proof of concept i'd like to share with them.


Good to know! Thanks for sharing.


A healthy diet of steady underfunding leads to humble, stable systems. We should start worrying when a team is temporarily overfunded, building out complexity in excess of their long term maintenance capability.


Is it healthy? At one point the team was funded enough to create the api, and now it can only maintain it. What if it was never funded enough to create the api in the first place? I realize there's a risk of building complexity, but don't mistake "healthiness" for "inability".


This is a really interesting point and I see a lot of truth in it. But the language is imprecise- underfunding would lead to a decaying system.


Emphasis on healthy.


I copied this data into Bigquery -- it's much more accessible this way for Jupiter notebooks and doing SQL queries

nps-public-data.nps_public_data



I wrote up some docs please add your positive & negative feedback by filing github issues here https://github.com/tonymet/nps-public-data/blob/main/README....


Thank you for creating this. How do you access/find it from within BigQuery Studio?



Couldn't get it to show up in search but the link works, thanks again.


Is it not showing up in the search bar ? You can search by name . Let me see how to generate a link


Love that this exists. I found that they also have a Github(1) with a public data archive. Any existing projects using this API in an interesting way?

[1] https://github.com/nationalparkservice


The GitHub is mostly outdated repos, although Symbol Library is still active. The developer API is great though.


I migrated this data to BigQuery. Test it out and if it works well

https://console.cloud.google.com/bigquery?hl=en&project=nps-...

If you like it I’ll update the batch job to keep it fresh


The list of national parks makes it seem like there are only 61 when there are actually 63. After comparing the list to Wikipedia, the reasons for this discrepancy are:

- National Park of American Samoa (est. 1988) is missing any designation

- Sequoia and Kings Canyon are listed as the same park (parkCode: seki)


fyi, there are some undocumented but public apis if anybody needs more data. you can see them if you visit certain nps websites.


Might be a good use-case to use something like this extension to auto-build an API spec whilst browsing a website:

https://github.com/AndrewWalsh/openapi-devtools


whoa that's sick, thanks for sharing


I think this is a funny joke about how HTTP is technically an API. However, with more and more website fighting back against scraping (in part due to LLMs), I suspect explicit APIs are going to become increasingly important and useful.


This made me wonder why websites would be against scraping, at least as long as the content has DoS protection behind an edge cache. Aside from the copyright/etc issues.

Then it made me wonder, since AWS/cloudflare already have ML products and have data stored in edge caches already, if they don’t/couldn’t train directly from the caches?


can you share them?


A lot of great data here too: https://irma.nps.gov/Stats/ I used it to build my National Park visitation visualization: https://jordan-vincent.com/night-under-the-stars


Your project is awesome! Nice work


Oh good. I was literally wishing for this on Saturday when trying to parse the table of which mileposts were closed on the blue ridge parkway.


The NPS API is still missing a bookings API, which is the carrot that was needed for developers to create new and useful experiences.


The bookings are all done through rec.gov


I believe this is intentional... many reasons come up for why.


Does one of them rhyme with Ticketmaster?


I don't think I see visitor stats? Oh well. Historical visitor data is here: https://irma.nps.gov/Stats/SSRSReports/Park%20Specific%20Rep...


This sounds incredibly fun. I wish morr government programs like this received more funding.


Their Twitter has been consistently funny. Whoever is making these decisions over there, I'd like to thank them.


Does anyone know any APIs to get BLM land?


I don't know, but you might want to reach out to http://www.mylandmatters.org/ - that's where I go to get a lot of land use data, so they might be able to point you in the right direction.


I think it's the ACH API, you need a routing number, account number, and a price tag for your local republican.


I remember when I used to think only one party destroyed public lands for a paycheck....

^^ Change republican to politician

Biden granted more oil and gas drilling permits than Trump in his first 2 years in office https://news.yahoo.com/biden-granted-more-oil-and-gas-drilli...


BigQuery public datasets would be a better hosting platform for this kind of data. I worry they are not anticipating the security & budgeting issues of hosting a real-time API.

With PUblic Datasets, the account making queries pays for the queries. NPS only pays for the storage (which is minimal).

With this API, NPS has to pay for every call to the API. That’s not cheap.


Requiring use of a private party to access public data is usually something we discourage.


I agree with this in theory but in practice it would be unrealistic and honestly a misuse of government funds for them to reinvent and maintain a fully in-housed stack for all of its digital services.


At what cost?


They’re hosting on AWS . So either taxpayer pays for hosting, or the customer pays .


Parent thread said:

> to access public data

Keyword is access. Hosting on AWS is an implementation detail that doesn't block the end consumer from accessing the data.


There are 4-5 other assets the customer needs in order to access it. So one more wouldn’t be a big deal.


Tell that to US government agencies publishing their announcement and other news on Medium.com.


Do you need anything other than a web browser to access it? Medium is just the server that it’s hosted on.


you have to accept TOS and in some cases pay for a subscription


Paying for subscription is only when the publisher has opted into monetization. Which isn’t the case for US government agencies.

That said, I hate Medium with a passion and that things like the Netflix tech blog are hosted there.


Private parties that the customer needs to pay access this NPS public data:

* AWS

* Comcast for their internet service

* Apple for their laptop

* A number of software providers for their development tools.

But asking the customer to pay google to query the data is crossing the line?


This argument is nonsensical.

Site hosting is not a customer cost.

The rest you list are costs orthogonal to this service.

> But asking the customer to pay google to query the data is crossing the line?

Yes.

Why are you arguing for a US government agency to require its citizens to pay for access to data which they have already paid for by funding said agency?


Well, no, a customer has choices for most of those, because the government isn't hosting the data exclusively with a private vendor that charges the customer for access, providing an exclusive franchise to that vendor.

That was what was suggested upthread.

Requiring the user to have certain capacities to access data, where those capacities are provided by a number of competing vendors (and some by free, gratis and/or libre sources) is a very different thing.


NPS is hosting this data on AWS , a private vendor. And NPS (ultimately taxpayers) pay for every query.

So are you ok with some chinese APP company making 50 crappy NPS themed apps and having taxpayers pay for the backend?


> So are you ok with some chinese APP company making 50 crappy NPS themed apps and having taxpayers pay for the backend?

I will make that trade every day of the week if it means access continues to be through a standard protocol (HTTP) and not beholden to any particular vendor.


> So are you ok with some chinese APP company making 50 crappy NPS themed apps and having taxpayers pay for the backend?

Addressed in a comment in another subthread, which I know you are aware of since you responded to it, too: https://news.ycombinator.com/item?id=39086270


Why would someone who just wants to access the data need to pay for AWS? And the rest can be avoided by using a library PC & open source software. Or more likely, are already things almost everyone has on hand anyway.


Every request to EC2 costs money. TANSTAAFL


> TANSTAAFL

What?


There Ain't No Such Thing As A Free Lunch. It's an old saying meaning nothing is really free. If you aren't paying money, you're paying some other way.


> If you aren't paying money, you're paying some other way.

Taxes are, generally, money.


Who exactly am I paying for the software on my laptop?


I was hoping it was a PEMDAS for AWS costs.


What's stopping me from accessing this without an AWS account, over Frontier, on a Thinkpad?

That's the difference.


You’re thinking small. I’m thinking big. That’s the difference.


You're arguing in favor of making consumers require an account at a company that's already centralized too much of the web.

That's fundamentally a lot worse than the government paying hosting costs to one particular vendor for a commodity service.


You really aren’t. You’re talking like someone who is willfully ignorant of the decades of internet history that have preceded this conversation.

People have quite literally died over the issue of public access to public data. It’s quite an important belay point to arrest the deterioration of the spirit of open networks.


Host it as csvs as a backup


Majority of your bullet points would be circumvented by running your own server and developing on a linux OS.


Where do you host this hypothetical server? How does it get internet access?


FYI, you can edit your posts for an hour. Instead of reposting, just add your new thoughts onto your previous comment?


Who makes servers?


The federal government pays Comcast to provide internet to low income households. And you can actually access this data on any old brand of laptop, or the desktop computers provided for use in most jurisdictions, and do not have to pay for any development tools to do so.


My second hand laptop isn't apple, my host is a raspberry pi, not AWS. I don't use comcast - I have a wide choice of providers including free ones (at my local library), and I've never paid for a development tool


> NPS has to pay for every call to the API. That’s not cheap.

I am perfectly fine with it being considered part of the basic, taxpayer-supported functions of government agencies to be providing the public with relevant data.

If there is a concrete abuse or wildly disproportionate cost problem in particular areas, that may need to be addressed with charges for particular kinds or patterns of access.


You might be fine but any taxpayer expense must be justified and cheaper alternatives explored. This is someone else's money so it is very easy to feel entitled but every penny saved here can go into other better things like conservation, infra in parks etc.


At what cost? Rest APIs are very expensive ways for the government to make CSV data available to the public.


They are a whole lot less expensive than tracking customer usage and billing for it, and a whole lot more useful to the public than having the data nominally publicly accessible but only "on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying Beware of the Leopard." [0]

[0] Douglas Adams, Hitchhiker's Guide to the Galaxy


not in this case that's my whole point . they are choosing the most expensive (and riskiest) way to make csv files available to the public


There's likely some truth that CSV would work well here, and would likely be cheaper to operate. I wouldn't be surprised if a lot of clients are or could be doing full transfers of the data and doing their own queries.

I'd be pretty happy with sqlite dumps too.

I don't really have an issue with the REST, though. I wouldn't be surprised if this was just a standard and cheap to set up Django+REST libraries stack. Yeah, the compute costs are higher than transferring static files, but I'd be shocked if this was taking enough QPS for the difference in cost to make a meaningful difference.

I get wanting the government to be responsible, but this veers a bit too far into Brutalist architecture as an organizational principle.


Yes, and the flip side of that is you require people to have and use Google accounts to access public data? That’s not exactly ideal.


And at what cost. Why are we paying for customers to query this data in realtime?

So someone can host a ripoff NPS app on the App Store and taxpayers now pay for content hosting?


The USPS provides a free service for address validation. You have to register an account and receive an access token. If they feel your token is using too much, they can handle it as necessary. Why this same concept couldn't be done in the same way is just lack of imagination.

You can access for free, but if you abuse or break the TOS your access is revoked. Done


USPS is a corporation with profit and loss. I still consider this irresponsible but theoretically I'm not paying for their largesse


Fine host it as CSV as a backup for the luddites.


> BigQuery public datasets would be a better hosting platform for this kind of data. I worry they are not anticipating the security & budgeting issues of hosting a real-time API.

Then use their API to populate a BigQuery public dataset and make available to all.

Otherwise, perhaps we, as outside observers, need to consider the possibility that those whom made the decisions to provide this service as such did so for reasons which we may not be aware.


ok good suggestion here you go

nps-public-data.nps_public_data



Looks like access denied? Good work!



the following tables are now available:

parks

people

feespasses


[flagged]


> Are you new here?

Yes, I am new here. So why don't you tell me why you haven't answered this question I posited earlier:

  Why are you arguing for a US government agency to
  require its citizens to pay for access to data which
  they have already paid for by funding said agency?


There's a difference from paying for the information, and paying for the expense of delivering that information. You have to pay for official copies of documents like birth/marriage/death certificates. To get copies of public court documents costs as well even though you can read the data for free. Toner/paper isn't free, fancy paper for certs aren't free. You can have the data, but you gotta pay for the copy.


well said


REST APIs are extremely battle-tested, easy to integrated with, and far more mainstream than BigQuery public datasets or any other niche technology that may or may not exist at some point in the future. If cost is truly an issue, perhaps the solution is to properly fund the NPS so it can make smart technology decisions.


every rest api implementation is bespoke. what does "battle tested" mean in this sense ?

Sure the concept of rest APIs is mature, the but each implementation is untested.


at what cost?


I’m not sure if you’re serious (given the spammy nature of your posts, I’m inclined to believe not), but given that REST is the de facto standard for exchange of information between machines across the internet, I think the onus is on you to estimate how much money you think the NPS stands to save by doing it your way. Then the rest of us can evaluate whether that’s a good tradeoff.


We only know based on the information we have that they are sharing CSV files via REST on AWS/EC2. That’s the most expensive way to share it, and also risky .

What’s spammy about my post? I have asked people to focus on costs when they make general statements like “all govt data should have a REST api”.


Do you actually know that the only source of the data is a static CSV? Or is that just speculation?

I think we can dispense with the risky argument, because this API has existed for years without issue.

When you make the argument that "X is too expensive," the onus is on you to prove it's expensive in a relative sense, not simply in an absolute sense. Saving $100 matters if you're spending $1000; it probably doesn't matter if you're spending $10m. Feel free to convince us: make some estimates, crunch some numbers, and look at existing NPS IT spending and see if they seem ballpark reasonable. Otherwise you're just banging on about a left-field solution that almost no one wants because it's putatively cheaper (but by how much, you can't say).


all the APIs but one are static json


CSV is also a de-facto standard and is more common than REST at 1/100 the cost


Do you have numbers to backup the cost savings estimate? I can imagine lots of REST implementations that are really inexpensive.


hosting csv on S3 vs hosting ec2 instances for an apache rest api


Looking at the data made available by this API, I think it's safe to say this is fine.


This is a fascinating thread under this comment. Everyone is keying off of one part of the comment (querier pays) and not the more critical issue IMO - anticipating security and budgeting issues of hosting a real-time API. You suggested an alternative and everyone is pitting the status quo against that alternative instead of maybe looking for other alternatives that help address the issue.

People here clearly don’t like a querier pays model and that’s fine. But should NPS still reinvent the wheel across the SDLC to serve this data? I think there’s a compelling argument in there.


Yes thank you for noticing that. My bigger concern is NPS paying for expensive auto-scale resources for what is basically CSV files that could be hosted cheaply and securely.

REST API compute is very expensive when you include compute costs, transfer fees and admin costs to keep it up.

Not to mention the cost to implement a bespoke API and deal with security issues.

All to make CSV available!


On the list of alarming or even questionable things our taxes pay for, this doesn't even make the top 100.


start a thread on one of those let's discuss


I'd consider it a public transit service. We wouldn't be upset about people using shuttle busses to get to the parks, would we? I think long term footing the bill for an open platform with principle beneficiaries who use it is fine so long as it provides a net benefit.


If you have to pay for REST API OR shuttle busses which one gets funded?


With their API they have to write a bunch of boilerplate code to transform from their SQL db to REST. Authentication, throttling, threat prevention, encoding, etc etc.

With BigQuery they just copy the data in via CSV and Big Query handles the indexing & query engine.


> With their API they have to write a bunch of boilerplate code to transform from their SQL db to REST.

Open source tools that will present a simple, read-only REST API over an SQL db with little to no custom code exist (so do proprietary tools, often from DB vendors and sometimes as part of SQL db products.) Same with NoSQL or sorta-SQL storage solutions.

The idea that they have to write a bunch of boilerplate code to do this is false. They might choose to do that, but its defintely not necessary.

> Authentication, throttling, threat prevention, encoding, etc etc.

Again, open source canned solutions that take a little bit of configuration exist for many of those, and some of them are likely shared services that they just point at whatever service needs them.


Who says they have a SQL DB? This looks to be almost entirely static data, occasionally updated.


Whatever storage format they have, they are writing boilerplate to transform it into REST . Regardless, it will be cheaper to just ingest into BigQuery


It’s Apache Solr. Most of the data is static, but alerts and events get frequent updates.


Same concern about unnecessary code and compute stands.


Are you suggesting the government put their public access API behind a paywall?


This data domain doesn’t need a realtime api. They could host CSVs online with some mirrors and save millions of dollars hosting this stuff.


Ah yes, I see now. Yeah makes no sense to offer a REST response for each request.

On that note, what would the processing entail? Processing the get request and packaging the entire dataset into a REST object right? Or is it a more complex API that lets you run queries against the dataset? For that matter wouldn’t downloading a CSV also have to be packaged into a REST object?


the rest API provides parameters for filtering & pagination. all of that is unnecessary. it's a few hundred MB tops . CSV , Bigquery , anything is better than running REST on EC2


[flagged]


> They should choose a solution that makes the customers pay for accessing the data.

They are.

The NPS's customers are the American public, and they are paying for the data via taxes (approximately; government finance is more complex than that, and taxes don't really pay for spending that's a model that really is only true when working with commodity and not fiat currency, but, its reasonable enough for this purpose.)

What you want isn't for the current customers to pay for the data, but for the consumers of data to be viewed as customers and then charged for the data. but that potentially makes things more expensive for the NPS's customers, for instance, if one of the significant consumers turns out to be other public agencies, who then are paying a direct cost which pays for both the access to the data and the additional costs of billing and account management that a consumer-pays model imposes, and paying the overhead for the payer-side costs associated with payments, as well as the actual amount of the payments, then you end up ultimately with largely the same customers paying, but paying a whole lot of additional overhead.


The taxpayers are the patrons not the customers.


We are both.


The taxpayers are the patrons. The API consumers are the customers. When the customers make a lot of API calls or cause security issues, the patrons have to pay.


I'd argue that this is an overly restrictive framing of the situation for a few reasons.

First, patron is just another word for customer. I understand that you're using it to distinguish between two customer types:

1. The end user (presumably a future park visitor)

2. The intermediary providing an experience to the same end users (who are presumably using the intermediary to plan a future park visit)

But this is not a formal distinction, and I think it's necessary to zoom out and look at the players involved and the nature of their relationship with the NPS.

Is it not true that "Patrons" are still likely to be the initiators of those API calls?

Is it not also true that the "Customers" exist within the same tax system as the "Patrons"? If I go to nps.gov as a "Patron", I'm directly consuming NPS resources. If I access NPS information via a site that provides an alternative experience, I (the future park goer) am still the primary beneficiary of the API call.

Let's say I build a project that calls these APIs and my goal is to help people who have accessibility needs find the parks that are most amenable to their situation. Let's say this is a passion project and I'm not doing this to make money. What you're proposing would make such a project non-viable.

And I think there's a strong case to be made that the experience described in the last paragraph will consume fewer NPS resources than navigating through NPS page after page to find what I'm looking for.

And in the end, if the information helped a park goer find a place to spend their money, this is a net positive for the NPS.


Interesting API


Why is the government cargo-culting the scourge that is API keys?

The goal of this should be for everyone to have access and lower barriers to entry, not put bureaucracy in the way of access and de facto suppress use by open source projects because each user would need their own API key unless someone publishes one.


From the site:

  The NPS Data API is open and accessible to all developers
  who wish to use NPS data in their projects.
From their "API Guides":

  Limits are placed on the number of API requests you may
  make using your API key. Rate limits may vary by service,
  but the defaults are:

    Hourly Limit: 1,000 requests per hour

  For each API key, these limits are applied across all
  developer.nps.gov API requests. Exceeding these limits
  will lead to your API key being temporarily blocked from
  making further requests. The block will automatically be
  lifted by waiting an hour.
That, along with their ToS[0], hardly seems to qualify as a "cargo-culting scourge."

0 - https://www.nps.gov/aboutus/disclaimer.htm


API keys were invented as a tracking device. You sign up and then they associate all your use with one person and can do things like revoke your keys if you e.g. try to compete with the company's own products. Neither of these should be relevant to public data on a government service.

Rate limits are straight forward to implement per-IP address without having any other information about anyone. The sort of person willing to bypass them by using a thousand IP addresses is the same sort of person who would sign up for a thousand API keys using fake names. How are you supposed to rate limit by API key if "anyone" can get an API key? You'd need to use some means to rate limit how many API keys someone could request, which was the original problem.


> API keys were invented as a tracking device

And that's exactly how they're used as well. They need a method to track the usage of these services because there is often a cost involved with providing them. You also need a way to block or rate limit usage that is not IP bound.

As an example, when Yr[0] opened up their APIs for free world-wide weather forecast it quickly spiralled out of control. I don't recall the specifics of it, but in short a major phone manufacturer started using their APIs on their phones and it took down the service because of the increased load. They could have solved it by just adding more hardware, things like this is highly cacheable, but when you're dealing with tax payers money you generally don't want to subsidise for-profit companies. So you implement a token and tell them to implement their own caching layer on top of it, and everyone is happy.

I don't see how you'd solve something like that with anything other than a token. The methods you've mentioned in other posts simply don't work when a couple of hundred million phones ping your API every time they unlock their phone and it refreshes the weather widget. It also create no incentive for the developers to do things right, like not checking for updates every time the user does something, even though the initial request also came with a TTL and cache-control header that clearly states when this would be updated again.

[0] https://developer.yr.no


> They could have solved it by just adding more hardware, things like this is highly cacheable, but when you're dealing with tax payers money you generally don't want to subsidise for-profit companies. So you implement a token and tell them to implement their own caching layer on top of it, and everyone is happy.

The for-profit company is happy, anyway. They get free data and you've priced the competition out of the market.

What things like this are really useful for is to create the app equivalent of weather.gov. Most for-profit "repackage government data" websites and apps are ad-laden spyware that will spin your CPU at 100% and shovel every byte of data they can hoover up into a data warehouse that sells to anyone with a buck while doing little more than displaying the government data.

If you want to create an open source one which is free and promises not to track the user, you can, but then you need the data. If you end up with millions of users, who has more resources to set up caching servers, some individual idealist with zero revenue or the United States Government?

This shouldn't even be a question. The government has to operate infrastructure that can handle millions of users for many other reasons. This should be something they're experienced in, and something like this should just fit into a slot in existing infrastructure. This is what it's for. If all you want is to provide the data for various scummy middlemen to wrap in ads and spyware then why is it an API at all instead of a static data dump / live feed with the latest changes?


> The for-profit company is happy, anyway. They get free data and you've priced the competition out of the market.

And they'll also be happy to disregard all your wishes for them to implement their own caching layer and if you have no way to block this kind of activity they absolutely will do it. As demonstrated in the example I gave you.

> If you want to create an open source one which is free and promises not to track the user, you can, but then you need the data. If you end up with millions of users, who has more resources to set up caching servers, some individual idealist with zero revenue or the United States Government?

Me - as a taxpayer - isn't really keen on paying for everyone to build their application on top of it. If you create an open source application you can always tell the users how to obtain such a token.

> This shouldn't even be a question. The government has to operate infrastructure that can handle millions of users for many other reasons. This should be something they're experienced in, and something like this should just fit into a slot in existing infrastructure. This is what it's for. If all you want is to provide the data for various scummy middlemen to wrap in ads and spyware then why is it an API at all instead of a static data dump / live feed with the latest changes?

Again - why should I as a taxpayer have to pay for that? For me, the taxpayer, the service is just as available and usable, even if I have to request a token to use the service. How do you propose you'd limit how a service can be consumed without some kind of token? We've already established that your other solutions doens't work. The alternative is likely just to not provide the service at all, which seems like a net loss for everyone involved, both for for-profit business, taxpayers and open source developers.


> API keys were invented as a tracking device.

Yes, by definition.

Apparently, you did not review the "Disclaimer" link I provided. In it is the following:

  Not all information or content on this website has been
  created or is owned by the NPS. Some content is protected
  by third party rights, such as copyright, trademark,
  rights of publicity, privacy, and contractual restrictions.
  The NPS endeavors to provide information that it possesses
  about the copyright status of the content and to identify
  any other terms and conditions that may apply to use of the
  content (such as, trademark, rights of privacy or publicity,
  donor restrictions, etc.); however, the NPS can offer no
  guarantee or assurance that all pertinent information is
  provided or that the information is correct in each
  circumstance. It is your responsibility to determine what
  permission(s) you need in order to use the content and, if
  necessary, to obtain such permission.
Notice the first sentence; "Not all information or content on this website has been created or is owned by the NPS."

Perhaps there is a need for use to be "tracked" in order to ensure legal agreement to the Terms of Use?


That isn't a terms of use, it's a disclaimer. It's informing you that some of the information on the website might not be in the public domain, which is a simple factual statement that doesn't require you to agree to anything in order for it to be true or applicable.


My bad, I thought of it as a ToS. Thanks for the clarification.


Per IPv6 address? It’s very difficult (impossible?) to even make IPv4 based rate limiting work.


With IPv6 you use address blocks instead of individual addresses.

IP-based rate limiting is extremely effective because it bifurcates the internet into IP addresses controlled by the attacker and ones that aren't. The attacker can only issue requests at a rate of rate limit per IP address times number of IP addresses (or IPv6 blocks) they control. Then the IP addresses under their control get denied while the IP addresses not under their control, i.e. all of the other users, are unaffected.

This only becomes a problem if they control on the order of millions of IP addresses, but then you're dealing with a sophisticated criminal organization and are probably screwed anyway.


Yep. Very likely they are using API Gateway with Usage Plans, which is a very simple and effective way to do rate limiting and quotas.


If the provider is bearing the costs (like here) then they always need some kind of authorization, or they have no way to shut off abusers or people with misbehaving clients.

An API key is about the simplest possible way to achieve that, and appears to be perfectly adequate in this case.

What do you suggest? SAML?


> If the provider is bearing the costs (like here) then they always need some kind of authorization, or they have no way to shut off abusers or people with misbehaving clients.

HTTP is an "API" that has no API keys and all the public web servers in the world seem to manage this without any trouble.

> What do you suggest? SAML?

No authentication required by default -- it's public data. Just impose a reasonable rate limit by IP address and require registration only if someone has a legitimate reason to exceed that.


> all the public web servers in the world seem to manage this without any trouble

Incorrect. Most large web sites invest in DDOS protection e.g. Cloudflare.

Cloudflare DDOS protection as an example is a lot more sophisticated than merely counting requests per source IP (https://developers.cloudflare.com/ddos-protection/about/how-...).


Cloudflare is one of the ways they manage it.

But API keys aren't any good for that anyway because if someone is just trying to overload your service by brute force, they can send requests regardless of whether the keys are valid and still use up all your bandwidth sending error responses or your CPU/memory opening new connections prior to validating the API keys, and to avoid that you'd still need some kind of DDoS protection.

Where they actually do something is where you're doing accounting, because then if someone wants to send you a million requests, you don't block them, you just process them and send them a bill. Maybe you block them if they reach the point you don't expect them to be able to pay. But if it's a free service that anybody can sign up for as many times as they want then that doesn't do any good because the price is $0 and a rate limit per key is avoided by signing up for arbitrarily many more keys.


> HTTP is an "API" that has no API keys and all the public web servers in the world seem to manage this without any trouble.

Um, no. That’s just not true.


We're currently using a discussion forum that nobody signed up for an API key in order to make posts and you don't even need a user account in order to read. What allows them to sustain this without being destroyed by evil forces?


> nobody signed up for an API key in order to make posts

Yes you did. When you logged in, they gave you an API key in the form of a cookie that you include with every request.

And it's run at a loss by Y Combinator, which is very, very wealthy. And even hackernews has to pay for cloudflare and mods, on top of hardware, hosting, and traffic.


> When you logged in, they gave you an API key in the form of a cookie that you include with every request.

You can read this website (i.e. make queries against its database) without logging in. Moreover, the main thing the cookie does is not some kind of rate limiting or denial of service protection, it's assigning your username to your posts so that others can't impersonate your account. Various image boards exist that even allow you to post without logging in and they seem to be fine with it.


> You can read this website (i.e. make queries against its database) without logging in

Yeah, but the sentence I replied to was "nobody signed up for an API key in order to make posts". That claim was false. Being able to read the website is a totally different topic.


> That claim was false.

It was not. A login cookie isn't an API key. It serves a different purpose, which you can observe on the services that do have an API key and then separately require some other credentials to make posts as a particular user account.

Here's a good way to distinguish them. If I want to make my own app (in this context a web browser), do I have to maintain some intermediary servers that the app makes requests through in order to keep my, the app developer's, API key a secret from the users who are using the app? No, the user only needs their own user account, and only for the things that require a user account, and the service expects for each user to have their own account, rather than each app.


> It was not. A login cookie isn't an API key.

It was. Google "what is an api key", and the first result is

> An application programming interface (API) key is a code used to identify an application or user and is used for authentication in computer applications.

Yes, as you argue, it is indeed used to indentify multi-user applications. It is also used to identify users. It is not as narrow as you thought. Learning something new is a good thing! I'll be abandoning this thread now. If you need to get the last word, go ahead. If you need a victory, then fine- I was wrong all along, you win.


Google "is a cookie an API key" and the first result is this:

https://news.ycombinator.com/item?id=39094541

Which says:

> A login cookie isn't an API key.

If the first result is authoritative then I guess that sorts it.

But your link was from this site:

https://www.fortinet.com/resources/cyberglossary/api-key

Which is confusing because it also says:

> API keys cannot be used for secure authorization because they are not as secure as authentication tokens. Instead, they identify an application or project that calls an API.

> API keys are generated by the project making a call but cannot be used to identify who created the project.

> API keys are used to identity projects, not the individual users that access a project.

Which certainly implies that API keys identify applications or projects. But it's not that confusing because when the first definition says "user" what it means in context is the application developer.

Using the same definition out of context would lead you to believe that, for example, your browser's user agent string is an API key. It's a code (i.e. symbols) that identifies an application or user (browser fingerprinting) and is used for authentication in computer applications (some sites may require you to authenticate again if your browser fingerprint changes too much). So clearly that definition is too broad without context. If you allow a loose enough definition of "code" it would make your screen resolution an API key because it can be used for fingerprinting in the same way.


> Which says:

> A login cookie isn't an API key.

You.... googled your own comment, and cited it as evidence that my google result was wrong?

I guess I'm done here.


It was the first result. Either that means it's right, and then there we are, or it means being the first result is no guarantee, and then what does that say about yours?


There's a rate limiter that kicks in if you try to post or do other things as a logged in user too fast.


That also applies when you're not logged in.


Probably either the lack of evil forces currently attempting to destroy it or cloudflare.


So we've established that it isn't API keys.


Per IP limits don't do anything about the scenario where the API is integrated into a third party website that sees a sudden spike in popularity. At that point, the API is providing free capacity to the third party site. Maybe that is fine, but you seem to be ignoring the possibility.


Because it's fine. That's what it's for, isn't it? The public, via some website, is requesting the government data their tax dollars have paid for.

Which allows that website (or app) to operate with minimal resources, e.g. by a non-profit or open source project, instead of having to be a for-profit entity which needs some underhanded way to generate revenue in order to display the "free" data.


API keys are important for effective rate limiting/abuse prevention.


I don't really understand your complaints about API keys, but if you did want to make an issue of something perhaps it should be that you get your API key sent to you by email, in plaintext. Not amazing, but I guess for their threat model it's generally ok.


API keys provide a straightforward mechanism for limiting use, and for allowing clients that get lots of traction to pay for higher limits. That’s not a cargo cult, that’s just design.


Could you show us an example of a service API that you maintain that doesn’t uses API keys?


RSS feeds.


What makes API keys a scourge?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: