Hacker News new | past | comments | ask | show | jobs | submit login

> API Terms of Use

> Applications using the Listen API must not pre-fetch, cache, index, or store any content on the server side. Note that the id and the pub_date (e.g., latest_pub_date_ms, pub_date_ms...) of a podcast or an episode are exempt from the caching restriction.

Is that.. common? I've never knowingly come across anything like that before, seems weird to me. Sort of makes sense, in a 'you must not try to avoid needing to pay us more because we want more money' sort of way, but.. really? Also, almost entirely (basically, except OSS) undetectable, surely.

[Edit: failure to read my own quote correctly, thanks xd1936] --- And if you really take it seriously - 'must not [...] store any content' - it really limits what you could even use it for, not being able to store the `id` even for a later reference. I don't think that's what's intended, but it seems to be what's written. ---

(Just so I don't sound like a grumpy old git (I'm not old, at least!) - I really really really like the docs page https://www.listennotes.com/api/docs/ only thing I'd suggest perhaps is embedding the OpenAPI 'HTML' contents below the other options, rather than it being a link to follow. Awesome though.)

Map tiling APIs do this, like Mapbox and Google. Else you could circumvent all but their lowest-tier subscription plans with a brain-dead caching proxy and a large disk which is what they want to avoid.

Amazon's API famously does this as well (or used to, it's been a while) by requiring any prices you show to be no more out of date than N minutes forcing you to basically request on demand every time you need to show it. They'd rather you just send the traffic their way for people to see the price.

Heh, yeah. I think my reaction's still similar though - why shouldn't I be allowed to do that?

The alternative of course is to charge more per tile, or have a base 'access fee' + small incremental charge. Pay per usage doesn't work best for everything, IMO.

(And I'd likely still want to come back occasionally to check it hasn't changed, even if I cached every tile forever. (Which I probably wouldn't, if the hit rate was really low, like it was a one-off, and I'm being cheap about my API usage why wouldn't I also be cheap about my disk usage.))

> why shouldn't I be allowed to do that?

Short answer: Because that's the contract.

Companies that provide data for offline use will have a separate licensing modeling, usually with subscriptions for updates or perhaps a finite license term. MaxMind's GeoIP database is a popular example.

That's not really an answer though, that was the starting point.

And this isn't a one-off dataset, we're discussing an API pricing model - there will be new podcasts, existing podcasts' metadata will change; people using this API will want to make repeated calls, they just might also reasonably want to cache results.

If this were my service, I just wouldn't do pay-per-API-call, or at least not only. Of course, the free tier presents more of a problem then, but I'd probably just restrict it more making it less attractive, and have a lower entry point than the $100pcm that's a flat-fee for some but not all extra features, showing images at all (and not in free), for example.

As it is, I reckon loads of users cache results - not maliciously, just because they haven't read that they're not supposed to - and that OP has no idea (because how would they).

Pay-per-use is just the simplest and most straightforward and possibly fairest way to couple the value your API gives someone with the amount they pay in return.

Or, from the eyes of the user, they get full access to the API yet don't have to pay much if their project gets no traction.

The downside is that users can lie, but it's mainly just low-end users who would lie. Pay-per-user licenses are similar: a startup or a hackathon is most likely to share the license between a few people while larger companies are going to be honest because (1) they can afford it and (2) they don't want trouble at scale.

So you can ignore most abuse.

The problem with other payment structures for ListenNotes is that it's a relatively small database. You can clone the whole thing trivially. It doesn't even mirror/host the audio feeds. Its only value is that it put in the work of structuring and normalizing the metadata.

If you built a business on top of ListenNotes, you'd save more and more money as you grow bigger and bigger if you were simply cloning the whole thing with your own crawler. So the more value you would get from ListenNotes, the less you're actually paying them. Or ListenNotes would have to price their per-call fee so high that they could somehow capture a fair price for that value yet shut out smaller users.

Turns out "courtesy agreements" generally do work at scale as larger companies become less and less likely to lie just like they become less and less likely to pirate Photoshop.

> have a lower entry point than the $100pcm that's a flat-fee for some but not all extra features, showing images at all (and not in free), for example.

The downside of this is that now you limit what people can build on cheaper tiers. In fact maybe they can't even build their compelling product without whatever content you're paywalling behind tiers they can't afford on day 1, while the goal is to let someone build anything they want on day 1 so that they are a large end-user on day 1000.

After all, the ideal isn't that you scale value with your customer's income but rather you scale in price as they convert value into income. It, of course, is all just trade-offs.

Depending on the use case, possibly a whole lot of disks.

Right, I would assume that even just the tiles for the biggest cities alone would still be way more than most would want to store. On the other hand, let's assume on the client-side, can you not even cache a tile a user just saw 10s ago but went off screen? Or is it assumed the browser will cache that tile?

> On the other hand, let's assume on the client-side, can you not even cache a tile a user just saw 10s ago but went off screen? Or is it assumed the browser will cache that tile?

I don't know the map tile terms, but the quoted limitation for this service specifies server-side caching.

I’ve noticed similar recently with many paid book search apis out there and was also grossed out.

You’re not paying for a data source at all, you’re paying for an expensive embedded application.

I don’t see how it’s remotely reasonable. The person managing this api has stricter protections on this data (though they’re not even his podcasts) than we have on our personal data.

You're not paying for the data, you're paying for the service.

This is common. Companies that provide the data for offline use tend to have a separate licensing and subscription fee structure. Companies that provide the API tend to forbid offline caching/storage of the data.

The service, though, is 'convenient access to the data [which is already out there]'. And once I've used it, I don't need it 100/sec just because that's how frequently people are using my downstream service to do something with some popular 'trending' podcast; I'm perfectly happy (and it would be a good practice to be!) caching it for some period, until I need the service again to conveniently see if the data that's already out there has changed.

> The service, though, is 'convenient access to the data [which is already out there]'.

The service is whatever is described in the contract you agree to when you purchase it.

If you don't like the terms of the contract, you can always try to negotiate an alternate agreement. Or you can choose not to purchase the service.

The seller isn't obligated to provide their services on your terms, just as you're not obligated to purchase the seller's services on their terms if you don't agree to them.

A single snapshot of an ever-changing database is the culmination of potentially years of research and payroll and system development that API consumers precisely didn't and don't want to do, that's what gives the dataset and thus API value.

The price that captures that value would have to be much higher in the model where you only need to access the database at some interval (let's say weekly), and that's not necessarily any more palatable.

I commend the service provider for aggregating the data and making a business - hope that person is able to make a living from it.

It’s an interesting service that I would be very interested in using in providing a service of my own. And I’d be more than happy to pay for it, but those terms are a non-starter, at least for me.

The year is 2040. There’s no running water. Grocery stores mandate that all purchased liquids must be consumed prior to leaving the premises.

The year is 2050. For some reason that nobody can remember, everybody lives in "stores."

The year is 2060. “Stores” begin synthetically seeding human life in closed environments in according with growth hacking best practices. Product market fit declared a solved problem.

Thing is, you prevent an API so that people don't use some kind of data harvester. If your API is terrible, people resort to harvesting.

> Note that the id and the pub_date of a podcast or an episode are exempt from the caching restriction

> it really limits what you could even use it for, not being able to store the `id` even for a later reference

:facepalm: - thanks, I'll (keep it but) edit my comment to reflect that correction.

At least for the actual audio, I understand that podcasters get grumpy when people cache that server-side, because they depend on server logs to get viewership numbers for advertisers, so if a popular client downloads the audio once and distributes it to all their customers, they can't make money off any of those customers.

Podcasts also often target advertisements geographically (based on IP address, I guess?). Being able to serve to each listener is part of their value proposition to advertisers.

I worked on a food tracking PWA, and getting it to be useful offline was horrible. We’d have to hit the API at least once a day to grab commonly used foods and refresh our temporary cache. The data did not change at all... eggs don’t suddenly have a different calorie count the next time you eat them lol

A database of all of the world's foods though could easily be larger than I'd like a calorie counting app on my phone to be though, for example. So it's not necessarily silly - network can be cheaper than disk.

IIRC, Mapbox has similar terms for both their map tiles and their geo lookup results.

Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
