Hacker News new | past | comments | ask | show | jobs | submit login

Some obvious differences...

Bing sets cookies. Qwant does not set cookies.

Qwant, i.e., lite.qwant.com, prefixes search result URLs to point to Qwant servers. Qwant redirects www.qwant.com to lite.qwant.com when Javascript is disabled. Bing does not prefix search results.

Bing requires sign-up in order to use their API. Qwant's undocumented API is freely accessible, no sign-up.

Qwant, i.e., lite.qwant.com, requires a User-Agent header. Bing does not require a UA header.

Example of Qwant API

   curl "https://api.qwant.com/api/search/videos?q=example&count=150&offset=0&f=xyz&t=xyz&l=en_gb&uiv=xyz"



Speaking as someone who ran operations in a search company that had an api for accessing their search results, that api will get slammed. But not in a good way.

Crooks use search engines to find pages that have exploitable js code or sql injection vulnerabilities. They use them to find unprotected comment sections so they can inject spam into them. They use them to build dossiers on people by scraping public information sites. And all that API use, they never pay a dime for access. Nor will they reveal who they are in order to get access. It just isn't how they operate.

At its peak I had over 2.6 MILLION internet hosts black listed from using the Blekko API. Exactly zero reached out to any of the easily found contact addresses and said "Hey your API seems to be unresponsive" :-)


Can't this be remedied by introducing a 1 second cooloff period per IP address?


Query per second or qps limits is a good start, but I quickly learned that a 1qps limit to "free" access meant that all of a sudden it was a 1000 unique IPs that were making one request each.

Lots of botnets were highlighted this way, when one address searches for 'joomla v2.3', and the next IP searches for 'joomla v2.3', page=2, and the next IP searches for 'joomla v2.3', page=3, etc.

It was annoying but an interesting problem. We could implement any arbitrary policy and then watch as the bots adjusted to come in just at that policy limit. We banned entire Ukrainian ISPs (they were a big source at the time) and have VPN providers become the big users. We put in limits per day, I tried a "thermal" system where IPs gained "heat" by queries and "cooled" by idle time. We built a server with a "broken" IP stack that we could send the initial TCP connect to, the server would accept the connection and then never respond. A "black hole" if you will. The trick was we didn't actually keep[ sockets open we just pretended like we it was the other end of a TCP connection. It did everything correctly except complete the connection. That would cause any client using off the shelf IP stacks to hang indefinitely.

It is a game with no ending as one might say.


The curl request needs a UA header as you noticed:

curl 'https://api.qwant.com/api/search/videos?q=example&count=150&...' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: