Hacker News new | past | comments | ask | show | jobs | submit login

Does not take much effort. Below is an example using curl. For reading Twitter feeds I just get the JSON and read the "full_text" objects. I have simple custom program I wrote that turns JSON of unlimited size into something like line-delimited JSON so I can use sed, grep and awk on it but HN readers probably prefer jq. For checking out t.co URLs I use HTTP/1.1 pipelining.

Usage for reading is something like (but not identical to)

   1.sh screen_name > 1.json
   yy059 < 1.json|grep full_text > 1.txt
   less 1.txt
Usage for checking out URLs is something like (but not identical to)

   unset connection
   export Connection=keep-alive
   yy059 < 1.json|grep full_text \
   |yy030 \
   |grep "https://t.co/.{10}$" \
   |uniq \
   |yy025 \
   |nc -vv h1b 80 \
   |sed -n '/location: /s///p' \
   |ahref > 1.htm
   links -no-connect 1.htm
"ahref" is just a script turns URLs on stdin into simple HTML on stdout

Alternatively if I do not trust the URLs I might use a script called "www" instead of ahref. It takes URLs on stdin and fetches archive.org URLs wrapped in simple HTML to stdout, using the IA's cdx API.

  #!/bin/sh
  SCREEN_NAME=$1
  COUNT=500
  PUBLIC_TOKEN="Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA"
  GT=$(exec curl -A "" -s https://twitter.com/$SCREEN_NAME|sed -n '/gt=/{s/.*gt=//;s/;.*//p;}');
  echo "x-guest-token: $GT" >&2;
  REST_ID=$(exec curl -A "" -H "authorization: $PUBLIC_TOKEN" -H "content-type: application/json" -H "x-guest-token: $GT" -s "https://twitter.com/i/api/graphql/mCbpQvZAw6zu_4PvuAUVVQ/UserByScreenName?variables=%7B%22screen_name%22%3A%22$SCREEN_NAME%22%2C%22withSafetyModeUserFields%22%3Atrue%2C%22withSuperFollowsUserFields%22%3Atrue%7D"|sed 's/\(rest_id\":\"[0-9]*\)\(.*\)/\1/;s/.*\"//'); 
  echo "rest_id: $REST_ID" >&2;
  curl -A "" -H "authorization: $PUBLIC_TOKEN" -H "content-type: application/json" -H "x-guest-token: $GT" -s "https://twitter.com/i/api/graphql/3ywp9kIIW-VQOssauKmLiQ/UserTweets?variables=%7B%22userId%22%3A%22${REST_ID}%22%2C%22count%22%3A$COUNT%2C%22includePromotedContent%22%3Atrue%2C%22withQuickPromoteEligibilityTweetFields%22%3Atrue%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Atrue%7D&features=%7B%22dont_mention_me_view_api_enabled%22%3Atrue%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_uc_gql_enabled%22%3Afalse%2C%22vibe_tweet_context_enabled%22%3Afalse%2C%22responsive_web_edit_tweet_api_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22include_rts%22%3Atrue%7D"
There's no way I would use the Twitter website as it requires enabling Javascript and not for the user's benefit.

This solution isn't pretty but I can easily keep tabs on Twitter feeds without any need for a Twitter account, a Twitter "API key" or a so-called "modern" browser.




Can u scrape search feeds too? Ie tweets that match a certain string?


You can scrape anything you see in the UI (and sometimes stuff you cannot see). Twitter makes almost no effort to stop people from using their internal APIs, which is why them saying discontinuing the free public API is to stop malicious bots is pretty laughable. Unless they seriously increase their detection abilities for non-approved clients using their internal API, it would take any malicious actor all of a few hours to transition to using the internal API for whatever they want. Honestly, I assumed most bad actors would already be doing this, since things like spamming were already against the ToS of the public API.


What happens if/when they block that Bearer token?


The token has been the same since at least 2020 when Twitter started using GraphQL instead of REST.

Every person visiting twitter.com is using this same token. The token is neither personal nor private.

What would be the point of changing or blocking it.


> Every person visiting twitter.com is using this same token.

Be interesting to see if that stays the same when they're charging for the API but leaving a huge loophole with this token.


Twitter is not alone in using GraphQL this way, having all website visitors use the same token or key. Other websites do it, too, as shown below.

Using GraphQL like this can be an effective dark pattern because to anyone using a "modern" browser that "tech" commpanies control it makes it seem like the text of the website cannot be retrieved without Javascript enabled. That's false, but nonetheless it gets people to enable Javascript because the website explicitly asks them to enable it. Then the website, i.e., "tech" company, can perform telemetry, data collection, surveillance, and other shenanigans.

Sometimes this practice might not be a deliberate dark pattern, it might just be developers who are using Javascript gratuitously. For example, HN search provided by Algolia uses GraphQL. HN puts URLs with pre-selected query terms and a public token ("API key") on the HN website. Everyone that uses those URLs uses the same key.

Unlike Twitter, HN istelf does not ask anyone to enable Javascript. The website works fine without it, including the Algolia search, as shown below.

Usage is

   1.sh query > 1.json

   #!/bin/sh

   curl -A "" -d '{"query":"$@","analyticsTags":["web"],"page":0,"hitsPerPage":30,"minWordSizefor1Typo":4,"minWordSizefor2Typos":8,"advancedSyntax":true,"ignorePlurals":false,"clickAnalytics":true,"minProximity":7,"numericFilters":[],"tagFilters":["story",[]],"typoTolerance":"min","queryType":"prefixNone","restrictSearchableAttributes":["title","comment_text","url","story_text","author"],"getRankingInfo":true}' "https://uj5wyc0l7x-3.algolianet.com/1/indexes/Item_production_sort_date/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.0.2)%3B%20Browser%20(lite)&x-algolia-api-key=8ece23f8eb07cd25d40262a1764599b1&x-algolia-application-id=UJ5WYC0L7X"


Here is a non-curl version of HN search using custom HTTP generator yy025 and h1b, an alias for localhost address of TLS forward proxy

    #!/bin/sh

    export Connection=close;
    export Content_Type=x-www-form-urlencoded;
    export httpMethod=POST;
    x=$(echo '{"query":"'$@'","analyticsTags":["web"],"page":0,"hitsPerPage":30,"minWordSizefor1Typo":4,"minWordSizefor2Typos":8,"advancedSyntax":true,"ignorePlurals":false,"clickAnalytics":true,"minProximity":7,"numericFilters":[],"tagFilters":["story",[]],"typoTolerance":"min","queryType":"prefixNone","restrictSearchableAttributes":["title","comment_text","url","story_text","author"],"getRankingInfo":true}');
    export Content_Length=${#x};
    echo "https://uj5wyc0l7x-3.algolianet.com/1/indexes/Item_production_sort_date/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.0.2)%3B%20Browser%20(lite)&x-algolia-api-key=8ece23f8eb07cd25d40262a1764599b1&x-algolia-application-id=UJ5WYC0L7X"|(yy025;echo "$x") \
    |nc -vv h1b 80




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: