My popular Twitter Analytics app has reached technical limits. Need help

PaulHoule · on Nov 5, 2012

There's an antipattern here.

Don't build services based on other people's APIs.

It's sad but true. For all the talk about mashups, it's rare to find a demo that's actually cool and much rarer to find a real application because of these problems with API limits. Sure, you might be able to build something that can handle 99% of twitter users, but the interesting and profitable 1% will blow out the API limit and then you're hosed.

redacted · on Nov 5, 2012

So, your solution to the problem (carefully stated and with recompense offered) is: "Have a time machine"?

I guess it is a good solution, but it would require a lot of fundamental physics work so it might not match the time-frame he needs.

My point, and I apologise for the snark, is that when people have a problem and ask for help, they are not asking for judgement on things they should have done in the past, they are looking for ideas for how to move forward. If you feel strongly that people who build on APIs do not deserve this help, then perhaps consider making that point on any of the frequent "X API sucks" threads the pop up from time to time.

mittermayr · on Nov 5, 2012

he is both right and wrong. relying on APIs can make the world very complex and high-pressure suddenly, but on the other hand, building/owning your complete eco-system is sometimes not doable from a starting perspective (wish I'd started Twitter, but I haven't...). so his advice, while not helpful, does have some truth between the lines.

mittermayr · on Nov 5, 2012

you do have a point. it's incredibly fascinating to use an API since it instantly connects you with a large eco-system in this case, a large twitter universe - but once you start charging and making it a business, it becomes very much a middle-man-ish high-pressure type of situation.

troels · on Nov 5, 2012

Maybe I'm stating the obvious, but you write that:

So I scale back up to 100 follower requests for the next call. It goes through. Next 100, fails. I scale down again …

So, that sounds like a cache is being primed on the first request. If this is a consistent pattern, you should be able to issue a request for one record, followed by a request for 100 and get it through most of the time (E.g. unless you run into a garbage collection cycle). If you code against this assumption, you should be able to utilise 50% of the theoretical limit of a 100/hour.

Is there something I'm missing?

mittermayr · on Nov 5, 2012

sometimes, it's follows a clear pattern. request 100, fail. request same 100, success. repeat. --- that's likely because they load it in the cache once they have the results. it halves my success rate, basically.

but sometimes it's just random (depending on twitter overall traffic I'd assume).

plus, some records are faulty and can not be fetched. so this causes other issues as well and drops the api call.

hboon · on Nov 5, 2012

A couple of items comes to mind:

1. Can you modify your system to return results based on a subset of their followers? If this is still of value to the user, it looks like the way to go, providing results based on a subset followed by another set of "final" results based on the full follower graph.

2. If you have enough celebrity-scale accounts using your service, are you able to share their followers details and cut down on time used to pull them?

3. On the business side, the service sounds dangerously like something that would be killed by Twitter if it becomes successful, by some definition of successful because you are pulling out their follower graphs. Look at Tumblr and Instagram. While it is good to milk it while you can and build it with ambition, it is also wise to look further and prepare for the day if it gets shut down and revenue goes to 0. If you have nothing to lose, go ahead, but be aware.

mittermayr · on Nov 5, 2012

Great, great points. Here are my responses: 1) Yes, that seems to be the only way out right now. But my core features (most popular followers / most valuable followers) depend on a full set and will not be accurate. Growth data works fine though. The main reason why my service got traction were the most popular followers / most valuable followers metrics though. So that sucks. But seems to be my only resort right now.

2) I can get IDs faster, and then theoretically use my own DB to see if I need details for that ID (i.e. follower details) from Twitter or have them stored. Problem is: I am duplicating Twitter's database, also, I had this before and the database grew to immense size and kept crashing all the time despite efforts to avoid it. Also: Celebrities have a large distribution of followers, so unlikely to save much time by seeing who's details I already got elsewhere.

3. Yes, although Twitter announced their quadrants recently, of services that will be supported by them, one of them is social analytics. This is what I do. So that should be good. But you have a point, I can't touch any money for up to 1 year in case Twitter shuts it down, I want to pay back the remaining unused time to my users. So it's frozen money.

JoachimSchipper · on Nov 5, 2012

Focusing on 1): if "most popular follower" means "follower with the most followers", you could try one of three strategies.

First, just check which of the the top 100 (1000) most-followed users follow the celebrity - this will likely eliminate the heaviest hitters immediately.

For accounts with normal amount of followers, do as you already do.

For the in-betweeners,

    Pick your celebrity.
    For a random 1% (.1%) of his/her followers:
        Retrieve list of people followed by this follower
        For each user followed by a follower of the celebrity:
            Increase number of people following this user by 1
    For each user in the top 1% (10%, .1%) of the above table of users:
        Retrieve number of followers
    Report user with highest number of followers from the above

This is, of course, based on the idea that often-followed users who follow a celebrity will also have many followers among the followers of the celebrity. Results will become more accurate as you poll more followers, of course.

(I don't use Twitter or their API, and the above may be completely wrong.)

mittermayr · on Nov 5, 2012

see, this is what I came for. I believe there's room for optimization within the limits I have to adhere to right now. your approach seems like a stab at it and I'll think this through again in a bit if it makes sense. but you're hitting the nail on the head in terms of where my problem is, i can't change what twitter does, i can display partial results but it's suboptimal in many ways, so how can i make these partial results the best possible quality? this is where you answer seems to make sense - thanks. i'll think about this one for sure.

ndemoor · on Nov 5, 2012

Digging further in 3), if your app is proven to be in the 'good' quadrant, have you ever considered contacting Twitter directly and requesting a higher API limit rate.

I can imagine they are ok with that if your app provides value to their ecosystem.

mittermayr · on Nov 5, 2012

well i constantly mention the issue through their bug reports and discussions, they acknowledge it and promised to talk to the team about re-evaluating the way things are handled now, but I'd assume this means it's not a priority.

i tried signing up through one of their partner thing forms but as expected, no response.

adrinavarro · on Nov 5, 2012

With the right infrastructure and tech, the "mirror" DB shouldn't crash. And also keep in mind how many people follow a lot of celebrities… common point!

mittermayr · on Nov 5, 2012

i agree, it totally was my failure to comprehend database architecture to a point where I was able to keep it running reliably.

but as said in other comments, i'm not sure twitter is happy with someone duplicating their entire user database over time.

PanMan · on Nov 5, 2012

First, you could try Datasift or Gnip, who both sell twitter data, and thus have no API limit. Not sure if you can afford it, as it does have a cost.

Second, maybe you could use the streaming API: you could get part of the data that way, and have more credits. If users follow back, you could use the sitestream, although it's quite different to work with then the REST API.

Thirdly, if I read correct, if Alice and Bob are both your clients, and Fred follows both, you now collect Fred's data twice, right? I would put a cache in between that. Riak, cluster of redis, or even S3 or DynamoDB. If I can help more, send me an email (in profile)

Lastly, if you have twitter investors in your userbase, ask for an intro to talk to twitter. They see the value of your service.

mittermayr · on Nov 5, 2012

Hey! Datasift+Gnip both seem to supply conversational data, which I currently don't track. My focus is on user-data, which both don't seem to provide. And yes, they're expensive, wow.

The streaming API idea is a good one to get the most out of all of Twitter's data sources... someone else suggested this as well. Seems like it's worth a try.

Your Alice/Bob/Fred assumption is correct. I had a cache of sorts, through a large mysql table which went way over a couple of gigabytes and kept crashing all the time and restoring took half a day.

The twitter investors haven't had a chance to see any of the service since they are still waiting for results :) tough one to ask for intros ;)

PanMan · on Nov 5, 2012

The first thing I would try is rebuilding the cache. Since you only need to do key-value lookups, there are many (easy) approaches, that can scale better than MySQL. You could do JSON files on a filesystem (with some nesting, as you don't want to put 20 million files in 1 dir). Or Redis: 20 million x 1 KB is 20 GB. You will need more with overhead, but a few machines would work. Or Riak (we went > 1 billion items with Riak). But even MySQL should be able to handle this: We had over 100 million records in MySQL (on SSD's) before switching to something else.

That seems to be the quickest win to reduce the number of requests, but it will depend on the overlap of your usersets how much this will help.

OoTheNigerian · on Nov 5, 2012

Ironically, this made me sign up, thereby adding to your load. Sorry.

I'm sure before the end of today someone here would help you out. Better still it might help trying to contact a few peeps directly.

mittermayr · on Nov 5, 2012

ha, thanks! that wasn't my intention, but well, it helps checking out if it can withstand the load.

i feel sort of bad asking for advice here, since many have better things to do and i am the one making money with this, but i've reached a point where I don't know how to continue. and this is very odd.

adrinavarro · on Nov 5, 2012

I'm surprised nobody has mentioned Redis yet.

I'd build it with two stages. First, a cache that holds user information (sort of replicating profiles), that is, follower count and etc. This would be shared among all users, and can go into a dedicated Redis instance (and why not, also replicated to a MySQL-InnoDB for convenience).

Then, the "graph" DB (follower list), that I'd put into Redis. With some scripting and Redis magic, you can keep automatically sorted (server-side) users by their follower count. You'll just need a lot of RAM (get a dedicated server, look at ovh or others, cloud is usually more expensive and less reliable when it comes to RAM).

You can collect profile information before they go on to the 1.1 (which forces auth), to populate the global DB. Then, you'd only have to fetch users' follower IDs (using the 1.1: followers/ids), which I believe is way more reliable (and progressively, pooling queries, populate the profile database in batches of 500 or 250 users, using followers lists with user details).

This means that data can be queried dynamically without killing the server (or the servers, there should be more than one), therefore allowing for "partial results" (1M followers -> info about the first 10,000 just after signing up, for example).

mittermayr · on Nov 5, 2012

you do have a sensible approach that you suggest here. originally, without using redis, i wanted to use mysql to cache all user data. and insert/refresh details in it over time. the table quickly grew to 20M records (with meta-data information taking at least 1K of data per record if not more) , the database grew to multiple gigabytes. twitter has up 140M accounts or more now, so i'd need headroom here, although i'd likely not touch a large amount of twitter users.

also the system started making sense after a while when I had user ids that I had already cached (you are correct, the IDs I get through followers/ids which is much more well-though out function in terms of limits).

but then mysql constantly crashed and reparing/backing up a multiple-gigabyte-table exceeded my technical abilities and i gave up. so I split up everything into per-user-sqlite databases that I backup to S3. i lose the ability to access a cache of users though since I can't query other user's sqlite databases in a sane way to see if they have meta data for that user id.

major problem is that I believe twitter will eventually shut me down if I duplicate/replicate their user database (and I constantly need to refresh since user data will eventually be outdated).

adrinavarro · on Nov 5, 2012

They don't need to know you replicate! ;)

By the way, which fields (nickname, avatar, follower count, following count…?) are you storing for follower representation?

mittermayr · on Nov 5, 2012

well, they'd find out eventually, that's my worry. it's tough to speculate on something like this.

this is the culprit call: https://dev.twitter.com/docs/api/1.1/get/users/lookup

i store anything that might provide sensible statistics later on from that response (so no profile colors or photos)

adrinavarro · on Nov 5, 2012

Well, it's intended to work with followers/ids and other calls. Twitter might go against you, but that would be disregarding your calls… If they pull the plug it will be because of features, not because of how you use the API :(

ivanhoe · on Nov 5, 2012

you could easily apply sharding to that DB table and then scale horizontally as much as you like (just think big enough, so that you don't have to re-shard too soon..)

KirinDave · on Nov 5, 2012

"I'm surprised nobody has mentioned Redis yet."

Because it's not a very good fit for this job?

adrinavarro · on Nov 5, 2012

Why wouldn't it be?

bonzoesc · on Nov 5, 2012

The problem is Twitter API limits and processing bandwidth, and Redis isn't a +1 Wand of Scaling Magic.

adrinavarro · on Nov 5, 2012

Redis does help with the big database crashing, read the post.

KirinDave · on Nov 9, 2012

Table too big? Redis is not a solution.

laxk · on Nov 5, 2012

I am working on the project where I also have to scrape a lot of information about a user(posts/tweets/statuses, photos, friends, etc) on the social networks(g+, twitter, facebook, youtube, etc) All these limitations are really annoying. Errors are expected almost on every API call (timeout, 5xx errors, host not found, new fields on data structure, etc). The scraper should be smart enough to support all these caveats.

What I do not understand is why these major players do not want to introduce API for money? I am ready to pay for it and I know a lot of people who also are ready to pay for it. But please, remove these limitations and make your APIs more stable.

mittermayr · on Nov 5, 2012

i agree, a pricing model would be very interesting. and probably a good way for twitter to monetize their platform.

mittermayr · on Nov 5, 2012

I have quite a few celebrities signed up, without advertising to them, including certain inventors of Twitter itself, TV personalities, major investors and others.

And all of them (1M+ followers) have to wait up to 60 days or more to get past the login page due to a bug and limits of Twitter.

I feel there's a smart way to work around this and I have always managed to do so in the past, but now, I've hit my technical limits and need help.

I am willing to split upcoming PRO account payments 50/50 with anyone able to help me code moving forward / solve this issue.

huhtenberg · on Nov 5, 2012

For this sort of user demographics your pricing appears to be off by an order of magnitude. Just sayin' :)

mittermayr · on Nov 5, 2012

i know, but as said, it started as a funny experiment, became a toy and then I felt bad charging more for things i can not influence. It'll definitely be able to charge 19.99 a month up to 99.00 a month for corporations, there's a lot of features I can add - but right now, it'd put me under even more pressure to charge that much. it's a messed up weird situation.

a3camero · on Nov 5, 2012

Considering increasing the highest price level. There are people who build businesses around Twitter. They will pay more than $1200/yr for something that helps them make more money than that.

mittermayr · on Nov 5, 2012

yeah well, i instantly would and I have the features/service to back it up with quality data, but as long as I can't deliver any sort of service to larger accounts (1M+ followers), it doesn't make sense to charge like that. long term, no problem. right now, twitter limits me too much.

cjbenedikt · on Nov 5, 2012

why don't you discuss this with some of the social data mining firms. they may have encountered similar problems

mittermayr · on Nov 5, 2012

yeah, good point for sure. problem is that I am living in Europe right now (I miss the US), and it used to be a lot easier to just meet up with someone knowledgable from any industry back in the US. Here, it's hard to do so. And cold-emailing companies in that field might come off as me trying to drain their competitive advantage? No? I'd be curious who might be able to help out. But calling up Klout would likely not get me a call back at this stage.

and just to add: I think my particular problem is, and that's sort of the selling point, my analysis and reports is not just growth data (which is easy), but my calculations require all of your follower details to work correctly (to provide correct results). Twitter is sending me random follower details, so having a partial set means little reliability (for up to 60 days).

EwanToo · on Nov 5, 2012

I don't know where in Europe you're currently based, but Datasift are in the UK and seem to be one of the hottest "social media analytics" companies out there. Their staff seem fairly active on twitter itself, maybe give them a try [1]

1 - http://datasift.com/whoweare/team

pja · on Nov 5, 2012

Yeah, I was about to suggest Datasift as well. IIRC I had a chat to some of them at the last SiliconMilkRoundabout & they seemed like they knew their stuff.

It does occur to me that the OP is effectively trying to replicate large chunks of the twitter datastore & that's going to be very difficult to manage! It's not like twitter themselves were particularly reliably to start with after all.

mittermayr · on Nov 5, 2012

yeah, their data records are sometimes faulty and I need to scale down (even if it's not a time out) to 1 record per request to find the faulty in one hundred accounts. so that sucks.

and regarding replicating, originally that's what I did. I had up to 20M of Twitter's 140M records cached almost - but that probably wasn't cool with them on the long run and i was unable to maintain a database with one table having multiple gigabytes of data.

mittermayr · on Nov 5, 2012

checking them out right now, thanks for the hint!

ig1 · on Nov 5, 2012

Also Peerindex.

cjbenedikt · on Nov 5, 2012

there is also an excellent start-up based in Portland, OR called luckysort. Good people, only the time difference from Europe sucks ;-)

dtsingletary · on Nov 11, 2012

I'm happy to answer questions in regards to Klout as far as I can. We utilize GNIP and Datasift both for different situations, but we're working on a different side of data than you are, it sounds. Feel free to email me at api@klout.com.

mittermayr · on Nov 5, 2012

Also: I've helped Twitter fix two other bugs that I reported, but on this one, it's a grey area (they don't consider it a bug, but a safety measure in terms of traffic time out, but still charge me API call credits). so yeah.

arcatek · on Nov 5, 2012

Well, they are right : if someone manage to find a request which regularly fail with timeout, he could use it to spam Twitter's servers indefinitely.

julien_c · on Nov 5, 2012

Which version of the API are you using?

mittermayr · on Nov 5, 2012

i've switched over to the new version now, except for a few calls.

OoTheNigerian · on Nov 5, 2012

Quick one.

I am having inaccurate results. I can confirm I have a verified account and more than 0 retweets weekly.

I am guessing this inaccuracy is as a result of the challenge you are having with indexing.

mittermayr · on Nov 5, 2012

the verified account will be correct tomorrow. this is a timing bug that some users encounter where one procedure finishes faster than another (it's rendered before all data is there, somehow). the retweets is funky because Twitter is migrating to a new API version which I've adopted mostly. but the retweets feature will be gone in the new api feature, they don't offer that data anymore. so it might not be reliable right now.

sycren · on Nov 6, 2012

Could you not think of this as a networking problem. Say you have 10 users with more than 1 million followers and there are 100 million twitter users, what is the probability that some of your other users are following them?

Perhaps for some of the accounts that do not use much credit to search for followers, you could also search for those who it follows. Then on your backend, you can see if this subset exists within a previous list of followers of a big user.

countessa · on Nov 5, 2012

don't know the twitter api very well, so....maybe instead of scaling back and doggedly trying to get the current record set, you simply mark the 1st 1000 as having an error somewhere, proceed to the 2nd 1000....come back to the ones giving you problems later. I know eventually you have to come back and get the broken ones, but if you manage to process 80% of someone's millions of followers, then you can start digging into the other 20% a bit at a time and at least provide some value for your customers in the meantime......

I haven't actually checked out your service because i'm not really a twitter person so maybe you do this already, but could you provide statistics based on the amount of data you've got so far? (i know they will be inaccurate), but you can sort of give them as a "moving target", based on x number of your followers type thing. That way the user gets a little bit a value right away.

mittermayr · on Nov 5, 2012

i also tried this and it helped a bit, for every 100 user details request I pull a random set of 100 ids from different places in a user's follower list to minimize getting stuck. it helped a bit. but main problem are still the time-outs.

irfan · on Nov 5, 2012

I had a similar problem when using SSL connections. I'm not sure about your data by in my case the data was pretty much public data and there was no harm in using simple HTTP connection. This significantly improved the speed and no more very frequent timeouts.

Also try enabling/disabling gzip compression for API calls.

mittermayr · on Nov 5, 2012

the bottleneck is twitter's api limits, data-wise and http connection wise I have headroom, lots of.

simple http connections to parse/spider the follower records from public pages is a no-go since twitter blocks the IP then, and scaling this out will eventually not end in a good way.

irfan · on Nov 5, 2012

I was talking about simple http connection for API instead of https for API

kernel_sanders · on Nov 5, 2012

This may be against twitter's ToS, but can you create a bunch of accounts with different API keys to access the api concurrently and get around the rate limits?

I'm sure twitter must account for this, but how do they? You don't need to provide much information to get an API key.

mittermayr · on Nov 5, 2012

yeah no, they're explicitly not allowing this and as soon as this thing grew out of weekend-project scale, i have to adhere to any rules they're pushing out. too high a risk to be shut down and locked out completely.

YousefED · on Nov 5, 2012

What about re-using keys of your other registered users to handle the API requests of users with a lot of followers in parallel?

dhruvbird · on Nov 5, 2012

Interesting practical problem. I've tried mentioning a few solutions you might try here: http://dhruvbird.blogspot.com/2012/11/twitter-api-y-u-no-res...

mittermayr · on Nov 5, 2012

i responded on your blog. i am using solution 3 already, it helps a bit but doesn't help with the core problem.

dhruvbird · on Nov 5, 2012

@mittermayr I responded to your comment too :) There isn't much you can do about the latency at their end except try and get the most out of your API quota by making sure you don't issue a call when you know you will time out (discussed in the reply).

I would be interested in knowing other solutions that work for you.

shazow · on Nov 5, 2012

When I worked on SocialGrapple (similar featureset to Fruji), here's what I did:

* Technically: I had a well-optimized PostgreSQL database which had a few parts: Follower graph schema with a revision id (which node is following which), it got cleaned out every N revisions; a delta schema which took the last two revision ids and diff'd them; an aggregate schema which did a bunch of queries and summarized the results every T interval; metadata schema which stored cached information about each node (updated every time that user object was fetched).

I'm pretty obsessive about query and schema optimization, and I had a comprehensive benchmarking suite which helped me consistently improve my performance with bulk insertion as well as aggregate queries as well as user displayed queries. Each job was broken down into small efficient pieces that were executed in dependency order by my custom task scheduler, Turnip (open source at https://github.com/shazow/turnip).

I don't remember the exact numbers but I was approaching 100M rows on a single 512mb Linode.

Redis would have worked too but I would have needed much more RAM or more moving pieces to move things in and out of RAM for processing. None of my queries were slow enough to worry about this.

* Pricing: As others mentioned, higher prices make it easier to scale. I charged based on the size of the account and how many accounts you wanted to monitor (basically proxies to how many API calls you'll cost me). A small account cost something like $6/mo, 5 medium accounts $14/mo, 25 bigger accounts at $50/mo, 100 large accounts (1M+ followers) at $125/mo. I had modest revenue but I can't say my pricing scheme was perfect. I was actively messing with it towards the end.

* I had a legacy Twitter whitelisted account which gave me 10K api hits per hour. This helped me a lot. At the same time, I was careful to not become too dependent on that account in case I lost it. I was well within the boundary of normal user limits the entire time and only really used my whitelisted account to experiment or backfill new data. I made sure to always make the most efficient API calls to avoid wasting them. I too had issues with timeouts but it more came in waves when Twitter was having infrastructure issues rather than consistently. It wouldn't surprise me if this has gotten worse.

Also, I used and stitched all three Twitter APIs: REST, Search, and Streaming. It was painful.

* Diversify. Twitter is becoming an increasingly developer-unfriendly platform to build on, and your business should not be dependent on it. I added Facebook support to SocialGrapple, and I was going to add Google+ support too. Today, I'd also add app.net support. That said, the majority of my business was still Twitter, and that sucked. This was a big factor in my decision to sell out and shut it down—I didn't see the developer ecosystem as a place where you can have a sustainable business, let alone a thriving one.

I actually had several conversations/negotiations with Twitter about how they'd interpret their terms of service wrt my product. It helped to know people at the company to get a favourable ruling, but I still felt like it could be reversed—err, "provided with guidance" at any moment.

For what it's worth, I found it more rewarding to build an analytics product that was super useful for a smaller group of people than a little helpful for a lot of people (I'd say tweepsect.com is the latter). Think about where on the spectrum you want to be as this makes decisions, like pricing, easier.

Best of luck! Shoot me an email (in my profile) if you'd like more details.

mittermayr · on Nov 5, 2012

andrey, just wanted to say thanks for this comprehensive answer. it seems like you speak from a lot of experience and while it's scary to read through what you have had to do to survive, it all makes a lot of sense and helps me in picking my battles a bit.

again, thanks for this, i am going through it later again, just responding to over 60 other e-mails with help and support, just fascinating to see this. if one thing, we can hopefully make it clear that betting on someone's platform will provide tremendous opportunity but also introduce a considerable uncertainty if it takes off.

michaelmior · on Nov 5, 2012

Not sure if this is against Twitter's ToS, but since follower data is public, can you make use of unused API requests from accounts with fewer followers?

mittermayr · on Nov 5, 2012

Nah, unfortunately not. All calls originate from my company account or on behalf of a user. So for using other people's account tokens to scan, wouldn't work either as they are only authorized to scan their followers.

mittermayr · on Nov 5, 2012

just wanted to say thanks real quick for all this help. i've received over 30 e-mails already from random people helping me out with advice. i know it's often looked down upon in comments sections to thank the community since it provides no value, but i don't give a shit right now, i just need to say thanks, so much everyone :)

pbrumm · on Nov 5, 2012

If one of your requests for followers fails at 100, does it fail for a different user as well? Instead of backing down on a user you could try delaying it for a time period and switch to another user.

also keeping track of when the failures occur is important, and potentially valuable information.

mittermayr · on Nov 5, 2012

i run up to 16 crawlers at the same time. twitter limits (in terms of rate limits) me only for each user and the calls i issue through that user's credentials. so i can go parallel easy. not much of a bottleneck on my side.

pbrumm · on Nov 5, 2012

Does it fail for an individual user or is it all users at that time period. instead of reducing the number of followers, you may just need to wait 5 - 10 minutes and ask for that user again with full 100.

mittermayr · on Nov 5, 2012

pretty much what I do, my local sqlite storage per user slows things down a bit (but that's good since Twitter is even slower). so between requests, I often give Twitter enough time to finish the request for the previous 100 and store them in cache, so that when I re-request the same 100 (i always try twice), there are often there. but not always. it's a mix of overall twitter load, plus where/how deep down these 100 followers are stored, plus if a follower record is damaged (happens frequently), plus other time-out factors. that's what I am complaining about, it's so hard to work around this.

mosselman · on Nov 5, 2012

Maybe it is a good idea to ease the load on the API by scraping the HTML pages of twitter. Follower count is right there.

This solution is only partially ugly if you are only after the follower count of a username.

9mit3t2m9h9a · on Nov 5, 2012

Actually, loading follower list via HTML page of twitter could also be a viable idea - more viable than playing by the rules, at least.

mittermayr · on Nov 5, 2012

doesn't work. tried this briefly at the beginning but they return error pages after a certain amount of calls per IP. and no, splitting up to many IPs isn't feasible. unfortunately.

nodemaker · on Nov 5, 2012

if exception.message=="Capacity Error": sleep(x) x=x*2 continue

mittermayr · on Nov 5, 2012

well, I do back off for a bit once I hit rate limits or the request times out. but regardless, I lost an API call credit.

bonzoesc · on Nov 5, 2012

You don't lose a credit when you get rate limited.

mittermayr · on Nov 5, 2012

yeah, but i don't even go to rate limit (i check before every call). i stop 2 or 3 calls shy of it, so that doesn't limit me. but if I send an API request and that request times out and does not return data, then it still cost me a precious call (and one closer to rate limit).

batgaijin · on Nov 5, 2012

Switch to dedicated and put saved money into ram.

mittermayr · on Nov 5, 2012

i'm on dedicated. 16gb of ram. not the issues. the bottleneck is on twitter's side.