About the data:
- DB Size: 543 million rows
- Data Size: 173GB (uncompressed)
- Stored in mysql
- 200+ Million tweets from 13+ Million users
- Collected in 1 week
- Operation costs: 100+ dollars
- Rackspace Cloud - 1 CentOS 8GB Ram server
- Java, memcache, mysql and perl for core processing
- js, php for analytics & visualization
* Download the data at this url
http://www.archive.org/details/2011-06-calufa-twitter-sql
I was a part of the webecology project (and 140kit.com, both of which gave large twitter datasets to researchers.