A few years ago I was inspired by this artice: http://viniciusvacanti.com/2010/11/01/6-things-you-need-to-learn-to-build-your-own-prototype/. Now I am able to build my own Rails apps and run them in production. I am now wondering what it takes to launch and successfully run a website with 5-10K users? All the articles I have found doing a web search were vary basic.
The most difficult thing is going to be getting to 10K active users :)
These days RAM is cheap and SSD storage is also widely available. For a very long time, one of my side projects with 50K users was hosted in a EC2 small instance. With that out of the way, here are a few things you will need to take care of:
* Security (especially passwords) - Rails should take care of most of this for you, but you should ensure that you patch vulnerabilities when they are discovered. Also, stuff like having only key-based login to your servers etc.
* Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.
* One click deployment - Use Capistrano or Fabric to automate your deployments.
* A good feedback/support system - this could even be email to begin with (depending on the volume you expect), but it should be accessible.
* Unit tests - as your app grows in complexity, you will never be able to test all the features manually. I'm not a big fan of test driven development, but really, start writing unit tests as soon as you have validated your product idea.
* Alerts, monitoring and handling downtime - Downtimes are inevitable. Your host or DNS could go down, you might run out of disk space, etc. Use something like Pingdom to alert you of such failures.
* Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
> Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.
The part about testing your backups is huge. I can't count how many projects I've been on that had problems where we needed to restore and we looked only to find any number of problems. Oh, backups actually stopped last month when we ran out of space, oops the backups only backed up these 3 db's and not the one you want, things like that. I'd also stress the importance of off-site backups. If you're using AWS for everything and your account is compromised can they delete your backups (assuming they have full, 100% unlimited, admin access to AWS)?
Which is also why if you're using stuff like AWS, Heroku, or any other third party provider (hosted Mongo, hosted ElasticSearch, Stripe, NewRelic, etc.) it's very important to ensure those passwords are secured and only the people absolutely necessary have access. Also, when offered, two-factor authentication should always be used.
> * Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
Depending on the service you're building, you can log too much. Consider the privacy and security implications of the existence of those logs; anything you log can be subpoenaed, but logs that don't exist cannot be.
Consider anonymizing your logs from day 1, and only turning on non-anonymous logging upon a report from a user. Alternatively, give users a "report a problem" button, and save their last N minutes of otherwise-ephemeral logs only when they hit that button.
You absolutely want to log enough to help you debug the service, but do you really need to archive old logs, or should you delete them entirely?
> * Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
+1 You can't log too much. The user who claims an important email never arrived - does your system say it was sent? This bug 3 users have reported yet no one can reproduce - what were they doing at the time and what else was going on?
No, I'm not at that stage yet (of effectively being able to rewind application state in the log files to see what was going on), but for debugging issues in production it's exceedingly useful.
Getting loads of core services out into third parties is really wonderful for logging. E.g. if email sending happens in Mandrill, then you never need to write decent logging calls for that and you have a reliable source of truth!
This brings up a tangential problem I've yet to solve: how do you warn that something didn't happen when it should?
E.g. you have a script that does backups. You log the script's output, but one day something fails and the script is no longer executed.
Some form of dead man's handle is needed; the only way I can think of is to set up a monitoring service to check your log store for these entries every X hours.
I've had this same issue over and over again in my career.
I've toyed with the idea of writing a daily "sanity checker" in crontab that verifies various concepts of system health.
Examples: Did the latest batch of data transfer to S3? Did we delete old customer accounts today? Did we get any signups (because if not, something may be broken, but not triggering an exception report etc)? Did we send out daily report emails?
But I could see this easily becoming a pointless exercise, and I doubt I'd have the time to keep the sanity checker updated with the latest requirements. In fact, the sanity checker would probably become insane pretty quickly.
Perhaps the platform itself should do this for you, in some way. Idea: while coding, indicate that this procedure should be running periodically, ie:
and then the system would log every time it occurs, with a generic task that would run periodically and scan for things that should have occurred, but haven't in some while.
I have always heard the opposite, that too much logging is as bad as no logging. I see the point of having the logs to be able to find out what happened, but what happens when there' s so much logging that the information needed is just buried into huge amount of noise?
This was true before Splunk. If you logged too much, your logs could start to outstrip the assumptions behind your log rotations and cause trouble. Now the common wisdom is to just log everything so you can Splunk it later if you have a problem. Verbose logging + Splunk have made production incident identification so much easier than it used to be.
Splunk DOES charge by the GB, but it's not very expensive in the long run.
My favorite systems to work with are the ones with overly verbose logs, where the overly verbose parts were clearly tagged and could be filtered out. Generally, we would never look at the verbose lines, and even when we did, we would normally have some idea what we were looking for, and be able to filter somewhat for it.
I'd actually argue it is possible to log too much if you aren't using good tools to make your logs easily searchable. Which is why you should use such tools if at all possible. Otherwise the logs can become so big that finding the entries for that bug or that email becomes pretty much impossible. This is also why it's important to take a few minutes and think about what you're logging and how. Things like request and user IDs can be invaluable. My test is usually "if I have nothing but this error message/log entry, do I have enough to begin investigating?". This is hard to get right until a bug or problem occurs and you actually need to use the logs but investing a bit of time into it can be a life saver.
> Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
How do most people manage activity logs? Currently what we have set up is the user id (if the user is logged in), IP address, URL they hit, user agent, and timestamp are all inserted into an activity logs table. For one particular site with an API that's being polled the size of the DB grew pretty large.
Off-machine logging. There are commercial services (we're using Papertrail but there are tons of them), roll-your-own solutions (Elasticsearch-Logstash-Kibana), and simple solutions (syslog).
For an easy and simple solution, spin up a second instance and send logs to it via rsyslog over a private network interface. Most mature frameworks provide a method to send logs over syslog. It's UDP and very lightweight. Another plus: if you are compromised, you have another server with your logs and that server isn't running your vulnerable app.
We have a high logging load (we log every request) due largely to IRS requirements. I've been really happy with it over the past 6 months but something that cannot be overstated is that you'll really need to become familiar with each one of the technologies used as each requires it's own setup and configuration. Not being familiar with any of them, it took me a solid 3 days to get to where the whole thing was usable and performant. Troubleshooting it is a breeze, and the whole system scales really easily, but a lot of that time was invested up front.
Logging every hit will always require a lot of space. But there are some tricks you can use to "compress" it: hash long strings like the URL and user agent and store the hash as binary instead of a string. A 100+ byte string can compress to just 16 or 32 bytes depending the hash your pick. Store the hash lookup in a separate table.
I often found myself falling into the "I'm not using PHP, so I don't have to worry about any security holes" trap. CSRF is something you really need to watch out for if you are constructing forms manually!
No bashing intended. I started off writing bad applications in plain PHP that were full of security holes, and moved on to writing better applications in Python with the help of frameworks - at the time I didn't realise there were helper frameworks for PHP too and thought Python was infallible.
Nailed it with first sentence: it takes 10k users to run an app with 10k users! And that will be, by far, the hardest part.
The technical stuff will be pretty much trivial. Any decently constructed app on any decent framework (Rails, etc) on any decent host (AWS, DO) would be able to handle a 10k user app (probably maxing out at 1% online at same time) without breaking a sweat.
And you will have plenty of time to build out the tech because it will probably take you many months to get to even a few thousand users (depending on what kind of app it is, of course).
That's a good overview. I would also not forget about the type of application. Like you said, resources are pretty cheap these days, maybe leveraging Amazon cloud and integrating with other applications could life the heavy weight that is normally required. If someone can get to 10k active users, they should also note on possible scaling issues down the line.
also i think one should take care of Reliability into account as well, if you have only one app server it will fail - maybe eventually - and you'll find yourself trying to ssh into the server at 2am, have at least a few app servers load balanced, have 2 db servers replicating with auto failover etc..
not sure if this goes without saying, but a huge one you missed is some sort of VCS. Use what you know and be sure to commit early and often. It can seem like a hassle to begin with, but once it saves you once you'll be very thankful your using it.
good list.
I just want to add more voice to the unit testing part. It's natural to feel that testing will slow down the development process , but once you got some unit tests for the core parts of the app, it feels awesome and the developers will have a lot of confidence in the product.
TDD speeds up development, especially when adding features. Without full test coverage, you're relying upon manual regression testing to catch bugs. Once you have actual users, that could be catastrophic to deploy broken code without even realizing it.
Ironicially it's a completly different story on the consumer market. I bought my two 4GB sticks about 2 years ago and now they cost twice as much. 4GB can cost you $40 which is not cheap at all.
That's totally negligible on the cost of the rest of the machine. Imagine $160 will get you 16G!! That's an absolutely enormous amount of memory, most power users would be more than satisfied with that.
Not all that long ago that amount of RAM would cost more than a brand new mid sized car.
Despite what the smart engineers of HN are going to prattle on about running a web app, it's not about systems infrastructure. It's about people. 5-10K user records in the DB, relatively speaking, is a small list and can be kept on a single pretty minimal server instance.
10K user records is not the issue. It's dealing with the humans who use the app on a day to day basis.
Typically getting only a small fraction of your user base to be active in the app is pretty challenging - if you can acquire them in the first place.
That said, having even a few hundred active users can tip the scales in terms of what is manageable, depending on what the app does and whether they're paying money or not. Customer support can be a full-time job or worse. In the early days your users will discover every bug and problem imaginable.
Biggest mistake I ever made was scaling up an active user base on a free product without a revenue model. Twice I managed to hit a sweet spot in acquiring active users but because I couldn't leverage the scale to achieve anything other than more work for myself, I burned out and it collapsed very quickly. If you make more money as you grow, you can afford to invest in delegating responsibilities or at least justify it. Otherwise you've got a very stressful hobby on your hands..
Quick add-on edit:
If you're launching a web app for the first time, the biggest takeaway you should get from the comments on this thread is anticipate that customer support will be a major challenge.
One of the best ways to prevent a flood of CS inquiries is aggressive logging and alerts to squash bugs or outages before they inconvenience too many users. Lots of great comments in here cover that point, so take notes.
I have zero costs though, since it's serverless and all hosted on Google and Github's infrastructure.
For privacy reasons I have made the deliberate choice to not include any log reporting tools or use any tracking services that could possibly make development a whole lot easier.
Thankfully I get some support from a couple of other people that mostly help with small dev tweaks and supporting users, finding out what goes wrong and providing excellent feedback.
I can really recommend setting up a subreddit so that people can help themselves and others, and you can provide some feedback where needed.
All in all the load of user support has been quite easily manageable, though now with Trakt.TV's debacle with the new API there is definite pressure to fix things.
I completely agree with this comment in every point.
However I’ll add that it’s my experience that free customers are the worst, by far for customer support, and that as such rwhitman probably got burned disproportionately. A percentage of free customers demand straight up magic, and get loud in public if you don’t deliver. People who are paying, particularly paying significant amounts of money, expect their money to be well spent and results in proportion with what they pay. IE they tend to be reasonable.
Still - if you where not expecting support to be a major part of this experience - you should now.
I run Minimal Reader [1] (which has about 5k users) and I can say that running a web app like this takes a lot of time and energy keeping everything running smoothly.
My service has a lot of moving parts, all of which are distributed among a couple dozen different servers. Keeping the technical infrastructure running smoothly requires a lot of data visualization of server stats, database stats, web request stats, worker stats, user stats, etc. I have everything piped into a nice dashboard so we can see if there is anything odd happening at a glance. When things break (and they will) you need to know where to look first.
Having 5k users also requires time to help them with support issues. Users generate a lot of bug reports, questions, and suggestions. To keep paying users happy, I offer a 1-day response time on support issues, which requires me to spend quite a bit of time sending emails.
Then, of course, if you want to grow the app, you need to spend time marketing it. We could talk for hours about this.
The list goes on and on. Feel free to shoot me an email (email in my profile) if you want to talk specifics about anything.
Not overkill as servers are not equal and different apps require different infrastructure. The bulk of my servers are small worker servers that go out and fetch feeds. Parallel downloads, high network throughput, low CPU/memory.
Most of the servers are workers which fetch feeds. Parallel downloads across a number of low-level commodity servers. Additionally, there's a master and slave Postgres database, a small Cassandra cluster, 2 redis boxes, an elasticsearch cluster, front-end servers, caching servers, etc. Shoot me an email if you want to talk in more detail.
One server to serve an RSS reader to the 5k users, 23 others to host 23 different stats and monitoring applications to make sure that one server is working OK?
This looks like a really slick app - congratulations! I'm actually working on an RSS reader at the moment with a few similar philosophies (respecting privacy, lightweight design etc.). There's a lot of competition in the RSS reader space since the sad demise of Google Reader - how did you market the site and get your first users?
I agree, I haven't seen the "Wait in line for a free account" before. Can you describe this feature in more detail? For example, how long does a user have to wait, does it allow in more users as the size of the queue grows, etc.?
It is usually set to a fixed number of users per hour with a minimum wait time. Keeps the server happy when I get HN'd/Reddit'd/etc. (Does anybody get Slashdotted anymore?)
Thanks. I've signed up for a couple of services that do that. It seems to work out even though some people get a little annoyed that they have to wait.
It's relatively simple to create a server that can handle the traffic as long as your app is something simple. The 10k users you're talking about I'm assuming is monthly active users (means they visit your site once or more per month) I have a website (yes a "web app") with 120k active users per month. It's running just fine on a relatively small server. To get users usually you need to give them something to come back to because they feel like it's something that they can't do without, maybe something that only you can provide, and something that makes them feel good and causes them to keep coming back to you to get their dose. If your question is more about technical details, then I'd just suggest not to worry about that until you have to, ie. your server is crapping out right before your eyes, that's a good problem to have. However, one mistake that I have made is to give up eventually on something that I may not see the immediate value in just b/c it's bit too difficult to maintain. So having said that don't waste a second optimizing something that no one's gonna end up using in 3 months. Everything else is really just noise.
That's not a very meaningful metric on its own. 5K users per day? Per month?
How's the distribution of traffic? Do people use it spread out over the month or mainly within the last or first days of the month? Do they use it on work days or throughout the week? Are they from different time zones?
What do they do? Is there a lot of write activity or is it mainly read? Is the read stuff cacheable between users or is it highly individualised. etc. etc. etc.
Obviously this depends a lot on the kind of usage patterns your application provides, but in my experience you should be able to comfortably run a 30k _daily_ users (the total n. of registered users could be 10x-100x that, depends) web app on a single $200-300/mo AWS machine (with hourly payments, you can get a better rate by prepaying). Get a bunch of cores and plenty of memory and put both your web app and the DB server on the same box for ultimate simplicity. Postpone scaling efforts for as long as humanly possible, probably until PMF and accelerating growth.
With reasonably "low level" tooling such as Java/Clojure/Haskell/whatever and a properly configured Postgres instance you should be able to go quite far. You're very unlikely to be CPU-challenged in the web app (again, no idea what your web app is going to be doing, so it's just a guess), most of the memory and CPU will be consumed by your database server caching and running queries. You should be able to handle a good 500-1000 db transactions / sec without much hassle.
IMHO most of the challenge will be making something that 10k people will want to use daily, not actually being able to scale to that many users.
I run Plunker (http://plnkr.co) that gets up to 20k daily active users. Users hammer the preview server[1] all day long generating a significant number of requests (by default it refreshes at each keystroke, debounced by 500ms or so).
That server runs happily as a single servo on http://modulus.io with absolutely no need for intervention on my part.
The rest of the application has similar requirements. I have one micro-equivalent server running the front-end, one the api and one the thumbnail generation. In general, this requires no hand-holding by me.
If your site is not processing or memory-intensive it should be feasible to scale to 10k users with a single $5/month instance on DigitalOcean or an equivalent level server on Heroku or Modulus or GCE.
PlainSite (http://www.plainsite.org) has about 12K users per day on average. Most of them are just browsing for cases, not paying, but some are paying.
The main stress on the system is really determined by the complexity of the SQL queries on each page. I've spent a great deal of time optimizing them, and I know there are certain ones that need to be further optimized. I have the database (MySQL) on one server, the web server and documents on another, and static resources such as images on a third, which probably isn't even necessary. All three servers run Linux and the database server has 48GB of RAM. They're hardly new; you could buy all of this equipment today for under $1,000 total.
The biggest technical bottleneck is really RAM; the biggest expense for this kind of site is bandwidth.
I have http://ficwad.com/ sitting around, with Google Analytics telling me it gets daily users in the upper end of that range. It runs on the cheapest plan webfaction offers (and I'm making it even cheaper with some affiliate credit...). The only place where it's running into issues is email, which I had to write a little queue system to throttle the sending to keep it under the plan's daily limits while still making sure that the important messages go out first.
I could make it fancier and put it on pricier hosting if I bothered to monetize it in any way.
I have a little over 10k DAU for http://cronometer.com/ and it runs on 2 fairly small cloud instances + a cloud database. A single host can handle the load fine, but I keep 2 running to survive failures.
And it took me nearly 4 years to get that many users. We can’t all grow like facebook!
True, it started as a hobby, so it’s been fully bootstrapped and I’ve spent less than $2500 on actual marketing to date. Advertising in the Health / Diet space is super pricy, the ROI still doesn’t make sense. CPA >>>>>> LTV
Are you talking about 5-10k simultaneous users? It's very subjective depending on the usage profile of your site. The best thing is to pick a service like AWS/Heroku/Cloud Hosting Service that allows you to grow your offering. After you get to a point of profitability, you can look at greater efficiencies like setting up your own hosting hardware depending on your needs, if that makes sense. Our company has grown to a point where we are looking at implementing our own multi-datacenter cloud to save on hosting fees. But that's a good problem to have :D
In my experience running https://thisaintnews.com the hardest part is getting and keeping users. It would probably be easier if I did some marketing, but that's too much like work! Still, it's not doing bad for something that's been around for 6 years, and hasn't been worked on in a year.
The second problem is motivation, after a certain amount of time, it becomes far less fun and much more of a burden, at which point you have to decide if you'll power through, give up, or quit totally.
The rest is just a software/hardware problem, and easily dealt with when needed.
As for the load, it's not that busy, but not that quiet for what it is, (http://stats.thisaintnews.com) and it runs on a cheap server from http://www.kimsufi.com/uk/, has a Xeon(R) CPU E3-1225 V2 @ 3.20GHz, 16Gb ram and 2x 1tb hdd, unlimited bandwidth and 1gbps link. It only costs about £25/month iirc.
Okay -- "what it takes" is a pretty broad question, but I'll do my best to put some thoughts together. I run a few web applications, but the most popular one is http://sleepyti.me -- with about ~50-60k views per day.
- A reliable hosting environment. I currently have a Linode VPS (basic $20 package, with $5 monthly backups) that runs http://sleepyti.me, my personal web site, an IRC server, a Mumble server, and a bunch of other stuff -- it's not even close to being maxed out resource wise, even with all the constant traffic the site is getting. It's important to remember that consistent network connectivity is a really important aspect here: a 30-minute downtime during peak hours can easily lose a lot of users. I'd say Linode is great, and I'm very happy with their service, but I also host several Sinatra web applications on a Digital Ocean VPS that only costs $5 per month (although I do my own backups, rather than using their service). I've noticed zero load-related performance impacts. Clearly, though, there is a limit to how far that can scale.
- A production web server. This probably goes without saying, but a lot of webapp developers are used to just working on their own dev environment. For my apps, I use nginx (and thin, when necessary).
- Security. Make sure that you have the basics of application security covered in your app itself. OWASP produces some pretty great "cheat sheets" that can help out in this area. Furthermore, make sure that your server is updated frequently, using SSL correctly, etc. I work in information security -- please believe me when I say that getting hacked is not something you want to deal with when you're trying to grow.
Hope this helps, and good luck with launched your apps!
Since the article you point to is about programming then I'd assume your asking about the technical side of things rather that actually acquiring 5-10k users (which is very difficult). It's an interesting question, but as you have it phrased it's similar to "how long is a piece of string."
Firstly, the load of a web app is going to be dictated by what the app actually does.
Also 5k-10k users should be clarified as to whether you mean total users or concurrent users. Testing capacity can be actually tricky figuring out how the number of users equates to actual hits to your servers.
As an example, we have nearly 50k accounts but on average only a few hundred are using the service at the exact same time. I would guess that our app is fairly complex compared to the average app. We run 3 app servers, 1 DB master, 1 DB slave, and 2 cache servers. Our monthly hosting bill is around $1,200.
1. First off, don't think too much about what programming language/framework to use. For such a user base you can use mostly anything and it'll run smooth on a low spec server. Using Ruby and Rails for a website with 5-10k users (assuming they're active users) will definitely not cause problems or empty your CC.
2. Do not invest time and/or money in learning another programming language or framework until you are sure that for a specific component of your product, programming language X will perform at least 2 times better with 2 times less HW resources.
3. Stressing again on the app stack (I saw some really pushy comments on changing the programming language), it is rarely the bottleneck of a web app. You'll scale your storage stack way earlier and more often than the app stack.
4. Know your data. That's how you decide if it's better to use a RDBMS, document store, k/v store, graph database etc. Like I said before, you're going to scale your data before any other layer becomes a problem so choosing the right data storage solution is crucial. Don't be afraid to test various storage solutions. They usually have good -> great documentation and ruby tends to be a good friend to every technology. There's a gem for everything. :)
5. Scale proportionate to your business/product growth. You will have to scale at some point. But be careful to scale proportionate to your growth. For example, if the number of users will double, get the hardware that suffices that growth. Less HW resources will lead to a slower user experience thus user dissatisfaction. More HW resources than needed will increase your costs and the resources that are not needed will stay unused. Why waste money?!
These are my 2c. As your business gets bigger - I hope it does - other problems will occur. But usually these things will last up to 100k users.
Disclaimer: this is for a generic web app as you didn't give us any details. Depending on the app, some of my points might be inaccurate or invalid.
I think you're asking a capacity planning question. To the extent that you're asking a capacity planning question, for most apps in the Rails sweet spot (CRUD which takes data from users, munges it, and stores it in the database, perhaps interacting with a few external APIs and making occasional use of more advanced things like e.g. taking uploads and storing them in S3 prior to processing by queue workers), you're going to not even notice the day that your user count hits 10k.
If you back out the numbers, they go something like this: eight hour work day, worst hour has 25% of the user base actively logged in (we'll assume it is a very sticky app), 10 significant actions per hour implies 25k or so HTTP requests which actually hit Rails, which is less than 8 requests a second. You can, trivially, serve that off of a VPS with ~2 GB of RAM and still have enough capacity to tolerate spikes/growth.
Let's talk about the more interesting aspects of this question, which aren't mostly about capacity planning:
Monitoring: Depending on what you're doing, at some point between 0 users and 10k users, the app failing for long periods of time starts to seriously ruin peoples' days. Principally, yours. Depending on what you're doing, "long periods" can be anything from "hours" in the general case to "tens of minutes" for reasonably mission-critical B2B SaaS used in an office to "seconds" for something which could e.g. disable a customer's website if it is down (e.g. malfunctioning analytics software).
I run a business where 15 seconds of downtime means a suite of automated and semi-automated systems go into red alert mode and my phone starts blowing up. I don't do this because I love getting woken up at 4 AM in the morning, but because I hate checking my inbox at 9:30 AM in the morning and realizing that I've severely inconvenienced several hundred people.
You're going to want to build/borrow/buy sufficient reliability for whatever problem domain it is you're addressing. I wouldn't advise doing anything which requires Google-level ops skills for your first rodeo. (There is a lot to be said for making one's first business something like a WordPress plugin or ebook or whatnot where your site being down doesn't inconvenience existing customers. That way, unexpected technical issues or a SSL certificate expiring or hosting problems or what have you only cost you a fraction of a day's sales. Early on that is likely negligible. When an outage can both cost you new sales/signups and also be an emergency for 100% of your existing customer base, you have to seriously up your game with regards to reliability.)
Customer support:
Again, depending on exactly what you're doing, you will fail well in advance of your server failing on the road from 0 to 10k users. Immature apps tend to have worse support burdens than mature apps, for all the obvious reasons, and us geeks often make choices which pessimize for the ease of doing customer support.
My first business produced a tolerable rate of support requests, particularly as I got better about eliminating the things which were causing them, but I eventually burned out on it. I have a pretty good idea of what my second one would look like if it had 10k customers -- that would imply on the order of 500 tickets a day, 100+ of them requiring 20 minutes or more of remediation time. This would not be sustainable as a solo founder. (Then again, if that business had 10k customers, revenues would presumably be in the tens of millions, so I'd have some options at that point. There are many businesses which would not be able to support a dedicated CS team on only 10k customers, like e.g. many apps businesses, so you'd have to spend substantial brainsweat on making sure the per-customer support burden matched your unit economics.)
The biggest issue: selling 10,000 accounts of a SaaS app is really freaking hard.
Depends what they're doing. With a Linode + LEMP stack you can serve millions of users. I'd say MySQL is usually the bottleneck but if you have time to perfect your software, there's no telling how many users you could serve from a good VPS account. Don't get caught up in the hype. Cost of the hardware is not really prohibitive. Finding someone that knows what to buy and what to do with it, good luck!
I'm not sure how system intensive Rails app is, but we ran a Facebook game with 20K daily active users on a $8/mo Dreamhost, back in 2008, with a fantasy league and all. My point is, unless you're doing something extraordinary, I doubt you would need more than a VPS.
As others have mentioned, multiple of those users can be hosted on an EC2 small instance. I suggest you start there. When moving to production, a bigger challenge is security, both in terms of intrusion and data protection. Making sure you have good rollback feature built into your rollout regime, because things can be fatal with real users. If you're using something that's basic like Heroku or EC2, you can scale way beyond that user strength with a click of a button. Scaling up would be least of your worries, at least for a few weeks.
If you're unsure, go with Heroku. Once you understand your system use, you can very easily switch to AWS and reduce costs.
I can think of three things that you'll really really really want to be on top of while scaling up
1. your architecture must allow for vertical scaling. this means upgrading your hardware to beefier, stronger, faster machines with more CPU power and more memory. vertical scaling is often a very cost effective of improving performance.
2. your architecture must allow for horizontal scaling. this means being able to provision and deploy new instances of your application servers very easily, using an automated process. more servers running in parallel is a very effective way of handling increased load.
3. you must be able to monitor and protect your systems. https everywhere. highly secure passwords everywhere, and you should rotate your passwords on a regular basis. log everything and set up services to monitor your logs and notify you when weird/bad shit happens.
It mostly depends on what the users do. If the interaction is minimal and most pages can be rendered once and then served from cache you can run this on a laptop. If interaction is intense and everything has to be generated on the fly it could tax a mid range server. It all depends on the use case.
Well, I think good indexing is key. How do you have your indexes. If they are aligned with what is frequently queried then you should be in good shape. A database of 5-10K users isn't difficult these days for any major DBMS. The key lies in the other things attached to the users and items you query. That can run into the millions. Even that is not much, but depending on how many times you query a million row table, it could put a pinch on your optimization.
Good database mechanics is key. That is the most important thing in my opinion. That is really the whole point in rails when you are deciding relationships. The abstraction in Rails when deciding what should be the best model structure is the same thing as deciding what should be the best and most efficient table structure in your database.
The rave about MongoDB is that it (maybe not quote me on this) "cures" the need for the desire multi-dimensional database. However, even with MontoDB's ability to expand due to it's not needing a pre-defined structure and the ability to expand out dimensionally to a certain extent, PostgreSQL (claims anyways) is still more efficient if you correctly index your tables (think about how you will be querying) and create the correct relationships. Build out models. Allow flexibility.
Also, don't forget caching. Redis and performing jobs is key in certain situations. However, don't get caught up in too much hype. Especially those coming from closed source technologies (Not just talking about caching technologies here but everything in general). They will sell and produce an atmosphere of necessity, but do some research first. Don't follow the heard. I am not going to call anyone out on this. Just do the research and think why is that necessary. I've mentioned Redis a few times and maybe that isn't even necessary either.
Most importantly, put your stuff out there. If it crashes, so what! At least you know you have something. And then you will have people who will give you advice in a coherent direction if necessary.
I salute you in your efforts. Now the most important part is put it out there and kick ass!
You may even want to look into Redis which is a cache system.
Also, I forgot to mention since I am miles deep in my own program and hoping a couple bottles of wine will assist me in getting this app hammered out, but make sure you have a good git system going. A live repo, a dev repo and a staging repo at minimum. And at "another minimum", you probably at least want these three repos both on your remote server and local server.
Eventually you will want to maybe bring on other developers. Have a good private repo on Github. That is a choice of course. This way you can choose what to merge into your live branches. You just need to find a good groove. Git is beautiful. It really does help in organization of your development process.
And analytics. I'm a bit absent on this. The reason why is that you are a developer. You will eventually know what you want to know and even if you need to bring on another developer due to time constraints and such, you will know how to develop out personalized analytics that you want out of your app. I mean it is almost evolution. You build an app. Then you want to know what's up. You know the app the best so you will develop loggin and analytical methods to provide you with what you want to know. So again. Unless you are looking at a gem that will help you with visualizing what you are going to develop on the analytical side of things, don't get overwhelmed. You know best. You built it. It's beautiful, and if your app is really a hit, you will plug in chug and make it work.
Sounds like you have the skills to grow this up to 500 to 1k paying customers on your own. Early on you can increase ram/cpu to handle any initial scaling issues.
Once you even have 1k customers you'll have revenue to hire experts to help you with scaling and security.
I have an online game that has 16k active users and for the backend I use a $90 per month dedicated server. I can share the details if you want but my point is you won't necessarily need a huge server to maintain 5-10k users.
I run Reverb (https://reverb.com) and there's a lot of good information in this thread. Backups, Security, Alerts, Monitoring, all good things.
However, the biggest piece to scaling your application is the automation of everything you possibly can so that you can scale when you need to. You're going to be in a bit of pain if you need to scale everything manually.
Here's a few things I automate using Jenkins:
* Creation of web application servers(whether it be Puppet, Chef or Ansible, etc) make sure you can bring up a new node quickly and scale your app layer horizontally. Ideally automate the addition of this node to the LB.
* Data store backup/restores to all staging environments on a schedule(tests backups/restores) this is done using some custom code and the Backup gem. This way your dev team has access to an env that closely resembles prod and can resolve current prod bugs.
* External security scans using NMap (again using custom scripts). The jenkins job will fail if output is not as it expects. This way if we change a layer of our infrastructure we can know if something is exposed and shouldn't be.
* Static code analysis using Brakeman
Information you're going to need to scale your infra:
* Metrics on each one of your hosts. Use DataDog if you can afford it, integrates with all major systems and technologies. Great tool.
* Log collection via something like Logstash or Loggly and being able to visualize your application and web logs.
* Application response time measurements using something like NewRelic or building your own using StatsD and tracking the heck out of your application actions
Last but not least, have a plan for failure. While you're laying in bed at night, ask yourself these question:
* What would happen if the DB went poof? Can I restore it? How much data will I lose from the last backup? Will I know when this happens?
* What would happen if you're now being scanned by 30 ips from the netherlands, all of which are submitting garbage data into your forms. Are you protected against this? How will the added load effect your app layer? Do you have a way to automate the responses to those requests so as to deny them? This is a case of when, not if. Be ready.
* What would happen if my site gets put on Digg(lol)?
There's no magic bullet here. It's just practice, failure and learning from yours and others mistakes.
These days RAM is cheap and SSD storage is also widely available. For a very long time, one of my side projects with 50K users was hosted in a EC2 small instance. With that out of the way, here are a few things you will need to take care of:
* Security (especially passwords) - Rails should take care of most of this for you, but you should ensure that you patch vulnerabilities when they are discovered. Also, stuff like having only key-based login to your servers etc.
* Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.
* One click deployment - Use Capistrano or Fabric to automate your deployments.
* A good feedback/support system - this could even be email to begin with (depending on the volume you expect), but it should be accessible.
* Unit tests - as your app grows in complexity, you will never be able to test all the features manually. I'm not a big fan of test driven development, but really, start writing unit tests as soon as you have validated your product idea.
* Alerts, monitoring and handling downtime - Downtimes are inevitable. Your host or DNS could go down, you might run out of disk space, etc. Use something like Pingdom to alert you of such failures.
* Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.