The most difficult thing is going to be getting to 10K active users :) These day...

icehawk219 · on Jan 9, 2015

> Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.

The part about testing your backups is huge. I can't count how many projects I've been on that had problems where we needed to restore and we looked only to find any number of problems. Oh, backups actually stopped last month when we ran out of space, oops the backups only backed up these 3 db's and not the one you want, things like that. I'd also stress the importance of off-site backups. If you're using AWS for everything and your account is compromised can they delete your backups (assuming they have full, 100% unlimited, admin access to AWS)?

Which is also why if you're using stuff like AWS, Heroku, or any other third party provider (hosted Mongo, hosted ElasticSearch, Stripe, NewRelic, etc.) it's very important to ensure those passwords are secured and only the people absolutely necessary have access. Also, when offered, two-factor authentication should always be used.

Rapzid · on Jan 9, 2015

And don't use keys on your console admin accounts.

bw00d · on Jan 10, 2015

This is great, thanks.

JoshTriplett · on Jan 9, 2015

> * Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.

Depending on the service you're building, you can log too much. Consider the privacy and security implications of the existence of those logs; anything you log can be subpoenaed, but logs that don't exist cannot be.

Consider anonymizing your logs from day 1, and only turning on non-anonymous logging upon a report from a user. Alternatively, give users a "report a problem" button, and save their last N minutes of otherwise-ephemeral logs only when they hit that button.

You absolutely want to log enough to help you debug the service, but do you really need to archive old logs, or should you delete them entirely?

porker · on Jan 9, 2015

> * Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.

+1 You can't log too much. The user who claims an important email never arrived - does your system say it was sent? This bug 3 users have reported yet no one can reproduce - what were they doing at the time and what else was going on?

No, I'm not at that stage yet (of effectively being able to rewind application state in the log files to see what was going on), but for debugging issues in production it's exceedingly useful.

edmack · on Jan 9, 2015

Getting loads of core services out into third parties is really wonderful for logging. E.g. if email sending happens in Mandrill, then you never need to write decent logging calls for that and you have a reliable source of truth!

tlack · on Jan 9, 2015

Except you won't know if your server ever sent it to Mandrill. :) Always be extremely verbose with logging!

porker · on Jan 9, 2015

This brings up a tangential problem I've yet to solve: how do you warn that something didn't happen when it should?

E.g. you have a script that does backups. You log the script's output, but one day something fails and the script is no longer executed.

Some form of dead man's handle is needed; the only way I can think of is to set up a monitoring service to check your log store for these entries every X hours.

Any alternatives?

tlack · on Jan 13, 2015

I've had this same issue over and over again in my career.

I've toyed with the idea of writing a daily "sanity checker" in crontab that verifies various concepts of system health.

Examples: Did the latest batch of data transfer to S3? Did we delete old customer accounts today? Did we get any signups (because if not, something may be broken, but not triggering an exception report etc)? Did we send out daily report emails?

But I could see this easily becoming a pointless exercise, and I doubt I'd have the time to keep the sanity checker updated with the latest requirements. In fact, the sanity checker would probably become insane pretty quickly.

Perhaps the platform itself should do this for you, in some way. Idea: while coding, indicate that this procedure should be running periodically, ie:

    Monitor.registerPeriodicTask('email-reports', 'daily')

and then the system would log every time it occurs, with a generic task that would run periodically and scan for things that should have occurred, but haven't in some while.

rwbhn · on Jan 10, 2015

Monitor that the newest backup is less than N hours old.

caw · on Jan 10, 2015

Alternatively, if you bother to have proper exit codes for your backup scripts, you could verify exit != 0, and alert on that.

I have cron on all of my systems email STDERR to me, STDOUT is redirected to the normal logs.

One you clean up your crons, you really only get email once a month or so when something breaks.

porker · on Jan 11, 2015

Do you use something like http://habilis.net/cronic/ ?

caw · on Jan 12, 2015

Nope, just instead of this

    0 1 * * * backup >/var/log/foo 2>&1

I do this

    0 1 * * * backup >/var/log/foo

My errors are emailed to me via standard cron conventions. Normal logging is available for me if I care to look at it.

reymus · on Jan 9, 2015

I have always heard the opposite, that too much logging is as bad as no logging. I see the point of having the logs to be able to find out what happened, but what happens when there' s so much logging that the information needed is just buried into huge amount of noise?

msielski · on Jan 9, 2015

This is true, without the right tools. I am moving to logstash with kibana to do this, and it's looking very promising. See http://www.elasticsearch.org/videos/kibana-logstash/

revo13 · on Jan 9, 2015

Concur with this. Log everything, and use Logstash/Kibana to sift through it for what you are looking for.

exelius · on Jan 9, 2015

This was true before Splunk. If you logged too much, your logs could start to outstrip the assumptions behind your log rotations and cause trouble. Now the common wisdom is to just log everything so you can Splunk it later if you have a problem. Verbose logging + Splunk have made production incident identification so much easier than it used to be.

Splunk DOES charge by the GB, but it's not very expensive in the long run.

gizmo686 · on Jan 9, 2015

My favorite systems to work with are the ones with overly verbose logs, where the overly verbose parts were clearly tagged and could be filtered out. Generally, we would never look at the verbose lines, and even when we did, we would normally have some idea what we were looking for, and be able to filter somewhat for it.

icehawk219 · on Jan 9, 2015

I'd actually argue it is possible to log too much if you aren't using good tools to make your logs easily searchable. Which is why you should use such tools if at all possible. Otherwise the logs can become so big that finding the entries for that bug or that email becomes pretty much impossible. This is also why it's important to take a few minutes and think about what you're logging and how. Things like request and user IDs can be invaluable. My test is usually "if I have nothing but this error message/log entry, do I have enough to begin investigating?". This is hard to get right until a bug or problem occurs and you actually need to use the logs but investing a bit of time into it can be a life saver.

bw00d · on Jan 10, 2015

This is something I'm going to have to research. I haven't dealt with logging yet, besides reading development logs and Heroku logs.

landr0id · on Jan 9, 2015

> Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.

How do most people manage activity logs? Currently what we have set up is the user id (if the user is logged in), IP address, URL they hit, user agent, and timestamp are all inserted into an activity logs table. For one particular site with an API that's being polled the size of the DB grew pretty large.

oisinkim · on Jan 9, 2015

> Logging, logging, logging - I can't stress on this enough.

There is no easier way to offload, view, filter, alert and search than logentries:

http://www.logentries.com

chrissnell · on Jan 9, 2015

Off-machine logging. There are commercial services (we're using Papertrail but there are tons of them), roll-your-own solutions (Elasticsearch-Logstash-Kibana), and simple solutions (syslog).

For an easy and simple solution, spin up a second instance and send logs to it via rsyslog over a private network interface. Most mature frameworks provide a method to send logs over syslog. It's UDP and very lightweight. Another plus: if you are compromised, you have another server with your logs and that server isn't running your vulnerable app.

clogston · on Jan 9, 2015

We're running the Elasticsearch, Logstash, Kibana (ELK) stack with the recommended approach i.e.:

  logstash client
                   \
  logstash client --> redis -> logstash "server" process -> elasticsearch <- kibana 
                   /
  logstash client

We have a high logging load (we log every request) due largely to IRS requirements. I've been really happy with it over the past 6 months but something that cannot be overstated is that you'll really need to become familiar with each one of the technologies used as each requires it's own setup and configuration. Not being familiar with any of them, it took me a solid 3 days to get to where the whole thing was usable and performant. Troubleshooting it is a breeze, and the whole system scales really easily, but a lot of that time was invested up front.

agib · on Jan 21, 2015

Just curious about the redis-in-the-middle. Any references so I can dig deeper?

daigoba66 · on Jan 9, 2015

Logging every hit will always require a lot of space. But there are some tricks you can use to "compress" it: hash long strings like the URL and user agent and store the hash as binary instead of a string. A 100+ byte string can compress to just 16 or 32 bytes depending the hash your pick. Store the hash lookup in a separate table.

mandeepj · on Jan 9, 2015

what is the benefit of your approach? The lookup table will still have data growth issues

tonyarkles · on Jan 10, 2015

Presumably there will be many more hits than URLs, i.e. the URL table will grow much slower than the Hits table.

tonglil · on Jan 9, 2015

We keep the last 30 days and also send it out to Rollbar for notifications and analysis. It's working great!

ShinyCyril · on Jan 9, 2015

> Rails should take care of most of this for you

I often found myself falling into the "I'm not using PHP, so I don't have to worry about any security holes" trap. CSRF is something you really need to watch out for if you are constructing forms manually!

oafitupa · on Jan 9, 2015

And while you are bashing PHP, actual professionals use it all the time and don't fall for those noob mistakes. Learn some Symfony2.

ShinyCyril · on Jan 10, 2015

No bashing intended. I started off writing bad applications in plain PHP that were full of security holes, and moved on to writing better applications in Python with the help of frameworks - at the time I didn't realise there were helper frameworks for PHP too and thought Python was infallible.

hackerboos · on Jan 9, 2015

Giving OP benefit of the doubt I think s/he meant to say "When I'm not using a framework."

briandear · on Jan 9, 2015

No need to construct forms manually, Rails form helpers take care of authenticity tokens automatically.

ShinyCyril · on Jan 10, 2015

I use Python myself and had this problem until I discovered WTForms.

pbreit · on Jan 9, 2015

Nailed it with first sentence: it takes 10k users to run an app with 10k users! And that will be, by far, the hardest part.

The technical stuff will be pretty much trivial. Any decently constructed app on any decent framework (Rails, etc) on any decent host (AWS, DO) would be able to handle a 10k user app (probably maxing out at 1% online at same time) without breaking a sweat.

And you will have plenty of time to build out the tech because it will probably take you many months to get to even a few thousand users (depending on what kind of app it is, of course).

bw00d · on Jan 10, 2015

Thanks

sixpenrose16 · on Jan 9, 2015

That's a good overview. I would also not forget about the type of application. Like you said, resources are pretty cheap these days, maybe leveraging Amazon cloud and integrating with other applications could life the heavy weight that is normally required. If someone can get to 10k active users, they should also note on possible scaling issues down the line.

ybrs · on Jan 9, 2015

also i think one should take care of Reliability into account as well, if you have only one app server it will fail - maybe eventually - and you'll find yourself trying to ssh into the server at 2am, have at least a few app servers load balanced, have 2 db servers replicating with auto failover etc..

danmanstx · on Jan 9, 2015

not sure if this goes without saying, but a huge one you missed is some sort of VCS. Use what you know and be sure to commit early and often. It can seem like a hassle to begin with, but once it saves you once you'll be very thankful your using it.

ing33k · on Jan 9, 2015

good list. I just want to add more voice to the unit testing part. It's natural to feel that testing will slow down the development process , but once you got some unit tests for the core parts of the app, it feels awesome and the developers will have a lot of confidence in the product.

briandear · on Jan 9, 2015

TDD speeds up development, especially when adding features. Without full test coverage, you're relying upon manual regression testing to catch bugs. Once you have actual users, that could be catastrophic to deploy broken code without even realizing it.

bw00d · on Jan 10, 2015

Thank you for your input.

Gurkenmaster · on Jan 9, 2015

> These days RAM is cheap

Ironicially it's a completly different story on the consumer market. I bought my two 4GB sticks about 2 years ago and now they cost twice as much. 4GB can cost you $40 which is not cheap at all.

jacquesm · on Jan 9, 2015

> 4GB can cost you $40 which is not cheap at all.

That's totally negligible on the cost of the rest of the machine. Imagine $160 will get you 16G!! That's an absolutely enormous amount of memory, most power users would be more than satisfied with that.

Not all that long ago that amount of RAM would cost more than a brand new mid sized car.