Hacker News new | past | comments | ask | show | jobs | submit login
My Amazon S3 Mistake (devfactor.net)
349 points by DevFactor on Dec 30, 2014 | hide | past | favorite | 165 comments



There are a few lessons here:

1. Use IAM roles. AWS keys should only have access to the specific functionality they need. Best practice is to never use your root credentials for anything; always use IAM users and use unique keys for each use case, so that you can invalidate and replace them easily. Every time documentation or a tutorial asks you to insert AWS keys, your response should be "let me go create a role for that" rather than "let me go look those up".

2. More importantly, if you ever accidentally publish or leak credentials, don't try to clean it up by deleting those commits. Invalidate the credentials immediately and re-issue them.

3. Always, always `git diff HEAD` before committing. Know what you're about to push up. This isn't just a security concern - the number of small, stupid things you'll catch that you'd otherwise end up fixing 15 seconds later is substantial. As a bonus, this incentivizes you to keep your commits small and atomic.

You might talk to Amazon directly - they've been known to forgive debts like these in these kinds of circumstances.


> You might talk to Amazon directly - they've been known to forgive debts like these in these kinds of circumstances.

They did:

> Lucky for me, I explained my situation to Amazon customer support – and they knew I wasn’t bitcoin mining all night. Amazon was kind enough to drop the charges this time…


Whoops. I missed that. Glad the author was able to dodge that bullet!


This was my initial reaction too, but I was pleased to find it's just a slightly misleading title :)


On HN I recall the title originally including the $2300 (which matched the blog post's title). Now the HN title does not include the $ amount. Did the mods make that change? I would think preserving the source's title would generally take precedence even if a bit misleading.


I'm nearly certain it did, which is in line with submission guidelines; certainly the title led to our mutual confusion. I do not know how/if titles can be edited.


> Always, always `git diff HEAD` before committing.

I prefer to always "git commit -v", which shows me the diff while I edit the commit message.


Call me unsophisticated, but I personally find it extremely helpful to use a GUI (currently using SourceTree) to interactively review changes before committing. I use the command line for almost everything else, but this is one of those cases where a nice GUI really seems to shine.


That's how I work these days as well. SourceTree for commits, especially if I just want to stage hunks instead of whole files. Command line for everything else.


This is exactly what I do also, all I use SourceTree for is to review my changes. I'll go back to the command line to commit and push.


I'm with you on this as well. I attempt to make small localized changes, thus `git diff` usually suffices. However if I forget to commit often, or end up doing a big refactor, I will use SourceTree to aid in inspecting the diff. Additionally, I find staging hunks much more intuitive via a GUI than the command line.


That is literally the only reason I use Github for Windows. I'll have to check out SourceTree.


Tortoise git is also not bad.


SourceTree seems like such a complicated mess. `git gui` and `gitk` are my preference.


Try the GitHub app (which works with non-github git repos as well). It's feature list is TINY compared to SourceTree, its pretty and has a very easy to use commit interface.

https://mac.github.com or https://windows.github.com


I really really really wish that GitHub's application worked on Linux. Linux is seriously missing a lot of nice GUI applications that OSX and Windows get. Actually, Linux is missing a lot of nice convenient things that Windows and OSX have. It's all technically possible on Linux, but it's intentionally left difficult (or "advanced") for no reason that I can tell. Just the other day I was trying to install Lua on a Linux machine for the first time. Lua for Windows comes as a single installer that includes Lua, over a dozen batteries-included libraries, and a text editor. To get that on Linux, you have to yum install lua, install luarocks from source, then luarocks install every library you want to have.

The only reason Linux is nicer for certain kinds of programming is because Bash and the GNU utils are so great. But why bother when there's a world of people making things so easy and smooth for Windows?


I'm on Linux


No Sourcetree either on Linux :~(


I use both in moderation.

My workflow is primarily raw CLI (no aliases, no github extensions, etc). I have vim configured as my primary editor and my .vimrc has the vim-git plugin (https://github.com/tpope/vim-git) loaded so I can enforce 50-char summary and 72-char line-wrapping.

I use other clients for special purposes:

Fugitive (https://github.com/tpope/vim-fugitive) primarily for git blame while working.

SourceTree for visualizing branches and for staging individual lines from a hunk in a finer grained fashion than `add -p` allows. Occasionally for branch maintenance when an interactive visual list is useful.

gitk for loading partial histories (eg. --author options, pickaxe option), visualizing complex branch arrangements that make SourceTree choke, anytime I need performance to search back many months.


I also use Vim and Fugitive, but instead of using a GUI I use `vimdiff` as my Git mergetool and difftool. You can use `git difftool` instead of `git diff` and your diffs will open in `vimdiff`. This also goes for any other diff programs which take command-line arguments, like some GUI programs, FWIW...


Funny, I ended up with SourceTree because it seemed like a clean and simple (well, at least when I started with it), but still having all the tools I needed, alternative. Sure, it has gotten a bit more complex, but the basic functionality (as I see it: diff, stage, commit, push, pull, branch management) feels simple to me.

Could you explain what parts of it you don't like? Also, could you link to git gui? I tried to Google it, but didn't really get good results (shockingly, all the other Git GUI programs came up...).


Here's the tool I was referencing: http://git-scm.com/docs/git-gui

It just seemed so confusing compared to the other tools I mentioned, and it would try to do all this extra stuff for you automatically (i.e. adding weird extra arguments to the git commands and I had no idea what they did)


Ah, the weird extra arguments is something that I had noticed as well. And I have to admit, with some shame, that I have not taken the time to look into why.


As another commenter mentioned, tig is amazing.


tig[1] is another really nice tool to check the diff while on the working branch. It shows things like what branch different remotes are on and a nice tree of what has happened on the source tree (merges, pull requests, etc.)

[1] https://github.com/jonas/tig


I would also highly suggest people use 'git add -p' so they consciously add hunks to the commit.


This one right here. It's really useful for creating relevant commit messages, catching spurious whitespace changes, random debug code etc.

I like Sourcetree for history and diffs, but it's CLI all the way for commits.


I use git commit -p for the same reason.


Huge +1 for #3. Saves my ass all the time. I usually stage the changes I think I want to commit and use `git diff --staged` to confirm before I commit.


I like that better - I think I'll start using it. I usually end up staging the stuff I want to commit and reverting the stuff I don't, but that offers better flexibility. Thanks!


About IAM... I worked with that a LOT a couple of years ago. It's interesting but quite the rabbit hole of configurations and permissions.

I used to be an AWS fanatic and got burned. Unless you need to scale up on the spot (and have the time to code management of that) it's a huge waste of time for no good price. EC2 instances have dismal performance.

Just get decent servers somewhere. For most scenarios it's hard to max out a server nowadays, if you program decently. You have to do that for AWS 10x anyways, IMHE. The cloud is a black hole of developer time and problems.

On top of that it's stupid to put data in US when you are a foreign company, you are open game for 3 letter agencies no questions asked.


I really have to wonder about the amount of time to setup your own servers. I like to setup and tweak hardware for fun. But the time and starting costs seems way too much compared to spinning up a few instances to get something going quickly. Looking around here, San Francisco bay area, I can't find any way to host a server as inexpensively and as quickly as I can on the cloud. Plus there are lots of alternatives to AWS which promise better performance.

BTW, tweaking around with servers can also be a major time sink.


I mean dedicated servers. Usually the provider installs the OS for you and then you take over.


In addition to git diff, I suggest git add -p and git diff --cached to enhance visibility of changes. I alias them to short mnemonics in my bash configs.


Heroku doesn't require a public GitHub account.

Seems like the best lesson learned is keep personal work personal, use private repos if using GitHub.


No, the lesson is don't commit credentials and if they leak then revoke them immediately.

and use least privilege to make sure that if you do screw up it's as minor as possible.


+1 - Even if you are using a private repo within your company, committing credentials is foolishness. It's trivial for someone to intentionally or unintentionally leak them, or abuse them.

And personally, I've found that it's just good practice to exercise those kinds of precautions for "throw-away" projects regardless of their "throw-away" status. It keeps you honest.


Or, if you don't want to pay for the privilege of not having your work public, go use Bitbucket.


Heroku doesn't require any Github account - that's the beauty of git :)

That said, I'd still argue that you should practice defense-in-depth; plan for "this is how I'll limit damage when someone gets ahold of these credentials" rather than "Nobody will get these, so I can do dangerous things with them".

It's the same argument behind "don't do things as root unless you have no other choice"; trading security for convenience works really nicely right up to the point where you get your teeth kicked in.


*use bitbucket


Sorry for OT but why do you need to add HEAD? I always just do git diff. I tried to google but didn't find any good answer.


If you do just "git diff" it will show you the diff between the index (staging area) and your working tree. If you do "git diff HEAD" it will show you the difference between the latest commit and your working tree, that is, it includes the diff in the staging area.


Variant on #3: git diff master origin/master just before you git push.


Or, if your current branch has an upstream, `git diff @{u}`.


> I learned a valuable lesson here though. Don’t trust .gitignores and gems like Firago for keeping your data safe. Open source is awesome, but if you are dealing with anything that can be scaled up to thousands of dollars per hour – at least store it in a private repo if not on your local machine.

You learned the wrong thing :-P The lesson here is "always check what you're committing"


That and: Immediately invalidate any auth information that was accidentally published, no matter how briefly.

I realize I haven't always followed this advice myself. So the reminder from the OP story is useful.


As an additional layer of safety, you can also commit to a local repo, or a private bitbucket repo, and give it all a nice once over before pushing everything to GH. At least this keeps your screw ups out of public scope.


Isn't that what you're doing anyways when you use a DCVS? Granted, I use mercurial. It has two components on your side, your working directory, and the actual repo. Committing moves changes from your working directory, and saves them to a branch on your local repo. Then you get the local repo to push that change to a remote one.

In essence, "committing" then is the act of pushing your changes to your local repo. Or you could, you know, just look at the changes on disk if really required.


Yes, that's how Git works. I'm not sure what fein meant.


Hell, even without making the mistake of publishing keys, I've accidentally run up quite large bills for Amazon services; backups that failed to remove old copies was a big one. Instances that were supposed to have been shut down, but for some reason it didn't happen (I don't know if this is my mistake or a bug at Amazon...probably my mistake).

I've simply stopped using Amazon for anything tinkery, because the costs of making mistakes can be tremendous. At least when I make mistakes on my own colocated server, I know it can never cost me more than the $100/month I pay to host it. And, storage is practically infinite (4TB hard disks), and I can spin up more VMs than I would ever need for tinkering in 32GB of RAM on our "spare" server.

And, when I have needed to use VM with some cloud host...Linode and Digital Ocean and similar may have dramatically smaller toolsets for managing virtual resources than Amazon (probably unworkably so for large deployments), but my mind has a much easier time predicting costs than with Amazon. After being surprised on more than one occasion with a ~$300 bill from Amazon (for running nothing but personal pet projects with no economic return), I turned everything off.


Something similar has happened to me, on a smaller scale. I was recently checking my bank account, and noticed a charge from AWS for something like $15. After further investigation, I had apparently been charged this $15 for about 6 months.

Hmm.

Logging into AWS and checking the Cost Explorer showed that I had been charged for a t1.small EC2 instance I had running, but when I logged into the AWS console, all I saw was a single "stopped" instance. There's no way I was being charged for cycles I'm not using, right?

Turns out that the instance I was being charged for was in us-west, and I had only been looking at us-east- and there is no indication within the actual EC2 panel that you have instances in other regions!

I was a bit perturbed, to say the least...


Almost the exact same story here. I never thought to change the region to see what was causing a $3 charge every month, and at $3 I didn't have the energy to investigate further until about a year of needling frustration built up and I reached out to Amazon support directly and they clued me in.

Years later I'm using AWS professionally, and it all seems easy now. But when you are first introduced to AWS and want to kick the tires on a toy project, it can be confusing and overwhelming.


+1 on this. I have several instances running in US East and Ireland. When you login it just shows the instances for 1 region (usually us-east) and I have a quick "oh shit" moment wondering what happened to the Ireland servers. The AWAS EC2 console really needs a way to show all instances.


Not sure why you didn't click on "Billing & Cost Management" where you can see a complete breakdown (by resource and region) of exactly what you are being charged for.


I don't know about the previous commenter, and I don't know if the tools have improved, but when I tried to use the billing and cost management page, I couldn't figure out what was actually costing so much. It broke it down into (EC2, S3, etc.), but I couldn't figure out how that mapped out to the bill (i.e. was it the snapshots of my instances, the storage for the not running instances, the old backups in S3, etc.). It didn't seem like it was telling me "exactly" what I was being charged for.

For me, when I saw a $300 bill for one running EC2 instance, a couple of halted ones, and a few hundred GB of storage for something I thought of as a "toy" project that I didn't want to invest serious time or money in, I knew I was done with AWS. A small colocated server could readily provide those resources for vastly less money (and I work on tools to manage cloud and VM resources, including a reasonably good API for spinning them up and down and such, so I don't really miss the Amazon API or UI). This was after I'd already gotten a shock from an automated backup gone wrong that ran up a huge bill. So, it took me a couple of times getting burned.

Of course, it's always been my fault for not understanding how Amazon bills for things, what services cost money (i.e. a down instance still costs money), how much things cost, and sometimes how the API works (or at least confirming that it's doing what I think it's doing, in the case of removing old backups). I'm not blaming Amazon. I'm just saying, I don't trust myself to use Amazon for anything that I'm not going to spend a lot of time and energy on, because I'm obviously not capable of using it without making mistakes when I treat it like a toy. I readily admit I shot myself in the foot; Amazon just provided the guns.


The whole billing/cost section got big improvements last year.


You can set up billing alerts so that you're notified if your AWS bill crosses a threshold.

http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/...


When I setup an AWS account for pet projects the very first thing I do is create a billing alarm that alerts me the second my bill goes above $0. This way if something is accidentally running without my knowledge, ill get an alert (email / sms) and can quickly take action.


are there any major companies providing co-location or is it kindof a diverse scene? im curious about it but only really know about cloud VM... basically just looking for a site where i can read about one of these services / pricing plan. my main questions are about bandwidth -- for one company i freelance for, the majority portion of their bill is s3 IO and it would be huge for them to get to a capped pricing model even if speed suffers

tried googling but it doesnt seem so straight-forward... i like the idea though.


Colo can be confusing. There are large companies that have their own data centers (sometimes dozens of data centers) and many small companies that often bundle and re-sell services. I currently use CoreXchange, which is now owned by Zayo. I chose them partly because they are pretty close to me; I'm in Austin, the data center I'm hosted in is in Dallas. They have pretty clear pricing on their website and their connectivity and service is good and fast. But, there are many, many, good providers out there. I would check in your area (assuming you're in an area that has good backbone access; most large cities do), first.

There are cheaper providers and more expensive providers...sometimes, it can be hard to tell whether you get what you pay for, without actually trying it. But, most of the big colo providers have been discussed online at Web Hosting Talk or other places. When getting new service, I usually check WHT first for deals in my local area in a good data centers. Sometimes, for low bandwidth (where low bandwidth may mean something still kinda huge, like several terabytes of transfer per month) a reseller may be the best deal. Sometimes, going straight to the provider is the best choice. I ended up a CoreXchange by way of a deal from ColoUnlimited.

The good thing about going this route is that ongoing costs are somewhat lower, generally speaking, though up front costs are much higher (~$3000 for a server, for example). And, costs can be predicted with high precision. You can simply say, "I want this much bandwidth." and that's what they'll give you and bill you for. You're pretty much just paying for power and bandwidth when you're in a colo; all the other stuff is up to you (hardware upgrades, replacements, etc. either need to be done via shipping gear in and out or by going on-site to make the changes).

The bad thing about going this route is that it's all on you. This isn't "managed" hosting (though they do have staff on-hand, and you can usually pay for "remote hands" to handle system upgrades and such, it tends to be very pricey for anything more than simple reboots). And, scaling is non-trivial. In the bad old days, if you had a site get crazy popular overnight, you had to figure out how to get more servers online on short notice...maybe missing an opportunity to grow and blowing your shot at a good first impression on new users. AWS and other cloud services are much more readily scalable. But, there's no reason you can't build with scaling in mind, and use cloud services for elasticity while using colocated servers for your baseline service. That's how we do for our business stuff. If we suddenly need more servers online, we spin them up via Amazon or Google Compute Engine (the software I work on supports several cloud providers, as well as building cloud infrastructure out of heterogeneous servers, so this is not much different than managing VMs on our own hardware and we can move websites and such back and forth across our own machines and AWS VMs reasonably easily).


For applications I've opted to use Heroku due to its simplicity when compared to EC2. I'm still working out the static storage though. I was impressed by Amazon customer support, so I may stay around. Simply renting a VPS somewhere is another option.


PSA

If you're running AWS, I highly recommend for all developers and sysadmin to attend AWS free training[0][1]. While you're at it might as well get yourself certified[2].

You might have senior expertise with system operation and application deployment but sometimes, AWS approach things differently. The essence of the training is to always implement best practices, not just solving problem.

Also use the opportunity of AWS event to network with their Solution Architect. Trust me on this one. This worth more than AWS Enterprise Support.

[0] http://aws.amazon.com/training/course-descriptions/architect...

[1] http://aws.amazon.com/training/course-descriptions/architect...

[2] http://aws.amazon.com/certification/


Erm, is it really free? I just followed your link, which lead me to https://www.aws.training/home?courseid=3&language=en-us&sour... ... which lead me to http://www.globalknowledge.com/training/course.asp?pageid=9&... ... and there I find out it's $2095 USD.

Hopefully I'm mistaken... because I'd love to take it if it's free!


I just noticed that free training session seems to be limited to certain location sold by "Amazon Web Services". For example Kuala Lumpur and Bangkok are free, but other location have varying fees.

I attended 2 free session in Kuala Lumpur and Singapore back in June. I always assume its free for all. Sorry


Huh, a round trip to Kuala Lumpur is less than the course in the States.[1] Best excuse for an overseas flight I've seen lately.

https://www.hipmunk.com/flights/WAS-to-KUL#!dates=Jan12,Jan1...


You can use AWS IAM (https://aws.amazon.com/iam/) to help prevent something like this. AFAIK you can create a sub-account that only has access to specific resources, such as S3, and use the keys from that sub-account.

I haven't used it much, but it looks like you can be very specific in what you allow, such as only allowing access to a single bucket with S3, or a single domain with SES.


You totally can be very specific about what actions and services an IAM key has access to. The permissions model takes a bit to get your head around (It did me, at least) but once you do, you can have very strict, precise permissions for your keys.


Yes that's correct.

I use a specific key for only allowing DNS updates to be applied to "example.com" and no other domain, for example.

As others have mentioned the billing alerts are also a useful thing, along with the IAM Simulator.


Yeah, I did that, though in my case it was a bitbucket repo I thought was private, but somehow ended up public (obviously stupid to ever be checking in the keys at all).

All I had setup was a micro S3 instance I'd been using for some toy craigslist scraping & hadn't touched for months.

Then out of the blue I get an "urgent please check your account" email from amazon. Go check the AWS console, and what do you know -- maximum number of maxed-out instances instances churning away with 100% CPU usage in every region on earth. The charges were already about to $50,000 when I turned everything off.

I wrote a very, very apologetic email to amazon, and they forgave all the charges, for which I was very grateful.

Definitely a learning experience.


And please for the love of all that is secure, use IAM roles. Even for your personal things. it's not that hard and you can stop things like this from happening even with the auth credentials.


If the criminals can create a bot to scan for AWS keys, I wonder if github can't create a plugin to detect the same and warn the committer or maybe limit access to this data to original committer only. It won't be 100% but I bet the bots aren't 100% either, so if it covers most of the cases it would still be useful.

Or maybe just have a script on local github pre-commit hook?


This happened to me and I was immediately emailed by AWS with an auto-generated support case: "Your AWS account ### is compromised". The emails outlined next steps like: check for unauthorized usage, delete/rotate the key, etc.

I was very surprised/impressed.


> It won't be 100% but I bet the bots aren't 100% either

Bots having tons of false positives doesn't really matter (except to the bot maker, maybe). But GitHub having tons of false positives means customers get annoyed by false alerts, locked data, whatever.


I don't think people will be upset to get an "WARNING: You might have committed a secret" if it's a negative.

You might be right if it really is a ton, but then you work on your algorithm. I think the problem is so big that there really do need to be warnings for these kind of issues.


Removing such suspicious actions from public /events API and other APIs would probably have minimal effect but have the bots that feed from those not see it. Just one of the possibilities :)


I don't think it's github's responsibility to prevent you from shooting yourself in the foot.


It is thinking like this that leads companies from being awesome. I've known a couple people in real life that this has happened to. It is stupid and wasteful.


It's not an obligation, but neither is running a free git repository :) It would be a nice help though. One that can save some people $2k - not many software features have this kind of immediate impact :)


Nobody said it was their responsibility. If adding an optional feature that prevents you from shooting yourself in the foot makes people like github more, maybe it's worth it to them. "Responsibility" has nothing to do with it.


Would have to make it an option, because we check in various AWS keys into private repos.

Also how do you differentiate an AWS root key (bad) vs an IAM Key (good)?


Maybe you should rethink your policy either way.

In my circles, at least, it's standard practice to use environment variables.

But I would think clearly it'd be an option.


> Maybe you should rethink your policy either way.

Do you have articles discussing the cons of AWS keys in private repos?

We deploy our systems on vanilla EC2 instances, which are configured by using a server orchestration system (Ansible). So for any env variables to get set, we'd have to put them in config scripts, which are currently checked into github.

To make it clear, we only check in our IAM keys that are AWS service specific, like SES.


Well, no, but it would be a nice-to-have.


Lesson 1: Don't publish your passwords.

Lesson 2: When using AWS then use Billing Alarms[1].

It takes about 1 minute to setup and enables e-mail or SMS notifications at dollar-thresholds of your choice.

[1] http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/...


The corollary to that is the first thing such people/scripts do when they have access to your account is disable Billing Alarms.


Are you sure you can disable billing alarms with just your access/secret key? I thought you would need to know the user name + password for that


Having root credentials is sufficient, and based on the fact that this guy said "apparently the s3 api lets you spin up ec2 instances" it looks like he didn't touch IAM.

Root credentials are deprecated, but can still be used and if this guy used them then yes, his billing alarms could have been disabled.

There's a difference between an admin iam user (can't do billing stuff) and the root credentials (just as powerful as username / password).


Maybe they should send an email/SMS to notify you that email/SMS had been disabled. (And, an email/SMS to the old address when the email/SMS is changed too.)


Also, if you ever accidentally leak your keys... CHANGE THEM!


The same thing happened to my brother. He's been teaching himself software development for a while, and learned the same painful lesson about securing your keys. This was despite the things he did right, like:

1. Used IAM roles (but he didn't lock down it's permissions nearly enough clearly)

2. Used two factor authentication.

http://snaketrials.com/2014/11/11/espionage/

http://snaketrials.com/2014/11/12/espionage-update/

I should note Amazon forgave the money his account owed, given it was his first time making this mistake on their systems. Amazon told him the servers were probably being used to mine bitcoins.

edit - Oh, and on his blog he says it was $2000 over 12 hours. It turned out to be $5000 over 2 hours!


This seems to be bad design on Amazon's part as well. There should be some kind of manual thresholding in place to make sure this doesn't happen.


There is, hence the emails and phone calls that were missed.


Amazon also places limits on all of its services. I would think one of the reasons for these default limits is to help with cases of abuse. You can learn more about them at http://docs.aws.amazon.com/general/latest/gr/aws_service_lim...


amazon has limits that take time to increase, which has me scratching my head - we've always had to put in a support ticket whenever moving past those limits, for instance the 50 server limit we reached.


I must be very old or very paranoid, but I wouldn't for one second think of developing from scratch using only public infrastructure.

If I want to learn rail, I'd start by installing it and running it on my local PC. What's the point of relying on heroku, aws, etc for just everything? What's the point of using github as the main hub for everything? I'm synchronizing stuff to github explicitly and carefully, my casual, day-to-day operations are all running on my own hardware, network and services. Even my most simple websites run first at home, then are sync'ed to the web after I've checked everything works fine.


Some old man Java rambling for you. :) (I am ex-Rails actually)

one of the big things about Java that at first I thought was tedious but now have come to love is the build process. By creating the "built" project in a new folder, it is like you are guaranteeing that only stuff you manually specify to be in the production code actually makes it in.

i know .gitignore is supposed to handle this but it just seems dangerous to rely on source-control related filters when there is really sensitive security info involved.

in node i've used grunt & really like that model. The idea being that you have the project run a build then check that into a "build" branch in git & deploy from there. it's also nice because i can uglify etc. if i'm not worried about preserving the source.

I guess for distributed coding this is tough because you want your peers to have access to the raw source + build instructions, but if I were working on a really security-sensitive project (I do enterprise so I'm starting to understand the need for extra precautions) I would probably only distribute code that goes through a (perhaps even thin/transparent) build process & then figure out a way to integrate changes back into the source branch.

I know it must sound like overkill to the open source world & especially for Rails projects, where there ordinarily is no build stage, but it seems like there is some level of danger in distributing a source control repo where you are relying on you + any collaborators to properly configure .gitignore and never accidentally check anything sensitive into the repo. I am decent with git but no expert, i still sometimes accidentally stage things I shouldn't when initially checking in a .gitignore -- and i have much more faith in myself than some of my peers, hah.


> The idea being that you have the project run a build then check that into a "build" branch in git & deploy from there.

This doesn't at all solve the OP's problem, you shouldn't be putting your build under source control in lieu of your source code.


well the way that it works in Grunt is actually that you have your source branch and then there is a separate build branch -- so the source is still versioned, but only the artifacts from the "build" branch are presented to public / pushed to your server, whatever.

i just appreciate the security builds add. In enterprise coding the paradigm is very simple -- pretty much keep whatever u want in source, but only built artifacts are ever deployed to the servers. In a closed-source setting this is perfect.

I understand what you mean though it doesn't really solve the issue of distributed, open-source development. (I guess no one would really make a public repo unless they wanted to share, huh?)


I think a lot of the comments here are good - but a thing to keep in mind with github is that their free repo hosting is by default - public. Which means that your code is indexed and searchable - by anyone visiting github including anonymous users. They do give you free private repos if you are a college student or professor.

I am a big advocate of self hosting - not because services like github suck (in fact I'm a big fan of github, bitbucket and other services) but you have more control over your own code (that and you can setup private repos).

Here is a small list of self hosted solutions (I forked indefero into srchub - many don't like the google code feel but I personally like it for the simplicity and the fact I can easily fix/modify/add on to it): http://softwarerecs.stackexchange.com/questions/3506/self-ho...


I think it should also be noted that if you're using Heroku, you've already got a "private git host" right there. People get so used to "git push heroku master" as an idiom that it doesn't occur to them that "git pull heroku" works too. You can collaborate using Heroku as a single-source-of-truth just as well as you can using Github or Bitbucket.


FYI, bitbucket and possibly other services offer private repos for free (in order to compete with GitHub).


tl;dr; free services are fine - just make sure you have backups of all your data

For most individual projects - they are probably good enough. However, the thing to keep in mind is that for free they usually limit features (I think bitbucket limits the number of developers) or remove features all together (google code removing downloads support, github did at one time but reintroduced it). I'm not saying that I think they should be offering everything for free and shouldn't take away features - they have to make money - but in my opinion always have backups and a plan B in case you need to jump ship.

I was a huge google code fan (the design isn't Web 2.0 with social integrations - but I prefer functional over design) until they pulled the plug on downloads support (which for most Linux people isn't a big problem - a make && make install and it's compiled AND installed - for Windows it's not that easy). Which forced me to self host - I do mirror many of my public projects on github but in the event github management decides to remove/reduce a feature I won't be struggling to find a new "home" for my ~100+ projects.

Also - if you look at the issue tracker for google code it's pretty obvious that Google is no longer supporting or even monitoring it (several spam issues have been appearing). I don't have a crystal ball - but my best guess is that Google will pull the plug on Google Code as well. And it would make sense - most people have already "fled" Google Code for github/bitbucket etc.

With that said - let me put on my tin foil hat and say that even if we ignore any potential future issues there could be potential privacy issues if you upload your code to a private repo hosted by someone else. I'm personally not the type to snoop, but that might not stop some lowly intern from getting curious. Or even some bug on their site that makes private repos public (for a short time anyways). Did you hear about the time that Dropbox made passwords optional for four hours [1]?

[1] - http://techcrunch.com/2011/06/20/dropbox-security-bug-made-p...


There really ought to be a way to cap your bill. Alarms aren't enough when large bills can be run up overnight.


And what would the action be when that limit is reached? Terminate all your EC2 instances? Delete your objects from S3/Glacier? Destroy all your RDS/DynamoDB/SDB tables? Disable your CloudFront CDN assets? Stop returning DNS answers from Route 53?

This blog genre would then be "Amazon deleted all my stuff and ruined my business because they didnt think I'd pay $26". As the story notes they tried to reach out multiple times using multiple contacts. There are also existing tools for setting up specific billing alerts if you really want to take some action at $26.


Yeah, I don't get it.

Why doesn't an account automatically come with, let's say, a $100/mo default cap. And also a $25/day default cap. Someone playing around would just leave those defaults alone and thereby not wind up looking like a total fool.

Using Wikipedia for a guide, a micro instance only costs $0.013 per hour. An instance with 30 GiB memory and 8 cores only costs $1.00 per hour. So the cap numbers I suggested would work fine for many people.

For some reason Amazon allows people to shoot themselves in the foot too easily. Yes, they usually? waive the charges, but it seems like such a waste of resources, overall. It wouldn't be so bad if bitcoin mining were more profitable. But $1000 spent at Amazon probably mines $10 or less in bitcoin. What a waste of energy.


I documented a similar experience just over a year ago[1].

I ended up helping give a talk about my experience at the Amazon Summit in Sydney. I hope I made a good cautionary tale to the devs/ops/managers attending.

[1]https://news.ycombinator.com/item?id=6911908


In addition to the steps mentioned above, one thing that has kept me safe from my own heedlessness is to never, ever store credentials in a source code tree.

If your project is reading credentials from a file, rewrite it so it reads them from environment variables.

Most IDEs make it very easy to do that, and Python's virtual environments can do that work for you. Yes, it takes more effort, and sometimes it will be a little convoluted[1].

However, it's well worth the effort as you will have a system that you can put your faith in rather than having to double check every time in order to make sure you're not about to inadvertently commit your API keys.

[1] Example of my own: Pycharm does not read variables stored in a virtual environment's configuration, so I have to set them twice.


> Unfortunately the Rails tutorial is pretty bland so about half way through I decided to snoop around to see what my options where. Somehow, almost by chance I ended up a subscriber on Lynda. Lynda only has one Rails tutorial, but its pretty in depth and is backed by a five hour Ruby tutorial.

I had the situation in reverse. I found the Lynda.com rails tutorials first (in 2011) and I found it lacking compared to Rails Tutorial. I'm not sure what's the situation now, from Lynda.com website it seems the author (Kevin Skoglund) has updated the tutorials for Rails 4.


I thought new AWS accounts came by default with a 20 instance limit...


Yes, but 20x the largest instance option is expensive. The limits at Rackspace Cloud are a bit more sane, they limit the total "size" instead of number of instances. So you can run many small ones without hitting the default limit, but not as many large (expensive) ones.


The author just mentioned that there were 140 servers in his account. I do know how expensive the servers can be.


They do, but the bigger instances cost a lot to rent.


> Actually hooking into the API was very straightforwards, and didn’t take more than an hour to set up.

I have learned that if this is the case, I'm doing something wrong with my AWS setup.


Yours is a great observation! With great power comes great responsibility. If you find that you got great power but don't feel you got great responsibility, then chances are you are doing it wrong.


I used some app to upload a 250GB archive to Amazon Glacier. Turns out the app was having difficulties uploading it all at once, and its queue functionality sucked. I explained this to Amazon, since they were suggesting that I use that app to begin with. Turns out I ended up using terabytes of data usage just getting everything up to Glacier because the app was faulty. They tried to bill me something like $1500, but I never paid for it and just had my data removed immediately.


Maybe Github can reject commits by default that contain phrases of magic keys/known strings and file combinations that contain passphrases.


I'd rather Github not become the Gmail of version control.


Github is already the Gmail of version control. Isn't this the reason the attack is possible? Every file is automatically indexed for search.


Yes. However I meant that I don't want Github to start developing features that involves scanning and applying AI to codebases and their users' behavior, because I think it's a temptation they start marketing that data.

E.g., "Buy our engineer intelligence subscription! Find out if your job applicant is a ninja or noob!"

I suppose there's an argument that the data is already public and anyone could apply that analysis, which is true. But the difference is that Github is not profiting from this; their feature development will continue to focus on private repo owners as their customers (whose interests are pretty much aligned with public repo owners).


Is it not possible to setup some sort of dollar value or bandwidth maximum and freeze the account upon reaching that value?


For Amazon it is certainly possible, imo. They just don't do it, AFAIK. An alert that you might or might not get is obviously not as reliable as a hard limit.


I wish!

$100/mo cap would save a lot of these accounts from being hacked. My mistake was not unique, and if you browse around on Google you will find other authors who have had the same issues.


mildly sensational conclusion. his real lesson should be "this is how i learned how to use github."


Agreed. But the discussion here on Github is pretty good and tells us that it's happened to quite a lot of people. So the headline might be a bit sensational, but it's a real problem.


definitely. but he's blaming it on learning rails in a way. this is coming from someone who did this at one of their jobs. i accidentally committed AWS keys, and we had dozens of servers launched on our account in minutes. was insane.


Is there an easy way of putting a strict limit on the amount of money I'm willing to spend on AWS?


I haven't seen anything about limits, but they do let you set alerts when you pass a user-specified spending threshold.


The expensive mistake was to use Amazon at all. An own domain name costs about $1/month, an OVH root server less then $10. Install a minimal Debian and Linux Containers on it, and expand your own cloud, if you need it, e.g. by extending with cheap Hetzner servers.


Regardless of what service (AWS, Linode, Rackspace, OVH) if you leak keys or passwords you have the potential for costs incurred by malicious users.

I have not used OVH, but a quick perusal of the OVH API[1] suggests that you could invoke plenty of commands that would incur costs.

[1] https://api.ovh.com/console/#/order


But your app would not have any keys. You would give it a server and that would be it. There's no API use in a toy project, and so no way to leak account authentication.


Amazon has been really proactive in protecting against these kinds of things. They seem to be searching the web constantly for API keys, because they'll send you emails that say "hey we found your key here, you better do something about that".


The same thing happened to me. Almost exactly. Rails again. S3 bucket for images. Following along with 1monthrails this time. Pushed the key, fell asleep, awoke to Amazon warnings and a $2000+ bill. Also removed.

I wonder how often this happens. Are they mining bitcoins?


It would be really stupid to mine bitcoins. I assume they were mining litecoins or some other more GPU/CPU friendly coin.


Why ? It's slow but it cost them nothing ...


Because they can make money much faster with litecoin.

To illustrate this, EC2 GPU instances have a NVIDIA Kepler GK104 with 1536 cores, so that is like the GTX680. It gets 120 MHash/s on bitcoin, which translates to 2 cents per month.

It gets 207 kHash/s on litecoin, which translates to 32 cents per month.

I think on vertcoin you could get 90 cents per month.


Wow the effectiveness of GPU for mining Bitcoin has dropped dramatically since last time I looked at it (admittedly a couple years ago).

If I remember well you could make a few hundred bucks a month with just one high end ATI graphic card.


I now use bitbucket.


Amazon support told me (at least the ones spun up on my account), where indeed bit-coin miners.


There seems to be some serious confusion here if this dev thinks you can spin up EC2 instances with the S3 API.

Perhaps one lesson here aside from keeping keys outside of public repositories is to learn how an API works (IAM, arns, etc) before using it.


I've said this before, but AWS billing support is usually quite sympathetic to your situation if you've made a mistake. They've dropped $120 in spot instance charges I had which went way over what I expected.


"Turns out through the S3 API you can actually spin up EC2 instances"

Can you really use the S3 API to spin up EC2 instances? or is the guy just mixing the fact that the credentials he used for S3 can be used for other AWS APIs ?


> Can you really use the S3 API to spin up EC2 instances?

No. My guess would be credential with allow:* service access.


I'm not at all surprised to learn of bots cruising github looking for keys. I think a good lesson is that if you ever accidentally expose your API key - revoke or delete them immediately and generate new key.


I'm surprised no one else mentioned the Heroku mistake of putting config in a file instead of an environment variables. Settings like API keys should always be in env vars on Heroku, not in a config file.


If I understand you correctly, Heroku does make that possible. You can use Heroku Toolbelt or their web dashboard to add environment variables.

Adding environment variables to a file is just a convenience.


Damn! I know one thing for sure after reading this, I am not even trying AWS until I really really know what I am doing. For now I'll stick with VirtualBox and DigitalOcean.


Had a very similar thing happen to a co-worker... accidentally pushed an AWS key to github, and over $7,500 in charges before it was noticed.

Fortunately it was forgiven.


This happens very often. Like many others have recommended, disable the global AWS keys and use roles.


The github help page* literally states "Danger.. If you committed a key, generate a new one." In a box. In Red.

I'm not sure how much clearer GitHub could make that.

*https://help.github.com/articles/remove-sensitive-data/


I use github every day.

I've never read nor seen that page.


I wonder how much crypto they mined? Probably $1 worth but still worth it for them!


Always put your API keys in an environment variable or ~/.


mmmh. Using s3 key/secret to launch ec2 instances ? Does anybody have the pointer to the docs for that ?


They were probably using their root credentials instead of creating a narrowly focused IAM profile.



mmh. i think the author mentioned this as well: weren't these in the past just for S3 ?


>Over the holidays, I opted to try to teach myself Ruby & it’s companion Rails.

shouldn't it be "its"?


[deleted]


Don't need an English major for that :)


Why do people use this? for $3000 you can run like 6 4ghz machines with 12 gb ram out of your house with like 10mbs up and 50mbs down. I seriously believe the % of customers of amazon computing that actually need it is like 1%.


Why do people use Amazon? A number of reasons.

Do I want to buy and manage servers myself? Would I pay for an additional computer to host my little toy Rails app? Will my ISP allow me to route network traffic to my personal IP with a standard plan? Most, at least that I've seen, prevent you from hosting.

For a little project something like the author of this article described, using virtual hosting is significantly easier (and more realistic). If you're going to run an enterprise operation, bringing that sort of hardware in house might make sense, but a lot of larger companies are still pushing that hosting out to places like Amazon as well.

Now, if you're asking why Amazon would allow (new) customers to scale up to $3000 worth of hosting overnight, maybe that's a separate issue. But how should Amazon judge that? There are probably users who want to jump straight up to that sort of scale. And evidently they did detect that the author didn't seem to fit that use case -- they actually called him about it pretty much right away.


why doesn't Amazon just ask you on sign-up if you're going to be mining coins, and if you say no, require specific authorization from you (outside of the usual key) to start doing so? Then until you authorize it, they can just not run mining instances on your behalf. surely it's easy for them to tell when this is being done?

EDIT: this got downvoted, but I stand by it, plus it's a question, so you could reply and answer it. In my thinking it's the same reason there's a daily ATM withdrawal limit set by default. You can lift it, but it's there to reduce incentive (payoff from trying to see your PIN and then stealing your ATM card.) the current policy is like the bank calling you and saying, "ummm, I hope you know you've already withdrawn $7,000 and seem to be continuing." Given that bitcoin is (literally) cash, it seems to me saner to not run bitcoin mining instances by default, unless you authorize it specifically. or can they not tell?


Bitcoin mining is not a service offered on AWS - they are spawning up EC2 instances, which are virtual machines you have root access to. From there you can mine bitcoins or do whatever else you want.

Amazon doesn't have access to the data on the instance, or a list of processes running inside of it, or similar.


I know, but even without root isn't it trivial for Amazon to unambiguously tell that this is what the VM's are doing, without looking at anything else on the instances? How can you run virtual machine instances without knowing what the CPU's and GPU's are doing? (There is no mathematical 'private computing' or secret computing over untrusted hardware, i.e. hardware run by other people, that is used in practice anywhere in the world, where for fancy mathematical reasons the operator has no idea what the CPU or GPU of the instance they're running for someone is actually computing. It's not even a cryptographic primitive people know about, and certainly not something performed by hypervisors.)

The CPU or GPU pattern of bitcoin mining must be completely unambiguous and trivial for Amazon to detect on EC2 instances. Or am I wrong for some reason?


I would think that looking at patterns on the CPU level is almost impossible. The overhead would be enormous.

For instance if you look for a magic number. Instead of putting two numbers on the stack, and perform an addition you now have to have a conditional jump before the addition.

Maybe I'm wrong, but at least I think it would be virtually impossible.


You're right, but I meant for bitcoin mining specifically. it's incredibly resource intensive, so it's not something you have to check for all the time, just when something seems to be behaving this way suddenly (and spinning up lots of instances, etc). When there's an obvious huge resource spike, you can just check to see if it's bitcoin mining.

If they want to be neutral and not do introspection like this without permission, they can ask the user on sign-up if they want "Protection from mining processes" where sudden activity spikes will cause them to do introspection and shut down an instance if it seems to be bitcoin mining.

EDIT: plus, bitcoin mining is the same operation over, and over, and over, and over again. You don't have to "catch" it doing a particular operation. A brief sample taken at any time will show the VM doing the same exact thing. (tons of SHA hashes.)

I'm not an expert though so perhaps I'm missing something!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: