Fixing a Disk Space Alert at 3 AM

lmm · on July 13, 2015

Randomly rewriting a file that mysqld has open seems a lot more dangerous than restarting mysqld. What if that was e.g. where mysqld was batching up query results before streaming them to the client? Now you've not merely had a query fail, you've silently sent back incorrect results.

If you can't restart mysqld you have an architectural problem. Your database will go down sometimes and your system should be built to tolerate that.

ybrs · on July 13, 2015

i was just going to write the exact same thing. one should at least have a simple master to master setup for redundancy imho - then these kind of problems are much easier to solve, just restart mysql.

yebyen · on July 13, 2015

I did not even know this was a thing (without using some less well-known mysql extension like Galera), but just googled and found a brief document from Digital Ocean on exactly how to do this with stock MySQL. And here it is, right in the fine manual, supported since 5.1.18.

I don't know how to ask this more precisely, but how are races and concurrent writes on both masters handled? Is it still "ACID safe" writing across the cluster, or would you use something like this as a hot-spare for promotion in case of failover only?

brazzledazzle · on July 14, 2015

I bet it's intended for failover. Otherwise there wouldn't be much point in something like Galera.

yebyen · on July 15, 2015

Yeah. I thought about it, and there is so much work on things like etcd and zookeeper, I would have already heard about it if multi-master MySQL was really a thing worth writing home about. But still, cool to find out they're advertising it.

peterevans · on July 13, 2015

Good thing the author wasn't randomly rewriting a file, then, and that the file was deleted and inferably unlikely to be used in a critical operation!

Restarting MySQL is a violent operation. For large databases, it can take several minutes. If I can avoid it, I do so, and this was a really clever workaround for an issue that indeed avoided a restart. That's a good thing.

lmm · on July 13, 2015

> Good thing the author wasn't randomly rewriting a file, then, and that the file was deleted and inferably unlikely to be used in a critical operation!

That's not at all true. It's very common unix practice to open a file, delete it, and then use it for critical operations.

takeda · on July 13, 2015

I find it quite ironic that you and author of the article completely misunderstand what is happening.

Creating then deleting a temporary file is very common technique to make sure that that for will be removed as soon as the application closes it, it also is used to make sure that nothing will tamper with it. I find amusing that author found a way to still tamper with it.

Now depending on what the file was used for either the corrupted files caused database to return corrupted data to the user, write corrupted data to the database, our in the best scenario return error to the application issuing the query.

leni536 · on July 13, 2015

open's O_TMPFILE (since Linux 3.11) does exactly that without creating a named file and immediately removing it from the filesystem. I don't argue that the other technique isn't common though.

takeda · on July 13, 2015

The older technique is probably more common since it is available in older version of Linux and also compatible with other Unix based systems.

That said it still doesn't prevent the author for modifying the file by referencing file descriptor.

falcolas · on July 13, 2015

Former MySQL admin's advice (for OP, and others who run into this same issue):

Someone probably started running queries which were doing disk sorts. Look for abusive queries coming in and kill those, the temporary files will go away at that point. Based on the size of the files, simply looking for long running queries should be sufficient; if it's the backend for a web server, it's likely that the client and server have already given up on the query (or in the worst case, re-sent the debilitating query).

As stated by other users, truncating random files will cause more problems for MySQL than just restarting mysql. In fact, I'd recommend going in and restarting it now, to ensure that you're at a good state. Failing over to your slave (you do have a slave and failover procedure, right?) is less of a headache than trying to identify what problems were caused by manually truncating these sort files.

Finally, have a look at Skyline from Etsy[1]; a trend monitoring tool like this would have alerted you when the ramp started closer to 1am, well before this was suddenly an outage event.

[1] https://github.com/etsy/skyline

jerf · on July 13, 2015

There's a quick mention that this was a reporting server. I'd guess the reports have a common query in them that yesterday did not spill to file sorts, and today does, so literally overnight the report process goes from using virtually no disk space to using arbitrary multiples of how much you have.

Nothing really great leaps to mind to solve this in the general case. These sorts of correlated behaviors can really be jerks.

falcolas · on July 13, 2015

In this case I'd look for a recent table alter which involves a blob-style column - queries involving those will always be sorted on disk. Limiting the queries to not include the blob fields, or doing alters to varchar fields (even big ones) will help this out.

Having a daemon which kills off long running queries (such as ones with extensive disk sorts) can help as well, just be sure to follow up on the queries which were killed to fix the frontend or chastise the person doing `SELECT * FROM blob_table ORDER BY id`.

perlgeek · on July 13, 2015

Opening files and the deleting them is a classical pattern for using temp files; it makes sure that they will be gone when the process exits. Finding a deleted but still opened file is no indication that it's not needed anymore.

My approach would have been to kill the mysql worker process that keeps them alive. That way the program that started the query gets an error message, instead of whatever undefined behaviour you get by empying the files and surprising mysqld.

feld · on July 13, 2015

Top commenter on that article:

  First of all... If you're getting paged on disk alerts at 3 in the morning,
  you're doing it wrong. Write a script that checks every file system and 
  makes sure that X amount of free space is always present, adding free space
  as-needed, and emailing you on the backend during business hours whenever 
  the volume group is nearing depletion.

I'm sorry, but no -- I'm not going to automate adding more disk to a VM, extending a volume group, and growing a filesystem. Not now, not ever. No way in hell.

That needs to be done by a human after a backup has been tested and writes have been quiesced.

   This is what grown-ups do. Storage is cheap. Downtime is not.

Then I don't want to grow up like you. Have fun recovering your corrupted filesystem.

trcollinson · on July 13, 2015

I'm not the person who posted that comment, but let's take a step back. I have scripts which do this for me. However, I also have automation that takes back ups of the systems, restores the systems on other virtual hosts, and verifies that the restores are up and running properly. In fact if you can script one, you can generally script all of those scenarios.

For expanding database volumes, for example, I have a system which first monitors the size of my current database systems. If they get too large, it will create a new virtual which is bigger, restores a current backup of the running system to it, verify that, put it into rotation, make sure that all current data is replicated, then remove the old system.

Frankly, doing this by hand would be highly error prone and dangerous. I say, script everything.

feld · on July 13, 2015

Of course I have scripts to do that stuff, but considering the potential danger involved and being burned so many times in the past from bad filesystems/tools/kernels it is something I would only initiate manually -- never as a knee-jerk reaction to "low disk space" alerts.

dang · on July 13, 2015

Please don't copy-paste a comment from elsewhere in order to flame it on Hacker News. The local supply is excessive as it is.

The substantive point you're making is fine, but would be better and clearer if you expanded on it a bit and dropped the indignation.

digi_owl · on July 13, 2015

Welcome to the devops world, where admin is boring and coding is fun...

toomuchtodo · on July 13, 2015

Eh, sort of. I do Devops/Infrastructure exclusively, and while the tedious stuff is automated away, I've seen too much shit break on the other side of the fence to trust it fully (AWS API throwing errors when you absolutely need it to work, etc).

If something is broken, you should have enough automation to get it into a known good state without a human involved, while maintaining data consistency. Anything else should be automated, but you should still be around to babysit it while its going through the motions.

If you think you can automate everything and always trust it to work flawlessly, you haven't been around long enough for the edge cases.

digi_owl · on July 13, 2015

> If you think you can automate everything and always trust it to work flawlessly, you haven't been around long enough for the edge cases. <

And then you have some "clever" coder coming along with a fix for said edge case (that invariably create some new edge cases just outside the domain of the fix).

trcollinson · on July 13, 2015

That's a falacy of an argument really. It's not as if admins can't properly script environments and changes. Monitoring, updating, and scripting of changes happened long before the term devops came into existence.

digi_owl · on July 13, 2015

http://www.commitstrip.com/en/2015/07/08/true-story-fixing-a...

The "just add more to the pool" solution seems akin to the "solution" in the strip above.

And i have seen similar arguments for when a daemon or server goes down. Hook it up to a script that reboots it automatically (or fires up a new instance over and over and over) and go back to coding.

brazzledazzle · on July 14, 2015

I know it's risky to expand an existing disk online but isn't adding additional disks to the volume group pretty straightforward?

Maakuth · on July 13, 2015

It is sometimes useful to keep around a 1-10GB 'ballast' file full of zero you can remove if you're in this kind of emergency. Tuning the root reserved space as mentioned in the article comments is another useful trick.

michaelx386 · on July 13, 2015

I love this ballast file idea. It reminds me of a programming story by Noel Llopis[0] where a game developer was working with a strict memory limit and placed static char buffer[1024 * 1024 * 2]; in the code as an insurance policy[1]:

[0] http://gamesfromwithin.com

[1] http://www.dodgycoder.net/2012/02/coding-tricks-of-game-deve...

mcherm · on July 13, 2015

Useful indeed. But it won't help in cases like this one where the problem was that a process was consuming disk at a prodigious rate.

gojomo · on July 13, 2015

...and of course make sure such file isn't 'sparse'.

rhpistole · on July 13, 2015

I cannot imagine being in this situation and not either looking for queries to kill in mysql or restarting the daemon.

And as others have mentioned, having an open file descriptor to a deleted temp file is a classic unix pattern, truncating them is a horrific idea.

Daneel_ · on July 13, 2015

As mentioned in the article comments, this is a classic example of why you should partition your systems.

Still an interesting way to troubleshoot and alive the issue though. I'm sure during on-duty at 3am I wouldn't have come up with anything better.

marknadal · on July 13, 2015

This is the canonical story of databases, I had similar frightening nights several years ago with my MongoDB setup. I'm not a DevOps guy and having to wake up to figure out what is broken (and it always being the database) eventually took a toll on me. What's the point of running my database in the cloud if my storage space is still finite? Isn't the whole point of the cloud unlimited scalability? So why on earth do I have to be an expert in MDADM, LVM and all that other junk in order to make use of those unlimited number of hard drives? Ugh, sorry for the rant but this story causes triggers in me back to those days. I tried doing stuff differently since, experimenting with new database concepts, trying to make things better, and the good news is that I have never had this problem since - so hopefully I can save the pain and woes of future souls, http://gunDB.io/ .

woof · on July 13, 2015

Mysql uses TMPDIR for sorting, multiple large tempfiles indicates heavy queries that should be optimised and/or missing indexes.

Deleting those files while the server runs is an awesome way to "f@ck s*it up"! Do you really trust Mysql to behave nicely in that situation?

Add more disk space and point TMPDIR somewhere else than /tmp, /var/tmp or /usr/tmp!

Buetol · on July 13, 2015

To find easily about disk usage, I'm personally a big fan of `ncdu`. It's a really nice ncurses interface to du.

DEinspanjer · on July 13, 2015

Second this in general, although I don't think du (and consequently ncdu) will show anything about disk space claimed by deleted but unclosed file descriptors.

brazzledazzle · on July 13, 2015

Is there any risk to mysql when we nuke those handles?

djcapelis · on July 13, 2015

Yes. Never do what this person did.

mdellabitta · on July 13, 2015

Probably!

stephengillie · on July 13, 2015

In Windows we use TreeSizeFree to find the offending files. Tracking down the lock and unlocking can be tricky, but Process Monitor and File Unlocker are usually go to tools.

amelius · on July 13, 2015

I'm still looking for a way to close open sockets of a process. I know about the gdb hack [1] mentioned in the comments of the article, but it seems like an opportunity for somebody to write a nice tool for this.

[1] http://hacktracking.blogspot.nl/2013/06/closing-process-file...