Hacker News new | past | comments | ask | show | jobs | submit login
Inventory management in MongoDB: A design philosophy I find baffling (ayende.com)
127 points by douche on June 24, 2017 | hide | past | favorite | 86 comments



Just to make it clear, the issue the OP is pointing out is with whoever wrote this inexcusable piece of code in the book, not MongoDB itself. Even if the authors later clarify there's the possibility of error, just publishing such misleading and grotesque solution for creating transactions in a non ACID database is very poor judgement.

MongoDB should not be used for an online ordering system, period. But if the programmer had no better alternative than Mongo, then please use Mongo's atomic operations [1] and nested documents to make sure nasty Bad Things don't happen.

[1] https://docs.mongodb.com/manual/tutorial/model-data-for-atom...


In the case of an ecommerce site, where multiple products can go in a cart, I don't see any real right way to do things with MongoDB. Atomic for one row just doesn't help much, and you can't nest arbitrary cart product mixes.

As you say, bad idea in the first place.


I was part of a team that successfully did inventory in mongo. The cart makes zero difference if Mongo is only storing the inventory. It's fine to atomically change a single inventory row.

Operations that span multiple rows can be safely performed with a bus in front of it.

You can't judge the idea. I don't think you've quite grasped it. I agree the book example isn't great, but that's not the technology's fault.


I am assuming a system where available inventory is debited when the CC transaction completes.

How do you do that with single row only atomic transactions? And without charging $100 for what the customer thought was qty 3 $25 rakes and qty 1 $25 shovel...but is now something less than that due to concurrent purchases from other customers?


"How do you do that with single row only atomic transactions?"

In this scheme, you should theoretically be able to implement transactional protocols (two phase commit and friends, see, e.g., http://dl.acm.org/citation.cfm?id=3027893) that ensure you do not actually sell two people the same thing, but may return an error at buy time that the inventory disappeared while you tried to buy it.

You pay a sometimes-very-high cost in throughput.

Of course, talking about mongodb, it's probably broken in a myriad of ways no matter what you do :)


Nailed it, but I'm curious where you think the throughout suffered; we didn't encounter that problem.


It was a distributed system. The inventory was kept in Mongo with atomic debits and credits applied with a bus in front. SQL existed for the actual order, but inventory was managed in Mongo fine.

I'd like two of this product? Increment reserved in Mongo atomically, fire a bus message that inventory was reserved. I'd now like just one? Again atomically remove one from reserved and fire another message. Simplified of course.

Two users can't reserve the same inventory, reserving the inventory is atomic, and everyone else occurs during reconciliation. Typical distributed systems.


> It was a distributed system.

So you can't have exactly once semantics. When you say 'bus' I assume you mean a message broker which at best gives you at least once semantics. What happens when the message is delivered multiple times?


That's just a broker; all message handlers must be idempotent. This sometimes means storing message IDs or creating one-direction state change conduits that fail subsequent duplicate messages.

To answer that question more succinctly, nothing special happens when a message is delivered more than once.


At least, when we didn't screw up :)

Those were fun days.


I would assume you propagate a transaction ID or expected row version (and persist transaction histories and/or version numbers along with the inventory item row after mutation).


And what happens when that shovel gets stolen from the warehouse before it can be shipped to the customer?

You have to handle over-selling no matter what. The "source-of-truth" of you inventory system is whatever is actually, physically sitting in a warehouse somewhere. Not what your database says.


Yes, you have to handle overselling for other reasons, but I don't see why that's a good reason to not protect against it when you can.


I think Amazon calls it Apology Based Computing or something like that. You're better off at scale writing out a transaction log, then seeing if actions were successful and compensating otherwise.


> I was part of a team that successfully did inventory in mongo.

Yes, sure, let's go all get PHDs in electronics, networking protocols, IPC and database architecture, so we can write reliable transactions for our webshop/CRM. Or I don't know, maybe use mature system multiple decades in production.


Regarding "SQL is better suited to this use case because it has transactions" comments:

Before we had 3-tier architectures, people would have designed a shopping cart use-case as a single SQL transaction that would last maybe 10 minutes. The DB would make sure everything stays consistent until the final commit. The GUI would keep an open connection to the DB the whole time.

In the web age, you want stateless services and HA. It means a transaction can't last more than a single web page. It becomes more challenging to design a shopping cart, because the DB can't handle a long-running transaction anymore.

Writing a correct system that reserves the items you put in a shopping cart and doesn't leak items and doesn't sell the same item twice is not easy. A transaction Rollback will not do the cleanup for you, because there's no long running transaction anymore.

So SQL transactions can't help as much as you think.

Mongodb doesn't have transactions, but updates are atomic, which allows CAS and optimistic locking use cases. I agree it's less than ideal when you need to provide ACID behavior, but don't believe it's easy with SQL transactions. It's not.

The author regrets the book's suggestion of putting each object in stock in its own document, and I agree it's probably a recipe for disaster. Atomic updates make this design absurd.

You could easily db.products.update({_id: productId}, {$inc: {inStock: -5}, $addToSet: {pendingCarts: {cartId: cartId, quantity: 5, timestamp: new Date()}}}). This has the exact same atomic behavior as a SQL transaction to remove 5 from the stock and add a new "shopping cart entry" in another table.

(you still need to expire cancelled shopping carts, and you may need a transactional way of completing the order: it's also manageable if designed as an idempotent operation)

Anyway don't over-simplify this use case and believe "a single big SQL ACID transaction would handle the problem". That's just not true.


Well put! Having worked on corporate cash accounting and inventory systems, it's pretty clear that accounting is handled by banking systems _does_ not rely on SQL-transactions. It's handled by auditing and transaction based system (e.g. event systems) encoded in double entry accounting. ATM's for example will generally dispense money up to a given limit (say $500) as the cost of lost consistency is outweighed under those limits by providing high availability. If the user double draws and goes over their limit, they are held accountable at a later point (usually this is bounded by hard cash transfer limits).

In a warehouse inventory setting, when you _do_ have inconsistencies (e.g. lost items, misplaced orders, etc), a system which strictly enforces "inventory limits" will as often prevent employees from doing their job and shipping an item which could be sitting right in front on them but is not counted for in the system. Auditing combined with optimistic locking resolves this and allows both accountability, tracking, and flexibility.

Those are two real world examples which underly the idea that ACID guarantees and locking / transactions are two separate intents. CouchDB & Couchbase both provide ACID guarantees per document making it straightforward to implement multi-service applications using event base systems. It's equivalent to MongoDB's CAS operations. Really all that you need is to ensure that your changes are atomic and generally ACID compliance at a key/document level enables you to do this readily.

Personally, I find that SQL-style transactions just cause lots of issues with performance and locking contention while enabling developers to skimp on thinking deeply about how to appropriately design their data flow. Sometimes that's the right call for a team, but sometimes it's not.


> I find that SQL-style transactions just cause lots of issues with performance and locking contention while enabling developers to skimp on thinking deeply about how to appropriately design their data flow

"Designing data flow" would mean having to spend more time and money for the same task?


You always have to spend time and money to get performance and consistency at scale.


Transactions are never needed with a fully normalized model so if transactions are needed it is probably because your model sucks.

Or it is because you denormalized your model because your db engine's performance sucks in which case the transactions will probably just make it worse.

Good schema design and lock-free/wait-free (transaction-free) algorithms are not "reimplementing transactions in the client."

OPs example is garbage but his proposed transaction solution is garbage too.


Eh? If you don't have transactions of some sort you can't update multiple tables simultaneously, which you need for a denormalised model?


If your schema is fully normalized then you never have two instances of the same data in more than one place therefore you never have to update two things simultaneously.

Anytime you need a transaction it is because you have the same data, or some calculated derivative of the same data, stored in different places, which is why they both have to be updated at the same time to maintain consistency.


"Transactions are never needed with a fully normalized model": I just don't agree. Fully normalized most likely means many tables for everything, and transactions are needed to maintain consistency.

"you denormalized your model because your db engine's performance sucks": you're most likely to denormalize because joins are costly no matter what DB you're using.

"the transactions will probably just make it worse": denormalization is most often used to reduce the number of operations (so transactions are unlikely to make things worse).

I agree about the amount of garbage in the article though.


Odd, never thought of the distributed implementations as normalized or not, though that's exactly what the underlying design is about. Thanks! You must design at a minimum a normalized way to store data reference keys, but the prevalence of SQL makes me implicitly associate 'normalization' only with SQL. It's also good to note that adding in vector clocks / revision keys lets you store denormalized data (e.g. Caching with staleness detection) in various data stores as long as the parent id and revision keys are normalized somewhere.

Re: pjc50, yes that's what calafrax means.


Would you happen to have any articles/papers/blogs/talks that demonstrate how to perform the traditional bank transaction/shopping cart examples using an event-sourced system? Curious to learn more.


This is very close to how a team I was on solved this issue at Amazon. We took money, held inventory, had a shopping cart, and it worked out fine.

A service bus was necessary, but the actual atomic transactions in MongoDB didn't fail us. We didn't lose data. While the nay-sayers discounted Mongo, we were raking in cash on top of it.


> Anyway don't over-simplify this use case and believe "a single big SQL ACID transaction would handle the problem". That's just not true.

> In the web age, you want stateless services and HA. It means a transaction can't last more than a single web page. It becomes more challenging to design a shopping cart, because the DB can't handle a long-running transaction anymore.

Or you could cheat and not update the inventory until the purchase is made. ;)

The problem with the "adjust inventory on cart" in low inventory situations is you'll have 80% of your carts holding items that won't convert until a cart expiration. You only need the actual purchase to be atomic. Then, once the queued credit card transaction completes you adjust the order to refund the inventory [declined] or ship the order [completed].

That pattern absolves you of needing complex logic and allows you to distribute the activity relatively trivially as a set of two independent idempotent operations. And if the analog portion of the process fails, the picker hits a button and the order gets queued for a refund. Once the order is cancelled, another service contacts the customer.

Cart expiration, etc. makes the system unnaturally brittle by adding non-critical steps to the process.


You don't leave the transaction open while the user browses; you check for stock at the start, possibly moving items from "in stock" to "in cart" state. And then do the actual transaction for stock->sold at the time before you send off to the payment processor. If it's rejected, you return it to stock.


> You could easily db.products.update({_id: productId}, {$inc: {inStock: -5}, $addToSet: {pendingCarts: {cartId: cartId, quantity: 5, timestamp: new Date()}}}).

That is unlikely to work well at much scale. At least last I knew, Mongo docs are limited to 16MB and the entire doc is read then written in cases like this, very slow on large docs. Given the amount of data that may be attached to a `product`, it's not hard to hit these limits.


Please do the math... Before this document reaches 16MB, you're bigger bigger than Amazon. If this solution scales up to Amazon scale, that's good enough for me.


The example in the book is very simple - it can be implemented in SQL database with 2 tables (products and carts). When you have more entities and relations it would become too complex to keep all of them consistent in a denormalized scheme in MongoDB. You will have to write cron jobs that would cleanup broken references and still get errors.

So I wanted to say that denormalization and lack of foreign keys in MongoDB is much worse that lack of transactions.


people would have designed a shopping cart use-case as a single SQL transaction that would last maybe 10 minutes

This problem was solved in 1965 by CICS for the use case of "you're on the phone to a travel agent and they're finding you a ticket on their terminal". No "10 minute single transactions" anywhere...

In the web age, you want stateless services and HA

Those who forget history are doomed to repeated it.


I thought about CICS too, but I guess few even know what it is. SQL was the contender here...


New families of DB technologies generally traded certain things off (like ACID guarantees and transactions) in exchange for other things like scalability or flexibility. When someone comes back, and in user space reimplements the things that the DB intentionally traded off, you get the worst of all worlds. It's quite a bit like flattening one end of a screwdriver so that you can make it work to drive nails. Yes, you can make that work, and in some rare circumstances where you're trapped on a desert island that might be your only option.

The rest of us will just use a hammer.


My prior startup tried to do a security product with backend in Mongo. It really needed transactions and to avoid N+1 issues.

DB team insisted on writing "DAOs" that ended up pulling 1GB+ of data back from Mongo to merge in EACH of 100+ data points from a scanned machine. Similar issues in UI presentation. With multiple threads doing each of these things simultaneously there were many out of memory dumps. I analyzed these multiple times and told the DK VP Engineering what the problems were, and they didn't follow up for 6 months. He was gone soon after.


> DB team insisted on writing "DAOs" that ended up pulling 1GB+ of data

That DB team shouldn't be allowed near any database. Why on earth would they go for such a moronic abstraction?


I remember writing documentation for a Perl system in 2001 - it was using MySQL MyISAM tables and the main developer had a few hundred thousand lines of Perl that acted as the same kind of client attempt at transactions. It was a mess and huge amounts of money were spent on trying to get the thing to work. A few months later InnoDB came along and made it apparent that trying to write transaction logic in the client was a very bad idea, which seems to be the point of this article.


That was a terrible time in history - lots of Perl web apps built using MySQL or mSQL, both of which lacked transactions, right at a time when e-commerce was taking off.

Although Oracle, Sybase, SQL server and friends all transactions at that time, somehow the the mindset was that it was a complicated enterprise marketing gimmick, MySQL/mSQL are faster and simpler, and we can work around it in the client side. Seems like not much has changed.


That was a terrible time in history - lots of Perl web apps built using MySQL or mSQL

It was made worse by the MySQL team actively advocating against features they didn't have "you don't need transactions, do it in your application", "you don't need foreign keys, do it in your application" blah blah.

20+ years later they're still struggling to shoehorn it in.


lock tables


Should have a disclaimer, founder is the founder of RavenDB and it's clear he's cherry picking things and blaming it on the database vendor, instead of whomever wrote that example.


Where did he blame it on the database vendor?


TLDR:

Using a database that doesn't offer ACID, in a manner that requires ACID has non-trivial associated costs. This may also leave you open to a number of strange situations, where inventory quantities are unknown or incorrect.


Thanks for bringing up nightmares. Inventory on the web is one thing, think about how you'd do this for inventory in person at the store when a customer has product A in their hand and cash in the other. The inventory system says there aren't any in stock - should you sell them one? Of course.

Are you tracking individual lots of inventory and the costs you paid for them? Against which lot did you sell this one? You don't have any - so how do you calculate the margin for this item you sold but don't know how much it cost or where it came from? If it is returned, do you restock that inventory?

Mongo, SQL - they have there differences, but doing inventory management is tricky no matter what technology you use.


Particularly fitting comment:

"Mongodb, the ultimate Maybe monad. With a built in fromMaybe mempty call for your convenience."

Per Hmemcpy and Michael Snoyman on Twitter.


Makes it great for building Snapchat clones though!


Did I read that right? A document per item in inventory? This seems horribly inefficient.


It depends. If you use the event sourcing design pattern and CQRS this can be very efficient, especially if you have huge number of purchase and sale events.


As someone not versed in Mongo or anything similar, what is the alternative? A large array of docs that has to be loaded in to memory to work with?


From the article:

    > The example is that if you have 10 rakes in the stores,
    > you can only sell 10 rakes. The approach that is taken
    > is quite nice, by simulating the notion of having a
    > document per each of the rakes in the store and allowing
    > users to place them in their cart.
In other words there are 10 documents in Mongo, not 1 document with a `"quantity": 10` attribute.


Imagine that every of your items in a warehouse has a unique barcode to track it. In this case you have to keep a record in a database for every item.

And with current amounts of RAM on servers there would be no problems even if you have tens of millions of items.


Imagine that every of your items in a warehouse has a unique barcode to track it. In this case you have to keep a record in a database for every item.

You could have a list inside a single document.


And what about tracking item status and location? Another list?


Lists can contain items with complex structures, not just strings.


Then it would be difficult to add a reference to a specific items somewhere else (or one would even have to duplicate the data). The benefits and disadvantages of normalization vs denormalization are long known and this (denormalization) is possible with classic SQL DBMS too.


A products table with a document per product. A user table. An order table with line items in an array of embedded objects.

One would subtract product from amout field in products table to set new inventory level. A new order document gets created with all information needed to describe an order. Fields like total, subtotal, date would sit at the root level and line items with product descriptions and prices would be embedded as an array of objects. Then a user object with user_id, name address would be embedded.

Dealing with documents is a different but being able to contain the entire dataset with some relations is nice.


What else would you suggest?


{"product":"rake", "in_stock":10}


What facilities are those at? What is their current status? We have a defect report; what is the case number?


If you want to use that in the real world, I'd say there is no way around a real relational database.


They should not try to emulate SQL databases, there are other ways to manage inventory without transactions.

One way is to add a field to an item that show its status: whether it is in a warehouse, in someone's cart, ordered or sold. Then adding an item to a cart means updating those fields. There probably is a way to do several similar updates atomically.

Another way is to use append-only collection, that keeps a list of events, like "Item X added to cart Y", "Item X sent to delivery".

But I guess when there are more entities and relations this would become too complex to manage. While SQL databases have no problems with hundreds of tables and thousands of columns.


> Then adding an item to a cart means updating those fields. There probably is a way to do several similar updates atomically.

Atomicity is document level, not collection level. So you can't update multiple documents atomically. Or do you plan on having `{status: 'in-cart', cartOwner: 'customer-id' | null}` and single document for every stocked item (like 1k copies of the same book would be 1k db documents and you also have all those sold from before)?

> Another way is to use append-only collection

How does it help with overselling? To decide if it's okay to append, you have to know if current number of items is greater than 0 (don't forget to lock other clients out of appending this whole process, so they wait for you to finish).


MongoDB has purpose, but inventory management is not one of those purposes.


> MongoDB has purpose, but inventory management is not one of those purposes.

What purpose does it have frankly? The only one I see might be GridFS for what it is worth, though I don't believe one second the performances are that great, but when it comes to document oriented DB, Postgres can store both JSON and XML and query them and also do partial atomic changes. Scaling? easier maybe... Now competition is good and I'm sure NoSQL db success kind of forced traditional players to innovate. But I see no reason to use MongoDB in 2017.


"No reason to use MongoDB in 2017"

I would reply: replica sets and sharding and multi-threaded architecture and the absence of impedance mismatch, all in a single product that was designed for these features.


> I would reply: replica sets and sharding and multi-threaded architecture and the absence of impedance mismatch, all in a single product that was designed for these features.

If you don't care about data integrity, transactions and efficiency, then by all means. But in that case, a distributed file system is as efficient.

> multi-threaded architecture

I fail to see how a multi-threaded architecture is a problem with Postgres.

> the absence of impedance mismatch

Postgres supports JSON type(and XML), so no.


> > multi-threaded architecture

> I fail to see how a multi-threaded architecture is a problem with Postgres.

While it's not a fundamental issue, it does have some considerable drawbacks. A lot of the work required to make query processing use multiple cores (9.6, and the upcoming 10 release), would've been a lot easier if there were a shared memory space. There's no portable way to guarantee that shared memory that's allocated after process start can be mapped to the same addresses in different processes, so you've to deal with a relative addressing etc, which both complicates and slows down.

There's also upsides, don't get me wrong. Primarily around robustness in case of failures, but also some around resource isolation.


CockroachDB called, problem solved for SQL with ACID.

Replication and Sharding are easy.


How is this relevant ?

MongoDB is a document database. If you use CockroachDB you lose this capability.


CockroachDB reached 1.0 less than 2 months ago. I'm not sure what your domain is, but personally I would be very hesitant to go with a database which is so new


MongoDB escaped being glorified /dev/null just last fall.

https://jepsen.io/analyses/mongodb-3-4-0-rc3


Exactly, this is an argument in waiting for the real-world behaviour of the database to be well understood


Have you actually used replica set and sharding?

I did and it is a complete disaster in MongoDB.


Yes. With hundreds of nodes.

MongoDB also has some very large customers using it so clearly it isn't a complete disaster. I would chose to use it over PostgreSQL any day of the week (not that there is a clear option regardless).


Postgresql had never had a great story for sharding large datasets.


Maybe it can be used as an advanced cache with persistense and secondary indexes?


[flagged]


I think you're trolling but in case you aren't: structured logging, distributed cache, heterogenous analytics.

You might prefer other solutions to each just as I prefer technologies other than Perl, but "no purpose" is incorrect.


There are better solutions to all of those use cases. Logging? ELK. Cache? Redis. Not sure what you mean by "heterogenous analytics" but that's usually a layer on top of whatever data store you choose, and we've established that MongoDB just isn't a good data store.


It's not trolling. MongoDB really doesn't have a purpose. There is nothing it is good at.

https://thehftguy.com/2017/03/29/whats-the-best-nosql-databa...

logging => elasticsearch

distributed cache => memcache

analytics => any SQL database for low volume (< 1TB). any data warehouse database for high volume (> 10TB).


"Memcache"

Requires holding all data in RAM. Loses all data on a restart. Can't cache values more than 1mb. (Not that these are necessarily terrible; they're limitations that may not be appropriate for all uses.)

"Don't use mongo, use any data warehouse database"

That's what I suggested; you just turned it into an abstract recommendation.


All the cool kids are beating up this mongo kid, let's join in, we'll be cool too!

I'm sorry but your comment here on HN adds no value other than just piling on. MongoDB has indeed a purpose, and it's wildly misunderstood.

MongoDB initially optimized for engineering efficiency (allowing developers to prototype quickly) over engineering quality. This allowed them to gain market share and with the newfound wealth (and unicorn valuation) then buy up a real storage engine which supports transactions, and written by DB veterans (of BerkeleyDB fame). Early versions were definitely buggy, but since bringing in WiredTiger, they're now a serious player.

That initial gamble on optimizing for engineering ease-of-use is the reason they're not out of business, like Rethink and Basho. As a startup you have to hustle, can't just sell on the philosophical beauty of cleanly abstracted haskell.


That initial gamble on optimizing for engineering ease-of-use

But that was not a clever gambit. They had a bad product that they told a lot of lies about. That they got rich doing that is a sad indictment of our industry.


Have you actually the product ? And be honest here.

Because it was never a truly awful product. Yes it had some bugs but it was no worse than what we were seeing with Oracle or Cassandra. And if you call marketing blurbs "lies" then I am surprised you manage to survive day to day at all. PostgreSQL calls their database the "most advanced", Oracle calls theirs "#1 Database for Cloud Scale Simplicity". It's just regular marketing fluff.


Yep. Project to replace a single 24-core, 24G, SAN-connected Oracle box with 10 48-core, 96G boxes with internal SSD running Mongo.

The Oracle box was faster and had a DBA look at it maybe once a week. The Mongo setup required a full time team of a half dozen people. It's a complete joke. And their marketing lies included "you don't need a DBA"..


I used it. It was awful.

I cannot think of a single use case for mongo.


Indeed.

The truth is that MongoDB is well suited to a fairly narrow set of use-cases, but ends up being used for all sorts of stuff in practice. Hence the weird contortions which the author observes in the book they're reading.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: