The conventional wisdom is that it works better to get big with an unsustainable revenue model and convince people you can make it sustainable than to stay small with a sustainable revenue model and convince people you can make it big.
Spend whatever you can to capture the land, then charge rent.
Sometimes it turns out you can't hold the territory. Sometimes it turns out it's not worth much. But the times it works out are the times that make funds.
I'm inclined to agree with you. It's not my field but I suspect VC pressure has something to do with it: they need their 10x returns, or their unicorns, to make up for all the losses elsewhere and "get big fast" is the most reliable mechanism for doing that. Not suggesting I'm a fan of it, btw.
FWIW when I was still a student and not in a 9-5 I wondered what if would feel like to use basecamp. Now here I am happily going with flow using slack on the fre eplan. One of my coworkers sometimes asks if we shouldn't go with a subscription to have the whole logs but we rarely have to dive so far in the past at all.
In technology, it's often the cost of re-integrating with something else. MongoDB makes this lock in ever better because it's API isn't SQL, so getting off is even harder than usual.
I don't really understand your question... this is the favored approach of many Silicon Valley VC firms, and they are widely considered some the smartest tech start-up investors in the world.
If it really is a bad idea, and those firms have been so successful for so long not because of it but in spite of it, then it seems like a huge opportunity for others to come along and displace them with a better model. There's a lot of money in many of these tech markets. But we don't see that happening.
I guess the answer is that it's unintuitive... it seems like an obviously bad idea to many people, but the people that have the most experience think it's a great idea and continue to succeed with it.
There are other approaches: in China, for example, investors generally considered profitability much more important and expect it very rapidly, like in the first 6-18 months (though this has eased quite a bit towards a more SV-style model in recent years).
The gripe then is that companies can't have any long term vision or go big because they have to be making profit immediately and constantly.
Yeah, I when I thought about it I realized when I say "SV VC" I mean probably only the top 10-20 firms or so - the "top tier" or whatever, Sequoia, Accel, KPCB, Benchmark, etc. (highly subjective but you get the idea) - and I think those guys tend to do OK, and it has been mostly the same ones my whole time in tech (~10 years), and I know many go back considerably before that.
I meant that a lot of those people are down with the land-grab now / rent later model.
It's a bad idea most of the time but sometimes it works. The smart investors have a portfolio of many bets, and it's priced in that most won't succeed. For the entrpreneur I think this is a bad idea but for the investors it's workable if they can do it a few dozen times over.
It's just injection of both money and risk, amplifying the result. It'll fail more often than usual. But if it succeeds, wow.
It's a bad idea for the small business owner/startup to pursue, because investors buy into it for the 1 in how-ever-many that get big. 1 facebook is worth many failures. And to be fair, they're not just throwing money at the wall and hoping it sticks - though there is some element of that. There's also the many companies that don't make headlines but are generally - if modestly - successful as well. So, the investors don't want to lose money of course, but they're okay weathering several losses if the occasional win is big enough to more than compensate.
On the flip side, if you're a small business trying to grow your company, this hole "carve out the market operating a loss in hopes that you'll make it big" idea doesn't work in your favor most of the time, statistically speaking.
Modest investment for modest gains for modest success for long term growth is the smarter bet for the business owner (usually), but that's not as flashy, headline worthy, or profitable for the investors as the go-big or go-home approach.
So, it depends on what perspective you're taking as to whether this "conventional wisdom" really is good or bad.
> The conventional wisdom is that it works better to get big with an unsustainable revenue model and convince people you can make it sustainable than to stay small with a sustainable revenue model and convince people you can make it big.
This used to be called fraud, but now calling it out as such is thought of as regressive and anti-entrepreneurial.
I'm sorry if my wording was confusing; the company in these both situations is convincing people (investors) that they can become those things in the future, not that they are at the current time when they're not (which would likely involve some sort of deception).
It's like:
Case 1: "We are spending like mad to capture what we believe is a valuable market as fast as possible. This means we are losing money like crazy right now, but it is working and we are growing fantastically. Once are in a large dominant position, we believe we will be able to retain that market, so we won't have to spend as much on growth, and we can focus our energy on maximizing revenue and being capital efficient and start making a lot of profit."
Case 2: "We have validated our product market fit and business model, and are profitable. Our market share, growth and revenue numbers are small but stable and positive. We believe we will be able to rapidly accelerate our growth and market share in the future while maintaining our margin and start making a lot of profit."
Both are good, neither are fraud, but (1) currently tends to be the favored approach for venture financing, and one way or another it tends to be venture financed companies that end up dominating the important tech markets (though that doesn't necessarily means it's casual).
Last year (through Jan '17) spent (very simplified):
$29.9M on Cost of Revenue (hosting fees + the people who run things + support/consultants)
$78.6M on Sales & Marketing (sales people)
$51.7M on Research & Development (programmers)
$27.1M on General & Administrative (managers/executives/HR)
This was spread across ~820 (as of July) employees. Hosting + Rent + Employees = $188M. Basically they spend $229k per employee which works out to very roughly: ~43% Tech, ~42% Sales, ~15% Management
By comparison, Cloudera is quite similar: 42% on Tech, 45% on Sales, 13% on Management. They made $261M last year and spent $448M, losing -$187M across 1,470 employees ($305k per employee)
A small correction, G&A is everything which cannot be directly tied to per unit of sale. So if they have say employee(s) who work on installing and maintaining an active directory software - used by everyone in the company, those salaries will also be part of G&A.
Mongo going public for same reason all tech companies go public in 2016+ - they are done growing, they are not going to win it all after all, they're fucked, and now the investors want to liquidate
Huh? What is your bar for "done growing". Because the rest of the normal publicly traded market doesn't grow revenue at 50+ percent a year. The average for the S&P 500 is something more like 3-4 percent revenue growth per year.
MongoDB is part of the same thesis as something like Twilio -- they IPO'ed over a year ago with the same belief around differentiating via developer-centric customer acquisition. Twilio is still putting up ~70% yoy growth.
Twilio's growth is exciting but it isn't reflected in their stock price. Every few weeks, I find myself going back to look at their share prices.
Each time, I'm tempted to pull the trigger and invest a moderate five digit sum. I just can't go through with it. I usually buy and hold, long-term investing, and Twilio is one that I've been monitoring since their IPO.
I'd probably pull the trigger if I saw just a bit more fluctuation. It hasn't necessarily got to be just growth, but some fluctuation just to indicate that they are doing something of note.
For 70% yoy growth, it doesn't seem to be really impacting the company's valuation. I'd just checked it earlier this week, and just checked again now, and it's pretty much been the same price for a while now.
Twilio is trading at 10x revenue - which is what an expensive strategic SaaS acquisitions would sell at. The average is 4x[0] - its crazy overpriced even with the growth rate (it's still growing into its valuation)
I think the price was driven up to 20x because many thought it would be acquired.
It also has a ton of short interest in it - so you really shouldn't be investing in it at this point. Do some more research outside of what the price is doing before doing anything (there are undervalued stocks out there)
Yeah, I doubt I'll be investing in them. I can afford to park money in the long-term and take some risks but this one just seems stagnant.
Most of my assets are managed by a financial manager, but I do like to dabble with a smaller fund that I can risk losing entirely. I've actually had pretty good results, considering I'm not really skilled at this.
One of my methods is to simply read comments at sites like HN, Slashdot, Fark, and maybe even Reddit. I look for companies who are getting free publicity and are being reviewed by people who know more about the tech/business than I do.
I then go watch the trading volume and look for fluctuations. I'll read everything I can find about the company, including things like comments from employees at sites where employees anonymously rate their employer.
So far, it has been pretty successful. I made some pretty good money on Yahoo! at one point. I also bought ~$20,000 worth of Tesla when it was priced at $24. I still own those, I've not sold them. There have been a few others and they've done well enough.
I am absolutely not skilled at this. I mostly just buy and hold and plan on selling when it reaches a certain price. I don't have any of it automated, or anything like that. I've never tried to short or do put options. I'm not even entirely sure what the last one is.
It is just me playing around and learning as I go. I never spend anything that I can't afford to lose. I tend to go really slow and do a lot of reading before buying anything.
I've also had some luck with a slightly modified method. When I go to the grocery store, I'll look and see what brands are most frequently in carts and what isn't fully stocked on the shelf. I'll then look up the parent company and make a note of it. If I see that brand continually in shopping carts (I peek, I don't ogle or take pictures) then I'll do more reading and buy some shars in their parent companies. I notice that they seem to have continual but slower growth than the tech companies.
It's more or less just a game. I'd absolutely not take investment advice from me and I'm pretty sure my methods are unsound and probably unorthodox. I am absolutely open to advice, however.
So, with that, I thank you immensely. I'll continue to keep an eye on Twilio but I won't jump in just yet. I may never jump in at all. It's just on my, 'mentioned positively a lot list.'
> One of my methods is to simply read comments at sites like HN, Slashdot, Fark, and maybe even Reddit.
That's a great idea and I've found companies exactly like that (although as private companies - scouting for a VC). We're really in an advantageous position working in tech because we know which companies are doing well and where companies are spending their money.
One disadvantage is that a lot of gains are privatized or with VCs - and some of the best performing companies remain private for longer - but there is still some growth in public markets
I tend to follow the cloud, infosec, social and other tech public companies. I think the SaaS stocks are a bit overvalued at the moment, and there are too many that spend too much on sales and marketing to fuel growth. One of the great benefits of SaaS was that it could infiltrate companies from the bottom up and bypass the S&M process and customer acquisition costs, but there really aren't many of those - and even those companies that did do that, have ended up spending a lot on S&M
I did like Twilio a lot - I liked the Authy acquisition and growing out that way. I'm not sold on the "integration without programming" promise - a lot of companies have tried just that but it hasn't worked. You need dev platforms and developers. I was hoping Twilio would provide a drop-in oauth/openid server for enterprise because the current solutions there suck (serve both internal, intra and external auth backed by LDAP, AD, RDBMS etc.)
Theres a lot of value in their global integrations and routing - but i'm also not sure what the ceiling is for voice + SMS and outside of a handful of countries in emerging markets most messaging is application layer. Being able to push messages via WeChat, FB, WhatsApp etc. would be interesting
But you're right - can't go wrong with long term investments in these markets - either a fund or a basket of stocks you know. The Motley Fool posts are usually good value on getting up to date on some stocks, although there is a lot of people talking up or down their own investments there with what borders on FUD sometimes.
Thanks! It may not seem like it to you, but there's a lot to digest in your post. Before selling my company, I just had a big standard 401k. Now, I have someone who actually manages all that stuff for me.
It is quite a change. I've been retired for ten years, but I'm still getting used to certain things - and there were many things I had no compelling reason to learn.
Which is why I dabble in the market. I've even done some of my own bond selection. I am a sort of partial to bonds because then I can see what project it is that I'm supporting, find out who that project is supporting, and make choices that are both ethical and financially sound (less risky might be a better term).
It has been fascinating to learn some of this stuff. I do visit Mötley Fool sometimes but I mostly only use it as an educational resource. I'm not sure I can put my finger on it, but it just seems a little bit off to me. It's a vague thing, but it seems like they are trying to push me in a certain direction, like they are trying to influence not just what I should buy but the categories I should buy in - and I'm not really sure they have my best interests in mind.
I don't know if that makes any sense, really? It just seems off to me. It's like a feeling that I'm being manipulated. I'm pretty sure I'm not a paranoid schizophrenic or anything, it just isn't where I go for specifics. I happily go there to research and learn.
I still own a whole lot of shares in the now-parent company. I've been pondering unloading some of them and moving them to a more risky position. They are steady growth and I have no complaints but I'd like to put more into the account that I control and that is one way of doing it. Again, nothing that I can't afford to lose.
You've given me a lot to think about and I'm quite grateful. I'm always open to good advice. ;-) I freely admit there are many things I don't know.
> That's a great idea and I've found companies exactly like that (although as private companies - scouting for a VC). We're really in an advantageous position working in tech because we know which companies are doing well and where companies are spending their money.
Interesting! How did you start with such a role - scouting companies for VCs?
> When I go to the grocery store, I'll look and see what brands are most frequently in carts and what isn't fully stocked on the shelf. I'll then look up the parent company and make a note of it. If I see that brand continually in shopping carts (I peek, I don't ogle or take pictures) then I'll do more reading and buy some shars in their parent companies.
To be absolutely clear, I'm not a skilled investor. I'm not a professional. I have absolutely no ability to speak authoritatively on the subject. I don't even know all the terminology involved in the investment industry.
Much more likely is that they've grown to a headcount that would require making quarterly disclosures as a public company would and thus might as well IPO.
I'm asking this in good faith. What are the best use cases nowadays for Mongo? I've read that it still makes sense when you are dealing with truly self-contained documents. But even then, what is the advantage over just using Postgres or MySQL and their native JSON types?
I've read plenty of negative comments on Mongo. I'd like to hear the other side of the story, if possible.
This is probably not the answer you're looking for, but MongoDB no longer has much technical justification.
Compared to Mysql and Postgres, it does have an out-of-the-box partitioning scheme, which is something, and a reason to use it (although I'm not sure it outweighs the downsides).
Compared to newer cloud databases like Spanner, Citus, or Aurora which scale well but also provide your with ACID guarantees reminiscent of traditional RDMSes ... there's really not much there.
The top post on HN right now is about PostgreSQL 10 features, of which the very first feature listed is "Native Partitioning" (and it's not about sharding). Relational databases have been using the term "partitioning" for a very long time so if you're going to compare features across comparable systems, you need to use the correct terminology.
Let's also keep in mind we're discussing MongoDB which has a very rich history of using misleading benchmarks to showcase their product (e.g. benchmarking the number of requests/sec the server can select() versus the fsync() rate of a RDBMS), so I think it's useful to be extremely clear what we're talking about.
Yeah a colleague referred to MongoDB as the best marketed database and he was 100% correct. They’ll make some misleading benchmarks, throw a few conferences, and voila you have something to replace your ol’ RDBMS
My favorite MongoDB moment was when Stripe wrote MoSQL so they could sync Mongo into PostgreSQL so data scientists can do joins and otherwise interpret the data. Now you have a “SQL replacement” and SQL instead of just using SQL!
I can see it for truly self contained records at massive scale but IMO it falls flat as a RDBMS replacement which is how most people probably use it still.
Fair enough. When I first read your original comment, I thought that you were being unnecessarily pedantic.
I imagined that by using "partitioning" most people would probably know what I was talking about, but I've been working with MongoDB for a long time, so I probably have some cognitive bias here.
"Sharding" is the more specific/better term, but I'd still say that although more general and possibly ambiguous, partitioning is still roughly correct.
We use it for our database engine, as the nature of our data does not work in a SQL table. I work with DNA sequencing, and we store our data in a sample metadata, feature metadata, and sample-by-feature count table.
So our sample metadata has the generic metadata about each sample (where it's from, when was it collected, etc.). Our feature metadata is the particular feature metadata (DNA amplicon sequence variant, taxonomy, all seven taxa levels pre-split). These would work perfectly fine in a standard SQL database, but the problem is when we get into our sample-by-feature table/collection. Currently, we have over 400k unique features and tens of thousands of samples. If we were to try and map the frequency of each feature in our samples in a standard row/column relational database table, our table would be un-usably large (or just entirely not work as PostgreSQL, MySQL have a hard column limit). Thus, the benefit of MongoDB's document system fixes all the issues this causes.
Each of our documents in the sample by feature table can consist of ONLY the counts of the features that were in the sample, ignoring all the other N-thousand features that may be present in others. Building a pandas DataFrame from a query of this data will just fill the absent features with NaN, and we can just fill those NaN's with zeroes and carry on working with our target subset of the database.
Your entire response seems to be about why you chose document storage instead of relational data. Relational data makes sense most of the time, but not always.
The question, again, is why you'd use MongoDB even for document storage when there are so many alternatives that are safer and better-engineered. Even Postgres has a JSON data type that, by some reports, works better than Mongo and is much safer to use.
Another thread here[1] has gone into some detail about alternatives.
JSON wasn't stable in Postgres until 2012, whereas the "web scale" memes for Mongo started after its initial release in 2009. Mongo indeed had better marketing and a first mover advantage, and it never let go.
Early versions of mongodb locked the entire database server for every single write. Later versions locked the entire database for every single write. Only since Mongo 3.0 does MMAP only lock a single collection for a write and is WiredTiger available to offer you MVCC. And even in the MongoDB world materials it advertised now you can use more of your hardware.... Granted you could always shard, though that gets very expensive very quickly. Meanwhile postgresql had MVCC all along and its write speed was always faster than mongo on a single server basis. And you always could serialize data that changed into text fields as xml/json/csv/some other format. Mongo was mostly great marketing and a lousy product. Over time it has gotten much better. Mongo 3.2 is a way way way better product than mongo 1.x. But marketing did capitalize on a lot of hype which made no sense. Developers enjoyed just serializing their objects (the ones who didn't think of doing this into a text column in a relational db) and for the ones who did not need 'web scale' they had no idea about the severe concurrency issues introduced by mongo's writes. And for the rest there are plenty of posts about hacking around it or switching from Mongo to something else.
Something I don't think is mentioned elsewhere: other databases have shipped with Mongo compatibility. This means you can migrate from Mongo to something better without changing your code.
The most mainstream and interesting of these is Cosmos:
Even prior to the JSON type you always could have serialized the data for the sample to some kind of string JSON/XML/CSV and put it in the column. But I think the one thing NOSQL data stores e.g. HBase/Cassandra and to a lesser Degree MongoDB has taught me is to pay attention to the tradeoffs number of rows, size of index, etc...
If you process this kind of high-throughput data in R then yep, you get it as 3 data.frames as you describe, it's nicely tabular and easy to "join" by just matching index numbers.
I've been using and relying on MongoDB for years (since 2010). It's pleasant to develop for, the query interface is awesome, and the library / ORM support is top notch (Mongoid on ruby, and Mongoose on node).
I also use Postgres regularly so I can compare the 2. And in my experience, I always enjoy working with Mongo more.
Many people don't care about the developer experience and rather focus on Mongo's lack of features like joins and transactions. There's definitely tradeoffs to choosing Mongo and I wouldn't criticize anyone for picking Postgres over Mongo, however the amount of belly-aching about how Mongo is "the worst" was always pretty ridiculous.
If you’ve been using it since 2010, then your usage predates the aggregation framework. I’ll just state for the record that writing a JavaScript mapreduce (or telling a data scientist to write one) to count records is not a good experience.
The other issue that got me was I was running queries that should have used an index but indexOnly kept coming back false. I dig into the Mongo JIRA and sure enough it was a bug and queries were returning incorrect values for whether they were using an index.
The reason most people got mad at Mongo was probably their benchmark gaming though. They made unsafe writes the default for a long time so their default configuration benchmark would blow away SQL databases which fsync to disk by default (very reasonably!)
I have been using Mongoose for long enough (since 2012?) to have felt its pain point. The documentation is lacking (mainly of examples), contradictory and there are way too many breaking changes. This makes searching quite ineffective since you'll have to check every single piece of code you see in-depth to make sure it still works.
Huh! We started with MongoDB in 2009 and went live with it in 2010 (though I left that site in 2010) and as far as I am aware we were (one of?) the first paying customers of 10gen. I remember my boss joking about how he needed to press 10gen to actually accept money for support.
I even contracted wtih MongoDB Inc in 2012-2015 but gosh, the query interface being awesome is not my experience for complex queries. Multiple $elemMatch makes for nightmare queries. Note: $elemMatch got added because we asked for the functionality but the syntax is not our making, that was Eliot.
Agree and the other annoying thing is that the queries and most of the apis are not the same, so you work and build the query you want in JSON and now you need to go translate that to program code :( And vice versa, something is wrong with the query so if you want to figure it out and not be in a debugger you need to translate the code back to a query. There are libraries that accept the JSON but the standard one has an api that does not use the json queries directly.
We use MongoDB to build eCommerce apps at my company, using a Ruby-based platform that can be easily extended. Here's a couple of the reasons why we use it:
1. It's easier to extend the pre-defined model classes that our platform provides with new fields, without having to manage migrations. Mongoid manages the schema for you.
2. Because our apps represent an eCommerce "storefront", they are _not_ the system of record for our clients. Each client has their own way of storing this data, and fulfills orders through an order management system. They control all of that, obviously, because it contains important financial data that they need to retain.
3. MongoDB is a well-supported database, with a large ecosystem and lots of users. It's a familiar DB for new developers that we hire, as well as for systems integrators who are also building apps on our platform. Using MongoCloud and official support from MongoDB, Inc., hard questions about our database technology are just a phone call away. Postgres is amazing, but it isn't supported by a company you can turn to when shit really goes south, so it's a scary choice for enterprise service providers.
4. It doesn't have terrible documentation. Pretty much everything is there, it's just hard to find. I really wish they would use something other than Confluence because the search on there is awful.
All this aside, I still use Postgres on my own Rails apps because I don't need the complex features of MongoDB. But after a few years at my current job, I can definitely see why Mongo was chosen.
NoSQL has real use cases. The model is much closer to how you usually are structuring and handling data in your application. If you application is handling hierarchically structured, variously shaped, low-relational data - which many are because that's pretty much the default for how programming languages work - and does it's data processing and validation at the application level, then it can be nice to have a storage system that matches that closely.
It's that situation where a single structured User object would "foil-out" to like 10 de-normalized tables. It feels ridiculous. It often is a bit ridiculous. Also, I'm not sure how much Open Source RDBMS have improved in the last 8 years, but I know for a fact that MySQL used to just melt under joins like that with any serious load.
As far as the recent OS RDBMS JSON additions, they're just that - recent additions. They feel tacked on. They're treated differently then first-level values. Support in ORMs is lacking. It's hard to find examples online. They're not nearly as easy to use and well supported as hierarchical data is in systems build around it from the ground up (of course).
As far as why MongoDB... because it's the most popular solution. If you hit a problem, someone had probably hit it before, and you can probably find some material online. The leading drivers and libraries are relatively mature and widely used. It makes it a much safer bet as far as tractability of issues you will hit.
I don't really like MongoDB, but the case for using it seems pretty strait-forward to me.
Sure, but why wouldn't you use one of the myriad of options offered by Google/Microsoft/Amazon? Dynamodb, BigTable, DocumentDb would be better options instead of betting on a startup, no? Their NoSQL tech was novel several years ago and not the cloud providers are far ahead and there's no way MongoDb will be able to catch up in-terms of Features, Scale, Reliability and Availability
I think all of those services are cloud-based, yeah?
Cloud services are a great option to have available, but there are many reasons organizations choose to manage their own instances... pre-existing infrastructure investment, flexibility, data location and control, cost structure, regulatory requirements, etc.
I read here that Mongo has started a cloud-based offering, but the core of their business competes against other self-managed storage software like MySQL, Postgres, Hadoop, etc.
I don't use ElasticSearch for persistent storage of logs, only for the recent logs that need to be searchable (past month or so). I've found enough times I've had to wipe out ES indexes to keep things humming along well.
I've been doing a stack that uses Mongo, Sails and Vue for about a year now. What I like is that it feels like I'm just writing Javascript all the way from front end to query. Mongo queries are just JSON documents. I'm not sure how nicely Postgres queries would work in Nodejs? Can I just feed it a JSON object that represents a query and it will "work"?
The aggregation pipeline is also quite nice. I actually prefer it to Elasticsearch aggregations now. I've done some reasonably complex queries involving grouping deeply nested arrays of embedded documents, etc. using the aggregation pipeline.
Put it this way, with Elasticsearch when you have nested documents and the schema is variable and you want to do aggregation you end up in mapping hell. There is none of that crap in Mongo.
> Can I just feed it a JSON object that represents a query and it will "work"?
You can use a query builder [1][2], although it's not exactly create a JSON object for a query but still better than directly writing SQLs and IMHO it's more clean than JSON object queries.
I think you need to work on some real systems using MongoDB. Your statement is incredibly naïve, implying that there's little complexity in using MongoDB.
1) Mongo is not better than Postgresql JSONB and other Postgresql features, especially in 10.
2) There was NEVER a Mongo cluster management tool named Sheriff Bart! It litterally sells itself: Mongo Loves Sheriff Bart.
3) ArangoDB is able to provide many of the Mongo features with some ACID compliance and a better graph solution. It even has joins. I get that people use documents to avoid joins, but really. You still want them.
This is like saying "The iPhone is not better than Android". There is room for multiple offerings with different strengths and paths to market appealing to different types of users.
To the same simile, if you care about use, freedom and reasonableness, you'd pick Android. So too anything other than Mongo. That's what I don't get, why would anyone want to invest in Mongo? I say that as a Mongo certified develop (best way to learn at the time).
Speculating, amongst the "Android" options there are four categories:
- There are the expensive legacy relational solutions (these are on a long slow death).
- There are small open source and sometimes even closed source projects that are either going to be dead or gobbled up; or at least far too many people will fear they will which will prevent adoption.
- There are cloud-native lock-in solutions.
- There are well-established large community open source projects that stay alive because there's enough stakeholders but they don't have strong commercial backers.
All of these categories are a surprisingly interesting market opportunity for a disruptive commercially backed more vertically integrated open source player to target.
It's a bad strawman to say that PostgreSQL is ACID and MongoDB isn't. Here's the thing: even PostgreSQL isn't ACID when you throw replicas into the mix. And PostgreSQL as it stands "doesn't scale" past a single box, not if you want to do (most kinds of) replication or sharding.
The impression I get is that at scale, you don't really "need joins" and multi-table ACID for the kind of workload that MongoDB is targeting, i.e. massive throughput embarrassingly-parallel low-latency OLTP short requests that do point queries and point writes. It seems to be atomic, consistent and durable enough for that [1].
For analytical queries that really do need expressive SQL features, you generally wouldn't want to run them on the same database anyways, not when you're "at scale" and your seven-way join could impact other request latencies. You export it to Redshift or HDFS/S3, and do all the reporting/BI/analytics on it.
Second of all, Mongo marketing and ecosystem is foremost about making dashboard crud apps easy with not a lot of engineering. dashboards are not about point reads. Whether or not mongo is engineered for it, this is what the customers are using it for.
Third, I just wrote a proposal today for a client on Mongo, not at scale, who experiences silent database failures due to lack of acid. they only noticed because Mongo sometimes tells them there are -2 people left in the room.
Nobody at scale uses mongo because somewhere along the way you spend your giant bags of scale-money to hire a real engineering team to solve the silent errors and failwhales you've been suffering by migrating to an architecture that actually lets you reason about failure a little better than "it looks like something bad happened but we have no idea when how or why"
Repeat after me: PostgreSQL isn't ACID in a distributed system.
PostgreSQL isn't actually ACID when your database is multiple machines. PostgreSQL isn't actually ACID when you set up your first read replica on RDS. PostgreSQL isn't actually ACID when you use streaming or logical replication. PostgreSQL isn't actually ACID when you start sharding your users table. PostgreSQL isn't actually ACID in a world in which people need to scale out from one big box. It's eventually consistent, which is even less than I can say about MongoDB on a single-document level.
Personally, PostgreSQL is my favorite DBMS. It's well-engineered, extremely mature, and has lots of great features. Working seamlessly and providing strong guarantees outside a single box, unfortunately, isn't one of them. Not yet, at least.
To address the rest of your points: those people are using MongoDB wrong. PostgreSQL basically stops working well too, once you start mixing transaction processing and business reporting and BI in the same box at scale.
If you're going to claim that MongoDB is bad because it's not ACID or somesuch, you better hold PostgreSQL to the same standard, and understand precisely why it is that the two systems were built so differently.
Postgresql with read replicas is ACID-Consistent transactions because single writer, and CAP-AP eventually consistent because distributed reads.
It is easy to agree with you here: "PostgreSQL basically stops working well too, once you start mixing transaction processing and business reporting and BI in the same box at scale." It's worse than that - the app has to be coded before it can be scaled so you get fucked by complexity-scale via ORM a lot sooner than bigdata-scale. But as you sort of say, the point of a data warehouse is to be cap-AP (eventually consistent) through read-replicas without breaking ACID-consistent transactions for the system as a whole.
as to the argument "everything is bad at scale" i think that has merit too, I just gave a talk at ClojureNYC about building real apps with Postgres (tables) vs Neo4J (graph) vs Mongo (documents) vs Datomic (eavt): https://github.com/hyperfiddle/hypercrud.browser/issues/4 (Did I say real "apps"? I meant "balls of mud")
IMO, as I've said elsewhere on this thread, I really don't think it's useful to talk about the C-in-ACID and the C-in-CAP as if they were completely different things. The original research (both Gray's and Brewer's) clearly talks about it in terms of interface guarantees on your data, and the latter is directly built upon the former. As a user, all I care about is that the new data shows up once the system accepts it, no buts.
What good is my Postgres single-node ACID if I have to do all this work to deal with replication lag?
What good is ACID writes with AP reads? It doesn't lose data! Don't like eventual consistency? You have options! Check out Datomic - ACID transactions and strongly consistent + distributed reads. http://www.datomic.com/
Git is another ACID single-writer system with strong consistency and distributed reads.
"ACID" isn't a term that used to describe distributed systems— that's physically impossible— only systems on a single machine. To say PostgreSQL isn't ACID is incorrect and disingenuous.
ACID was defined upon Jim Gray's formalization of the terms "transaction", "atomicity", "consistency", and "durability" [1]. It applies regardless of however many nodes a system consists of. It cares about a system holistically, not what it's physically made up of. Gray defined them acknowledging the preexisting work on fault tolerance and high availability with replication and hot standbys, literally situations where it's worth talking about C-as-in-CAP consistency. A transaction is literally defined upon an agreement protocol like two-phase commit. Similarly, Eric Brewer literally defined the C in CAP following the C in ACID [2].
The two concepts are clearly related, if not identical. It's not like I magically can't ever have ACID transactions once a network is involved, as long as the network heals.
To come at this from another angle: ACID transactions are an abstraction. They serve only to make it easier to reason about concurrency and synchronization. If you're telling me that this abstraction falls apart inherently because of the network, then frankly it's not a very good abstraction at all.
PostgreSQL is ACID when it's a system that encompasses a single node. PostgreSQL is not ACID when it's a system that encompasses more than one node. Simple as that.
I think it is even more disingenuous to lull people into a false sense of security just because Postgres is technically "consistent" on a single node. It's 2017 -- your data getting fsync'ed to a single disk, in a single machine on a single rack in a single datacenter in a single region -- is simply not enough anymore.
Postgres will NEVER lose an increment if two transactions try to swap the same counter at the same time, even if denormalized or distributed reads. ACID means "doesn't lose data"
Postgresql with read replicas is ACID-Consistent transactions because there is a single writer process to prevent data loss. Only the reads are distributed and thus cap-AP eventually consistent.
What happens if a user makes an update to their data, and then my "single writer" master crashes, and I failover to a hot standby that doesn't have the new transaction? And then what happens if the user tries to update their data again?
the D in ACID addresses this. Your standby transactor process is attached to the same storage as the previous transactor (EBS block or dynamodb or whatever cloud HA storage people use these days). If they don't share storage, it's not a single writer.
You're not actually saying that PostgreSQL itself is ACID.
You are essentially asserting that a precondition for ACID in Postgres is that the underlying storage medium (or at least the bits on them) will never suffer permanent failures, and has infinite 9's of availability everywhere in the world, instantly. Then, and only then, can PostgreSQL provide anything resembling seamless consistency.
Sorry, no dice. This is not how distributed systems work in the world outside the walled gardens. I'm a huge proponent of the cloud, but I'm not a huge proponent of things that only work if you rely on the cloud.
a black hole might swallow the solar system and thus our ACID writer process lost a write? No, you get to choose how durable you would you like your D. Don't acknowledge the transaction as successful until you successfully commit to storage orbiting alpha centauri.
non-ACID writers like Mongo lose writes routinely, every day, often without anybody noticing. There are no guarantees whatsoever about data integrity. ACID writers lose data in 6-sigma catastrophic failures only, and you can control how many sigmas by tuning durability.
You asked a question about HA configurations so I answered in terms of HA configurations. Serving durable HA storage is literally what dynamo/riak is for. "The write operation is durable. dw (durable write quorum) is a configuration setting used to specify how many replicas to commit to durable storage before returning a successful response." https://stackoverflow.com/questions/22736821/riak-database-a...
Anyway I'm interested in continuing to discuss this if you are because this is a subject I am beginning to write about as part of my startup. so this conversation helps me. Perhaps there is a better medium though if you would like to continue?
And there's also quorum commits too! Point is, these features aren't on by default, so you really have to know what you're doing, and not just assume you're safe because of the magic word "ACID".
MongoDB is actually strongly consistent in a single document, though. It uses Raft by default to replicate changes to all replicas. In practice, it has the A, C, and D of ACID for point writes.
Those are really some low expectations. The whole point of the ACID terminology is to apply it to transactions of arbitrary complexity with respect to the underlying data store.
It's not saying much to say that a database management system is ACID as long as you constrain your updates to a single record in a single table.
Sure. I actually agree, and like I said elsewhere in this thread, Postgres is my favorite database, and using it is something that I'd like to do for as long as possible. MongoDB's main selling point, after all, is expediency.
What I'm tired of is the rhetoric built upon the dichotomy that Postgres is ACID and MongoDB/NoSQL is not. As it stands, neither is, in any scalable fashion. And in no way is Postgres a panacea for every possible kind of data problem.
And like you said. It's not saying much to say that a database management system is ACID as long as you constrain your updates to a single machine.
There are better things on the horizon, databases that can actually do it all.
I'm afraid we're in violent agreement. I never said "Postgres isn't ACID", that's never been in dispute. I said that you can't treat Postgres with a read replica as an ACID database, because as you and I both pointed out, it's then a distributed system (no self respecting, single-node databases is eventually consistent, so my use of that term should have implied a distributed system).
For fast prototyping, it's pretty hard to beat a NodeJS stack. A lot of modern web dev are one-man-shows. When you're just one developer doing frontend and backend, you don't really have time to care about database migrations or designing schemas. You really just want to duct tape enough pieces together to get your app out the door.
Caring about schema design, database migration, or even code correctness doesn't really help with this. Mongo in comparison has no learning curve, crazy number of pre-integrated libraries, and works well enough at low scale. This might sound like cancer to corporate devs but the tradeoff does make sense for individuals/demos/startups.
I've always suspected that MongoDB is largely driven by nodejs - i'd be curious what portion of Mongo connections are nodejs
There is no real good nodejs RDBMS ORM in the class of what other environments provide. I recently had a company rip out nodejs => sequelize => pgsql and replace it with nodejs => protobufs => sqlalchemy => pgsql
Even for fast prototyping, I feel like I end up spending more time dealing with data validity issues as the "schema" in my code falls out of sync with what's actually in the database than I would have by laying out a simple schema for my database to enforce. I think this is a YMMV situation.
You can do that with Arango. Equally quick. Equally dirty. Doesn't require a prod version with 3 nodes just to make sure, hand to God, no data is lost.
It probably has to do with being German. The US produces alot of tech that the rest of the world knows, due to PR. Germany. Well, they move quietly. Then BAM! ArangoDB in Poland!
Also, what does Mongodb offer that other NoSQL stores like DynamoDb, Bigtable, Azure Cosmos DB etc don't? I'm genuinely curious. I believe there's ACID support on these as well (Probably on by default on Dynamodb).
At the time MongoDB shipped with all databases completely public, with no authentication credentials.
That meant anyone could connect over HTTP and get your data.
From the article:
The problem for MongoDB users seems to be that on some systems the default configuration has the database listening on a publicly accessible port as soon as it’s installed. Users are supposed to read the manual and set up access control and authentication after installing the software but it seems that plenty of them don’t.
Yeah, I recall. Certainly that was a major screw up but, IMO, the users share part of the blame -- if they had a firewall in place this wouldn't have been an issue.
Mongo shares the blame for making it the default and making authentication such a pain to deal with:
- Command line login never works, you have to get into Mongo shell, then authenticate.
- Attaching multiple db permissions to one user has all kinds of gotchas to: I have to switch to each database and re-auth instead of just authing to the user and then switching DBs without further auth.
This is a great video, and accurately sums up MongoDB and NoSQL. I can't imagine why they need to raise money, why they spend so much money as it is, and why they think there's a big growth business in NoSQL.
1) MongoDB allows me to be more agile. I don't have to deal with database migrations (No locks when adding new column for example), which allows deployments at any time.
2) Indexed arrays. Allows to add "tags" to any entity which are useful in production (A/B testing, force specific behavior).
3) Easier to manage. I don't need an expensive PostgreSQL consultant to setup and hope for the best.
We've actually just started having the pain of Mongo migrations; adding indexes to large collections. If you just create the index on the primary and let it propagate throughout the cluster, then it will block further oplog replication on the secondaries and the whole cluster will go down for majority writes. The proper way is to take each secondary out of the cluster in turn, add the index, then step-down the primary and migrate it.
It's just as much of a pain as migrating your favourite SQL database, although at least you don't need to do it for small collections and for adding columns (which is a big bonus).
Collections with tens of Gigabytes and hundreds of millions of records can take a few hours per-node to build an index. You'll need to think about index migration strategies before that point though. I'd guess that once you have gigabytes of data in a collection you'll probably have less downtime with an election than you will waiting for the index to build.
I wonder how it feels to be running or working for a company that develops a product that almost all informed members of the developer community think is bad. They must be in some serious denial.
You say 'informed members', but most of the mongo hate seems to stem from bad reputation from the early days. Most of the hate seems to come from historical bandwagoning, rather than informed DBAs commenting on current versions.
For a long time here on HN, mongo was terrible, never to be trusted, and everyone should move to RethinkDB. But the RethinkDB boat has now foundered, and might be resuscitated by the LF, or it might sink again. If you'd followed the zeitgeist, you'd be in a worrisome spot right now.
I'm biased (I work part-time for the company), but RavenDB[0] 4.0 is damn fast, cross platform, and has beautiful tooling.
It's built on .NET Core, and our current performance is 140k writes/sec. We also bend over backwards to optimize read speeds: we generated indexes automatically based on your query patterns, discarding unused indexes and throwing server resources behind the hot ones. Also, unlike many other NoSQL databases, we support transactions and atomic writes.
I used to use Mongo a while ago and found it a very painful experience (this was a few years ago). With postgresql I can use SQL and get the benefits of document storage, and so-far I've found it efficient and rock-solid.
There are quite a few better options for trustworthy databases. One is my employer, MarkLogic, which is an ACID compliant NoSQL document db with a lot more indexing capabilities.
Especially if you’re using the search tools to retrieve data, I think a properly indexed MarkLogic would be quite a bit faster, and still more reliable.
So, call me nuts, but then shouldn’t there be sane defaults? I’ve not set up or tinkered with MongoDB in a looooong time, but that seems like a must for any project.
EDIT: instead of downvoting me explain to me what I said wrong, I’m seriously wondering why it needs to be configured for sane defaults ..
I'm not the one who downvoted, but back when I ran a large critical Mongo setup, essentially the core flag to set when running was 'safe=true' on mongoD. That was it. We had 30~ instances set up with sharding and replication, doing many thousands of operations per second at peak during the day. It was a great tool then, and I'm sure it's even better now. It was just fun to hate on it so people with 0 experience using it joined in.
Safe = false was set that way to game benchmarks and it caused people to lose mission critical data if they didn’t realize and reconfigure. It’s just sad to make such nonoptimal decisions to inflate benchmarks and get PR
For all the people asking "why don't people just use DynamoDB, BigTable, etc. if they need NoSQL", there are a few incorrect assumptions here--
1) People need NoSQL. This is rarely the case. That said, Mongo often lets you model structures how you think so it's nice for prototyping. I'm very comfortable with SQL so I wouldn't do this anymore, but I could see value in it once upon a time, especially if you're already using JS across the stack.
2) DynamoDB is not the same at all. Its query and sharding models are entirely different. In most cases, it does NOT allow you to model something usable how you think. Don't get me wrong-- it's a great technology, and we get a lot of value out of it at work, but if you're directly comparing the two, you probably haven't actually used it enough.
I have great succesful with MongoDB and at this point, I think it's the best database out there.
If you can live without transaction, then think about utilizing MongoDB.
- Add index without blocking anything
- Array index is transaprent from you. Same query for both of array or a single field.
- Partial index allow you to query on index while keep index size small to a certain amount of document
- TTL index help you avoid deleting document manually such as session data
- Spatial index avoid you to not index document missing field
- GEO spatial index is just awesome. Super easy to use
- Good enough full text search
- Data compression, out of the box
- Data compression for replication as wel
- Super easy to manage cluster: add, remove, hide node
There seems to be some high level execs that just left. I wonder why they left before the IPO - maybe their options were forced to expire? Matt Kroll was their head of sales and he just went to google - how is Mongo going to grow without a good sales team?
Can you clarify what you mean by this? You're Matt Kroll and you're confirming that you still work for MongoDB? (I'm afraid that nothing is obvious from anyone's usernames.)
I'm Matt Kroll.
I can confirm I was never Head of Sales at MongoDB.
And I can confirm (as anyone can see on LinkedIn.com) that I departed MongoDB for Google to continue to work as an Enterprise sales rep.
Just do a bit of digging. Both names dropped appear to be accurate from usernames/profiles. Matt did indeed leave MongoDB for Google, that much is accurate.
I don't fully get why you seem so upset about it. The OP seems to have made a faithful mistake and it doesn't seem to impact your reputation. Maybe the opposite. Why caring?
Matt,
I would remove my comment because I respect what you did for Mongo but I hesitate because my statements are not false. I will correct that you were not "Head isms sales" - it looks like your title was Enterprise Account Executive and you were responsible for the Fintech sales?
You did you leave Mongo for Google right before the IpO?
Your statement that I was "Head of Sales" is completely false. My job title has had nothing to do with Fintech sales since 2014. I did leave MongoDB in 2017, as did plenty of other people, and just as plenty of other people leave companies they work for every year. MongoDB, in fact, is a great company with a great product, and the reasons I left are completely personal and private. No inference can be drawn from my departure as to the state of the company- doing so would be completely unfounded and reckless.
Ric- I was never Head of Sales of MongoDB. That is false. I was merely an individual contributor as an Enterprise rep. Please edit your post immediately
Matt has asked me to edit my post, but that does not seem possible now. This is a correction:
I made a honest mistake: I thought Matt was head of sales but looking at LinkedIn, Matt's actual title was 'Enterprise Account Executive' and 'solely responsible for managing all [Fintech] sales'.
I believe their largest sales channel is Enterprise Financial Services and so I was concerned about Mongo's IPO since some people left - especially right beforehand.
I respect Matt and believe his comments, but I did NOT lie.
For what it's worth, they are at least not trying to sell non-voting shares like Snapchat (a company with much bigger losses). The offering is for Class A shares with 1 vote per share. Though the voting rights between Class A and Class B is still not clear:
"Outstanding shares of Class B common stock will represent approximately % of the voting power of our outstanding capital stock immediately following the closing of this offering, with our directors and executive officers and their affiliates holding approximately %, assuming in each case no exercise of the underwriters' over-allotment option."
How can you say it's not a viable business because they're losing money if you don't know their expenses? Uber & Lyft, (and for many years Amazon) were/are losing money. So that means nothing unless you know why they're blowing through cash.
I don't want to see any of my favorite software go public. It creates incentives that don't do me any good, like marketing instead of improving the product.