Great suggestions, looking into this right now. First time building something li...

ljm · on Dec 14, 2023

2.2k/mo right off the bat is pretty steep, especially if you're paying that while the search response reliably takes over 10 seconds.

Why would you shovel 1.5k into MongoDB's pockets right off the bat? Especially when ElasticSearch is much better suited to what you're trying to do?

altdataseller · on Dec 14, 2023

Sounds like someone drank the Mongo kool-aid. You absolutely do not need Mongo, let alone Mongo Atlas. 25 million documents with ecommeece products is measly and should fit in a single 600 GB server

ljm · on Dec 14, 2023

Probably not even that - 25mil is nothing really. A normalised schema in an RDBMS would handle that without sweating.

dinobones · on Dec 14, 2023

You could run this entire stack (yes, even for 25 million products) using Kubernetes in a $40/month Linode + Elasticsearch + Cloudflare free plan.

mfrye0 · on Dec 14, 2023

If you're already on AWS, I recommend switching to postgres for now. For context, I have 3 RDS instances, each multi zone, with the biggest instance storing several billion records. My total bill for all 3 last month was $661.

Postgres has full text search, vector search, and jsonb. With jsonb you can store and index json documents like you would in Mongo.

- https://www.postgresql.org/docs/current/textsearch.html - https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-rd...

philippemnoel · on Dec 14, 2023

You can even do Elastic-level full text search in Postgres with pg_bm25 (disclaimer: I am one of the makers of pg_bm25). Postgres truly rules, agree on the rec :)

LunaSea · on Dec 14, 2023

I have troubles seeing how this is possible.

$220 dollars per instance gets you 8Gb of RAM which is way, way, below the index size if you are indexing billions of vectors.

neeleshs · on Dec 14, 2023

how big is the disk for the biggest instance?

mfrye0 · on Dec 14, 2023

Pretty small still at 500gb. It only stores hot data right now and a subset of what's important. Most of our data is in S3.

tehlike · on Dec 14, 2023

Disclaimer: I am building https://pricetracker.wtf

You may want to look at Hetzner, and cut your costs by about 90%.

Feel free to reach me, email in profile.

katella · on Dec 15, 2023

In your footer you have a lot of links like "kitchenaid price tracker" and "best buy price tracker". Have these helped links helped?

adentranter · on Dec 14, 2023

hey! this is cool, I take it you are based in the US?

How long have you been working on this?

tehlike · on Dec 14, 2023

On and off for a year, with more time allocated since June. Yes I am in California.

wolfgang42 · on Dec 14, 2023

I’ll second the comments that $2k/month is alarmingly high, especially for the performance that you seem to be getting. When I shoved ~40M webpages into a stock ElasticSearch instance running on a 2013-era server I bought for $200 (on eBay), it handled the load when I hit the HN front page just fine. Either you’re being drastically overcharged or there’s something horribly inefficient in your setup that could probably be tweaked fairly easily to bring your prices down.

jabo · on Dec 14, 2023

I'm biased, but I'd recommend exploring Typesense for search.

It's an open source alternative to Algolia + Pinecone, optimized for speed (since it's in-memory) and an out-of-the-box dev experience. E-commerce is also a very common use-case I see among our users.

Here's a live demo with 32M songs: https://songs-search.typesense.org/

Disclaimer: I work on Typesense.

keybits · on Dec 14, 2023

I can also highly recommend TypeSense and have no affiliation. You'll save a lot of money and get much faster results.

hipadev23 · on Dec 14, 2023

You’re spending $2k/mo run this?? Holy hell.

k12sosse · on Dec 14, 2023

> I'm currently not storing the image files, so that reduces the cost as well.

I wonder if someone catches on and replaces all your image URLs to the fuzzy testicle egg cup[0], will that negatively impact reputation?

0: http://i.imgur.com/32R3qLv.png

leobg · on Dec 14, 2023

I index 40M paragraphs of legal text, bm25 and vector similarity search, at < 200ms query time, on a single $80/month Hetzner server. Email in profile if you’d like to talk.

DeathArrow · on Dec 14, 2023

>Mongo will run me about $1,500 / month at the current CPU level. AWS all in will be about $700. I'm currently not storing the image files, so that reduces the cost as well.

It will probably cost you just $100 to rent a server from Hetzner and do the same thing. I would also use Redis or another kind of cache to hit the DB less.

Oras · on Dec 14, 2023

Take a look at TypeSense. Faster, better filtering, and much much cheaper if you’re going the cloud version

4runner · on Dec 15, 2023

Sounds like you used an incorrect instance type/size on Atlas

berkes · on Dec 14, 2023

> site called "Built With",

Do you have Alink. And are they any good?

stef25 · on Dec 14, 2023

Google ?

https://builtwith.com/

berkes · on Dec 14, 2023

I specifically asked the author if he could add some extra info on Builtwith.

I can Google. But then I don't know if its truly the site the author was talking about. And I certainly don't know his or her insights on that site.

Karl-Heinz · on Dec 14, 2023

Berkes wanted to do good by sharing a provision with the OP, in case he/she buys something at builtwith.

We all know how to Google. :)

slt2021 · on Dec 14, 2023

managed elastic search could slash your cost by an order at least

KomoD · on Dec 14, 2023

Oh... no... $1500/mo?

dangoodmanUT · on Dec 14, 2023

Yo fuck mongo just use RDS or some digitalocean DB. Or really just use opensearch/elasticsearch, or even typesense (don't bother with raft it's so broken) or meilisearch

jabo · on Dec 14, 2023

We’ve interacted before on Twitter and GitHub, and I want to address your point about Raft in Typesense since you mention it explicitly:

I can confidently say that Raft in Typesense is NOT broken.

We run thousands of clusters on Typesense Cloud serving close to 2 Billion searches per month, reliably.

We have airlines using us, a few national retailers with 100s of physical stores in their POS systems, logistic companies for scheduling, food delivery apps, large entertainment sites, etc - collectively these are use cases where a downtime of even an hour could cause millions of dollars in loss. And we power these reliably on Typesense Cloud, using Raft.

For an n-node cluster, the Raft protocol only guarantees auto-recovery for a failure of up to (n-1)/2 nodes. Beyond that, manual intervention is needed. This is by design to prevent a split brain situation. This not a Typesense thing, but a Raft protocol thing.