I was at Yahoo almost a decade ago, when vector search within Vespa was first being rolled out in production use cases. It was already serving similarity search requests for Flickr back then.
Even though I'm with Zilliz/Milvus now, I wholeheartedly support and recommend folks check out and try Vespa. Congrats to the Vespa team!
EDIT: For folks on Twitter, you should follow Jo (https://twitter.com/jobergum) from Vespa if you aren't already. Great combo of technical content, hot takes, and vector database memes!
Huge congratulations to JKB, Frode, Kim, and the rest of the Vespa team! We are infinitely grateful for all of your help and advice. We are lucky enough to have Vespa as the foundation for our developer-focused enterprise search product.
Having worked with both Solr and Elastic in past search companies, it’s incredible to be able to deploy into Fortune 50 enterprises without any doubts about stability and with all the benefits of a cutting-edge hybrid engine.
Can’t wait to watch you on the next leg of your journey!
Off topic... looking forward to more engineers moving to Mastodon. I have Twitter/X blocked at DNS level and still fairly frequently encounter interesting accounts that I can't check out.
Nobody's saying it's anybody else's problem - they're just looking forward to more content being on Mastodon (or elsewhwere) as people move away from X.
It's not that weird, you are reading too much into the subtext. The web is in a transitory state, platforms change, people move. Wishing for more content to be available on a specific platform without blaming the author is an acceptable comment in my view.
Is there a particular mastodon server or set of servers that the engineering community is favoring? I know it technically doesn’t matter in a federated network, but curious anyway.
I'm from Pinecone. We use proprietary indexes. We could've used HNSW but decided the high memory consumption (ie, costly at scale) and slow index updates (ie, data gets stale) won't cut it for production use cases.
Oh I read a lot of HNSW stuff on your/Pinecone blog series. (Great learning resource btw, well done!) So I assumed you were using HNSW already. It's a news to me that you don't use it.
>Yahoo will own a stake in the new company and will be one of Vespa’s biggest customers for a long time to come.
what's usually going on in scenarios like this is either: the subsidiary is in growth mode and needs access to public capital markets to raise large amounts for building infrastructure (and sharing the expenses with others) to compete, or what has been built is going into profitability mode and the profits are better protected/hoarded in a private company where disclosure requirements are not onerous. If this move represents the former, expect the latter in a few years, is why yahoo would maintain a large stake.
(I'm just pointing this out because there are frequently underlying reasons that have nothing to do with the type of stuff that's in the press release, "successful venture, income/exit event" etc.)
A "spin-in" is a form of R&D in which a company is the sole investor in a startup. It sends a team of employees off to build an experimental product and then buys that startup for a predetermined and very healthy price.
Cisco has spent on average $763 million to acquire each spin-in that these three men have founded, even though Cisco was also the sole investor in each.
So they take smart people and free them from corporate bullshit by giving them a bunch of money and telling them to do what they do best? I suppose it requires a lot of trust in those individuals, or the takeaway might just be that subjecting your high-performing employees to bureaucracy and bean-counting is not a great strategy.
They're also returning like $6 billion/year to their shareholders via dividends -- if they had just hoarded that cash, their share price would obviously be much higher.
Isn't share price influenced by future dividends? Why would preventing dividends and hoarding cash (reducing expected future earnings of shares) drive up share price?
It’s a bit more complicated than that, it’s also a bit more complicated than what I’m about to say.
Functionally speaking when a company issues a dividend of $x the stock price S will drop to S-x on the ex dividend date. This makes sense as the company now has $x less per share. The shareholder is at a wash as well. They had S before and now they have S-x+x. No value was created, money was simply moved.
Given the above you can more or less value dividends using discounted cash flow. A ~3% dividend is CSCO’s case isn’t worth paying more (increasing the share price over) on its own because the cash used to buy the shares is worth ~5.5% today according to the risk free rate.
In fact if one were considering the dividend alone then the company should be worth less. Obviously factoring in future growth, etc… the market has landed on the current price.
Diab0lic's comment is true about future dividends - but my point is that comparing today's share price to the share price in 2000 and implying they haven't done much misses the fact that they've paid out something like $54 billion in dividends since then.
This will be wrong but true enough to make my point; With a market cap today of ~$215 billion, had CSCO not paid out dividends, their market cap would be $215 billion that represents current cash and the discounted value of future cash + the $54 billion that would also be in cash -- for a total of $269 billion. That's enough to add ~$15 to their share price.
Or I suspect a third where the Vespa management team would leave and start their own company that would appreciate at a higher rate than yahoo due to greater growth potential.
So yahoo spins them out and has their equity grow faster than yahoo stock, or they lose the product to a new competitor.
There's truth in what you say. I was interpreting (without studying closely) the scale of what's happening to be beyond the scale where ICs have significant leverage, so not the startup venture scale but more like B-C round. It's already clear this stuff's going to be big, and the biggest players have the most advantages, it's time to scale, quit, or pray for a proprietary technological breakthrough. But you're right about giving small equity stake holders a more pure play though equity is an expensive way to pay people.
A third option is very simply about focus. Yahoo experienced this itself under Verizon, wasting years on useless 5G-oriented initiatives, a completely irrelevant focus for Yahoo itself. If the focus of a company's leadership isn't aligned with what benefits a subsection of that company, spin-out is a good way for that subsection to gain an aligned focus without detracting from the parent company's goals.
Vespa's been a technology built with search & ad-serving in mind, while Yahoo's other recent moves have been ad-tech layoffs alongside acquisitions in sports, finance & tech journalism, so the focus shift makes some sense here.
We’re at a moment in history where AI companies (especially ones with proven cashflow like this) are getting almost all the VC money and wild multiples.
It seems like the smart move to spin it out to access this money.
Vespa has been around for a LONG time. When I worked at Yahoo back in 2005-2007 the Flickr team were using it to scale Flickr - they were integrating it in order to serve things like public tag pages, to take the load off their MySQL cluster.
I remember understanding it as the internal equivalent to something like Solr, although actually it predated it - looks like the first public Solr open source release (coming out of CNET) was January 2006.
Yeah, I worked with the Flickr team on that project. Scaling to billions of photos, with partial update support of popularity for ranking.
Back then, the properties had to stand up their own Vespa cluster(s), later on we created a managed service out of it. And, yes, the original plan for Vespa was to be a Vertical Search Platform, that is where the name Vespa comes from. More on the history in this blog post https://blog.vespa.ai/vespa-is-becoming-its-own-company/
This is fun to see, and for a seriously deserving team!
Vespa really is an impressive platform. I've been working with search platforms for a while, starting with FAST Instream, MarkLogic, Elastic, Weaviate, and now Vespa. As others have noted, Vespa has a deep technical lineage. You can see a lot of the hard earned lessons in the design, as well as improvements in the flaws of the spiritual predecessors (FAST et. al).
The history alone makes the project interesting. But then, in the last couple of years (or maybe earlier), they started introducing all the fundamentals for supporting deep integration of tensors and a custom HNSW implementation. Whether accident or planning, this was a nice strategic move. They were leagues ahead, just in time for the popularization of learned embeddings.
Now they have the best combination of traditional lexical search, semantic search with embeddings, and the combination of the two in hybrid search. That's without even getting to all the functionality in the multi-phase ranking, hosted ML models, and document processing engine.
Obviously I'm biased (we use Vespa at work for enterprise search products), but I can't recommend this engine/platform enough.
Vespa was actually very cool at the time as a document-oriented search engine, occupying the same niche as Solr and elaborations of that like ElasticSearch. But I don't know if it's competitive today.
This blog post says it's "developed by Yahoo" which is I guess true. But it was originally an acquisition, largely developed by a team in Norway, and apparently most Vespa development still happens there.
very, it's outcompeting most vector databases on features and maturity when it comes to vector search while having very powerful and flexible and proven text search too
The tricky part is, it's more a platform to build complex search systems with then "just" a vector database. So if a found a company today which focus is to create clever multi phase search pipelines and train (e.g. domain adopt) LLMs for calculating embeddings etc. then it's probably _the_ best solution by far, you probably can get away with having only AI engines devops and a single programmer (who might also most times just do devops). But if you need to deeply integrate it into a different existing search system things are less grate.
And I would love to see some modernization, like having a format for structured queries which is more widely supported then YQL... (eying graphql here)
Yeah +1 for VERY competitive. The vector capabilities of Vespa are incredible and the Text/ranking features are amazing. I don´t think any other product have those two sides so developed as them.
We conducted benchmark tests on Elastic's queries per second (QPS) performance using datasets of 500,000 and 1 million vectors. Result was Zilliz is 13x and 22x faster, per number of vectors respectively. https://zilliz.com/blog/elasticsearch-cloud-vs-zilliz
Feel free to explore our open-source benchmarking tool, which allows you to examine our methodology and even compare it with your vector database. https://github.com/zilliztech/VectorDBBench
Actually, Vespa comes out of the same FAST company. Yahoo bought Overture/Altavista and a lot of other web search companies in 2003, including the web search division of FAST. The Enterprise search division of FAST was later acquired by Microsoft.
It had all started a bit earlier with a colleague (I assume that's you up there, jkb79 :) pulling the leg of a guy visiting us from the Bangalore office, about the polar bears roaming the streets of Trondheim. We just thought we'd prank him a bit about having more dangerous animals than they did in India. Nothing too serious.
So a few months later I woke up early one morning and saw it had been snowing all night. I made sure to get over to the office before the traffic started ruining the image of barren, desolate place where you could entertain the idea of polar bears roaming the streets, looking for young, fresh engineering meat.
I took a few photos, walked inside, and got busy looking for photos of polar bears that could fit into this idea of upping the Dangerous Wildlife War between India and the home of the proper vikings.
So I did a little bit of photoshopping, uploaded it to my Flickr account, and sent the link to a few other yahoos around the world. We had a bit of fun with it, but I don't think anyone with access to this worldwide web of information will be too fooled. (It's quite easy to figure out that there are no polar bears on the Norwegian mainland, so I've never been too bothered about leaving it up there.)
> the Dangerous Wildlife War between India and the home of the proper vikings.
Was there an other side to this war? The most dangerous animal you're likely to come across (realistically, without getting photoshop involved) in an average Indian street would be a monkey. I've also come across many varieties of snakes, from time to time, but they're generally chill and just want to be left alone. The monkeys are ruthless bastards though, not to be trifled with.
Oh, absolutely not. They had no idea, and it wasn't really a thing outside of a few minutes of pranking one engineer visiting us + a bit of photoshopping for fun.
The only ones to ever get in touch about this whole thing was, 1) The Y! corporate blog who wanted us to make a lighthearted a fun write-up about the polar bears, which I cannot seem to find online anymore, and 2) a random local from Trondheim, Norway who sent me quite the angry e-mail telling me that I was single-handedly ruining the tourism business in town with that silly image.
Stray dogs are a much bigger problem on Indian streets. In 2020 ~6.8 million Indians were bitten by stray dogs and thousands of them die of rabies every year.
Hehe, it was a joke, we don't have polar bears on the mainland of Norway. But, it was fun to show the photo to visitors from different countries. "Be careful when you walk back to the hotel".
Wow, this brings back soo many old memories of mine @ Yahoo! . Almost a decade ago, vespa inside Yahoo was way ahead of its time. Glad, it finally spun out. I wonder how does this compare with other vector Dbs like pinecone, chroma...
The standard generally is whether the trademark would cause confusion to or deceive consumers. Naming a company after an existing brand which is an invented word (being a corruption of googol) would fall under the latter, and intent matters too.
As to the former, trademark classes are one of the ways in which confusion is determined, but it’s not the only way for sure.
That said, if it’s a word borrowed from another language and there’s no intent to deceive, I imagine courts would allow it - see also Apple/Apple Corps, and the many companies called “Delta” not in the airline business.
Wasps are a nuisance known for causing pain and property destruction. "vespa" as a name for a product always seemed strange to me. At least the scooters were vaguely waspish looking and buzzed around town. What's yahoo's excuse?
Aside from the famous case of Apple Corps as noted by jollofricepeas, there are multiple other trademarks for “Apple” as a single word in the US, including for tobacco, horseshoes, books, and travel.
If you look for just pure vector similarity search, there are many alternatives. But Vespa's tensor support, multi-vector indexing and the ability to express models like colBERT (1) or cross-encoders makes it stand out if you need to move beyond pure vector search support.
Plus, for RAG use cases, it's a full blown text search engine as well, allowing hybrid ranking combinations. Also with many pure vector databases like Pinecone, you cannot describe an object with more than one vector, if you have different vector models for the object, you need different indexes, and then duplicate metadata across those indexes (if you need filtering + vector search).
That's a name I haven't heard in awhile. I thought the concept of it was amazing and tags were the solution to everything. Remember machine tags? Oh how things have evolved.
Not to mention the naming trend companies took on for a decade afterwards.
machine tags were stupid. tagging was about humans labeling the important aspects for retrieval. not a replacement for all metadata, or for faceted search.
What is perhaps a bit unusual is that the Vespa team has existed for a very long time. Even before the project got its name (2004/2005?). If memory serves it will be 20 years this year or next year since the first bits of what became the Vespa platform were developed. Some of the people working there have worked together since 1998-1999.
I think much of the reason for the longevity of the team is that it was kept somewhat separate from the rest of Yahoo! so they could maintain focus.
I think Vespa's growth and innovation were driven by Yahoo's resources and infrastructure. Now as a separate entity, it might struggle to maintain the same pace of development and innovation without the backing of a tech giant like Yahoo.
This is fabulous and I’m very excited to see where this project goes.
It definitely has legs and if the team understands just how enormous the potential could be, annd especially why they could disrupt, I’m even more eager to see what they do.
Ex-yahoo here (Quit in 2022, at the height of salary boom :)
Yahoo is a profitable company with atleast 5B $ in revenue. Search and Mail are extremely profitable businesses. When was the last time you changed your mail id? Search is atleast 2B $ plus in revenue. Mail is 1B $ plus in revenue. Yahoo Finance, Yahoo sports, yahoo news are all highly trafficked websites although I dont know the revenue numbers. I do wager that if yahoo is put on stock market, its valuation will be around $10B although private equity has spent only 4B$ on acquiring them. Who knows what went into the discussions.
Why did I leave? Because I figured my career growth is elsewhere.
One thing I learnt at yahoo is that large companies take a very long time to die . You can have a profitable career in them, even if YoY growth is slowing.
I get this sentiment, but I don't know anyone in my age group (late 20s/early 30s) that use yahoo mail either. My dad did use yahoo mail, but i'm guessing it has about 20 years left.
Interesting point but makes me wonder whether it's meaningful. There are plenty of local Chinese email providers, why would anyone choose Yahoo specifically?
Good. Similarly there are people who are happy with Yahoo and stuck with it since a long time. Overtime, all the most important/common features in Gmail are also in Yahoo.
Yahoo home page is actually very engaging (in a car wreck kind of way), and they haven't updated it much in years. I definitely believe they are doing well.
I don't think the Japanese entity and US entity have any relation to each other anymore. I think US entity divested itself of Yahoo Japan completely in 2018 or so. Yahoo Japan is now collectively owned by Softbank and Naver under some holding company (it was big news in Japan when the merger happened).
Legally being placed under the same corporate entity was a while ago. As is normal for big acquisitions, actually merging the businesses is a much slower process (allegedly part of the reasons BA's first days of operating at Heathrow Terminal 5 were a mess is that it was the first time BEA and BOAC were actually working together).
My point is just, from what I've heard the switch to Line Yahoo reflects a real integration of the businesses, not just a change of brand.
No relation to Yahoo in the US. Spun off from a previous incarnation of Yahoo in that doesn't exist anymore. In practice, they're two companies that confusingly share the same name.
Maps. Groups. Upcoming/Local. Flickr. Tech was great, groundbreaking at the time (however, that had its own drag on velocity). Product "management" of anything post-acquisition was demonstrably atrocious.
But like Amazon, Yahoo had developed scalable solutions in-house to run Yahoo. That was Amazon's motivation for launching AWS. They had already developed all these solutions for themselves and got the idea to monetize their infrastructure. I don't see why Yahoo couldn't have done the same thing. Plus Yahoo was sitting on Hadoop until it got spun off. And the kicker is Yahoo briefly started to offer enterprise big data solutions and even came up with a killer app that was years ahead of it's time called Pipes. But for some reason Yahoo just abandon everything.
People tend to forget or are to young to know just how big and innovative Yahoo was at one time. I know this may sound strange if you're younger but Yahoo was literally the Google of their era. Yahoo was absolutely *massive* at their height.
"The deal for Alibaba was expensive and risky at the time, but ended up as the most lucrative bet in Silicon Valley: at today's prices, that stake would be worth more than $80 billion. Normally, such a success would preserve a company for years, but on Monday morning Yahoo announced that it would sell its operating business—the web properties without Alibaba and other investments—to Verizon for less than $5 billion, pennies on the dollar it would have commanded at its peak 15 years ago."
Yahoo was massive at one point and $35B didn't seem like a lot compared to its peak valuation as well as its competitors. I can see why Jerry Yang didn't want to settle when he had seen his company at a much higher valuation in the past.
I really don't know how Yahoo managed to fumble things so badly but they were always one of the better companies as far as technology goes - this announcement today is a testament to that. Just that their business and product side was lacking
And also pick something that's distinctive and easily Googleable. I'm tired of companies and products picking super generic names that mean I get swamped with spammy SEOd results every time I search for information/help/reviews/etc on them.
Vespa is the Italian (and Latin) word for wasp, not sure Piaggio can claim ownership outside of the domain of commercial vehicles under idk EU or WTO rules.
Even though I'm with Zilliz/Milvus now, I wholeheartedly support and recommend folks check out and try Vespa. Congrats to the Vespa team!
EDIT: For folks on Twitter, you should follow Jo (https://twitter.com/jobergum) from Vespa if you aren't already. Great combo of technical content, hot takes, and vector database memes!