I've looked into Vespa a bit lately. It looks pretty good!
I'm a little disappointed in its data type support, though. With ES you can throw deeply nested data structures (maps, arrays, arrays of maps, maps of arrays, etc. ad nauseum) at it and have them be fully indexed and searchable. But Vespa doesn't really do indexing of nested structures.
This means that if your application's schema is already dependent on such nested data structures, you need a mapping layer that flattens your structures. For example, if you have:
{
address_streetAddress: "1 Bone Way",
address_city: "Boneville",
address_state: "WA"
}
And then, of course, you have to unflatten when you get the results (unless you only use Vespa for the IDs and look up the original data in your main data store).
Same thing with arrays. Vespa doesn't really support arrays, whereas in ES, all attributes are technically arrays. (I.e., a "term" query/filter doesn't distinguish between the two: {term: {foo: "bar"}} will match both documents {foo: "bar"} and {foo: ["bar"]}.)
Another oddity is the system for updating your schema, which includes not just data model definitions, but a whole bunch of files which you upload as a batch. The programmatic API for updating the schema is a little impractical, much less practical than with ES where you can just do "curl -d @mappings.json" and you're done. Also not at all a fan of their use of XML.
Overall, Vespa feels more than a little antiquated. It's an old project, after all. That said, I'm probably willing to deal with the warts if it's more solid. I like that the core server is written in C++, not Java.
What has your experience been in terms of clustering? With ES you can just boot up a bunch of nodes and, on a good day, it will self-organize into a pretty nice and scalable setup. (On a bad day, your cluster will become "red" for unpredictable reasons.) Is Vespa as seamless here?
That's my big complaint with Solr, too, which I want to like because on paper it seems a ton more sane, but realistically the ability to throw a random JSON document at ES -- without having to sit down and pre-define the schema -- is invaluable.
That's because that page explains how to turn the mode on and off, how to fine-tune it (e.g. different date formats) and how to index formats other than JSON. Elasticsearch does not support a good chunk of this, so no need to document.
And even then, it will already discuss the problem with auto-guessing the content types, something that Elasticsearch mentions only later. Solr is just more upfront and explicit about the issues.
Still, you do have a point, Solr documentation tries to be comprehensive rather than ease-of-use oriented. That sometimes obscures the easy things.
I'm a little disappointed in its data type support, though. With ES you can throw deeply nested data structures (maps, arrays, arrays of maps, maps of arrays, etc. ad nauseum) at it and have them be fully indexed and searchable. But Vespa doesn't really do indexing of nested structures.
This means that if your application's schema is already dependent on such nested data structures, you need a mapping layer that flattens your structures. For example, if you have:
then you have to flatten it to something like: And then, of course, you have to unflatten when you get the results (unless you only use Vespa for the IDs and look up the original data in your main data store).Same thing with arrays. Vespa doesn't really support arrays, whereas in ES, all attributes are technically arrays. (I.e., a "term" query/filter doesn't distinguish between the two: {term: {foo: "bar"}} will match both documents {foo: "bar"} and {foo: ["bar"]}.)
Another oddity is the system for updating your schema, which includes not just data model definitions, but a whole bunch of files which you upload as a batch. The programmatic API for updating the schema is a little impractical, much less practical than with ES where you can just do "curl -d @mappings.json" and you're done. Also not at all a fan of their use of XML.
Overall, Vespa feels more than a little antiquated. It's an old project, after all. That said, I'm probably willing to deal with the warts if it's more solid. I like that the core server is written in C++, not Java.
What has your experience been in terms of clustering? With ES you can just boot up a bunch of nodes and, on a good day, it will self-organize into a pretty nice and scalable setup. (On a bad day, your cluster will become "red" for unpredictable reasons.) Is Vespa as seamless here?