Apache releases Lucene Core 4.0 and Solr 4.0

shadowmint · on Oct 12, 2012

If you've never heard of Lucene and are wondering why you should care, the IBM quick start guide is a decent place to get to know what it is and what you can do with it: http://www.ibm.com/developerworks/java/library/os-apache-luc...

Basically, its a search engine; if you need to run one of those yourself, it's really neat.

encoderer · on Oct 12, 2012

Lucene + Solr is also a NoSQL database. Not the best choice for every circumstance but then again, nothing really is.

mattdeboard · on Oct 13, 2012

Indeed. But its capabilities don't really take it above what you'd get out of a normal relational DB. Or even a more traditional key/value store.

mej10 · on Oct 12, 2012

I would add that it is pretty easy to get started with on Rails as well. Using the Sunspot library for Solr is super easy to use and has been really flexible for our use cases. It is a solid piece of software.

gfodor · on Oct 12, 2012

Maybe it's just me, but over the years Solr and Lucene have become an impenetrable project. When I first started playing with Lucene many years ago it was very simple, transparent code backed by solid and understandable algorithms and data structures. Now of course, it was limited.

But now Solr has absorbed Lucene and it seems to do everything and the kitchen sink. The abstractions in Solr in my experience are mindbogglingly complicated if you want to extend its capabilities, because it does so many things.

I have yet to look at 4.0 but I hope they have removed as much as they have added. I think if the focus for the next release of Solr was to remove lots of unused, poorly factored, or complicated features it would be a huge benefit to the project.

gojomo · on Oct 12, 2012

Yes, a bit like Java itself, I see it as trending towards being an "Aircraft Supercarrier Battle Fleet" of functionality. (See also: the Hadoop ecosystem.)

But, it's free, powerful, and supported by a giant community... so once you get the hang of picking and choosing the parts you need, ignoring the rest until needed, you can get impressive results with minimal effort.

dude_abides · on Oct 12, 2012

Solr _is_ designed to be a lot more than Lucene.

You can always continue to use Lucene if you want to write the bulk of the search engine functionality yourself, and just need an indexing+search library.

aidos · on Oct 12, 2012

It does do a lot of stuff, mostly that only gets in the way once you start scratching around and realise there's a lot to figure out.

For me one place it could really be simplified is the basic config it comes with. When you first approach the project the config files are so complete that you just have no idea where to start.

Then again, once you figure out the basics you have an amazingly powerful search engine that performs admirably and can scale to massive loads.

lazyjones · on Oct 13, 2012

You can use ElasticSearch (based on Lucene - http://www.elasticsearch.org/) if Solr/Lucene seems to complex. I wasn't very successful in getting a stable setup when I tested it (more than a year ago), but as you can see from their web page, they aim at making things easier.

dermatthias · on Oct 12, 2012

Last time I checked, Solr and Nutch (the web crawler) had a really bad documentation. Only some wiki pages slapped together, mostly outdated.

Any improvements on this front?

iankp · on Oct 13, 2012

Nope, you're spot-on. The Solr Wiki reads like it expects you to be the person who wrote the entire system.

simonw · on Oct 12, 2012

Anyone know what the most exciting features are since the last major stable release?

donretag · on Oct 12, 2012

Lucid Imagination has an annotated version of the “Release Highlights” for Lucene/Solr 4.0

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4...

Spellchecking built into the index instead of needing an ancillary one is a big feature, IMHO.

Andrzej Białecki, Robert Muir, and Grant Ingersoll will be presenting a paper on the Lucene 4 architecture: http://opensearchlab.otago.ac.nz/paper_10.pdf

snapbug · on Oct 13, 2012

Will be? Already have, a couple of months ago.

dude_abides · on Oct 12, 2012

- near real-time indexing with soft commits

- zookeeper integration

languagehacker · on Oct 12, 2012

Updating a single field in a document is a pretty big feature. It will require a much slimmer and less monolithic pipeline for big-data indexing from multiple sources.

qhoxie · on Oct 12, 2012

Mike McCandless gives a good overview here: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-l...

He has some other posts on there that do a deeper dive as well.

druiid · on Oct 12, 2012

I agree with some of the other new and exciting features already posted.. but I'm surprised no one has mentioned that the Solr Cloud was taken out of beta! That is pretty awesome and we will be looking at this feature closely for sure.

erickt · on Oct 12, 2012

My favorite is the faster regex engine:

https://issues.apache.org/jira/browse/LUCENE-1606

The elasticsearch folks were waiting for that before exposing regexes to the world.

awj · on Oct 12, 2012

The spatial enhancements are pretty exciting. You can support a lot of interesting things when search can be polygon queries against polygon data. WKT is a pretty fatty format though.

blutonium · on Oct 13, 2012

Spatial is great! What's better than WKT though? WKB?

awj · on Oct 15, 2012

Yeah, that's what I was thinking. Either way you (potentially) have a lot of data to push around, but encoding things as text that could largely be fixed width binary floats really bloats them up.

ajtaylor · on Oct 13, 2012

I'm really excited about all the new features in 4.0. The more I read, the more I can't wait to try it out on my dev server. The new features that are most interesting to me: spatial search improvements, atomic updates, custom scoring options, pivot faceting, speed improvements and Solr Cloud. All these have the potential to make searches on $work's website much more relevant and faster.

dotborg · on Oct 12, 2012

API is cool and dandy, but I think project should go in other direction. Lucene focus should be REAL support for languages, lack of proper dictionaries, lame stemmers makes it hard to use.

For me Lucene was always an NoSQL database and the rest of real search stuff I had to implement myself. (not to mention Solr mess).

It's fun and cool to work on some Java code architecture, but the real need lies somewhere else.

JackParsons · on Oct 12, 2012

SolrCloud- automated peer-to-peer scaling. Very nicely done.