Hacker News new | past | comments | ask | show | jobs | submit login
Apache releases Lucene Core 4.0 and Solr 4.0 (apache.org)
126 points by andrevoget on Oct 12, 2012 | hide | past | favorite | 25 comments



If you've never heard of Lucene and are wondering why you should care, the IBM quick start guide is a decent place to get to know what it is and what you can do with it: http://www.ibm.com/developerworks/java/library/os-apache-luc...

Basically, its a search engine; if you need to run one of those yourself, it's really neat.


Lucene + Solr is also a NoSQL database. Not the best choice for every circumstance but then again, nothing really is.


Indeed. But its capabilities don't really take it above what you'd get out of a normal relational DB. Or even a more traditional key/value store.


I would add that it is pretty easy to get started with on Rails as well. Using the Sunspot library for Solr is super easy to use and has been really flexible for our use cases. It is a solid piece of software.


Maybe it's just me, but over the years Solr and Lucene have become an impenetrable project. When I first started playing with Lucene many years ago it was very simple, transparent code backed by solid and understandable algorithms and data structures. Now of course, it was limited.

But now Solr has absorbed Lucene and it seems to do everything and the kitchen sink. The abstractions in Solr in my experience are mindbogglingly complicated if you want to extend its capabilities, because it does so many things.

I have yet to look at 4.0 but I hope they have removed as much as they have added. I think if the focus for the next release of Solr was to remove lots of unused, poorly factored, or complicated features it would be a huge benefit to the project.


Yes, a bit like Java itself, I see it as trending towards being an "Aircraft Supercarrier Battle Fleet" of functionality. (See also: the Hadoop ecosystem.)

But, it's free, powerful, and supported by a giant community... so once you get the hang of picking and choosing the parts you need, ignoring the rest until needed, you can get impressive results with minimal effort.


Solr _is_ designed to be a lot more than Lucene.

You can always continue to use Lucene if you want to write the bulk of the search engine functionality yourself, and just need an indexing+search library.


It does do a lot of stuff, mostly that only gets in the way once you start scratching around and realise there's a lot to figure out.

For me one place it could really be simplified is the basic config it comes with. When you first approach the project the config files are so complete that you just have no idea where to start.

Then again, once you figure out the basics you have an amazingly powerful search engine that performs admirably and can scale to massive loads.


You can use ElasticSearch (based on Lucene - http://www.elasticsearch.org/) if Solr/Lucene seems to complex. I wasn't very successful in getting a stable setup when I tested it (more than a year ago), but as you can see from their web page, they aim at making things easier.


Last time I checked, Solr and Nutch (the web crawler) had a really bad documentation. Only some wiki pages slapped together, mostly outdated.

Any improvements on this front?


Nope, you're spot-on. The Solr Wiki reads like it expects you to be the person who wrote the entire system.


Anyone know what the most exciting features are since the last major stable release?


Lucid Imagination has an annotated version of the “Release Highlights” for Lucene/Solr 4.0

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4...

Spellchecking built into the index instead of needing an ancillary one is a big feature, IMHO.

Andrzej Białecki, Robert Muir, and Grant Ingersoll will be presenting a paper on the Lucene 4 architecture: http://opensearchlab.otago.ac.nz/paper_10.pdf


Will be? Already have, a couple of months ago.


- near real-time indexing with soft commits

- zookeeper integration


Updating a single field in a document is a pretty big feature. It will require a much slimmer and less monolithic pipeline for big-data indexing from multiple sources.


Mike McCandless gives a good overview here: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-l...

He has some other posts on there that do a deeper dive as well.


I agree with some of the other new and exciting features already posted.. but I'm surprised no one has mentioned that the Solr Cloud was taken out of beta! That is pretty awesome and we will be looking at this feature closely for sure.


My favorite is the faster regex engine:

https://issues.apache.org/jira/browse/LUCENE-1606

The elasticsearch folks were waiting for that before exposing regexes to the world.


The spatial enhancements are pretty exciting. You can support a lot of interesting things when search can be polygon queries against polygon data. WKT is a pretty fatty format though.


Spatial is great! What's better than WKT though? WKB?


Yeah, that's what I was thinking. Either way you (potentially) have a lot of data to push around, but encoding things as text that could largely be fixed width binary floats really bloats them up.


I'm really excited about all the new features in 4.0. The more I read, the more I can't wait to try it out on my dev server. The new features that are most interesting to me: spatial search improvements, atomic updates, custom scoring options, pivot faceting, speed improvements and Solr Cloud. All these have the potential to make searches on $work's website much more relevant and faster.


API is cool and dandy, but I think project should go in other direction. Lucene focus should be REAL support for languages, lack of proper dictionaries, lame stemmers makes it hard to use.

For me Lucene was always an NoSQL database and the rest of real search stuff I had to implement myself. (not to mention Solr mess).

It's fun and cool to work on some Java code architecture, but the real need lies somewhere else.


SolrCloud- automated peer-to-peer scaling. Very nicely done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: