Sunday, August 01, 2010

» openSUSE package index and search

I've been busy working on a new implementation of our beloved "webpin", which is actually a service for searching for packages in the insane amount of repositories and packages we have, in the distribution, in all openSUSE Build Service repositories, as well as on Packman. The thing is, it's a bit dated now, and its features are limited by the fact that it's using a relational database to perform search operations. I've been digging into Apache Solr quite a bit over the last few months (did I already mention that it totally rocks? :)) and I thought.. hmm.. why not use that for indexing packages/repositories ? So I just started out on a quick prototype, to see how well it suits the job as well as how well it performs. The results are quite stunning, to say the least, both in terms of performance (results just take a couple of milliseconds on a search index that includes openSUSE 11.1, 11.2, 11.3, all non-home: repositories in the OBS, as well as Packman for 11.1, 11.2 and 11.3.. that's.. quite a lot) as well as in terms of the quality of results -- but the latter is hardly a surprise, as Solr really excels at that. It's what it has specifically been designed and implemented for, after all. So there it is, it's already completely functional, and consists of a Solr schema definition as well as a bunch of Perl scripts to crawl, index, verify and query. The next items on the TODO list are as follows: After that, I shall probably implement an additional REST API that supports more features, as a wealth of more precise and/or complex search options are provided by Solr. I will implement those (REST API and web user interface) in Java, given that there is a faster, native format to send queries and fetch results to/from Solr. That being said, applications and web frontends that interact with Solr can be written in quite a lot of programming languages. Once I'll have a prototype of the above, I'll let you know, and will ask for testing and feedback :) If you're already interested in more information or want to help developing, please let me know (or just poke me on IRC).

Labels: , ,

3 Comments:

Blogger DrTech said...

Hi Pascal

If you want a JSON interface to a Lucene based full text search (along with a whole lot of other stuff), check out ElasticSearch

And if you're using Perl, then have a look at the Perl API I've written: ElasticSearch.pm v 0.16. (ElasticSearch.pm v 0.18 just released, but still being mirrored by CPAN)

clint

23:04  
Blogger Loki said...

But what's the point of using ElasticSearch as opposed to Solr ?
(and there's the nice WebService::Solr perl module as well ;))
I've read the marketing blurb, but I don't see anything that would make me consider using ElasticSearch instead of Solr (but ElasticSearch definitely looks like a neat piece of software, don't get me wrong). Faceted search? And I know Solr already quite well ;)

23:49  
Blogger Claes said...

How big is the index after you have indexed all the above?

00:08  

Post a Comment

<< Home