This page presents my pet project: fulltext database locus. I wanted a fulltext which would be:

Personal but not lightweight

locus must run on hardware I can afford. It's decidedly non-distributed and has no pretensions to replace Internet search engines like Altavista. On the other hand, I want to index all documents which fit on my disk, including CDs (i. e. with texts of Project Gutenberg) and I'll tolerate slower indexing and higher disk usage (30-70 percent of source text size for indexes) than Glimpse or Swish as a tradeoff for larger maximum database size and more focused search. locus was tested on 400MB in 1200 documents and can find uncommon words (i. e. the kind of words you would normally use to search for something) under ten seconds.

Smart but not programmer-hostile

The ideal of fulltext search is clear: you just type in a few words and the program finds what you meant to search for. The problem is, it doesn't always work that way. So locus gives you the choice: if you just type in a few words, it uses a relatively complicated search algorithm trying to find the best match. When you're not satisfied, you can see why it found what it found and tweak parameters to your heart's content and beyond, using a simple query language. locus can search for phrases - not just on one line with exactly matching spaces, like grep, but for words near each other - as well as topics (get a word, find fifty associations in your thesaurus and search for these). Simple stemming is also supported.

Universal back end for any front end

I don't like creating GUIs, and GUIs I do create tend to look awful even to me (not to mention others). So I decided to concentrate my work on locus on the back end. But of course, to use a program, one needs an interface... You can specify queries on the command line and read results from standard output (or redirect them to a file), and if you want anything fancier, set up your own frontend. grazer output is quite flexible - for example, you can output html and query locus databases through your browser.

Interested?

If you think you might use something like locus, you have the Linux source.

Do let me know at vbar@comp.cz how you liked it. Now there's also a mailing list for locus: send (empty) mail message to subscribe. If you have any questions, problems and/or suggestions getting, installing, understanding, using and/or extending locus, you may want to see FAQ before mailing me. You can also take a look at the available options to see all the exciting possibilities (well, all the exciting possibilities I cared to document - but there's enough of them). Your distribution contains just a (forever unfinished) core of locus. The newest version is always (well, modulo connection problems) available at locus homepage.

Some additional code and data files for special uses are here, yet more are available upon request.


Last modified 02 Jan 99.