Lucene 1.9.1 API

Apache Lucene is a high-performance, full-featured text search engine library.

See:
          Description

Core
org.apache.lucene Top-level package.
org.apache.lucene.analysis API and code to convert text into indexable tokens.
org.apache.lucene.analysis.standard A grammar-based tokenizer constructed with JavaCC.
org.apache.lucene.document The Document abstraction.
org.apache.lucene.index Code to maintain and access indices.
org.apache.lucene.queryParser A simple query parser implemented with JavaCC.
org.apache.lucene.search Search over indices.
org.apache.lucene.search.spans The calculus of spans.
org.apache.lucene.store Binary i/o API, used for all index data.
org.apache.lucene.util Some utility classes.

 

Demo
org.apache.lucene.demo  
org.apache.lucene.demo.html  

 

contrib: Analysis
org.apache.lucene.analysis.br Analyzer for Brazilian.
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese and Korean.
org.apache.lucene.analysis.cn Analyzer for Chinese.
org.apache.lucene.analysis.cz Analyzer for Czech.
org.apache.lucene.analysis.de Analyzer for German.
org.apache.lucene.analysis.el Analyzer for Greek.
org.apache.lucene.analysis.fr Analyzer for French.
org.apache.lucene.analysis.nl Analyzer for Dutch.
org.apache.lucene.analysis.ru Analyzer for Russian.

 

contrib: Ant
org.apache.lucene.ant  

 

contrib: Highlighter
org.apache.lucene.search.highlight The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.

 

contrib: Lucli
lucli  

 

contrib: Memory
org.apache.lucene.index.memory High-performance single-document main memory Apache Lucene fulltext search index.

 

contrib: Miscellaneous
org.apache.lucene.misc  
org.apache.lucene.queryParser.analyzing  
org.apache.lucene.queryParser.precedence  

 

contrib: MoreLikeThis
org.apache.lucene.search.similar Document similarity query generators.

 

contrib: RegEx
org.apache.lucene.search.regex Regular expression Query.
org.apache.regexp This package exists to allow access to useful package protected data within Jakarta Regexp.

 

contrib: Snowball
net.sf.snowball Snowball system classes.
net.sf.snowball.ext Snowball generated stemmer classes.
org.apache.lucene.analysis.snowball TokenFilter and Analyzer implementations that use Snowball stemmers.

 

contrib: SpellChecker
org.apache.lucene.search.spell Suggest alternate spellings for words.

 

contrib: Surround Parser
org.apache.lucene.queryParser.surround.parser This package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.query This package contains SrndQuery and its subclasses.

 

contrib: Swing
org.apache.lucene.swing.models Decorators for JTable TableModel and JList ListModel encapsulating Lucene indexing and searching functionality.

 

contrib: WordNet
org.apache.lucene.wordnet This package uses synonyms defined by WordNet to build a Lucene index storing them, which in turn can be used for query expansion.

 

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer();

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead (note that the 
    // parameter true will overwrite the index in that directory
    // if one exists):
    //Directory directory = FSDirectory.getDirectory("/tmp/testindex", true);
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
    iwriter.setMaxFieldLength(25000);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.TOKENIZED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory);
    // Parse a simple query that searches for "text":
    Query query = QueryParser.parse("text""fieldname", analyzer);
    Hits hits = isearcher.search(query);
    assertEquals(1, hits.length());
    // Iterate through the results:
    for (int i = 0; i < hits.length(); i++) {
      Document hitDoc = hits.doc(i);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    directory.close();

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Document's by adding Field's;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
  [ ... ]

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
  [ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
  [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
    [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHtml demo is more sophisticated.  It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
  [ ... create an index containing all the relnotes ]

> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html



Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.