org.apache.lucene.analysis.cz
Class CzechAnalyzer
public final class CzechAnalyzer
Analyzer for Czech language. Supports an external list of stopwords (words that
will not be indexed at all).
A default set of stopwords is used unless an alternative list is specified, the
exclusion list is empty by default.
- Lukas Zapletal [lzap@root.cz]
void | loadStopWords(InputStream wordfile, String encoding) - Loads stopwords hash from resource stream (file, database...).
|
TokenStream | tokenStream(String fieldName, Reader reader) - Creates a TokenStream which tokenizes all the text in the provided Reader.
|
CZECH_STOP_WORDS
public static final String[] CZECH_STOP_WORDS
List of typical stopwords.
CzechAnalyzer
public CzechAnalyzer()
CzechAnalyzer
public CzechAnalyzer(File stopwords)
throws IOException
Builds an analyzer with the given stop words.
CzechAnalyzer
public CzechAnalyzer(HashSet stopwords)
CzechAnalyzer
public CzechAnalyzer(Hashtable stopwords)
Builds an analyzer with the given stop words.
CzechAnalyzer
public CzechAnalyzer(String[] stopwords)
Builds an analyzer with the given stop words.
loadStopWords
public void loadStopWords(InputStream wordfile,
String encoding)
Loads stopwords hash from resource stream (file, database...).
wordfile
- File containing the wordlistencoding
- Encoding used (win-1250, iso-8859-2, ...), null for default system encoding
tokenStream
public final TokenStream tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.
- tokenStream in interface Analyzer
- A TokenStream build from a StandardTokenizer filtered with
StandardFilter, LowerCaseFilter, and StopFilter
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.