org.apache.lucene.analysis.compound
Class DictionaryCompoundWordTokenFilter
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
org.apache.lucene.analysis.compound.DictionaryCompoundWordTokenFilter
public class DictionaryCompoundWordTokenFilter
- extends CompoundWordTokenFilterBase
A TokenFilter that decomposes compound words found in many germanic languages
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find
"Donaudampfschiff" even when you only enter "schiff".
It uses a brute-force algorithm to achieve this.
Constructor Summary |
DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
- Parameters:
input
- the token stream to processdictionary
- the word dictionary to match againstminWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
java.lang.String[] dictionary)
- Parameters:
input
- the token stream to processdictionary
- the word dictionary to match against
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary)
- Parameters:
input
- the token stream to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.
DictionaryCompoundWordTokenFilter
public DictionaryCompoundWordTokenFilter(TokenStream input,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
- Parameters:
input
- the token stream to processdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output streammaxSubwordSize
- only subwords shorter than this get to the output streamonlyLongestMatch
- Add only the longest matching subword to the stream
decomposeInternal
protected void decomposeInternal(Token token)
- Specified by:
decomposeInternal
in class CompoundWordTokenFilterBase
Copyright © 2000-2009 Apache Software Foundation. All Rights Reserved.