Class Hunspell
java.lang.Object
org.apache.lucene.analysis.hunspell.Hunspell
A spell checker based on Hunspell dictionaries. This class can be used in place of native
Hunspell for many languages for spell-checking and suggesting purposes. Note that not all
languages are supported yet. For example:
- Hungarian (as it doesn't only rely on dictionaries, but has some logic directly in the source code
- Languages with Unicode characters outside of the Basic Multilingual Plane
- PHONE affix file option for suggestions
The objects of this class are thread-safe.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescription(package private) final Runnable(package private) final Dictionaryprivate final TimeoutPolicy(package private) final Stemmer(package private) static final long -
Constructor Summary
ConstructorsConstructorDescriptionHunspell(Dictionary dictionary) Hunspell(Dictionary dictionary, TimeoutPolicy policy, Runnable checkCanceled) -
Method Summary
Modifier and TypeMethodDescriptionprivate booleanacceptCase(WordCase originalCase, int entryId, CharsRef root) (package private) booleanacceptsStem(int formID) private booleancanBeBrokenAt(String word, String breakStr, int breakPos) private booleancheckCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev) private booleancheckCompoundRules(char[] wordChars, int offset, int length, List<IntsRef> words) private booleancheckCompounds(char[] wordChars, int length, WordCase originalCase) private booleancheckCompounds(CharsRef word, WordCase originalCase, Hunspell.CompoundPart prev) private booleancheckCompoundsAfter(WordCase originalCase, Hunspell.CompoundPart prev) private booleancheckLastCompoundPart(char[] wordChars, int start, int length, List<IntsRef> words) (package private) BooleancheckSimpleWord(char[] wordChars, int length, WordCase originalCase) private RunnablecheckTimeLimit(String word, Set<Suggestion> suggestions, long timeLimitMs) private boolean(package private) booleanprivate booleancontainsSharpS(char[] word, int offset, int length) private voiddoSuggest(String word, WordCase wordCase, LinkedHashSet<Suggestion> suggestions, Runnable checkCanceled) findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context) Find all roots that could result in the given word after case conversion and adding affixes.private booleanhasForceUCaseProblem(Root<?> root, WordCase originalCase, char[] wordChars) private booleanprivate static booleanisDigit(char c) private static booleanprivate booleanmayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos) postprocess(Collection<Suggestion> suggestions) booleanprivate booleanspellClean(String word) private booleanspellWithTrailingDots(String word) private boolean
-
Field Details
-
SUGGEST_TIME_LIMIT
static final long SUGGEST_TIME_LIMIT- See Also:
-
dictionary
-
stemmer
-
policy
-
checkCanceled
-
-
Constructor Details
-
Hunspell
-
Hunspell
- Parameters:
policy- a strategy determining what to do when API calls take too much timecheckCanceled- an object that's periodically called, allowing to interrupt spell-checking or suggestion generation by throwing an exception
-
-
Method Details
-
spell
- Returns:
- whether the given word's spelling is considered correct according to Hunspell rules
-
spellClean
-
spellWithTrailingDots
-
checkWord
-
checkSimpleWord
-
checkWord
-
checkCompounds
-
findStem
private Root<CharsRef> findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context) -
acceptCase
-
containsSharpS
private boolean containsSharpS(char[] word, int offset, int length) -
acceptsStem
boolean acceptsStem(int formID) -
checkCompounds
-
checkCompoundPatternReplacements
private boolean checkCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev) -
checkCompoundsAfter
-
hasForceUCaseProblem
-
getRoots
Find all roots that could result in the given word after case conversion and adding affixes. This corresponds to the originalhunspell -s(stemming) functionality.Some affix rules are relaxed in this stemming process: e.g. explicitly forbidden words are still returned. Some of the returned roots may be synthetic and not directly occur in the *.dic file (but differ from some existing entries in case). No roots are returned for compound words.
The returned roots may be used to retrieve morphological data via
Dictionary.lookupEntries(java.lang.String). -
mayBreakIntoCompounds
private boolean mayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos) -
checkCompoundRules
-
checkLastCompoundPart
-
isNumber
-
isDigit
private static boolean isDigit(char c) -
tryBreaks
-
hasTooManyBreakOccurrences
-
canBeBrokenAt
-
suggest
- Returns:
- suggestions for the given misspelled word
- Throws:
SuggestionTimeoutException- if the computation takes too long andTimeoutPolicy.THROW_EXCEPTIONwas specified in the constructor
-
suggest
- Parameters:
word- the misspelled word to calculate suggestions fortimeLimitMs- the duration limit in milliseconds, after which the associatedTimeoutPolicy's effects (exception or partial result) may kick in- Throws:
SuggestionTimeoutException- if the computation takes too long andTimeoutPolicy.THROW_EXCEPTIONwas specified in the constructor
-
doSuggest
private void doSuggest(String word, WordCase wordCase, LinkedHashSet<Suggestion> suggestions, Runnable checkCanceled) -
checkTimeLimit
-
postprocess
-
modifyChunksBetweenDashes
-