Rowles.LeanLucene.Analysis.Stemmers
Classes
ArabicStemmer
Arabic light stemmer. Removes common Arabic prefixes and suffixes without performing full morphological analysis or root extraction. Based on the Khoja and Garside (1999) light-stemming approach. Expects lowercased, fully vowelised or unvowelised Unicode Arabic input. Hamza normalisation (أ إ آ → ا) should be applied upstream.
ChineseStemmer
Chinese stemmer — identity implementation.
DutchStemmer
Dutch Snowball-inspired stemmer. Handles common Dutch inflectional and derivational suffixes. Expects lowercased input. Dutch vowel sequences (ij, oe, eu, ui) are not decomposed here; apply normalisation upstream if needed.
EnglishStemmer
English stemmer wrapping the existing Porter stemmer implementation.
FrenchStemmer
French Snowball-inspired stemmer. Handles common French suffixes.
GermanStemmer
German Snowball-inspired stemmer. Handles common German inflectional and derivational suffixes. Operates on lowercased input and folds umlauts (ä→a, ö→o, ü→u) and ß→ss as a preliminary step, mirroring Snowball's approach for German.
ItalianStemmer
Italian Snowball-inspired stemmer. Handles common Italian inflectional and derivational suffixes. Expects lowercased, UTF-8 normalized input.
JapaneseStemmer
Japanese stemmer — identity implementation.
KoreanStemmer
Korean stemmer — identity implementation.
PortugueseStemmer
Portuguese Snowball-inspired stemmer. Handles common Portuguese inflectional and derivational suffixes. Covers both European (pt-PT) and Brazilian (pt-BR) variants. Expects lowercased, UTF-8 normalized input.
RussianStemmer
Russian Snowball-inspired stemmer. Strips common Russian inflectional endings written in Cyrillic. Based on the Dovgal/Snowball Russian algorithm. Expects lowercased input (е and ё are NOT equated — normalise upstream).
SpanishStemmer
Spanish Snowball-inspired stemmer. Handles common Spanish inflectional and derivational suffixes. Expects lowercased, UTF-8 normalized input; accented vowels (á, é, í, ó, ú) are treated as distinct characters.
Interfaces
IStemmer
Stemming contract. Implementations reduce a word to its root form.