Table of Contents

Public namespace Rowles.LeanLucene.Analysis.Stemmers

Classes

Public class ArabicStemmer

Arabic light stemmer. Removes common Arabic prefixes and suffixes without performing full morphological analysis or root extraction. Based on the Khoja and Garside (1999) light-stemming approach. Expects lowercased, fully vowelised or unvowelised Unicode Arabic input. Hamza normalisation (أ إ آ → ا) should be applied upstream.

Public class ChineseStemmer

Chinese stemmer — identity implementation.

Public class DutchStemmer

Dutch Snowball-inspired stemmer. Handles common Dutch inflectional and derivational suffixes. Expects lowercased input. Dutch vowel sequences (ij, oe, eu, ui) are not decomposed here; apply normalisation upstream if needed.

Public class EnglishStemmer

English stemmer wrapping the existing Porter stemmer implementation.

Public class FrenchStemmer

French Snowball-inspired stemmer. Handles common French suffixes.

Public class GermanStemmer

German Snowball-inspired stemmer. Handles common German inflectional and derivational suffixes. Operates on lowercased input and folds umlauts (ä→a, ö→o, ü→u) and ß→ss as a preliminary step, mirroring Snowball's approach for German.

Public class ItalianStemmer

Italian Snowball-inspired stemmer. Handles common Italian inflectional and derivational suffixes. Expects lowercased, UTF-8 normalized input.

Public class JapaneseStemmer

Japanese stemmer — identity implementation.

Public class KoreanStemmer

Korean stemmer — identity implementation.

Public class PortugueseStemmer

Portuguese Snowball-inspired stemmer. Handles common Portuguese inflectional and derivational suffixes. Covers both European (pt-PT) and Brazilian (pt-BR) variants. Expects lowercased, UTF-8 normalized input.

Public class RussianStemmer

Russian Snowball-inspired stemmer. Strips common Russian inflectional endings written in Cyrillic. Based on the Dovgal/Snowball Russian algorithm. Expects lowercased input (е and ё are NOT equated — normalise upstream).

Public class SpanishStemmer

Spanish Snowball-inspired stemmer. Handles common Spanish inflectional and derivational suffixes. Expects lowercased, UTF-8 normalized input; accented vowels (á, é, í, ó, ú) are treated as distinct characters.

Interfaces

Public interface IStemmer

Stemming contract. Implementations reduce a word to its root form.