ChineseStemmer

Namespace: Rowles.LeanLucene.Analysis.Stemmers

Assembly: Rowles.LeanLucene.dll

Chinese stemmer — identity implementation.

public sealed class ChineseStemmer : IStemmer

ChineseStemmer

Implements: IStemmer

Remarks

Mandarin Chinese is an isolating language: words do not inflect via suffixes, so suffix-stripping stemming is linguistically inappropriate. The morphological unit in Chinese is the character (字) or multi-character word (词), not a stem produced by affix removal.

Meaningful normalisation for Chinese search involves:

Word segmentation (e.g. jieba, Lucene's CJK analyser, or a dictionary-based tokeniser)
Simplified ↔ Traditional character conversion
Full-width → half-width normalisation

This class is provided so the IStemmer pipeline compiles uniformly across all supported languages. Wire up proper segmentation as a pre-tokenisation step before passing tokens here.

Methods

Stem(string): Returns the stemmed form of the word.

Table of Contents

ChineseStemmer

Remarks

Methods