
ChineseStemmer
- Namespace
- Rowles.LeanLucene.Analysis.Stemmers
- Assembly
- Rowles.LeanLucene.dll
Chinese stemmer — identity implementation.
public sealed class ChineseStemmer : IStemmer
ChineseStemmer
- Implements
Remarks
Mandarin Chinese is an isolating language: words do not inflect via suffixes, so suffix-stripping stemming is linguistically inappropriate. The morphological unit in Chinese is the character (字) or multi-character word (词), not a stem produced by affix removal.
Meaningful normalisation for Chinese search involves:
- Word segmentation (e.g. jieba, Lucene's CJK analyser, or a dictionary-based tokeniser)
- Simplified ↔ Traditional character conversion
- Full-width → half-width normalisation
This class is provided so the IStemmer pipeline compiles uniformly
across all supported languages. Wire up proper segmentation as a pre-tokenisation
step before passing tokens here.
Methods
Stem(string)
Returns the stemmed form of the word.