Table of Contents

Public classSealed ChineseStemmer

Namespace
Rowles.LeanLucene.Analysis.Stemmers
Assembly
Rowles.LeanLucene.dll

Chinese stemmer — identity implementation.

public sealed class ChineseStemmer : IStemmer
ChineseStemmer
Implements

Remarks

Mandarin Chinese is an isolating language: words do not inflect via suffixes, so suffix-stripping stemming is linguistically inappropriate. The morphological unit in Chinese is the character (字) or multi-character word (词), not a stem produced by affix removal.

Meaningful normalisation for Chinese search involves:

  • Word segmentation (e.g. jieba, Lucene's CJK analyser, or a dictionary-based tokeniser)
  • Simplified ↔ Traditional character conversion
  • Full-width → half-width normalisation

This class is provided so the IStemmer pipeline compiles uniformly across all supported languages. Wire up proper segmentation as a pre-tokenisation step before passing tokens here.

Methods

Public method Stem(string)

Returns the stemmed form of the word.