
KoreanStemmer
- Namespace
- Rowles.LeanLucene.Analysis.Stemmers
- Assembly
- Rowles.LeanLucene.dll
Korean stemmer — identity implementation.
public sealed class KoreanStemmer : IStemmer
KoreanStemmer
- Implements
Remarks
Korean is a highly agglutinative language where grammatical information (tense, case, honorific level, negation, aspect) is encoded in chains of bound morphemes attached to a content root — for example, 먹다 (eat) → 먹었습니다, 먹고 싶어요, 먹히다. The boundaries between morphemes can require phonological rules (e.g. consonant assimilation) that cannot be resolved by simple string suffix removal.
Recommended pre-processing for Korean search:
- POS-tagging and morpheme segmentation with Mecab-ko, Komoran, or Nori (Lucene's
KoreanAnalyzer) - Lemmatisation to dictionary base form (원형)
- Jamo decomposition for sub-syllable indexing when required
This class is provided so the IStemmer pipeline compiles uniformly
across all supported languages.
Methods
Stem(string)
Returns the stemmed form of the word.