Table of Contents

Public classSealed KoreanStemmer

Namespace
Rowles.LeanLucene.Analysis.Stemmers
Assembly
Rowles.LeanLucene.dll

Korean stemmer — identity implementation.

public sealed class KoreanStemmer : IStemmer
KoreanStemmer
Implements

Remarks

Korean is a highly agglutinative language where grammatical information (tense, case, honorific level, negation, aspect) is encoded in chains of bound morphemes attached to a content root — for example, 먹다 (eat) → 먹었습니다, 먹고 싶어요, 먹히다. The boundaries between morphemes can require phonological rules (e.g. consonant assimilation) that cannot be resolved by simple string suffix removal.

Recommended pre-processing for Korean search:

  • POS-tagging and morpheme segmentation with Mecab-ko, Komoran, or Nori (Lucene's KoreanAnalyzer)
  • Lemmatisation to dictionary base form (원형)
  • Jamo decomposition for sub-syllable indexing when required

This class is provided so the IStemmer pipeline compiles uniformly across all supported languages.

Methods

Public method Stem(string)

Returns the stemmed form of the word.