Table of Contents

Public namespace Rowles.LeanLucene.Analysis.Filters

Classes

Public class AccentFoldingFilter

Normalises accented/diacritic characters to their ASCII base form (e.g., é→e, ñ→n, ü→u) for language-neutral matching. Uses Unicode canonical decomposition followed by stripping combining marks.

Public class HtmlStripCharFilter

Strips HTML/XML tags from input text, leaving only the text content.

Public class LowercaseFilter

Performs an in-place lowercase transformation on tokens or a character buffer.

Public class MappingCharFilter

Maps specific characters or strings to replacements using a lookup table. Useful for normalising special characters (e.g., smart quotes → straight quotes).

Public class PatternReplaceCharFilter

Replaces text matching a regex pattern with a replacement string.

Public class PorterStemmerFilter

Porter Stemming Algorithm implementation as an ITokenFilter. Based on the Porter 1980 specification for English stemming. Operates on tokens in-place, replacing text with stemmed form.

Public class StopWordFilter

Removes common English stop words from a token list using a frozen set for fast, allocation-free lookups.

Public class SynonymGraphFilter

Token filter that supports multi-token synonym expansion using a trie-based SynonymMap. Uses longest-match lookahead for multi-word synonyms and inserts replacement tokens at the same position offsets.

Public class SynonymMap

Trie-based synonym map supporting multi-token source phrases. Used by SynonymGraphFilter for longest-match multi-token synonym expansion.

Interfaces

Public interface ICharFilter

Interface for character-level filters that transform raw text before tokenisation. Char filters run before the tokeniser, operating on the entire input string.