Rowles.LeanLucene.Analysis.Filters
Classes
AccentFoldingFilter
Normalises accented/diacritic characters to their ASCII base form (e.g., é→e, ñ→n, ü→u) for language-neutral matching. Uses Unicode canonical decomposition followed by stripping combining marks.
HtmlStripCharFilter
Strips HTML/XML tags from input text, leaving only the text content.
LowercaseFilter
Performs an in-place lowercase transformation on tokens or a character buffer.
MappingCharFilter
Maps specific characters or strings to replacements using a lookup table. Useful for normalising special characters (e.g., smart quotes → straight quotes).
PatternReplaceCharFilter
Replaces text matching a regex pattern with a replacement string.
PorterStemmerFilter
Porter Stemming Algorithm implementation as an ITokenFilter. Based on the Porter 1980 specification for English stemming. Operates on tokens in-place, replacing text with stemmed form.
StopWordFilter
Removes common English stop words from a token list using a frozen set for fast, allocation-free lookups.
SynonymGraphFilter
Token filter that supports multi-token synonym expansion using a trie-based SynonymMap. Uses longest-match lookahead for multi-word synonyms and inserts replacement tokens at the same position offsets.
SynonymMap
Trie-based synonym map supporting multi-token source phrases. Used by SynonymGraphFilter for longest-match multi-token synonym expansion.
Interfaces
ICharFilter
Interface for character-level filters that transform raw text before tokenisation. Char filters run before the tokeniser, operating on the entire input string.