Table of Contents

Public namespace Rowles.LeanCorpus.Analysis.Analysers

Classes

Public class Analyser

Composable analyser that runs a tokeniser followed by a chain of filters.

Public class AnalyserFactory

Factory for creating language-specific analysers.

Public class KeywordAnalyser

Analyser that treats the complete input as a single token. The returned token list is reused across calls; callers must not hold references to it beyond the current invocation.

Thread-safety: This class maintains instance-level buffers for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.

Public class LanguageAnalyser

Configurable analyser that chains a tokeniser, lowercase normalisation, stop-word removal, and optional stemming. Used by AnalyserFactory for language-specific analysis pipelines.

Public class SimpleAnalyser

Analyser that splits text into letter-only tokens and lowercases them without stop-word removal. The returned token list is reused across calls; callers must not hold references to it beyond the current invocation.

Thread-safety: This class maintains instance-level buffers for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.

Public class StandardAnalyser

Default analyser combining tokenisation, lowercase normalisation, and stop-word removal into a single pipeline. Uses original input offsets for lowercasing to avoid double string allocation. The returned token list is reused across calls — callers must not hold references to it beyond the current invocation.

Thread-safety: This class maintains instance-level buffers (_tokensBuf, _lowerBuf, _internCache) for performance. Each instance should be used by a single thread, or callers should create separate instances per thread (as IndexWriter does in AddDocumentsConcurrent).

Public class StemmedAnalyser

Extends StandardAnalyser with Porter stemming for improved recall. Pipeline: tokenise → lowercase → stop-word removal → Porter stem.

Internal classInternal TokenTextCache
Public class WhitespaceAnalyser

Analyser that splits text only on whitespace and applies no token filters. The returned token list is reused across calls; callers must not hold references to it beyond the current invocation.

Thread-safety: This class maintains instance-level buffers for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.

Interfaces

Public interface IAnalyser

Analyses input text into a list of tokens for indexing or querying.