Table of Contents

Public classSealed NGramTokeniser

Namespace
Rowles.LeanLucene.Analysis.Tokenisers
Assembly
Rowles.LeanLucene.dll

Splits text into all contiguous character substrings of length in [MinGram, MaxGram]. Useful for partial-word matching and CJK text.

Thread-safety: This class maintains an instance-level intern cache (_internCache) for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.

public sealed class NGramTokeniser : ITokeniser
NGramTokeniser
Implements

Constructors

Public constructor NGramTokeniser(int, int)

Initialises a new NGramTokeniser with the specified gram size range.

Properties

Public propertyRead-only MaxGram

Gets the maximum n-gram length (inclusive).

Public propertyRead-only MinGram

Gets the minimum n-gram length (inclusive).

Methods

Public method Tokenise(ReadOnlySpan<char>)

Splits the input text into a list of tokens at word boundaries.