
NGramTokeniser
- Namespace
- Rowles.LeanLucene.Analysis.Tokenisers
- Assembly
- Rowles.LeanLucene.dll
Splits text into all contiguous character substrings of length in [MinGram, MaxGram]. Useful for partial-word matching and CJK text.
Thread-safety: This class maintains an instance-level intern cache (_internCache) for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.
public sealed class NGramTokeniser : ITokeniser
NGramTokeniser
- Implements
Constructors
NGramTokeniser(int, int)
Initialises a new NGramTokeniser with the specified gram size range.
Properties
MaxGram
Gets the maximum n-gram length (inclusive).
MinGram
Gets the minimum n-gram length (inclusive).
Methods
Tokenise(ReadOnlySpan<char>)
Splits the input text into a list of tokens at word boundaries.