
EdgeNGramTokeniser
- Namespace
- Rowles.LeanLucene.Analysis.Tokenisers
- Assembly
- Rowles.LeanLucene.dll
Splits text into character substrings of length [MinGram, MaxGram] anchored at the start of each whitespace-delimited token (edge n-grams).
Thread-safety: This class maintains an instance-level intern cache (_internCache) for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.
public sealed class EdgeNGramTokeniser : ITokeniser
EdgeNGramTokeniser
- Implements
Constructors
EdgeNGramTokeniser(int, int)
Initialises a new EdgeNGramTokeniser with the specified gram size range.
Properties
MaxGram
Gets the maximum n-gram length (inclusive).
MinGram
Gets the minimum n-gram length (inclusive).
Methods
Tokenise(ReadOnlySpan<char>)
Splits the input text into a list of tokens at word boundaries.