Table of Contents

Public classSealed EdgeNGramTokeniser

Namespace
Rowles.LeanLucene.Analysis.Tokenisers
Assembly
Rowles.LeanLucene.dll

Splits text into character substrings of length [MinGram, MaxGram] anchored at the start of each whitespace-delimited token (edge n-grams).

Thread-safety: This class maintains an instance-level intern cache (_internCache) for performance. Each instance should be used by a single thread, or callers should create separate instances per thread.

public sealed class EdgeNGramTokeniser : ITokeniser
EdgeNGramTokeniser
Implements

Constructors

Public constructor EdgeNGramTokeniser(int, int)

Initialises a new EdgeNGramTokeniser with the specified gram size range.

Properties

Public propertyRead-only MaxGram

Gets the maximum n-gram length (inclusive).

Public propertyRead-only MinGram

Gets the minimum n-gram length (inclusive).

Methods

Public method Tokenise(ReadOnlySpan<char>)

Splits the input text into a list of tokens at word boundaries.