Table of Contents

Public classSealed Tokeniser

Namespace
Rowles.LeanLucene.Analysis.Tokenisers
Assembly
Rowles.LeanLucene.dll

Slices input text into tokens at word boundaries, splitting on whitespace and punctuation whilst tracking character offsets.

public sealed class Tokeniser : ITokeniser
Tokeniser
Implements

Public method Tokenise(ReadOnlySpan<char>)

Splits the input text into a list of tokens at word boundaries.

Public method TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)

Emits token offsets without allocating any strings. Used by StandardAnalyser to defer string materialisation until after filtering.