
Tokeniser
- Namespace
- Rowles.LeanLucene.Analysis.Tokenisers
- Assembly
- Rowles.LeanLucene.dll
Slices input text into tokens at word boundaries, splitting on whitespace and punctuation whilst tracking character offsets.
public sealed class Tokeniser : ITokeniser
Tokeniser
- Implements
Tokenise(ReadOnlySpan<char>)
Splits the input text into a list of tokens at word boundaries.
TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)
Emits token offsets without allocating any strings. Used by StandardAnalyser to defer string materialisation until after filtering.