Table of Contents

Public classSealed WhitespaceTokeniser

Namespace
Rowles.LeanCorpus.Analysis.Tokenisers
Assembly
Rowles.LeanCorpus.dll

Splits input text into tokens separated only by whitespace.

public sealed class WhitespaceTokeniser : ITokeniser
WhitespaceTokeniser
Implements

Methods

Public method Tokenise(ReadOnlySpan<char>)

Splits the input text into a list of tokens at word boundaries.

Public method TokeniseOffsets(ReadOnlySpan<char>, List<Token>)

Emits whitespace-delimited tokens into the supplied list.

Internal methodInternal TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)

Emits whitespace-delimited token offsets into the supplied list without materialising token text.