
LetterTokeniser
- Namespace
- Rowles.LeanCorpus.Analysis.Tokenisers
- Assembly
- Rowles.LeanCorpus.dll
Splits input text into letter-only tokens, discarding digits and punctuation.
public sealed class LetterTokeniser : ITokeniser
LetterTokeniser
- Implements
Methods
Tokenise(ReadOnlySpan<char>)
Splits the input text into a list of tokens at word boundaries.
TokeniseOffsets(ReadOnlySpan<char>, List<Token>)
Emits letter-only tokens into the supplied list.
TokeniseOffsets(ReadOnlySpan<char>, List<(int Start, int End)>)
Emits letter-only token offsets into the supplied list without materialising token text.