

FSTReader
- Namespace
- Rowles.LeanLucene.Codecs.Fst
- Assembly
- Rowles.LeanLucene.dll
Reads a v2 .dic file: compact byte-keyed sorted dictionary. All data is loaded into contiguous arrays at open time — no per-term string allocation. Binary search operates on raw UTF-8 bytes (~3× faster than char-span comparison).
internal sealed class FSTReader
EnumerateAllTerms()
Enumerates all terms and their postings offsets in sorted order.
GetAllTermsForField(string)
Returns all terms for a field (all terms with prefix "field\0").
GetFuzzyMatches(string, ReadOnlySpan<char>, int, int)
Returns all terms within Levenshtein distance for a field, with edit distances.
Uses prefix-sharing DP on sorted terms: consecutive terms sharing a prefix reuse
the Levenshtein row up to the longest common prefix. Dead prefixes (row min > maxEdits)
skip ahead via binary search.
When more than maxExpansions terms match, only the closest are kept.
GetTermsInRange(string, string?, string?, bool, bool)
Returns terms whose bare value falls within a lexicographic range.
GetTermsMatching(string, ReadOnlySpan<char>)
Returns all terms matching a wildcard pattern for a given field.
GetTermsMatchingRegex(string, Regex)
Returns terms for a field whose bare text matches the given compiled regex.
GetTermsWithPrefix(ReadOnlySpan<char>)
Returns all terms sharing the given qualified prefix.
IntersectAutomaton(string, IAutomaton)
Intersects the term dictionary with an automaton, returning matching terms. Operates on bare term bytes (after fieldPrefix). Uses CanMatch for pruning.
Open(IndexInput)
Opens a v2 dictionary from an IndexInput positioned just after the codec header.
TryGetPostingsOffset(ReadOnlySpan<byte>, out long)
O(1) average-case hash lookup on UTF-8 byte keys (falls back to chain walk on collision).
TryGetPostingsOffset(ReadOnlySpan<char>, out long)
O(log N) binary search accepting a char span (encodes to UTF-8 internally).