
IndexWriterConfig
- Namespace
- Rowles.LeanLucene.Index.Indexer
- Assembly
- Rowles.LeanLucene.dll
Configuration for the IndexWriter.
public sealed class IndexWriterConfig
Properties
AnalyserInternCacheSize
Maximum number of entries in the StandardAnalyser token intern cache. Larger caches reduce per-token string allocation for repeated terms. Default: 4096.
BKDMaxLeafSize
Maximum number of point values in a BKD tree leaf node. Smaller leaves give faster range queries at the cost of larger index files. Default: 512.
BuildHnswOnFlush
Build an HNSW graph for every vector field at flush time. Disable to fall back to flat brute-force scan (useful for tiny indices where the build overhead outweighs benefit). Default:
true.
CharFilters
Character-level filters applied to text before tokenisation. Runs in order before the analyser. Default: empty (no char filters).
CompressionPolicy
Compression algorithm for stored fields. Default: LZ4 (fast decompression). Options: None, Lz4, Zstandard.
DefaultAnalyser
Default analyser used for fields without a specific mapping.
DeletionPolicy
Deletion policy applied after each commit. Default: keep latest only.
DurableCommits
When
true(default), Commit() flushes file contents and directory metadata to disk viafsyncbefore and after thesegments_Nrename, guaranteeing the commit survives a power loss. Disable only for write-heavy benchmarks where durability is not required; correctness suffers if the host crashes mid-commit.
FieldAnalysers
Per-field analyser overrides. Key is the field name.
HnswBuildConfig
HNSW build configuration applied to every vector field. See HnswBuildConfig.
HnswSeed
Optional deterministic seed for HNSW graph construction. When null, a random seed is generated per segment and persisted into the
.hnswfile. Set explicitly for reproducible builds.
IndexSort
Optional index-time sort order. When set, documents within each segment are physically reordered at flush time. Default: null (insertion order).
MaxBufferedDocs
Maximum number of buffered documents before an automatic flush.
MaxQueuedDocs
Maximum number of documents that can be queued for indexing before AddDocument blocks. Provides backpressure to prevent unbounded memory growth. Set to 0 to disable (not recommended). Default: 2 × MaxBufferedDocs.
MaxTokensPerDocument
Maximum number of tokens allowed per text field per document. 0 means unlimited (no budget enforcement). Default: 0.
MergeThreshold
Segment count threshold that triggers a tiered merge. When the number of segments at a given size tier reaches this value, the smallest are merged. Default: 10.
MergeThrottleSegments
Maximum number of unmerged segments before AddDocument blocks until a merge completes. Provides backpressure to prevent unbounded segment accumulation. Default: 0 (disabled).
Metrics
Metrics collector for flush, merge, and commit latency tracking. Default: NullMetricsCollector (no-op).
NormaliseVectors
Whether vector fields should be normalised (L2) at index time. When true, dot product equals cosine similarity, enabling cheaper search. Default:
true.
PostingsSkipInterval
Skip interval for postings lists. Every N-th doc ID gets a skip pointer for O(log N) advance. Must be consistent between write and merge paths. Default: 128.
RamBufferSizeMB
RAM buffer size in megabytes before an automatic flush.
Schema
Optional schema defining per-field types and validation rules. When null (default), documents are accepted without schema validation.
Similarity
Scoring model used by IndexSearcher. Default: BM25.
StopWords
Custom stop words for the default StandardAnalyser. When null, the built-in English stop word list is used. Set to an empty list to disable stop word removal.
StorePayloads
Whether to store per-position payloads in the postings.
StoreTermVectors
Whether to store term vectors for text fields.
StoredFieldBlockSize
Number of documents per stored field block. Larger blocks compress better but increase random-access cost. Default: 16.
TokenBudgetPolicy
Action taken when a document exceeds MaxTokensPerDocument. Default: Truncate.