IndexWriterConfig

Namespace: Rowles.LeanLucene.Index.Indexer

Assembly: Rowles.LeanLucene.dll

Configuration for the IndexWriter.

public sealed class IndexWriterConfig

IndexWriterConfig

Properties

AnalyserInternCacheSize: Maximum number of entries in the StandardAnalyser token intern cache. Larger caches reduce per-token string allocation for repeated terms. Default: 4096.

BKDMaxLeafSize: Maximum number of point values in a BKD tree leaf node. Smaller leaves give faster range queries at the cost of larger index files. Default: 512.

BuildHnswOnFlush: Build an HNSW graph for every vector field at flush time. Disable to fall back to flat brute-force scan (useful for tiny indices where the build overhead outweighs benefit). Default: true.

CharFilters: Character-level filters applied to text before tokenisation. Runs in order before the analyser. Default: empty (no char filters).

CompressionPolicy: Compression algorithm for stored fields. Default: LZ4 (fast decompression). Options: None, Lz4, Zstandard.

DefaultAnalyser: Default analyser used for fields without a specific mapping.

DeletionPolicy: Deletion policy applied after each commit. Default: keep latest only.

DurableCommits: When true (default), Commit() flushes file contents and directory metadata to disk via fsync before and after the segments_N rename, guaranteeing the commit survives a power loss. Disable only for write-heavy benchmarks where durability is not required; correctness suffers if the host crashes mid-commit.

FieldAnalysers: Per-field analyser overrides. Key is the field name.

HnswBuildConfig: HNSW build configuration applied to every vector field. See HnswBuildConfig.

HnswSeed: Optional deterministic seed for HNSW graph construction. When null, a random seed is generated per segment and persisted into the .hnsw file. Set explicitly for reproducible builds.

IndexSort: Optional index-time sort order. When set, documents within each segment are physically reordered at flush time. Default: null (insertion order).

MaxBufferedDocs: Maximum number of buffered documents before an automatic flush.

MaxQueuedDocs: Maximum number of documents that can be queued for indexing before AddDocument blocks. Provides backpressure to prevent unbounded memory growth. Set to 0 to disable (not recommended). Default: 2 × MaxBufferedDocs.

MaxTokensPerDocument: Maximum number of tokens allowed per text field per document. 0 means unlimited (no budget enforcement). Default: 0.

MergeThreshold: Segment count threshold that triggers a tiered merge. When the number of segments at a given size tier reaches this value, the smallest are merged. Default: 10.

MergeThrottleSegments: Maximum number of unmerged segments before AddDocument blocks until a merge completes. Provides backpressure to prevent unbounded segment accumulation. Default: 0 (disabled).

Metrics: Metrics collector for flush, merge, and commit latency tracking. Default: NullMetricsCollector (no-op).

NormaliseVectors: Whether vector fields should be normalised (L2) at index time. When true, dot product equals cosine similarity, enabling cheaper search. Default: true.

PostingsSkipInterval: Skip interval for postings lists. Every N-th doc ID gets a skip pointer for O(log N) advance. Must be consistent between write and merge paths. Default: 128.

RamBufferSizeMB: RAM buffer size in megabytes before an automatic flush.

Schema: Optional schema defining per-field types and validation rules. When null (default), documents are accepted without schema validation.

Similarity: Scoring model used by IndexSearcher. Default: BM25.

StopWords: Custom stop words for the default StandardAnalyser. When null, the built-in English stop word list is used. Set to an empty list to disable stop word removal.

StorePayloads: Whether to store per-position payloads in the postings.

StoreTermVectors: Whether to store term vectors for text fields.

StoredFieldBlockSize: Number of documents per stored field block. Larger blocks compress better but increase random-access cost. Default: 16.

TokenBudgetPolicy: Action taken when a document exceeds MaxTokensPerDocument. Default: Truncate.

Edit this page