Vector search
LeanLucene stores dense float vectors per segment and builds an HNSW graph at flush time by default. Searches use HNSW when a graph is present, then rerank the shortlist with exact cosine similarity.
Index a vector
using Rowles.LeanLucene.Document.Fields;
var doc = new LeanDocument();
doc.Add(new StringField("id", "v1"));
doc.Add(new VectorField("embedding", new float[] { 0.1f, 0.2f, 0.3f, 0.4f }));
writer.AddDocument(doc);
All vectors written to the same field must have the same dimensionality.
Vector fields are normalised at index time by default. This keeps cosine search cheap and consistent.
Query
using Rowles.LeanLucene.Search.Queries;
var query = new VectorQuery(
"embedding",
queryVector,
topK: 10,
efSearch: 128,
oversamplingFactor: 2);
var hits = searcher.Search(query, topN: 10);
The score is cosine similarity, range [-1, 1] (typically [0, 1] for normalised
vectors).
Build settings
using Rowles.LeanLucene.Codecs.Hnsw;
using Rowles.LeanLucene.Index.Indexer;
var config = new IndexWriterConfig
{
NormaliseVectors = true,
BuildHnswOnFlush = true,
HnswBuildConfig = new HnswBuildConfig
{
M = 16,
EfConstruction = 100,
},
};
Set HnswSeed for reproducible graph builds.
Hybrid retrieval
Combine a vector query with a text query through RRF, or add a
filter directly to VectorQuery:
var filter = new TermQuery("category", "docs");
var query = new VectorQuery("embedding", queryVector, topK: 10, filter: filter);
Implementation note
If no HNSW graph exists, LeanLucene falls back to a flat SIMD scan. Vector readers are opened lazily, so non-vector searches do not pay the mmap cost.