Indexes

SecantusDB has a real query planner with index acceleration across most of the shapes pymongo produces. This page documents what’s wired up and how to verify it.

Creating indexes

Standard pymongo API:

coll.create_index("email", unique=True)
coll.create_index([("a", 1), ("b", -1)], name="ab_compound")
coll.create_index("status", partialFilterExpression={"active": True})
coll.create_index("createdAt", expireAfterSeconds=3600)

The _id index always exists — it’s the document table itself, walked by WT-key order.

What find() accelerates

find() routes through the index entries table for:

Single-field filters

Bare equality ({field: v}), $eq, $in, and any combination of $gt / $gte / $lt / $lte against a single-field index.

coll.create_index("score")
coll.find({"score": {"$gte": 80, "$lt": 100}})  # IXSCAN on score

When no single-field index covers field, a compound index whose leading field is field is used instead. Equality lookups become prefix scans (enc(v) + COMPOUND_SEP); range bounds are evaluated with a leading-field-only scan that uses startswith(esc_X + esc_compound_sep) to identify boundary rows.

Multi-field bare-equality filters

When the filter’s fields are a leading prefix (set-wise) of an ASC compound index, an exact match (filter covers the whole index) or a prefix scan (strict leading prefix) runs.

Filter field order doesn’t matter — {b: 20, a: 1} finds the same {a: 1, b: 1} index as {a: 1, b: 20}.

Compound prefix + trailing operator

{a: 5, b: 10, c: {$gt: 20}} walks a compound {a, b, c} index by pinning the prefix from the equalities and applying the operator’s bounds to the next column. Supports $eq / $in / $gt / $gte / $lt / $lte on the trailing field.

Mixed-direction compound indexes

Compound indexes accept any per-field direction ({a: 1, b: -1}, {a: -1, b: -1}, etc.). Each field is byte-encoded with encode_value_directed(value, dir) so the entries table sorts in the index’s natural order. When the trailing field is DESC, the operator semantics flip ($gt becomes upper-exclusive in byte order).

Partial indexes

partialFilterExpression is honoured at write time (only matching docs get entries) and at query time:

coll.create_index("n", partialFilterExpression={"status": "active"})

# Uses the index — query implies the partial filter.
coll.find({"status": "active", "n": 5})

# Falls back to scan — query doesn't imply the partial filter.
coll.find({"n": 5})

The picker requires every key/value in the partial filter to appear with the same bare value in the user filter. Partial-filter keys are stripped before matching against the index key spec, so a query like {status: "active", n: 5} against a partial {n: 1} index with filter {status: "active"} correctly uses the index.

Conservative: operator-form clauses or document-level operators ($or, $expr, …) in the partial filter aren’t recognised as implied.

Multikey fallback

SecantusDB doesn’t yet support per-element multikey indexing. Instead, indexes are flagged multikey: True (sticky — never cleared) at insert / update / create_index time when any indexed field on a doc is a list, and the picker skips multikey-flagged indexes — find() falls back to a full scan + matches() so array-element queries ({tags: "python"} against {tags: ["python", "go"]}) return the correct rows.

Sort acceleration

If the sort field matches an index’s leading field, the post-sort step is skipped — find() walks the index in order:

coll.create_index([("createdAt", -1)])
list(coll.find().sort("createdAt", -1))   # forward walk of DESC index
list(coll.find().sort("createdAt", 1))    # backward walk of DESC index

Single-field sort can use any matching index — single-field or compound, ASC or DESC. Multi-field sort still falls back to in-memory sort_docs (today only single-field sort is index-accelerated).

Hints

hint is honoured on both find and aggregate:

Hint value

Behaviour

Index name string

Walk that index

Key-spec dict matching an index

Walk that index

"$natural"

Force collection scan

"_id_" / {_id: 1}

Walk doc table order

An unknown hint surfaces as a BadValue (code 2) error to the client. The hint can also align with the sort spec to skip the post-sort step.

aggregate lifts a leading $match stage into the initial fetch’s filter so a pipeline starting with [{$match: {...}}] benefits from the same index acceleration as find.

TTL indexes

expireAfterSeconds is honoured by Storage.prune_ttl(db, coll, *, now=None) which walks the collection, deletes docs whose indexed datetime field is older than now - expireAfterSeconds, and removes their index entries.

The clock is injectable so tests can drive expiry deterministically. There is no background sweeper — real MongoDB prunes every 60s; SecantusDB requires the caller to invoke prune_ttl explicitly. This is the right ergonomics for a test harness: the test that wants TTL behaviour fires the prune itself.

from secantus import SecantusDBServer
import datetime as dt

with SecantusDBServer(port=0, storage_path=":memory:") as server:
    client = MongoClient(server.uri)
    coll = client["db"]["events"]
    coll.create_index("createdAt", expireAfterSeconds=60)
    coll.insert_one({"createdAt": dt.datetime(2026, 1, 1, tzinfo=dt.UTC)})

    # Force expiry from the test:
    pruned = server.storage.prune_ttl(
        "db", "events",
        now=dt.datetime(2026, 1, 1, 0, 5, tzinfo=dt.UTC),
    )
    assert pruned == 1

Docs without the TTL field, with non-date values, or with values inside the window are left untouched.

explain

explain reports IXSCAN when an index would be used and COLLSCAN otherwise:

plan = coll.find({"n": 5}).explain()["queryPlanner"]["winningPlan"]
# IXSCAN: {"stage": "FETCH", "filter": ...,
#          "inputStage": {"stage": "IXSCAN", "indexName": "n_1",
#                         "keyPattern": {"n": 1}, "direction": "forward"}}
# COLLSCAN: {"stage": "COLLSCAN", "filter": ...}

Storage.explain_plan(...) mirrors find_matching’s routing decisions without executing them and is exposed on the public storage API.

Geospatial — 2d and 2dsphere

Both index types ship and accelerate $geoWithin / $geoIntersects / $near / $nearSphere plus the $geoNear aggregation stage. See the dedicated Geospatial page for the operator-by-operator reference, doc-side shapes accepted, distance-unit conventions across the GeoJSON / legacy-planar / legacy-spherical spec forms, and the worked deployment example.

Quick shapes:

coll.create_index([("loc", "2dsphere")])               # GeoJSON, spherical
coll.create_index([("loc", "2d")])                     # legacy [x, y] pairs, planar
coll.create_index([("loc", "2dsphere"), ("cat", 1)])   # compound geo + scalar

Natural iteration order

Walking the doc table in WT-key order yields docs in MongoDB’s natural sort order: numeric for int/float/Decimal128 (with cross-type collision preserved by the lexical-decimal encoding), chronological for ObjectId, lexical for strings, etc. update_matching(multi=False) and find() without sort walk in this order, matching mongod.

Index acceleration paths summary

Filter shape

Path

{f: v} (bare-equality, single field)

Single-field index, prefix scan on compound, or partial-index match

{f: {$eq: v}} / {f: {$in: [...]}}

Same as bare-equality

{f: {$gt/$gte/$lt/$lte: v}}

Range scan

{f1: v1, f2: v2, ...} (bare-eq)

Compound prefix exact / scan

{f1: v1, f2: {$gt: v}}

Compound prefix + trailing operator

{f: array} against a multikey index

Falls back to scan

{partial_filter_keys, ...index_keys}

Partial-index path

Sort {f: ±1} aligned with an index leading field

B-tree walk in (reversed) order, no post-sort

hint

Forces a specific index / $natural

Acceleration summary across index types

Index type

Filter / sort shape

Path

Single-field B-tree

equality / $in / $gt/$gte/$lt/$lte / sort

IXSCAN

Compound B-tree

bare-eq prefix; eq prefix + trailing operator on the next column; ASC/DESC mix; multi-field sort that matches or exactly inverts the key spec

IXSCAN, no post-sort

Partial

when the user filter implies the partial expression

IXSCAN

Multikey

equality / $in / range on the array column; whole-array equality goes through the canonical key entry

IXSCAN (sort-acceleration skipped — multikey doesn’t preserve a single natural order)

TTL

timestamp range + prune sweeper drives expiry

IXSCAN

2dsphere

$geoWithin / $geoIntersects / $near / $nearSphere / $geoNear via S2 cell-covering scan

IXSCAN — see Geospatial

2d

same operators via quadtree-decomposed Z-order range scan over a bit-interleaved geohash

IXSCAN — see Geospatial

Compound geo + scalar

geo column drives the cell scan; trailing scalar(s) filtered at the verifier step

IXSCAN

What’s still missing

  • Per-index collationcreateIndexes stores the option on the index spec, but entries are written in BSON codepoint order (no collation-aware sort key). Queries that carry collation fall through to COLLSCAN by design. The per-query collation infrastructure does honour collation for find / count / distinct / findAndModify — it’s just the index-side enforcement that’s missing.

  • TTL background sweeperprune_ttl is opt-in; no 60-second cadence sweeper. Real mongod runs one; for an in-process test surrogate the explicit-call ergonomics suit the audience better.

  • Text / hashed indexes — out of scope (no full-text engine; no practical workload pulling hashed shard-key behaviour into an in-process surrogate).