# Concurrency: what scales, what doesn't SecantusDB is a single-process embeddable MongoDB server. This page is about what that means for **concurrent writers** — many client connections issuing inserts/updates/deletes at the same time. The short version: **don't expect write throughput to scale with the number of concurrent writers**. The ceiling is in WiredTiger itself, not in SecantusDB's Python layer above it. If your workload depends on multi-writer scaling, run a real `mongod` instead. ## What scales fine - **Concurrent reads.** Multiple `find` / `count` / `aggregate` calls against the same or different collections run in parallel under WiredTiger's MVCC. Reads don't block writes and don't block other reads. - **Per-connection isolation.** Each TCP connection gets its own server thread and its own WiredTiger session. Sessions don't contend on each other for reads. - **Single-writer throughput.** A single connection driving inserts via `insert_many` (batched) hits ~5,000 docs/s on commodity laptop hardware with logging on, or ~30,000+ docs/s with logging disabled (which trades crash durability for speed; not recommended for real workloads). ## What doesn't scale Aggregate write throughput across multiple writer connections. **Adding writer connections does not increase aggregate throughput past N≈2 — and at N=4+ it can actively decrease it.** We measured this carefully because the question kept coming up. The benchmark and the data are at `bench/wt_poc/`; you can re-run it on your hardware to confirm. ### The headline number `bench/wt_poc/run.py` runs the same workload (50,000 row inserts, each row ~1 KiB, partitioned across N writers writing to their own table) through three paths: | N writers | Pure-C + pthread (no Python) | Python + WT SWIG bindings | |---|---|---| | 1 | 276,449 rows/s (1.00×) | 116,578 rows/s (1.00×) | | 2 | 340,106 rows/s (1.23×) | 87,010 rows/s (0.75×) | | 4 | 352,731 rows/s (1.28×) | 67,660 rows/s (0.58×) | | 8 | 285,146 rows/s (1.03×) | 58,751 rows/s (0.50×) | The pure-C column is the theoretical best case: pthreads, no GIL, no Python on the hot path, calling `libwiredtiger` directly. **Even that** caps at ~1.3× of single-thread aggregate throughput at N=2 and flatlines (or regresses) past that. The bottleneck is at the WT C library level — B-tree page locks, log write serialisation, cache eviction, internal scheduler. It's the same library `mongod` uses, but `mongod` gets multi-writer scaling by running a careful C++ scheduler above WT that takes advantage of lower-level WT primitives (per-cursor concurrency hints, parallel cursor batches, careful checkpoint coordination). SecantusDB doesn't have that scheduler — and writing one isn't a SecantusDB project; it would essentially be re-implementing `mongod`. ### Why disabling logging doesn't fix it A natural follow-up: maybe the journal is the serialiser. We tested that — same C benchmark, `log=(enabled=false)`: | N writers | Pure-C + pthread, no log | |---|---| | 1 | 1,007,557 rows/s (1.00×) | | 2 | 1,156,150 rows/s (1.15×) | | 4 | 700,035 rows/s (**0.69×**) | | 8 | 347,176 rows/s (**0.34×**) | Single-thread is much faster (~4×) but multi-thread is *worse* — collapses at N=4 and N=8. Disabling logging is a single-writer optimisation that loses crash durability AND fails to deliver concurrency. ### What this means for your workload - **One connection doing batched writes** is the fastest configuration and what we recommend for tests / dev / single-process applications. `pymongo`'s `insert_many` with batch=100 is ~5,000 docs/s on commodity hardware with full durability. - **Many connections doing concurrent writes** caps around the single-writer rate and may go *slower* if you push N high. Run a real `mongod` if your workload depends on this. - **Many connections doing concurrent reads** scales fine. Reads use MVCC snapshots and don't contend. - **Mixed read/write at moderate N** works as expected: writes serialise, reads run in parallel against an MVCC snapshot. ## Mitigations within SecantusDB If you genuinely need higher single-process write throughput from SecantusDB, the levers are: 1. **Batch larger.** `insert_many` with batch=100 is ~2× the throughput of `insert_one`. Going larger has diminishing returns. 2. **Reduce server-side work.** Drop indexes you don't need. Each index adds per-doc encode + WT cursor write. 3. **Disable the oplog if you don't need change streams.** Pass `replica_set_name=None` to `SecantusDBServer` (or run without `--auth` *and* without a replica-set advertisement). Halves per-write WT cursor traffic. 4. **`writeConcern: w:0`** for fire-and-forget writes — pymongo doesn't wait for the server's ack. Throughput climbs on the client side; server-side cost is unchanged. ## What we tried, what didn't work The path we took to nail this down (preserved here so future contributors don't re-walk it): - **Lock-decomposition** (replace global `Storage._lock` with per-collection locks + tiny `_oplog_seq_lock`). Did clean up several internal correctness issues — see the ``tasks/wt-concurrency-plan.md`` writeup — but didn't move multi-writer scaling. Bottleneck wasn't the Python lock layer. - **Profiling the insert hot path** (`bench/profile_insert.py`). Showed 50%+ of wall time was in WiredTiger's SWIG-generated Python bindings (`wiredtiger/packing.py`), not in our code. Suggested a Cython rebind would help. - **The pure-C pthread benchmark** (`bench/wt_poc/`). Killed the Cython rebind hypothesis: even with no Python anywhere, WT itself doesn't scale past N≈2. The bindings are a constant overhead; removing them wouldn't change the multi-writer story. The artefacts of all three exploration tracks are kept in the repo as reproducible evidence. Re-run them when somebody asks "but what if we just X?" and confirm the numbers haven't moved. ## Tracking `tests/test_concurrency.py` is marked `xfail` (expected-fail) — it encodes the goal "2 concurrent writers >= 0.7× of one" which the storage backend cannot deliver. Useful as a regression *detector*: if WiredTiger ever ships a higher-concurrency story upstream, that test will unexpectedly pass and the surprise will surface in the test logs.