Concurrency: what scales, what doesn’t¶
SecantusDB is a single-process embeddable MongoDB server. This page is about what that means for concurrent writers — many client connections issuing inserts/updates/deletes at the same time.
The short version: don’t expect write throughput to scale with the
number of concurrent writers. The ceiling is in WiredTiger itself,
not in SecantusDB’s Python layer above it. If your workload depends on
multi-writer scaling, run a real mongod instead.
What scales fine¶
Concurrent reads. Multiple
find/count/aggregatecalls against the same or different collections run in parallel under WiredTiger’s MVCC. Reads don’t block writes and don’t block other reads.Per-connection isolation. Each TCP connection gets its own server thread and its own WiredTiger session. Sessions don’t contend on each other for reads.
Single-writer throughput. A single connection driving inserts via
insert_many(batched) hits ~5,000 docs/s on commodity laptop hardware with logging on, or ~30,000+ docs/s with logging disabled (which trades crash durability for speed; not recommended for real workloads).
What doesn’t scale¶
Aggregate write throughput across multiple writer connections. Adding writer connections does not increase aggregate throughput past N≈2 — and at N=4+ it can actively decrease it.
We measured this carefully because the question kept coming up. The
benchmark and the data are at bench/wt_poc/; you can re-run it on
your hardware to confirm.
The headline number¶
bench/wt_poc/run.py runs the same workload (50,000 row inserts,
each row ~1 KiB, partitioned across N writers writing to their own
table) through three paths:
N writers |
Pure-C + pthread (no Python) |
Python + WT SWIG bindings |
|---|---|---|
1 |
276,449 rows/s (1.00×) |
116,578 rows/s (1.00×) |
2 |
340,106 rows/s (1.23×) |
87,010 rows/s (0.75×) |
4 |
352,731 rows/s (1.28×) |
67,660 rows/s (0.58×) |
8 |
285,146 rows/s (1.03×) |
58,751 rows/s (0.50×) |
The pure-C column is the theoretical best case: pthreads, no GIL, no
Python on the hot path, calling libwiredtiger directly. Even
that caps at ~1.3× of single-thread aggregate throughput at N=2 and
flatlines (or regresses) past that.
The bottleneck is at the WT C library level — B-tree page locks, log
write serialisation, cache eviction, internal scheduler. It’s the
same library mongod uses, but mongod gets multi-writer scaling by
running a careful C++ scheduler above WT that takes advantage of
lower-level WT primitives (per-cursor concurrency hints, parallel
cursor batches, careful checkpoint coordination). SecantusDB doesn’t
have that scheduler — and writing one isn’t a SecantusDB project; it
would essentially be re-implementing mongod.
Why disabling logging doesn’t fix it¶
A natural follow-up: maybe the journal is the serialiser. We tested
that — same C benchmark, log=(enabled=false):
N writers |
Pure-C + pthread, no log |
|---|---|
1 |
1,007,557 rows/s (1.00×) |
2 |
1,156,150 rows/s (1.15×) |
4 |
700,035 rows/s (0.69×) |
8 |
347,176 rows/s (0.34×) |
Single-thread is much faster (~4×) but multi-thread is worse — collapses at N=4 and N=8. Disabling logging is a single-writer optimisation that loses crash durability AND fails to deliver concurrency.
What this means for your workload¶
One connection doing batched writes is the fastest configuration and what we recommend for tests / dev / single-process applications.
pymongo’sinsert_manywith batch=100 is ~5,000 docs/s on commodity hardware with full durability.Many connections doing concurrent writes caps around the single-writer rate and may go slower if you push N high. Run a real
mongodif your workload depends on this.Many connections doing concurrent reads scales fine. Reads use MVCC snapshots and don’t contend.
Mixed read/write at moderate N works as expected: writes serialise, reads run in parallel against an MVCC snapshot.
Mitigations within SecantusDB¶
If you genuinely need higher single-process write throughput from SecantusDB, the levers are:
Batch larger.
insert_manywith batch=100 is ~2× the throughput ofinsert_one. Going larger has diminishing returns.Reduce server-side work. Drop indexes you don’t need. Each index adds per-doc encode + WT cursor write.
Disable the oplog if you don’t need change streams. Pass
replica_set_name=NonetoSecantusDBServer(or run without--authand without a replica-set advertisement). Halves per-write WT cursor traffic.writeConcern: w:0for fire-and-forget writes — pymongo doesn’t wait for the server’s ack. Throughput climbs on the client side; server-side cost is unchanged.
What we tried, what didn’t work¶
The path we took to nail this down (preserved here so future contributors don’t re-walk it):
Lock-decomposition (replace global
Storage._lockwith per-collection locks + tiny_oplog_seq_lock). Did clean up several internal correctness issues — see thetasks/wt-concurrency-plan.mdwriteup — but didn’t move multi-writer scaling. Bottleneck wasn’t the Python lock layer.Profiling the insert hot path (
bench/profile_insert.py). Showed 50%+ of wall time was in WiredTiger’s SWIG-generated Python bindings (wiredtiger/packing.py), not in our code. Suggested a Cython rebind would help.The pure-C pthread benchmark (
bench/wt_poc/). Killed the Cython rebind hypothesis: even with no Python anywhere, WT itself doesn’t scale past N≈2. The bindings are a constant overhead; removing them wouldn’t change the multi-writer story.
The artefacts of all three exploration tracks are kept in the repo as reproducible evidence. Re-run them when somebody asks “but what if we just X?” and confirm the numbers haven’t moved.
Tracking¶
tests/test_concurrency.py is marked xfail (expected-fail) — it
encodes the goal “2 concurrent writers >= 0.7× of one” which the
storage backend cannot deliver. Useful as a regression detector: if
WiredTiger ever ships a higher-concurrency story upstream, that test
will unexpectedly pass and the surprise will surface in the test logs.