Aggregation¶
aggregate runs MongoDB pipeline stages against the collection. Stages are
applied in order; each stage gets the documents emitted by the previous one.
Pipeline stages¶
Stage |
Notes |
|---|---|
|
Same matcher as |
|
Single-doc result with the named field |
|
Standard semantics |
|
Multi-field BSON cross-type sort; uses index acceleration for single-field sorts where possible |
|
Inclusion / exclusion / computed fields. |
|
Add / overwrite fields (computed via expressions) |
|
Remove paths (string or array) |
|
Path string or doc form ( |
|
Numeric ranges and fixed-duration date units ( |
|
|
|
Replace the root with a sub-document |
|
See accumulators below |
|
Both simple ( |
|
|
|
Equivalent to |
|
Run multiple sub-pipelines in parallel |
|
|
|
|
|
Replace target collection with pipeline output |
|
Count + size metrics; capped bounds ( |
|
Session and operation enumeration |
|
Spherical ( |
|
Recursive lookup with |
|
Inline document source (5.1+) |
|
Pipeline-form change-stream entry point |
$group accumulators¶
$sum, $count, $avg, $min, $max, $first, $last, $push,
$addToSet. Group buckets are emitted in first-seen order (matches
unsharded MongoDB; sharded behaviour isn’t modelled).
Expression operators¶
The $expr operator inside $match, plus computed fields in $project /
$addFields, runs through a single expression evaluator
(secantus.expressions.evaluate).
Field paths and variables¶
"$x.y"— path into the current doc."$$ROOT"/"$$CURRENT"— current doc."$$varname"— user variable (set via$let,$lookup’slet, or$merge’slet)."$$ROOT.field.path"/"$$varname.field.path"— walk a dotted path into a resolved variable (useful inside$merge’swhenMatchedsub-pipeline for$$new.field).{$literal: ...}— bypass field-path / operator interpretation."$$REMOVE"— sentinel that removes the field when used as a computed-field value in$project/$addFields/$setField.
Arithmetic and comparison¶
$add, $subtract, $multiply, $divide, $mod, $abs, $ceil,
$floor, $exp, $ln, $log, $log10, $pow, $sqrt, $round,
$trunc. Comparison: $eq, $ne, $gt, $gte, $lt, $lte, $cmp.
Logical and conditional¶
$and, $or, $not, $cond (dict or array form), $ifNull, $switch,
$let.
Strings¶
$concat, $split, $trim, $ltrim, $rtrim, $substrCP, $strLenCP,
$indexOfCP, $toLower, $toUpper, $toString, $regexMatch,
$regexFind, $regexFindAll.
Arrays¶
$size, $arrayElemAt, $first, $last, $slice, $concatArrays,
$reverseArray, $in, $filter, $map, $reduce, $range, $zip,
$arrayToObject, $objectToArray.
Documents¶
$mergeObjects, $objectToArray, $arrayToObject, $setField,
$getField, $unsetField.
Dates¶
$year, $month, $dayOfMonth, $dayOfWeek, $hour, $minute,
$second, $millisecond, $dateToString, $dateFromString.
$dateToString and $dateFromString accept a timezone argument:
IANA names:
"Europe/Dublin","America/New_York"(viazoneinfo).UTC offsets:
"+05:30","-04:00","+0530"(fixed-offset tzinfo).Aliases:
"UTC","GMT","Etc/UTC","Etc/GMT".
$dateToString: input datetime is treated as UTC if naive (matching BSON
Date semantics) and converted to the requested zone before formatting.
$dateFromString: when the parsed string has no zone info, the requested
timezone becomes the input’s tzinfo, so the returned datetime represents
the correct UTC instant.
Unknown timezone names raise an error (no silent misinterpretation).
Type checks and conversions¶
$type, $toInt, $toLong, $toDouble, $toBool, $toDecimal,
$toString, $toObjectId, $toDate. The $type operator returns the
BSON type alias for a value; the int32-vs-int64 distinction depends on
Python value range rather than the original BSON tag (which we throw away
on decode).
What’s not supported¶
$where/$function/$accumulator— all three evaluate user-supplied JavaScript and would need an embedded JS engine, sandbox, and BSON↔JS shim layer. SecantusDB doesn’t ship a JS runtime; not on the roadmap (seetasks/backlog.md§4 for the rejectedlang: "python"alternative and the trusted-plugin escape hatch).mapReduce— same JS-runtime dependency; also explicitly deprecated by MongoDB (removed from the Stable API in 5.0). The canonicalemit(this.<field>, 1)+values.length“count by field” pattern is recognised and translated to an equivalent$groupaggregation; non-canonical bodies return{results: [], ok: 1}so wire-shape probes pass.$densifywithmonth/quarter/yearunits — rejected; would needrelativedelta-style arithmetic and a new dependency.Text search (
$text,$meta: "textScore") — would need a full-text index implementation.
Pipeline tips¶
Put
$matchfirst so it can be lifted into the initial fetch’s filter and benefit from index acceleration.$lookupjoins are O(N+M) in memory; use$matchbefore$lookupto shrink the outer side.$sortfollowed by$limitis NOT yet a single optimised stage — sort runs the full collection then limit truncates. For test workloads this is fine; for large simulated datasets, prefer to sort by an indexed field so the sort itself is index-walked.