Skip to content

Methodology

The benchmark methodology mirrors BullMQ’s upstream bullmq-bench so the numbers are directly comparable. This page lays out the host, the scenarios, the harness flags, and the BullMQ reference run.

Version
Rust1.85+ (2024 edition)
ChasquiMQ engine1.0 (post-#79 main HEAD)
ChasquiMQ bench harnesschasquimq-bench (in-repo)
BullMQ5.76.4 (latest at measurement time)
BullMQ bench harnessbullmq-bench (upstream, not vendored)
Bun1.2.2 (used to run BullMQ bench)
Redis8.6.2 (Docker redis:8.6, default config, single-instance, no AOF / no RDB)
HostApple M3, 8 logical cores, 8 GB RAM, macOS 15 (Darwin 24.6.0)

Redis runs in Docker on the same machine as the bench client. The bench process and Redis share CPU. BullMQ’s published methodology uses a beefier client box than Redis box specifically to avoid this. Numbers in this section are upper-bounded by single-host contention and should not be compared to BullMQ’s published cross-host blog numbers. They are valid as our internal A/B because we run ChasquiMQ on the same host with the same Redis.

All four mirror bullmq-suite.ts from bullmq-bench:

ScenarioWarmupBench jobsBulk sizePayloadWorker concurrency
queue-add1,0001,000— (single add)10×10 nested UUIDsn/a
queue-add-bulk1,00010,000501×1 tinyn/a
worker-generic1,0001,000n/a1×1 tinydefault (1)
worker-concurrent1,00010,000n/a1×1 tiny100

queue-add-bulk and worker-concurrent are the two scenarios that matter for the headline throughput claim. queue-add and worker-generic are latency-bound (single in-flight op) — useful for direction-only comparison, not for a 3× claim.

Terminal window
cargo run -p chasquimq-bench --release -- \
--repeats 5 --scale 5 --discard-slowest 1 \
--scenario queue-add,queue-add-bulk,worker-generic,worker-concurrent
  • --repeats 5 — run each scenario 5 times.
  • --scale 5 — multiply the harness defaults’ job count by 5 (so queue-add-bulk writes 50,000 jobs per run, not 10,000).
  • --discard-slowest 1 — drop the slowest run before computing the mean. Reduces tail influence.

The harness reports mean, p50, p95, p99, stddev, CPU% (× core), and jobs/CPU-sec — see The 1.0 numbers for the full distribution table.

Terminal window
cd ~/Projects/experiments/bullmq-bench
BULLMQ_BENCH_REDIS_HOST=127.0.0.1 bun src/index.ts

The upstream harness runs each scenario once per invocation. We run it once per side-by-side measurement. The harness pre-loads warmupJobsNum + benchmarkJobsNum jobs, starts the stopwatch when the warmup count is hit, stops at the total, and reports 1000 * jobsTotal / time_ms as jobs/sec.

Note: the upstream bullmq-bench repo’s package.json says "bullmq": "latest" but the lockfile pinned an older 4.x. Run bun add bullmq@latest after cloning if you re-baseline against current BullMQ.

Both runs use:

  • Loopback Redis (no network).
  • No enableAutoPipelining on the BullMQ side (it hurts the worker scenarios by 38% on loopback — see Performance trade-offs).
  • No persistence (default in-memory Redis).
  • The same M3 host, ideally with the same load avg.
  • The bench process pinned to the same cores as Redis (i.e., not pinned at all — the M3 OS scheduler places both freely).

worker-concurrent is the most contention-sensitive scenario in the suite. The engine ceiling reproduces only on a quiet host (load avg < 1). On a contended Mac (browser, other agents, multiple Docker containers), the ceiling drops by 73% even though the engine code didn’t change.

The honest pattern: report quiet-host and contended-host numbers side by side. The quiet-host run is the engine ceiling — the marketing-defensible number. The contended-host run is what users will see when they reproduce on a busy laptop. Calibrating against your own environment matters.

The host-load gate formalizes this: a “host contention” explanation for a worker-concurrent regression is valid only when the engine code didn’t change between runs.

We tested pinning Redis to cores 0–3 with docker run --cpuset-cpus="0-3" ... to see if separating Redis from the bench client would push throughput up. It hurt by 10–15% on the throughput-bound scenarios. The OS scheduler still places Bun on cores 0–3 (the cpuset constrains the Redis container, not other processes), so the result was sharing 4 cores between bench client and Redis instead of 8 — strictly worse.

To genuinely separate bench client from Redis on this host, we’d need to also pin Bun (e.g., taskset -c 4-7). On macOS that’s clumsy. The unpinned baseline is the realistic single-host configuration.

Two open caveats:

  • Latency p99. The harness reports per-scenario distribution stats but doesn’t aggregate dispatch-to-ack p99. The handler_duration_us instrumentation exists; the bench wrapper hasn’t aggregated it yet.
  • Worker CPU% vs. BullMQ. ChasquiMQ’s bench measures its own CPU%. bullmq-bench does not. To claim “≥50% less worker CPU” defensibly, we’d need to instrument BullMQ’s bench process with top -pid or a CPU-aware wrapper. Not done end-to-end.

The 1.0 ship deliberately doesn’t claim a CPU% delta against BullMQ — only ChasquiMQ’s own CPU% (jobs/CPU-sec) is reported.

For the actual numbers: The 1.0 numbers. For the host-load reasoning: Regressions and floors.