Stress Testing Bugsink
If you’re expecting more than a few events per second (say, 5/s or half a million per day), it’s worth knowing how far your setup will stretch before it starts dropping, lagging, or breaking down. Bugsink comes with a simple stress testing tool to help you get a rough idea.
This article explains how to use it, what tradeoffs to keep in mind, and how to get a sense of whether you’re ingesting events faster than you can digest them.
These are my first notes on stress testing Bugsink. It’s not a polished guide, just something I wanted to share and refer back to.
Why stress test?
Stress testing can help answer simple but important questions:
- Can my setup keep up with production traffic?
- Where do things slow down, or fall over?
- Can I tune anything to make it better?
The answers depend on your data, your hardware, and your database – so it’s better to measure than to guess.
General setup for high-throughput
If you’re stress testing, you’re likely operating at the scale where external event storage is worth considering.
Bugsink’s default setup stores full event payloads in the database. That’s great for simplicity, but has downsides at scale:
- Disk space ties to DB size
- Migrations get slower and riskier
- Backups and restores become unwieldy
Bugsink supports storing event payloads as flat files (or other backends) while keeping the metadata in the DB. It’s a good idea in any high-throughput setup, not just for archival. For details, see Moving Event Data Out of the Database.
Remove Rate-Limits & Set Quota
To get meaningful results from a stress test, you’ll need to remove any rate limits and ensure the event quota is high enough.
Start by adjusting the per-project rate limits in your configuration:
MAX_EVENTS_PER_PROJECT_PER_HOUR
MAX_EVENTS_PER_PROJECT_PER_5_MINUTES
Set these high enough that your test traffic won’t get 429’d. After changing the values and restarting the server, give it about 5 minutes to register the new settings. Bugsink doesn’t apply the change instantly; it needs to observe a full interval under the new configuration before accepting more events.
Next, in the Bugsink UI, check the project’s retention settings. Specifically:
project.retention_max_event_count
Set this low enough (or send enough events) to trigger evictions during your test. Evictions take time, and you want to measure the system under real-world conditions. A high number might give an unrealistically smooth test result by deferring eviction pressure until after the run.
Set this value high enough that your database has enough data to become realistically slow (various operations take longer with more data). You could say: just set it to a similar number as you expect to see in production, because that’s the number of events you want to keep around.
Sample data
The stress test tool needs a sample event to send. The best way to get one is to download one of your own real events, preferably one that’s slightly larger than average. You can do this from the Bugsink UI, where it says “Download”.
Alternatively, you can pick a sample event from the Bugsink sample event repository. This is a good option if you want to stick to a specific size (e.g. matching what another user has reportedly used, or a specific size for your test).
Running the stress test
The stress test is a command-line tool that sends a lot of events to Bugsink, and measures how long it takes to process them.
The stress test tool is included with the default Bugsink install, but that’s not always where you want to run it. In many cases, you’ll want to generate load from a different machine—or running it locally may be awkward, especially if you’re using Docker.
To install it independently (assuming you have Python, pip, and a virtualenv set up):
pip install bugsink
The command itself is a subcommand of the bugsink-util
tool called stress_test
, and can be run like this:
bugsink-manage stress_test [options] sample.json
Important options:
--dsn DSN
: where to send the data--threads
: how many threads to run--requests
: how many requests per thread--compress
: choose fromgzip
,deflate
,br
; leave blank to disable compression
The total number of events is the product of the two requests × threads
An example:
bugsink-manage stress_test sample.json \
--dsn=https://a2f..1a2@yourhost.com/1 \
--requests 100 --threads 10 --compress=br
You can also loop this command to keep sending data until you stop it:
while true; do bugsink-manage stress_test ...; done
Optional modifiers
There are several flags to vary or randomize the data:
--fresh-timestamp
: set timestamp to now--fresh-trace
: generate new span/trace IDs--random-type
: randomize the exception type--tag foo:bar
: override or add tags (useRANDOM
as value to generate)
These are useful to simulate a more realistic stream of varied input, or to bypass duplicate-detection.
Tradeoffs and caveats
Some things to keep in mind:
-
Data is preloaded. The command loads all requests into memory before sending. Large payloads or large request counts may lead to OOM crashes (of your test-script, not Bugsink).
-
Single-machine pressure. If you run the command on the same machine as the server under test, you’re stressing CPU, disk, and I/O all on the same box.
-
Network pressure. The test is limited by the network speed between the client and server. If you’re running on a different machine, make sure it’s not a bottleneck.
-
Synthetic traffic. Because the data is based on a single sample, it may not be representative of your real traffic. The
random-type
,RANDOM
tag-value andfresh-trace
options help to generate more realistic traffic.
If you want to tweak behavior, the code is not hard to follow – see the stress_test
command in the Bugsink repo on GitHub.
Measuring digestion
Just because your server accepts 100 events per second doesn’t mean it can process them. Bugsink separates ingestion from digestion (in the recommended configuration) and digestion speed is often the real bottleneck.
Details on how to measure digestion speed are in the Snappea Stats article; the short version is:
bugsink-manage showstat snappea-stats --task-name=digest
(Give it a few minutes to actually digest some events before running the command.)
Please share your results
Performance varies a lot based on hardware, database engine, and configuration. If you run these tests, please consider:
- Sharing your results (publicly if you can)
- Trying different setups (Postgres vs. MySQL, different thread counts, compression settings)
- Reporting bottlenecks or surprising results
You’ll help yourself – and the rest of us – get a better sense of how Bugsink behaves under load.