Housekeeping

At high volume or over time, Bugsink can fill up disk space. Although it’s designed to minimize manual cleanup, larger or long-running installations may require occasional housekeeping to keep the system running smoothly. This page outlines the key areas to focus on.

Note: since Bugsink is a relatively new project, a lot of the housekeeping features are still evolving and somewhat scattered across different parts of the system. This page serves as a starting point for understanding the current state of housekeeping in Bugsink, but expect more unified and comprehensive documentation in the future.

In this page, we will cover:

  • Where to look: what kinds of things can fill up; and how can you detect whether that’s needed
  • What to do: how to clean up the things that can fill up, which commands are available, and how to automate them

We’ll focus on Bugsink itself; but in the final section we’ll briefly touch on database-level cleanup, since the database is typically the thing that fills up first and is the most opaque.

General setup tips

If you’re considering how to run cleanup tasks, you’re likely operating at the scale where external event storage is worth considering.

Bugsink’s default setup stores full event payloads in the database. That’s great for simplicity, but has downsides at scale:

  • Disk space ties to DB size
  • Migrations get slower and riskier
  • Backups and restores become unwieldy
  • It’s more opaque: seeing what is taking up space is harder

Bugsink supports storing event payloads as flat files (or other backends) while keeping the metadata in the DB. It’s a good idea in any high-throughput setup, not just for archival. For details, see moving event data out of the database

Event eviction and retention

Bugsink automatically deletes events over time to manage disk usage. This is based on a smart retention algorithm that tries to keep the most relevant events while discarding older or less useful ones.

The default number of events to keep per-project is 10,000; this setting can be adjusted in the project settings UI. Retention is applied automatically during normal operation (as part of the digest process), so you don’t need to run a separate cleanup job for this.

When an event is deleted, Bugsink also removes most related data like tags and metadata, although tag values may become orphaned (see below).

For more background, see Rate Limits and Retention.

Locations on disk

These are the main places where Bugsink stores data that can grow over time.

  • Database: Primary storage for all Bugsink-related data, including issues, tags, etc. In the default setup this includes the event payload verbatim; if you’re using the event storage feature, the event payloads are stored in a separate location, but the metadata is still in the database.

  • Ingestion store: events during ingestion; this location is short-lived in principle, but if you’re running a high-volume Bugsink, it can fill up (in proportion to the backlog of events to process), and it may also fill up if the ingestion worker is not running, misconfigured or crashing. Configured as INGEST_STORE_BASE_DIR, by default: /tmp/bugsink/ingestion.

File event storage: if you’re using the file event storage feature, this is where the event payloads are stored. This is configured in bugsink_conf.py via BUGSINK["EVENT_STORAGES"], or in Docker via FILE_EVENT_STORAGE_PATH and FILE_EVENT_STORAGE_USE_FOR_WRITE.

Identifying what needs cleanup

You can get a rough overview of the contents of the database using the /counts/ page in the UI (as a superuser; simply visit https://YOURBUGSINK/counts/). On this page you can see counts of various objects in the system, such as issues, events, tags, and more.

This can help you get as sense of how much data you have stored in your database, and can serve as a starting point for identifying what might need cleanup.

For example, if you see a very high number of tags or events compared to issues, it may indicate that there are orphaned rows that need cleanup. Alternatively: if you see relatively low numbers across the board, but your database keeps growing, your problem may be at the level of the database itself, and you may need to look into database-level cleanup.

Per-table row-counts
Screenshot of the counts page in the Bugsink UI, showing counts of various objects in the system.

Bugsink’s built-in cleanup features

Use these tools to detect and resolve housekeeping issues:

  • bugsink-manage vacuum_tags Removes unused TagKey and TagValue entries left behind when Events or Issues are deleted. (checking for unused tagvalues is not done at each event-eviction / issue-deletion for efficiency reasons). run periodically to keep the tag tables from growing indefinitely.

  • bugsink-manage cleanup_eventstorage <storage> Removes stored event payloads that no longer have a matching Event in the database. In theory, this happens during Event deletion, but in practice it may not always be reliable because the event-storage is disconnected from the database by design.

  • bugsink-manage make_consistent [--dry-run] Deletes dangling objects (Events, Issues, etc) and updates counters. “In theory” Bugsink should do this itself; but may be needed if the database was modified directly and perhaps in the case of crashes (though those would be a bug in Bugsink, so feel free to report them on GitHub)

It might make sense to set up a periodic job (e.g. daily or weekly) to run vacuum_tags and cleanup_eventstorage, while make_consistent is typically run on demand.

Deleting projects and issues may make sense too, though you should be aware that deleting issues is often not what you want.

DB-level cleanup

All relational databases have trade-offs around how they handle deletions and disk usage. Most don’t immediately reclaim space when rows are deleted; they mark the space as reusable instead. Over time, especially with frequent insert/delete cycles, this can cause database files to grow unexpectedly.

If you suspect this is happening, you might need to run a vacuum or similar command to reclaim space. To do this, you can refer to your database’s documentation for the specific command to run. Search for terms like

  • VACUUM <your database>
  • table bloat <your database>
  • optimize table <your database>

or consult the docs for your specific backend.

Future improvements

Bugsink is a relatively new project, and many of the housekeeping features are still evolving. The goal is to eventually have more automated and comprehensive cleanup processes that require less manual intervention.

At the time of writing, some issues are “work in progress” on GitHub; you can check that progress here:

Conclusion

Bugsink is built to clean up after itself where possible, deleting related data when issues or events are removed. Still, over time, and especially at higher volumes, some manual housekeeping may be needed to free up disk space or clarify what’s actually being stored.

The tools and options described above help you stay ahead of that, until more of this becomes automatic.