Skip to content

storage: override Pebble's ExitFunc to route through fatal logging#166840

Draft
williamchoe3 wants to merge 1 commit into
cockroachdb:masterfrom
williamchoe3:pebble-exit-func-crdb
Draft

storage: override Pebble's ExitFunc to route through fatal logging#166840
williamchoe3 wants to merge 1 commit into
cockroachdb:masterfrom
williamchoe3:pebble-exit-func-crdb

Conversation

@williamchoe3

Copy link
Copy Markdown
Contributor

Summary

Set pebble.Options.ExitFunc so that Pebble invariant violation exits (cache assertions, memtable checks, sstable reader checks) route through CockroachDB's log.Fatal path instead of calling os.Exit(1) directly.

Without this, Pebble's 18 direct os.Exit(1) calls bypass all logging, crash reporting (Sentry), and marker file writing — the process just vanishes with exit code 1 and no indication of why.

Depends on: cockroachdb/pebble#5865 (adds Options.ExitFunc to Pebble)

Note

This PR will not compile until the Pebble dependency is updated to include cockroachdb/pebble#5865.

Set pebble.Options.ExitFunc so that Pebble invariant violation exits
(cache assertions, memtable checks, sstable reader checks) route
through CockroachDB's log.Fatal path instead of calling os.Exit(1)
directly. This gives visibility into why the process died via log
lines, Sentry crash reports, and fatal exit marker files.

Requires cockroachdb/pebble#5865.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@trunk-io

trunk-io Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@blathers-crl

blathers-crl Bot commented Mar 26, 2026

Copy link
Copy Markdown

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@williamchoe3

Copy link
Copy Markdown
Contributor Author

Alternative approach: skip log.Fatal, write marker file directly

The current implementation routes Pebble's os.Exit(1) calls through log.Fatal, which gives us log lines, Sentry reports, and marker files — but runs the full logging pipeline on potentially corrupted in-process memory (these exits guard invariant violations in Pebble's cache, memtable, etc.).

A lighter approach that minimizes the window of running with corrupted state:

cfg.opts.ExitFunc = func(code int) {
    msg := fmt.Sprintf("pebble: invariant violation (exit code %d)", code)
    // Write marker file directly — no mutex, no logging infrastructure
    for _, spec := range stores {
        auxDir := filepath.Join(spec.Path, "auxiliary")
        _ = os.WriteFile(filepath.Join(auxDir, "_FATAL_EXIT.txt"), []byte(msg), 0644)
    }
    // Exit immediately — don't go through log.Fatal
    os.Exit(code)
}

Tradeoffs:

log.Fatal approach (current) Direct os.Exit approach (alt)
Marker file for roachtest
Log line in cockroach logs
Sentry crash report
_CRITICAL_ALERT.txt ❌ (unless added) ❌ (unless added)
Risk of running on corrupted state Higher (mutex acquisition, network call, disk flushes) Minimal (one os.WriteFile then exit)

The direct approach is safer for invariant violations where in-process memory may be corrupted (e.g. cache linked list pointers, reference count underflow). The log.Fatal approach gives better observability but relies on the logging infrastructure's mutexes being independent from the corrupted Pebble state — which is likely but not guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants