sharedcache: implement minimal version of metadata persistence by joshimhoff · Pull Request #2577 · cockroachdb/pebble

joshimhoff · 2023-05-29T16:17:50Z

Part of #2542.

sharedcache: implement minimal version of metadata persistence

This commit implements an approach to persisting metadata thought of by Radu.
The metadata being persisted is the mapping from <filenum, logical block
offset> to the index of the cache block at which the data is located on disk.
The basic approach is to write changes to the metadata, plus metadata pointed
at by a ptr based on a sequence number, to a circular log that can be replayed
at server start time.

As of this commit, metadata persistence is neither performant nor safe in the
presence of crashes or certain errors happening at file write time. We need to
also implement the working set idea thought of by Radu, but this commit is
large enough without that.

cockroach-teamcity · 2023-05-29T16:17:56Z

This change is

This commit implements an approach to persisting metadata thought of by Radu. The metadata being persisted is the mapping from <filenum, logical block offset> to the index of the cache block at which the data is located on disk. The basic approach is to write changes to the metadata, plus metadata pointed at by a ptr based on a sequence number, to a circular log that can be replayed at server start time. As of this commit, metadata persistence is neither performant nor safe in the presence of crashes or certain errors happening at file write time. We need to also implement the working set idea thought of by Radu, but this commit is large enough without that.

RaduBerinde

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @joshimhoff)

objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):

	file              vfs.File
	numDataBlocks     int64
	numMetadataBlocks int64

The cache block size shouldn't make a difference around how we structure the metadata.

objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):

// TODO(josh): We should make this type more space-efficient later on.
type metadataLogEntryBatch struct {

I think our basic structure should be something that is 4KB in size (4KB is typically the native storage block size) and can store as many records that fit in that size. We don't want to write less than 4KB at a time because that will end up being a read-modify-write under the covers.

The current batch can keep accumulating entries and whenever we need to flush we rewrite the entire 4KB every time; when the batch gets full we move to the next metadata block and start a new batch.

objstorage/objstorageprovider/sharedcache/shared_cache.go line 285 at r1 (raw file):

	}

	// Compute the number of cache blocks used for metadata and for data.

I think it would make sense to separate the metadata and data in different files. It will make things less error-prone.

objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):

		logPtr := unsafe.Pointer(&logAsBytes[0])
		log = unsafe.Slice((*metadataLogEntryBatch)(logPtr), s.numDataBlocks)

This unsafe stuff won't work across different architectures

objstorage/objstorageprovider/sharedcache/shared_cache.go line 592 at r1 (raw file):

	}

	// TODO(josh): Keep seq from overflowing, by moding by something, or similar.

Can make it uint64 and not worry about overflow (especially if we'll have one per 4KB block).

joshimhoff

Thank you for the feedback!

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick, @joshimhoff, and @RaduBerinde)

objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):