sharedcache: implement minimal version of metadata persistence#2577
sharedcache: implement minimal version of metadata persistence#2577joshimhoff wants to merge 1 commit into
Conversation
96b8b88 to
79b4f34
Compare
ad88620 to
8bf3ec6
Compare
This commit implements an approach to persisting metadata thought of by Radu. The metadata being persisted is the mapping from <filenum, logical block offset> to the index of the cache block at which the data is located on disk. The basic approach is to write changes to the metadata, plus metadata pointed at by a ptr based on a sequence number, to a circular log that can be replayed at server start time. As of this commit, metadata persistence is neither performant nor safe in the presence of crashes or certain errors happening at file write time. We need to also implement the working set idea thought of by Radu, but this commit is large enough without that.
8bf3ec6 to
ed318e1
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @joshimhoff)
objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):
file vfs.File numDataBlocks int64 numMetadataBlocks int64
The cache block size shouldn't make a difference around how we structure the metadata.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):
// TODO(josh): We should make this type more space-efficient later on. type metadataLogEntryBatch struct {
I think our basic structure should be something that is 4KB in size (4KB is typically the native storage block size) and can store as many records that fit in that size. We don't want to write less than 4KB at a time because that will end up being a read-modify-write under the covers.
The current batch can keep accumulating entries and whenever we need to flush we rewrite the entire 4KB every time; when the batch gets full we move to the next metadata block and start a new batch.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 285 at r1 (raw file):
} // Compute the number of cache blocks used for metadata and for data.
I think it would make sense to separate the metadata and data in different files. It will make things less error-prone.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):
logPtr := unsafe.Pointer(&logAsBytes[0]) log = unsafe.Slice((*metadataLogEntryBatch)(logPtr), s.numDataBlocks)
This unsafe stuff won't work across different architectures
objstorage/objstorageprovider/sharedcache/shared_cache.go line 592 at r1 (raw file):
} // TODO(josh): Keep seq from overflowing, by moding by something, or similar.
Can make it uint64 and not worry about overflow (especially if we'll have one per 4KB block).
joshimhoff
left a comment
There was a problem hiding this comment.
Thank you for the feedback!
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick, @joshimhoff, and @RaduBerinde)
objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):
Previously, RaduBerinde wrote…
The cache block size shouldn't make a difference around how we structure the metadata.
Are you saying to not size the metadata section in increments of cache block size? I can fix that.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):
Previously, RaduBerinde wrote…
I think our basic structure should be something that is 4KB in size (4KB is typically the native storage block size) and can store as many records that fit in that size. We don't want to write less than 4KB at a time because that will end up being a read-modify-write under the covers.
The current batch can keep accumulating entries and whenever we need to flush we rewrite the entire 4KB every time; when the batch gets full we move to the next metadata block and start a new batch.
Makes sense to me. Do you want the batching to be done in this PR?
objstorage/objstorageprovider/sharedcache/shared_cache.go line 285 at r1 (raw file):
Previously, RaduBerinde wrote…
I think it would make sense to separate the metadata and data in different files. It will make things less error-prone.
Good idea. Will do.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):
Previously, RaduBerinde wrote…
This unsafe stuff won't work across different architectures
Agreed. I wrote a TODO about this.
I don't have a lot of experience here, except using json.Marshall, which is inappropriate! Do you have a suggestion for how to serialize the metadata log?
I can also see fixing this in a later PR. Whatever you think!
objstorage/objstorageprovider/sharedcache/shared_cache.go line 592 at r1 (raw file):
Previously, RaduBerinde wrote…
Can make it uint64 and not worry about overflow (especially if we'll have one per 4KB block).
Will do. Thanks!
joshimhoff
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @RaduBerinde)
objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Makes sense to me. Do you want the batching to be done in this PR?
I was sort of thinking it would make sense to implement batching together with the working set idea, but I guess the two are independent, so happy to do batching now.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @joshimhoff)
objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Are you saying to not size the metadata section in increments of cache block size? I can fix that.
Right.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
I was sort of thinking it would make sense to implement batching together with the working set idea, but I guess the two are independent, so happy to do batching now.
It's up to you.
objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):
Previously, joshimhoff (Josh Imhoff) wrote…
Agreed. I wrote a TODO about this.
I don't have a lot of experience here, except using
json.Marshall, which is inappropriate! Do you have a suggestion for how to serialize the metadata log?I can also see fixing this in a later PR. Whatever you think!
There's an encoding package which has things like LittleEndian.PutUint32 https://pkg.go.dev/encoding/binary@go1.20.4#ByteOrder
|
Since our initial focus is performant secondary cache without metadata persistence, I will move this to draft. |
Part of #2542.
sharedcache: implement minimal version of metadata persistence
This commit implements an approach to persisting metadata thought of by Radu.
The metadata being persisted is the mapping from <filenum, logical block
offset> to the index of the cache block at which the data is located on disk.
The basic approach is to write changes to the metadata, plus metadata pointed
at by a ptr based on a sequence number, to a circular log that can be replayed
at server start time.
As of this commit, metadata persistence is neither performant nor safe in the
presence of crashes or certain errors happening at file write time. We need to
also implement the working set idea thought of by Radu, but this commit is
large enough without that.