Skip to content

sharedcache: implement minimal version of metadata persistence#2577

Draft
joshimhoff wants to merge 1 commit into
cockroachdb:masterfrom
joshimhoff:persist_the_metadata_two
Draft

sharedcache: implement minimal version of metadata persistence#2577
joshimhoff wants to merge 1 commit into
cockroachdb:masterfrom
joshimhoff:persist_the_metadata_two

Conversation

@joshimhoff

@joshimhoff joshimhoff commented May 29, 2023

Copy link
Copy Markdown
Contributor

Part of #2542.

sharedcache: implement minimal version of metadata persistence

This commit implements an approach to persisting metadata thought of by Radu.
The metadata being persisted is the mapping from <filenum, logical block
offset> to the index of the cache block at which the data is located on disk.
The basic approach is to write changes to the metadata, plus metadata pointed
at by a ptr based on a sequence number, to a circular log that can be replayed
at server start time.

As of this commit, metadata persistence is neither performant nor safe in the
presence of crashes or certain errors happening at file write time. We need to
also implement the working set idea thought of by Radu, but this commit is
large enough without that.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@joshimhoff joshimhoff force-pushed the persist_the_metadata_two branch from 96b8b88 to 79b4f34 Compare May 29, 2023 16:21
@joshimhoff joshimhoff marked this pull request as ready for review May 29, 2023 16:23
@joshimhoff joshimhoff requested review from a team, RaduBerinde and bananabrick and removed request for RaduBerinde May 29, 2023 16:23
@joshimhoff joshimhoff force-pushed the persist_the_metadata_two branch 3 times, most recently from ad88620 to 8bf3ec6 Compare May 29, 2023 16:41
This commit implements an approach to persisting metadata thought of by Radu.
The metadata being persisted is the mapping from <filenum, logical block
offset> to the index of the cache block at which the data is located on disk.
The basic approach is to write changes to the metadata, plus metadata pointed
at by a ptr based on a sequence number, to a circular log that can be replayed
at server start time.

As of this commit, metadata persistence is neither performant nor safe in the
presence of crashes or certain errors happening at file write time. We need to
also implement the working set idea thought of by Radu, but this commit is
large enough without that.
@joshimhoff joshimhoff force-pushed the persist_the_metadata_two branch from 8bf3ec6 to ed318e1 Compare May 29, 2023 17:12

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @joshimhoff)


objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):

	file              vfs.File
	numDataBlocks     int64
	numMetadataBlocks int64

The cache block size shouldn't make a difference around how we structure the metadata.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):

// TODO(josh): We should make this type more space-efficient later on.
type metadataLogEntryBatch struct {

I think our basic structure should be something that is 4KB in size (4KB is typically the native storage block size) and can store as many records that fit in that size. We don't want to write less than 4KB at a time because that will end up being a read-modify-write under the covers.

The current batch can keep accumulating entries and whenever we need to flush we rewrite the entire 4KB every time; when the batch gets full we move to the next metadata block and start a new batch.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 285 at r1 (raw file):

	}

	// Compute the number of cache blocks used for metadata and for data.

I think it would make sense to separate the metadata and data in different files. It will make things less error-prone.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):

		logPtr := unsafe.Pointer(&logAsBytes[0])
		log = unsafe.Slice((*metadataLogEntryBatch)(logPtr), s.numDataBlocks)

This unsafe stuff won't work across different architectures


objstorage/objstorageprovider/sharedcache/shared_cache.go line 592 at r1 (raw file):

	}

	// TODO(josh): Keep seq from overflowing, by moding by something, or similar.

Can make it uint64 and not worry about overflow (especially if we'll have one per 4KB block).

@joshimhoff joshimhoff requested a review from RaduBerinde May 31, 2023 19:20

@joshimhoff joshimhoff left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback!

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick, @joshimhoff, and @RaduBerinde)


objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):

Previously, RaduBerinde wrote…

The cache block size shouldn't make a difference around how we structure the metadata.

Are you saying to not size the metadata section in increments of cache block size? I can fix that.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):

Previously, RaduBerinde wrote…

I think our basic structure should be something that is 4KB in size (4KB is typically the native storage block size) and can store as many records that fit in that size. We don't want to write less than 4KB at a time because that will end up being a read-modify-write under the covers.

The current batch can keep accumulating entries and whenever we need to flush we rewrite the entire 4KB every time; when the batch gets full we move to the next metadata block and start a new batch.

Makes sense to me. Do you want the batching to be done in this PR?


objstorage/objstorageprovider/sharedcache/shared_cache.go line 285 at r1 (raw file):

Previously, RaduBerinde wrote…

I think it would make sense to separate the metadata and data in different files. It will make things less error-prone.

Good idea. Will do.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):

Previously, RaduBerinde wrote…

This unsafe stuff won't work across different architectures

Agreed. I wrote a TODO about this.

I don't have a lot of experience here, except using json.Marshall, which is inappropriate! Do you have a suggestion for how to serialize the metadata log?

I can also see fixing this in a later PR. Whatever you think!


objstorage/objstorageprovider/sharedcache/shared_cache.go line 592 at r1 (raw file):

Previously, RaduBerinde wrote…

Can make it uint64 and not worry about overflow (especially if we'll have one per 4KB block).

Will do. Thanks!

@joshimhoff joshimhoff left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @RaduBerinde)


objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Makes sense to me. Do you want the batching to be done in this PR?

I was sort of thinking it would make sense to implement batching together with the working set idea, but I guess the two are independent, so happy to do batching now.

@RaduBerinde RaduBerinde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @bananabrick and @joshimhoff)


objstorage/objstorageprovider/sharedcache/shared_cache.go line 219 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Are you saying to not size the metadata section in increments of cache block size? I can fix that.

Right.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 246 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

I was sort of thinking it would make sense to implement batching together with the working set idea, but I guess the two are independent, so happy to do batching now.

It's up to you.


objstorage/objstorageprovider/sharedcache/shared_cache.go line 319 at r1 (raw file):

Previously, joshimhoff (Josh Imhoff) wrote…

Agreed. I wrote a TODO about this.

I don't have a lot of experience here, except using json.Marshall, which is inappropriate! Do you have a suggestion for how to serialize the metadata log?

I can also see fixing this in a later PR. Whatever you think!

There's an encoding package which has things like LittleEndian.PutUint32 https://pkg.go.dev/encoding/binary@go1.20.4#ByteOrder

@joshimhoff

joshimhoff commented Jun 7, 2023

Copy link
Copy Markdown
Contributor Author

Since our initial focus is performant secondary cache without metadata persistence, I will move this to draft.

@joshimhoff joshimhoff marked this pull request as draft June 7, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants