Skip to content
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions packages/dds/cell/src/cell.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@
* Licensed under the MIT License.
*/

import { assert, unreachableCase } from "@fluidframework/core-utils/internal";
import {
DoublyLinkedList,
type ListNode,
assert,
unreachableCase,
} from "@fluidframework/core-utils/internal";
import type {
IChannelAttributes,
IFluidDataStoreRuntime,
Expand Down Expand Up @@ -57,6 +62,18 @@ interface ICellValue {
attribution?: AttributionKey;
}

/**
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: ICellPendingLocalOpMetadata is documented as "Kept private to this module," and its only producer is createLocalOpMetadata on SharedCell<T> (cell.ts:319-329), which threads a concrete T. The <T = any> default has no caller and forces the // eslint-disable-next-line @typescript-eslint/no-explicit-any directly above it. The pattern on SharedCell<T = any> / ISharedCell<T = any> / ICellLocalOpMetadata<T = any> exists because those are public-shaped types with external callers; this new private extension has none, and wes-carlson explicitly pushed back on any in this file during PR #10776.

Drop the = any default and the eslint suppression:

interface ICellPendingLocalOpMetadata<T> extends ICellLocalOpMetadata<T> {
	pendingNode: ListNode<number>;
}

* Internal extension of {@link ICellLocalOpMetadata} that carries a direct reference
* to the corresponding node in the pending message list. Holding the node enables
* O(1) removal from arbitrary positions in the pending list, which is required for
* future squash support. Kept private to this module so the public metadata interface
* does not leak `ListNode` (a runtime implementation detail).
*/
// eslint-disable-next-line @typescript-eslint/no-explicit-any
interface ICellPendingLocalOpMetadata<T = any> extends ICellLocalOpMetadata<T> {
pendingNode: ListNode<number>;
}

const snapshotFileName = "header";

/**
Expand Down Expand Up @@ -85,7 +102,7 @@ export class SharedCell<T = any>
*/
private messageIdObserved: number = -1;

private readonly pendingMessageIds: number[] = [];
private readonly pendingMessageIds = new DoublyLinkedList<number>();

private attribution: AttributionKey | undefined;

Expand Down Expand Up @@ -264,9 +281,7 @@ export class SharedCell<T = any>
0x00c /* "messageId is incorrect from from the local client's ACK" */,
);
assert(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: The messageId !== messageIdObserved watermark predicate is a sound proxy for "any pending writes?" today because the lifecycle is strictly push-on-submit / shift-on-ACK / pop-on-rollback — messageId advances exactly when pendingMessageIds grows, and the head moves in lockstep with ACKs.

This PR's stated motivation (per the description) is to enable future O(1) list.remove(node) for arbitrary-position drop in the squash follow-up. As soon as squash/arbitrary-removal lands and a mid-queue entry is dropped, messageId (the last-submitted id) keeps advancing while messageIdObserved (the last-ACKed id) lags, leaving the predicate true even when no actual pending op remains — which would suppress remote-op application indefinitely.

The cross-cutting invariant — that this guard must move to !this.pendingMessageIds.empty when arbitrary-position removal lands — is easy to miss from inside a future squash PR that's focused on reSubmitSquashed plumbing. Two options:

  1. Migrate the guard to !this.pendingMessageIds.empty in this PR. Behavioral no-op today, and you're already touching every other site that reads the queue.
  2. Add a one-line // TODO: at this predicate referencing the squash/arbitrary-removal follow-up, so the next maintainer can't silently miss the dependency.

Either is fine; the goal is just to make the coupling visible at the predicate site.

// eslint-disable-next-line @typescript-eslint/prefer-optional-chain -- TODO: ADO#58518 Code owners should verify if this code change is safe and make it if so or update this comment otherwise
this.pendingMessageIds !== undefined &&
this.pendingMessageIds[0] === cellOpMetadata.pendingMessageId,
this.pendingMessageIds.first?.data === cellOpMetadata.pendingMessageId,
0x471 /* Unexpected pending message received */,
);
this.pendingMessageIds.shift();
Expand Down Expand Up @@ -304,11 +319,14 @@ export class SharedCell<T = any>
private createLocalOpMetadata(
op: ICellOperation,
previousValue?: Serializable<T>,
): ICellLocalOpMetadata {
): ICellPendingLocalOpMetadata<T> {
const pendingMessageId = ++this.messageId;
this.pendingMessageIds.push(pendingMessageId);
const localMetadata: ICellLocalOpMetadata = {
// Use `last` so this remains correct if a future change appends multiple
// pending ids in a single push call (for the single-item case `first === last`).
const { last: pendingNode } = this.pendingMessageIds.push(pendingMessageId);
const localMetadata: ICellPendingLocalOpMetadata<T> = {
pendingMessageId,
pendingNode,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: PR #12273 wired applyStashedOp through createLocalOpMetadata, so stashed ops push onto pendingMessageIds and emit ICellPendingLocalOpMetadata carrying a pendingNode. The new Pending op bookkeeping describe covers direct cell.set(...) submits and LIFO rollback but doesn't exercise stash → rehydrate → ACK.

This is narrower than it might look: applyStashedOp isn't an independent producer — it delegates to set()/delete()submitCellMessage()createLocalOpMetadata(), the same path the new tests already cover, so the "different list instance" failure mode is ruled out by construction. The residual risk is a future regression in stash-rehydrate sequencing through processCore's ICellPendingLocalOpMetadata cast surfacing only in production.

One applyStashedOp round-trip test in the new Pending op bookkeeping describe (drive a stashed set/delete op, then ACK, assert no 0x471 and ordering preserved) closes that gap cheaply. If you prefer to defer to the squash follow-up that first reads pendingNode, note that in the PR description so the gap is intentional.

Is there an existing stash → rehydrate → ACK regression elsewhere in the DDS test corpus that exercises applyStashedOp end-to-end through the pending-id queue? If yes, this is fully addressed by reference.

previousValue,
};
return localMetadata;
Expand Down Expand Up @@ -352,8 +370,8 @@ export class SharedCell<T = any>
this.setCore(cellOpMetadata.previousValue as Serializable<T>);
}

const lastPendingMessageId = this.pendingMessageIds.pop();
if (lastPendingMessageId !== cellOpMetadata.pendingMessageId) {
const lastPendingNode = this.pendingMessageIds.pop();
if (lastPendingNode?.data !== cellOpMetadata.pendingMessageId) {
throw new Error("Rollback op does not match last pending");
}
} else {
Expand Down
66 changes: 66 additions & 0 deletions packages/dds/cell/src/test/cell.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import { strict as assert } from "node:assert";
import { type IGCTestProvider, runGCTests } from "@fluid-private/test-dds-utils";
import { AttachState } from "@fluidframework/container-definitions";
import {
type MockContainerRuntime,
MockContainerRuntimeFactory,
MockContainerRuntimeFactoryForReconnection,
type MockContainerRuntimeForReconnection,
Expand Down Expand Up @@ -488,6 +489,71 @@ describe("Cell", () => {
});
});

describe("Pending op bookkeeping", () => {
it("drains many pending sets via incremental ACKs without assert and preserves ordering", () => {
const containerRuntimeFactory = new MockContainerRuntimeFactory();
const cell1 = createConnectedCell("cell1", containerRuntimeFactory);
const cell2 = createConnectedCell("cell2", containerRuntimeFactory);

const values = ["v0", "v1", "v2", "v3", "v4"];
for (const v of values) {
cell1.set(v);
}

// Incrementally ACK each pending op one at a time.
// This exercises the per-ACK pendingMessageIds.shift() path and would assert-fail
// (0x471 "Unexpected pending message received") if the pending list order or the
// pendingNode bookkeeping were wrong.
for (const _ of values) {
containerRuntimeFactory.processSomeMessages(1);
// Local cell continues to reflect its latest local write throughout.
assert.equal(
cell1.get(),
values.at(-1),
"local cell should retain latest pending value while ACKs drain",
);
}

// After all ACKs, both cells must converge on the final value in order.
assert.equal(cell1.get(), values.at(-1), "cell1 final value");
assert.equal(cell2.get(), values.at(-1), "cell2 final value");
});

it("rolls back multiple pending sets in LIFO order against the expected pending id", () => {
const containerRuntimeFactory = new MockContainerRuntimeFactory({ flushMode: 1 });
const dataStoreRuntime = new MockFluidDataStoreRuntime();
const containerRuntime: MockContainerRuntime =
containerRuntimeFactory.createContainerRuntime(dataStoreRuntime);
const services = {
deltaConnection: dataStoreRuntime.createDeltaConnection(),
objectStorage: new MockStorage(),
};
const cell = new SharedCell("cell-rollback", dataStoreRuntime, CellFactory.Attributes);
cell.connect(services);

// Three pending sets; nothing has been flushed/sequenced yet.
cell.set("a");
cell.set("b");
cell.set("c");
assert.equal(cell.get(), "c", "latest local value should be visible before rollback");

// Rolls back in LIFO order; each rollback should match the last pending id (popped from the list).
// If pendingMessageIds was tracked incorrectly, rollback() would throw
// "Rollback op does not match last pending".
assert.doesNotThrow(
() => containerRuntime.rollback?.(),
"rollback should pop pending ids in LIFO order",
);

// After rollback, the cell value reverts to the pre-first-set value (undefined).
assert.equal(
cell.get(),
undefined,
"cell should be empty after rolling back all pending sets",
);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review: MockContainerRuntime.rollback is declared optional (public rollback?(): void in packages/runtime/test-runtime-utils/src/mocks.ts:449-454), so containerRuntime.rollback?.() short-circuits to undefined if the method is ever removed/renamed. assert.doesNotThrow then succeeds vacuously, and the test's whole purpose — exercising the new pop()?.data rollback path — is silently lost; the subsequent cell.get() === undefined check would fail for the wrong reason.

Guard the invocation so the test fails loudly if the mock loses the method.

Suggested change
);
assert(
typeof containerRuntime.rollback === "function",
"MockContainerRuntime.rollback must exist for this test",
);
assert.doesNotThrow(
() => containerRuntime.rollback!(),
"rollback should pop pending ids in LIFO order",
);

});
});

describe("Garbage Collection", () => {
class GCSharedCellProvider implements IGCTestProvider {
private subCellCount = 0;
Expand Down
Loading