Skip to content

Configure rate limits on VirtualMCPServer PR B 2#5522

Open
Sanskarzz wants to merge 1 commit into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP3
Open

Configure rate limits on VirtualMCPServer PR B 2#5522
Sanskarzz wants to merge 1 commit into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP3

Conversation

@Sanskarzz

Copy link
Copy Markdown
Contributor

Summary

This PR adds optimizer-aware tool-name resolution to vMCP rate limiting.

PR #5276 wired VirtualMCPServer.spec.config.rateLimiting into the vMCP runtime for normal tools/call requests, where the parsed MCP resource ID is already the backend tool name. When the vMCP optimizer is enabled, clients call the optimizer meta-tool call_tool, and the real backend tool name is carried inside arguments["tool_name"]. Without this follow-up, per-tool rate limits are evaluated against call_tool instead of the backend tool the optimizer is routing to.

Fixes #4552

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

This PR does not change the CRD schema or v1beta1 API surface.

Changes

File Change
pkg/vmcp/cli/serve.go Passes the existing optimizer passThroughTools map into the vMCP rate-limit factory.
pkg/vmcp/ratelimit/factory/middleware.go Adds vMCP-local optimizer call_tool inner-tool resolution before invoking the shared rate-limit middleware.
pkg/vmcp/ratelimit/factory/middleware_test.go Adds optimizer-specific unit coverage for inner-tool resolution, fallback paths, and middleware bucket behavior.
test/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.go Adds focused E2E coverage for optimizer call_tool rate limiting by inner backend tool name.

Does this introduce a user-facing change?

Yes. When vMCP optimizer is enabled, per-tool rate limits now apply to the real backend tool name passed through call_tool, instead of applying to the optimizer meta-tool name.

Implementation plan

Approved implementation plan
  1. Reuse the existing optimizer passThroughTools map computed in pkg/vmcp/cli/serve.go.
  2. Pass that map into pkg/vmcp/ratelimit/factory.NewMiddleware.
  3. In the vMCP rate-limit factory, detect optimizer pass-through tools/call requests.
  4. If the request is call_tool and arguments["tool_name"] is a non-empty string, shallow-copy the parsed MCP request and replace only ResourceID with the inner backend tool name.
  5. Invoke the existing shared pkg/ratelimit middleware with that temporary parsed request so rate-limit buckets use the backend tool name.
  6. Restore the original request before downstream vMCP handling so optimizer routing still receives the original call_tool request.
  7. Add focused unit tests for extraction, fallback behavior, and Redis-backed bucket behavior.
  8. Add E2E coverage for an optimizer-enabled VMCP where the second call_tool invocation for the same inner tool is rate-limited.

Special notes for reviewers

  • This PR intentionally keeps optimizer knowledge out of pkg/ratelimit.
  • The shared pkg/ratelimit.NewMiddleware remains the owner of Redis setup, limiter construction, fail-open behavior, identity extraction, and the JSON-RPC 429 response.
  • The only vMCP-specific behavior added here is resolving optimizer call_tool to the inner backend tool name before the shared rate-limit middleware checks the bucket.
  • The parsed request override is scoped only to the rate-limit middleware call; downstream vMCP handlers continue to see the original optimizer call_tool request.

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions Bot added the size/M Medium PR: 300-599 lines changed label Jun 14, 2026
@Sanskarzz Sanskarzz changed the title Add ratelimiting support for vmcp optimizer Configure rate limits on VirtualMCPServer PR B 2 Jun 14, 2026
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 14, 2026
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.29412% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.72%. Comparing base (c346ed7) to head (57c3c4e).

Files with missing lines Patch % Lines
pkg/vmcp/cli/serve.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5522      +/-   ##
==========================================
+ Coverage   69.70%   69.72%   +0.02%     
==========================================
  Files         645      645              
  Lines       65598    65627      +29     
==========================================
+ Hits        45724    45758      +34     
+ Misses      16530    16523       -7     
- Partials     3344     3346       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jerm-dro

jerm-dro commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@Sanskarzz — for context, this is about Trey's in-flight vMCP interface refactor (epic #5419, RFC THV-0076), which splits vMCP into a domain core (the VMCP interface, core.New(cfg)) behind a thin transport layer (server.Serve). Phase 2 (#5431) re-homes the optimizer and the middleware chain onto that split. With that in mind —

Let's park this for now. In the post-refactor vMCP, per-tool rate limiting doesn't need to be HTTP middleware at all — it fits more naturally as a VMCP decorator at the CallTool seam. (It isn't in Trey's refactor scope simply because it hadn't landed when he started — not an oversight.)

The reason it fits: the optimizer is becoming the outermost VMCP layer, resolving call_tool → the real backend tool. Anything that needs to key off the real tool name therefore sits below the optimizer — authorization already does (it's the core admission seam, #5438), and rate limiting has the exact same dependency. So the flow becomes optimizer → {authz, rate-limit} → core: by the time CallTool reaches the limiter the tool name is already resolved, the bucket keys correctly, and the whole mechanism this PR adds to pkg/vmcp/ratelimit/factory (parsed-request rewrite, context juggling) disappears.

Roughly (signatures illustrative — align with the post-#5431 VMCP interface):

// Composition root (cli/serve.go, post-refactor):
var v core.VMCP = core.New(coreCfg) // authz admission seam lives here (#5438) — below the optimizer
v = ratelimit.WrapVMCP(v, limiter)  // keys the per-tool bucket on the resolved tool name
v = optimizer.WrapVMCP(v, ...)      // outermost: resolves call_tool -> backend tool first
srv, _ := server.Serve(ctx, v, serverCfg)
// pkg/vmcp/ratelimit — a VMCP decorator instead of HTTP middleware:
type rateLimitedVMCP struct {
    core.VMCP // embed: every other method passes through unchanged
    limiter   Limiter
}

func (v *rateLimitedVMCP) CallTool(
    ctx context.Context, id *auth.Identity, tool string, args map[string]any, meta vmcp.Meta,
) (*vmcp.CallToolResult, error) {
    // `tool` is already the backend tool name — the optimizer decorator wraps
    // this one and resolved call_tool -> tool before delegating here.
    if err := v.limiter.Allow(ctx, id, tool); err != nil {
        return nil, err // typed rate-limit error; transport maps it to the JSON-RPC 429
    }
    return v.VMCP.CallTool(ctx, id, tool, args, meta)
}

The one bit of prep this needs: pkg/ratelimit currently bundles the bucket decision with HTTP concerns (identity extraction, writing the 429). The decorator only wants the decision — so we'd expose limiter.Allow(ctx, id, tool) error as a standalone call and leave the transport-shaped pieces (identity at the edge, 429 mapping) where they are. No behavior change, just splitting "decide" from "HTTP-wrap."

If you'd like to take this on as the VMCP-wrapper version, that'd be very welcome — otherwise we can fold it in once the refactor lands. Either way I'd rather not merge the middleware-layer change now and then unwind it.

CC @tgrunnagle for awareness — no action needed from you.

@Sanskarzz

Copy link
Copy Markdown
Contributor Author

@jerm-dro Thanks, that makes sense. Once #5431 lands, I’m happy to rework this as a VMCP decorator and split the rate-limit decision from the HTTP transport response mapping as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure rate limits on VirtualMCPServer

2 participants