fix(opensearch): set pool_maxsize so the shared client keeps a real connection pool#15682
fix(opensearch): set pool_maxsize so the shared client keeps a real connection pool#15682ef-rintaro wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughOpenSearch client initialization is updated to add ChangesOpenSearch Connection Pool Configuration
🎯 2 (Simple) | ⏱️ ~5 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a36f64a to
50a7311
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rag/utils/opensearch_conn.py (1)
76-85: Confirmpool_maxsizesupport; keep 32 but justify/make configurable
pool_maxsizeis a supportedOpenSearch()constructor parameter in opensearch-py (it feeds the underlying HTTP/urllib3 connection pool sizing); default is typically around 10 when unset.- Repo search shows only this
OpenSearch(instantiation inrag/utils/opensearch_conn.py, so the change is localized.- Since
32is materially above the typical default, ensure the PR rationale (and/or load/concurrency assumptions) is sufficient, or consider making the value configurable rather than hard-coded.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rag/utils/opensearch_conn.py` around lines 76 - 85, The OpenSearch constructor is being passed pool_maxsize=32 (in the OpenSearch(...) call) which is supported but higher than typical defaults; change this to a configurable value instead of a hard-coded 32 by reading a config or environment variable (e.g., OPENSEARCH_POOL_MAXSIZE) with a sensible default of 32, validate/coerce it to an int, and pass that variable into the OpenSearch(..., pool_maxsize=...) parameter; update any docstring or comment in rag/utils/opensearch_conn.py to justify the default and note that it can be tuned for load/concurrency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rag/utils/opensearch_conn.py`:
- Around line 76-85: The OpenSearch constructor is being passed pool_maxsize=32
(in the OpenSearch(...) call) which is supported but higher than typical
defaults; change this to a configurable value instead of a hard-coded 32 by
reading a config or environment variable (e.g., OPENSEARCH_POOL_MAXSIZE) with a
sensible default of 32, validate/coerce it to an int, and pass that variable
into the OpenSearch(..., pool_maxsize=...) parameter; update any docstring or
comment in rag/utils/opensearch_conn.py to justify the default and note that it
can be tuned for load/concurrency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 35a17e44-1b7c-4218-baf8-c76f93c4b9a3
📒 Files selected for processing (1)
rag/utils/opensearch_conn.py
What problem does this PR solve?
On the OpenSearch backend,
OSConnection(a process-wide@singleton) builds itsclient without
pool_maxsize. In opensearch-py 2.7.1 the underlying urllib3HTTPConnectionPoolthen falls back tomaxsize=1(
opensearchpy/connection/http_urllib3.py:maxsizeis only set whenpool_maxsizeis a truthy int). Because the one client is shared by everyconcurrent consumer -- sync Quart views run in a thread pool, the task executor's
asyncio.gatherfan-out, and thecluster.health()probe -- requests serializeon a single HTTP connection and urllib3 logs:
Each discard forces a fresh TLS handshake, degrading throughput and latency.
The Elasticsearch backend does not hit this: elastic-transport defaults to
connections_per_node=10(elastic_transport/_models.py), so its shared clientalready keeps a real pool. This is purely an OpenSearch-vs-Elasticsearch default
asymmetry.
Fix
Pass
pool_maxsizeto the OpenSearch client so the shared singleton keeps a realconnection pool, matching the Elasticsearch backend. Constructor-only change; no
behavior change for single-threaded callers.
Type of change
Affected backends
OpenSearch only.