Skip to content

fix(opensearch): set pool_maxsize so the shared client keeps a real connection pool#15682

Open
ef-rintaro wants to merge 1 commit into
infiniflow:mainfrom
ef-rintaro:fix/opensearch-pool-maxsize
Open

fix(opensearch): set pool_maxsize so the shared client keeps a real connection pool#15682
ef-rintaro wants to merge 1 commit into
infiniflow:mainfrom
ef-rintaro:fix/opensearch-pool-maxsize

Conversation

@ef-rintaro
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

On the OpenSearch backend, OSConnection (a process-wide @singleton) builds its
client without pool_maxsize. In opensearch-py 2.7.1 the underlying urllib3
HTTPConnectionPool then falls back to maxsize=1
(opensearchpy/connection/http_urllib3.py: maxsize is only set when
pool_maxsize is a truthy int). Because the one client is shared by every
concurrent consumer -- sync Quart views run in a thread pool, the task executor's
asyncio.gather fan-out, and the cluster.health() probe -- requests serialize
on a single HTTP connection and urllib3 logs:

Connection pool is full, discarding connection: <endpoint>. Connection pool size: 1

Each discard forces a fresh TLS handshake, degrading throughput and latency.

The Elasticsearch backend does not hit this: elastic-transport defaults to
connections_per_node=10 (elastic_transport/_models.py), so its shared client
already keeps a real pool. This is purely an OpenSearch-vs-Elasticsearch default
asymmetry.

Fix

Pass pool_maxsize to the OpenSearch client so the shared singleton keeps a real
connection pool, matching the Elasticsearch backend. Constructor-only change; no
behavior change for single-threaded callers.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Affected backends

OpenSearch only.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 897baec0-1b40-44af-bd41-80bbf79b66e4

📥 Commits

Reviewing files that changed from the base of the PR and between a36f64a and 50a7311.

📒 Files selected for processing (1)
  • rag/utils/opensearch_conn.py

📝 Walkthrough

Walkthrough

OpenSearch client initialization is updated to add pool_maxsize=10 to the OpenSearch(...) constructor while keeping timeout=600, documenting urllib3's default effective maxsize=1 and reducing pool contention for the shared singleton client.

Changes

OpenSearch Connection Pool Configuration

Layer / File(s) Summary
OpenSearch connection pool sizing
rag/utils/opensearch_conn.py
OpenSearch client initialization adds pool_maxsize=10 to expand the HTTP connection pool for concurrent thread access, preserving the existing timeout=600 setting.

🎯 2 (Simple) | ⏱️ ~5 minutes

"I'm a rabbit by the server stream,
I widened pools so threads may dream,
Ten lanes now run, no single-file queue,
OpenSearch hums — concurrency anew! 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: setting pool_maxsize for the OpenSearch shared client to maintain a real connection pool.
Description check ✅ Passed The PR description comprehensively covers the problem, root cause, fix, and change type; all required template sections are present and well-documented.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ef-rintaro ef-rintaro force-pushed the fix/opensearch-pool-maxsize branch from a36f64a to 50a7311 Compare June 5, 2026 01:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
rag/utils/opensearch_conn.py (1)

76-85: Confirm pool_maxsize support; keep 32 but justify/make configurable

  • pool_maxsize is a supported OpenSearch() constructor parameter in opensearch-py (it feeds the underlying HTTP/urllib3 connection pool sizing); default is typically around 10 when unset.
  • Repo search shows only this OpenSearch( instantiation in rag/utils/opensearch_conn.py, so the change is localized.
  • Since 32 is materially above the typical default, ensure the PR rationale (and/or load/concurrency assumptions) is sufficient, or consider making the value configurable rather than hard-coded.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rag/utils/opensearch_conn.py` around lines 76 - 85, The OpenSearch
constructor is being passed pool_maxsize=32 (in the OpenSearch(...) call) which
is supported but higher than typical defaults; change this to a configurable
value instead of a hard-coded 32 by reading a config or environment variable
(e.g., OPENSEARCH_POOL_MAXSIZE) with a sensible default of 32, validate/coerce
it to an int, and pass that variable into the OpenSearch(..., pool_maxsize=...)
parameter; update any docstring or comment in rag/utils/opensearch_conn.py to
justify the default and note that it can be tuned for load/concurrency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rag/utils/opensearch_conn.py`:
- Around line 76-85: The OpenSearch constructor is being passed pool_maxsize=32
(in the OpenSearch(...) call) which is supported but higher than typical
defaults; change this to a configurable value instead of a hard-coded 32 by
reading a config or environment variable (e.g., OPENSEARCH_POOL_MAXSIZE) with a
sensible default of 32, validate/coerce it to an int, and pass that variable
into the OpenSearch(..., pool_maxsize=...) parameter; update any docstring or
comment in rag/utils/opensearch_conn.py to justify the default and note that it
can be tuned for load/concurrency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35a17e44-1b7c-4218-baf8-c76f93c4b9a3

📥 Commits

Reviewing files that changed from the base of the PR and between 794c1f4 and a36f64a.

📒 Files selected for processing (1)
  • rag/utils/opensearch_conn.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant