fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (#15687) by Rene0422 · Pull Request #15701 · infiniflow/ragflow

Rene0422 · 2026-06-05T08:20:44Z

Summary

After bumping onnxruntime-gpu from 1.19.2 → 1.23.2 (commit f128a1fa9, shipped in v0.24.0 and v0.25.x), users running RAGFlow on hosts with CUDA 13 see the following errors on every OCR model load:

Failed to load library libonnxruntime_providers_cuda.so with error:
libcublasLt.so.12: cannot open shared object file.
Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*

Root cause

onnxruntime-gpu==1.23.2 (the only pre-built x86_64 Linux wheel on PyPI) is built against the CUDA 12 ABI and dlopens libcublasLt.so.12 / libcudnn.so.9 at provider-registration time. The Docker image (ubuntu:24.04 base) bundles no CUDA user-mode libs; it relies on nvidia-container-toolkit to inject them from the host at /usr/lib/x86_64-linux-gnu/. On a CUDA-13 host the toolkit injects libcublasLt.so.13 / libcudnn.so.10, so the cu12 SONAMEs the ORT wheel needs are nowhere on LD_LIBRARY_PATH and provider registration fails.

cuda_is_available() in deepdoc/vision/ocr.py was deciding to ask for CUDAExecutionProvider based solely on torch.cuda.is_available(). Torch only needs libcuda.so.1 (the driver lib, backwards-compatible), so it's happy on a CUDA-13 host — but ORT then fails the actual CUDA EP load, prints two warnings per model, and silently falls back to CPU.

Fix

Before reporting CUDA as available, probe with ctypes.CDLL for the exact cu12 SONAMEs ORT will need. If either is missing, log one actionable warning and return False so the existing CPU code path is taken explicitly. GPU inference is unchanged when the cu12 libs are present (CUDA-12 host or future bundled wheels).

This is a targeted, dependency-free fix:

No bump to onnxruntime-gpu (no cu13 stable wheel on default PyPI yet).
No bundling of nvidia-*-cu12 wheels (would add ~1 GB to the image).
No change to the LD_LIBRARY_PATH setup in entrypoint.sh.
ORT is only used in this one file; no other call sites need the same probe.

Users who want GPU inference on a CUDA-13 host now get a clear single-line hint and can either install the cu12 user-mode libs in the container or switch to a CUDA-12 host. Users on CUDA-12 hosts see no change. Users with no GPU at all see one cleaner warning instead of two ORT errors per model.

Test plan

CUDA-12 host with GPU: confirm OCR still selects CUDAExecutionProvider (look for load_model ... uses GPU log lines).
CUDA-13 host with GPU (the issue's reproducer): confirm OCR now selects CPUExecutionProvider (look for load_model ... uses CPU) and the libcublasLt.so.12 not found warning appears once per worker instead of the two ORT errors per model.
CPU-only host: confirm OCR still selects CPUExecutionProvider with no extra warnings (the probe runs only when torch.cuda.is_available() is true, so CPU hosts are unaffected).

Files changed

deepdoc/vision/ocr.py — cuda_is_available() now probes for libcublasLt.so.12 / libcudnn.so.9 after the torch check.

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

coderabbitai · 2026-06-05T08:21:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b893a77-c003-4ffe-bc4e-15b3d180ebd8

📥 Commits

Reviewing files that changed from the base of the PR and between 4718f4d and f711a90.

📒 Files selected for processing (1)

deepdoc/vision/ocr.py

🚧 Files skipped from review as they are similar to previous changes (1)

deepdoc/vision/ocr.py

📝 Walkthrough

Walkthrough

Adds a runtime CUDA/cuDNN probe to load_model (disabling CUDA when required SONAMEs are missing), adds __del__ finalizers to TextRecognizer and TextDetector, extracts polygon clipping into TextDetector.clip_det_res, and adds many docstrings across OCR helper methods.

Changes

OCR runtime, cleanup, and clipping updates

Layer / File(s)	Summary
CUDA availability probe and load_model doc `deepdoc/vision/ocr.py`	Expanded `load_model` doc and new probe that checks `torch.cuda` for the requested `device_id` and attempts to `ctypes.CDLL` load `libcublasLt.so.12` and `libcudnn.so.9`; missing libraries log a warning and force CPU provider selection.
TextRecognizer docs, helpers, and finalizer `deepdoc/vision/ocr.py`	Docstrings added for `TextRecognizer` constructor and many `resize_norm_*` / helper functions; `TextRecognizer.__del__` added to call `close()`.
TextDetector clip helper, docs, and finalizer `deepdoc/vision/ocr.py`	Polygon coordinate clamping extracted into `TextDetector.clip_det_res`; `filter_tag_det_res` now calls the helper. Added `order_points_clockwise` docstring, `TextDetector.__del__` finalizer, and updated close/call docstrings.
OCR high-level API docstrings `deepdoc/vision/ocr.py`	Docstrings added for `OCR.detect`, `OCR.recognize`, `OCR.recognize_batch`, and `OCR.__call__` describing return formats and confidence/drop behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 I sniff the shared libs under moonlight glow,
With ctypes paws I give them a gentle go,
If libcublasLt or cuDNN hide away,
I warn, then hop to CPU without delay,
Cleanups tidy, polygons clipped — off I go.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and specifically describes the main fix: silencing ORT CUDA EP load failure on CUDA-13 hosts, which is exactly what the PR addresses.
Description check	✅ Passed	The description provides comprehensive detail on the problem, root cause, fix strategy, and test plan, with all required template sections completed (problem statement and type of change clearly indicated).
Linked Issues check	✅ Passed	The PR directly addresses `#15687` by implementing the ctypes.CDLL probe for cu12 SONAMEs to detect missing CUDA libraries and log a single warning before falling back to CPU, matching all stated objectives.
Out of Scope Changes check	✅ Passed	All changes are scoped to the cuda_is_available() function and related docstring improvements in ocr.py; the del finalizers are minor cleanup additions that do not deviate from the stated fix objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…niflow#15687)

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

deepdoc/vision/ocr.py (2)
86-114: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The CUDA fallback warning is still emitted once per model load, not once per worker.

cuda_is_available() is recreated on each load_model() miss, and det.onnx / rec.onnx use different cache keys. On a CUDA-13 host this will log the same warning twice in one worker, which misses the PR goal of a single actionable warning per worker. Cache the probe result/warning at module scope or per device_id.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepdoc/vision/ocr.py` around lines 86 - 114, The CUDA probe in
cuda_is_available() is run every time load_model() misses (and det.onnx/rec.onnx
use different cache keys), causing duplicate warnings; change it to cache the
boolean probe result and warning state at module scope keyed by device_id (or a
single module-level sentinel if device_id is unused) so cuda_is_available()
returns the cached value and emits the warning only the first time per
worker/device_id; update any callers (e.g., load_model(), det.onnx, rec.onnx
paths) to call cuda_is_available() unchanged but rely on the cached result.
390-447: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

close() / __del__() still do not release the cached ONNX sessions.

Deleting self.predictor only drops the instance reference. load_model() keeps the same (sess, run_options) tuple alive in module-global loaded_models, so these new finalizers will not actually reclaim the session or GPU memory until process exit. If cleanup is part of this change, the cache needs ref-counting or explicit eviction too.

Also applies to: 540-579

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepdoc/vision/ocr.py`:
- Around line 753-754: The docstring for OCR.__call__ claims it returns
(filtered_boxes, [(text, score), ...], time_dict) but the implementation returns
a single list via list(zip(...)); update the docstring to describe the actual
return value (a list of (box, (text, score)) or whatever the list(zip(...))
contains) or alternatively change the implementation to return the documented
tuple (filtered_boxes, texts_scores_list, time_dict). Locate OCR.__call__ and
either edit its docstring to match the output of list(zip(...)) or modify the
return statement to assemble and return (filtered_boxes, list(zip(...)),
time_dict) so the API and docs are consistent.

---

Outside diff comments:
In `@deepdoc/vision/ocr.py`:
- Around line 86-114: The CUDA probe in cuda_is_available() is run every time
load_model() misses (and det.onnx/rec.onnx use different cache keys), causing
duplicate warnings; change it to cache the boolean probe result and warning
state at module scope keyed by device_id (or a single module-level sentinel if
device_id is unused) so cuda_is_available() returns the cached value and emits
the warning only the first time per worker/device_id; update any callers (e.g.,
load_model(), det.onnx, rec.onnx paths) to call cuda_is_available() unchanged
but rely on the cached result.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7573b17b-c26c-4ccf-8842-5aa6ad910aed

📥 Commits

Reviewing files that changed from the base of the PR and between 212cd13 and 4718f4d.

📒 Files selected for processing (1)

deepdoc/vision/ocr.py

dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jun 5, 2026

fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (infi…

212cd13

…niflow#15687)

Rene0422 force-pushed the fix/ort-cuda12-probe-15687 branch from 974c598 to 212cd13 Compare June 8, 2026 11:16

docs(ocr): add one-line docstrings throughout deepdoc/vision/ocr.py

4718f4d

dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread deepdoc/vision/ocr.py Outdated

docs(ocr): correct OCR.__call__ docstring to match actual return shape

f711a90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (#15687)#15701

fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (#15687)#15701
Rene0422 wants to merge 3 commits into
infiniflow:mainfrom
Rene0422:fix/ort-cuda12-probe-15687

Rene0422 commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rene0422 commented Jun 5, 2026

Summary

Root cause

Fix

Test plan

Files changed

Type of change

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading