Skip to content

fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (#15687)#15701

Open
Rene0422 wants to merge 3 commits into
infiniflow:mainfrom
Rene0422:fix/ort-cuda12-probe-15687
Open

fix(deepdoc): silence ORT CUDA EP load failure on CUDA-13 hosts (#15687)#15701
Rene0422 wants to merge 3 commits into
infiniflow:mainfrom
Rene0422:fix/ort-cuda12-probe-15687

Conversation

@Rene0422
Copy link
Copy Markdown
Contributor

@Rene0422 Rene0422 commented Jun 5, 2026

Summary

Fixes #15687.

After bumping onnxruntime-gpu from 1.19.21.23.2 (commit f128a1fa9, shipped in v0.24.0 and v0.25.x), users running RAGFlow on hosts with CUDA 13 see the following errors on every OCR model load:

Failed to load library libonnxruntime_providers_cuda.so with error:
libcublasLt.so.12: cannot open shared object file.
Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*

Root cause

onnxruntime-gpu==1.23.2 (the only pre-built x86_64 Linux wheel on PyPI) is built against the CUDA 12 ABI and dlopens libcublasLt.so.12 / libcudnn.so.9 at provider-registration time. The Docker image (ubuntu:24.04 base) bundles no CUDA user-mode libs; it relies on nvidia-container-toolkit to inject them from the host at /usr/lib/x86_64-linux-gnu/. On a CUDA-13 host the toolkit injects libcublasLt.so.13 / libcudnn.so.10, so the cu12 SONAMEs the ORT wheel needs are nowhere on LD_LIBRARY_PATH and provider registration fails.

cuda_is_available() in deepdoc/vision/ocr.py was deciding to ask for CUDAExecutionProvider based solely on torch.cuda.is_available(). Torch only needs libcuda.so.1 (the driver lib, backwards-compatible), so it's happy on a CUDA-13 host — but ORT then fails the actual CUDA EP load, prints two warnings per model, and silently falls back to CPU.

Fix

Before reporting CUDA as available, probe with ctypes.CDLL for the exact cu12 SONAMEs ORT will need. If either is missing, log one actionable warning and return False so the existing CPU code path is taken explicitly. GPU inference is unchanged when the cu12 libs are present (CUDA-12 host or future bundled wheels).

This is a targeted, dependency-free fix:

  • No bump to onnxruntime-gpu (no cu13 stable wheel on default PyPI yet).
  • No bundling of nvidia-*-cu12 wheels (would add ~1 GB to the image).
  • No change to the LD_LIBRARY_PATH setup in entrypoint.sh.
  • ORT is only used in this one file; no other call sites need the same probe.

Users who want GPU inference on a CUDA-13 host now get a clear single-line hint and can either install the cu12 user-mode libs in the container or switch to a CUDA-12 host. Users on CUDA-12 hosts see no change. Users with no GPU at all see one cleaner warning instead of two ORT errors per model.

Test plan

  • CUDA-12 host with GPU: confirm OCR still selects CUDAExecutionProvider (look for load_model ... uses GPU log lines).
  • CUDA-13 host with GPU (the issue's reproducer): confirm OCR now selects CPUExecutionProvider (look for load_model ... uses CPU) and the libcublasLt.so.12 not found warning appears once per worker instead of the two ORT errors per model.
  • CPU-only host: confirm OCR still selects CPUExecutionProvider with no extra warnings (the probe runs only when torch.cuda.is_available() is true, so CPU hosts are unaffected).

Files changed

  • deepdoc/vision/ocr.pycuda_is_available() now probes for libcublasLt.so.12 / libcudnn.so.9 after the torch check.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b893a77-c003-4ffe-bc4e-15b3d180ebd8

📥 Commits

Reviewing files that changed from the base of the PR and between 4718f4d and f711a90.

📒 Files selected for processing (1)
  • deepdoc/vision/ocr.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepdoc/vision/ocr.py

📝 Walkthrough

Walkthrough

Adds a runtime CUDA/cuDNN probe to load_model (disabling CUDA when required SONAMEs are missing), adds __del__ finalizers to TextRecognizer and TextDetector, extracts polygon clipping into TextDetector.clip_det_res, and adds many docstrings across OCR helper methods.

Changes

OCR runtime, cleanup, and clipping updates

Layer / File(s) Summary
CUDA availability probe and load_model doc
deepdoc/vision/ocr.py
Expanded load_model doc and new probe that checks torch.cuda for the requested device_id and attempts to ctypes.CDLL load libcublasLt.so.12 and libcudnn.so.9; missing libraries log a warning and force CPU provider selection.
TextRecognizer docs, helpers, and finalizer
deepdoc/vision/ocr.py
Docstrings added for TextRecognizer constructor and many resize_norm_* / helper functions; TextRecognizer.__del__ added to call close().
TextDetector clip helper, docs, and finalizer
deepdoc/vision/ocr.py
Polygon coordinate clamping extracted into TextDetector.clip_det_res; filter_tag_det_res now calls the helper. Added order_points_clockwise docstring, TextDetector.__del__ finalizer, and updated close/call docstrings.
OCR high-level API docstrings
deepdoc/vision/ocr.py
Docstrings added for OCR.detect, OCR.recognize, OCR.recognize_batch, and OCR.__call__ describing return formats and confidence/drop behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐰 I sniff the shared libs under moonlight glow,
With ctypes paws I give them a gentle go,
If libcublasLt or cuDNN hide away,
I warn, then hop to CPU without delay,
Cleanups tidy, polygons clipped — off I go.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically describes the main fix: silencing ORT CUDA EP load failure on CUDA-13 hosts, which is exactly what the PR addresses.
Description check ✅ Passed The description provides comprehensive detail on the problem, root cause, fix strategy, and test plan, with all required template sections completed (problem statement and type of change clearly indicated).
Linked Issues check ✅ Passed The PR directly addresses #15687 by implementing the ctypes.CDLL probe for cu12 SONAMEs to detect missing CUDA libraries and log a single warning before falling back to CPU, matching all stated objectives.
Out of Scope Changes check ✅ Passed All changes are scoped to the cuda_is_available() function and related docstring improvements in ocr.py; the del finalizers are minor cleanup additions that do not deviate from the stated fix objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Rene0422 Rene0422 force-pushed the fix/ort-cuda12-probe-15687 branch from 974c598 to 212cd13 Compare June 8, 2026 11:16
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jun 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deepdoc/vision/ocr.py (2)

86-114: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The CUDA fallback warning is still emitted once per model load, not once per worker.

cuda_is_available() is recreated on each load_model() miss, and det.onnx / rec.onnx use different cache keys. On a CUDA-13 host this will log the same warning twice in one worker, which misses the PR goal of a single actionable warning per worker. Cache the probe result/warning at module scope or per device_id.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepdoc/vision/ocr.py` around lines 86 - 114, The CUDA probe in
cuda_is_available() is run every time load_model() misses (and det.onnx/rec.onnx
use different cache keys), causing duplicate warnings; change it to cache the
boolean probe result and warning state at module scope keyed by device_id (or a
single module-level sentinel if device_id is unused) so cuda_is_available()
returns the cached value and emits the warning only the first time per
worker/device_id; update any callers (e.g., load_model(), det.onnx, rec.onnx
paths) to call cuda_is_available() unchanged but rely on the cached result.

390-447: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

close() / __del__() still do not release the cached ONNX sessions.

Deleting self.predictor only drops the instance reference. load_model() keeps the same (sess, run_options) tuple alive in module-global loaded_models, so these new finalizers will not actually reclaim the session or GPU memory until process exit. If cleanup is part of this change, the cache needs ref-counting or explicit eviction too.

Also applies to: 540-579

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepdoc/vision/ocr.py`:
- Around line 753-754: The docstring for OCR.__call__ claims it returns
(filtered_boxes, [(text, score), ...], time_dict) but the implementation returns
a single list via list(zip(...)); update the docstring to describe the actual
return value (a list of (box, (text, score)) or whatever the list(zip(...))
contains) or alternatively change the implementation to return the documented
tuple (filtered_boxes, texts_scores_list, time_dict). Locate OCR.__call__ and
either edit its docstring to match the output of list(zip(...)) or modify the
return statement to assemble and return (filtered_boxes, list(zip(...)),
time_dict) so the API and docs are consistent.

---

Outside diff comments:
In `@deepdoc/vision/ocr.py`:
- Around line 86-114: The CUDA probe in cuda_is_available() is run every time
load_model() misses (and det.onnx/rec.onnx use different cache keys), causing
duplicate warnings; change it to cache the boolean probe result and warning
state at module scope keyed by device_id (or a single module-level sentinel if
device_id is unused) so cuda_is_available() returns the cached value and emits
the warning only the first time per worker/device_id; update any callers (e.g.,
load_model(), det.onnx, rec.onnx paths) to call cuda_is_available() unchanged
but rely on the cached result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7573b17b-c26c-4ccf-8842-5aa6ad910aed

📥 Commits

Reviewing files that changed from the base of the PR and between 212cd13 and 4718f4d.

📒 Files selected for processing (1)
  • deepdoc/vision/ocr.py

Comment thread deepdoc/vision/ocr.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question]: [BUG] ONNX Runtime fails to load CUDA libraries after upgrade from v0.23.1 to v0.24.0/v0.25.6 (CUDA 13.0 environment)

1 participant