Skip to content

Add unpdf-markdown text-extraction backend#20

Draft
bosd wants to merge 1 commit into
py-pdf:mainfrom
bosd:add-unpdf-markdown
Draft

Add unpdf-markdown text-extraction backend#20
bosd wants to merge 1 commit into
py-pdf:mainfrom
bosd:add-unpdf-markdown

Conversation

@bosd

@bosd bosd commented May 23, 2026

Copy link
Copy Markdown

unpdf-markdown 0.6.4 (MIT) is a Rust-backed extractor imported as unpdf. Adds an unpdf_markdown_get_text adapter (using to_text for plain-text ground-truth comparison), the Library entry, and the requirement.

to_text can raise RuntimeError on PDFs it parses but cannot extract text from; that is caught so a single document does not abort the whole suite.

unpdf-markdown 0.6.4 (MIT) is a Rust-backed extractor imported as unpdf.
Adds an unpdf_markdown_get_text adapter (using to_text for plain-text
ground-truth comparison), the Library entry, and the requirement.

to_text can raise RuntimeError on PDFs it parses but cannot extract text
from; that is caught so a single document does not abort the whole suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant