A pure-Rust reimplementation of PyMuPDF.
pdfspine is a drop-in-shaped, permissively-licensed (Apache-2.0) alternative to AGPL PyMuPDF — pure Rust, no C blob, with PyO3 Python bindings and an opt-in import fitz shim. License cleanliness is CI-enforced via cargo-deny, not a promise.
Permissive licensing and pure-Rust safety, without giving up the PyMuPDF API.
Apache-2.0, license-clean
Permissive throughout; cargo-deny gates the shipped graph to exclude GPL / AGPL / LGPL / MPL / SSPL. CI-enforced, not a promise.
Pure Rust, no C blob
Self-contained wheels with no system zlib / C linkage and no bundled prebuilt engine — the differentiator vs pdfium wrappers.
import fitz compatible
An opt-in shim lets much existing PyMuPDF code run unmodified — collision-safe alongside a real PyMuPDF in the same environment.
Memory-safe by construction
#![forbid(unsafe_code)] in every first-party crate except the single audited PyO3 FFI chokepoint.
Text & tables
get_text in text / words / blocks / dict / rawdict / json / html / xhtml / xml; find_tables with merged-cell detection → markdown / html.
Edit & save
Full and byte-exact incremental save, garbage collection, page ops, insert_pdf merge / split, metadata / TOC, and encryption write.
Render
get_pixmap rasterizes vector + text + image + shadings via tiny-skia — near-parity SSIM with fitz and ~1.74× faster.
OCR
A pluggable engine: a Tesseract adapter and a pure-Rust PaddleOCR engine (PP-OCRv4, embedded models) that beats fitz on CJK scans.
Alpha / pre-1.0 — but the core is feature-complete.
About 88.7% of the PyMuPDF 1.24 public API is implemented and tested. Not yet on PyPI — build from source for now.
Read
Parse, malformed-PDF repair, and decrypt (RC4 / AES-128 / AES-256, R2–R6).
Text & tables
get_text formats, search_for, and find_tables with merged-cell detection.
Edit & save
Byte-exact incremental save, page ops, insert_pdf merge / split, encryption.
Render & OCR
get_pixmap rasterizer plus Tesseract and a pure-Rust PaddleOCR engine.