Apache-2.0 · pure-Rust · import fitz

A pure-Rust reimplementation of PyMuPDF.

pdfspine is a drop-in-shaped, permissively-licensed (Apache-2.0) alternative to AGPL PyMuPDF — pure Rust, no C blob, with PyO3 Python bindings and an opt-in import fitz shim. License cleanliness is CI-enforced via cargo-deny, not a promise.

Get started View on GitHub

Why pdfspine

Permissive licensing and pure-Rust safety, without giving up the PyMuPDF API.

Apache-2.0, license-clean

Permissive throughout; cargo-deny gates the shipped graph to exclude GPL / AGPL / LGPL / MPL / SSPL. CI-enforced, not a promise.

Pure Rust, no C blob

Self-contained wheels with no system zlib / C linkage and no bundled prebuilt engine — the differentiator vs pdfium wrappers.

import fitz compatible

An opt-in shim lets much existing PyMuPDF code run unmodified — collision-safe alongside a real PyMuPDF in the same environment.

Memory-safe by construction

#![forbid(unsafe_code)] in every first-party crate except the single audited PyO3 FFI chokepoint.

Text & tables

get_text in text / words / blocks / dict / rawdict / json / html / xhtml / xml; find_tables with merged-cell detection → markdown / html.

Edit & save

Full and byte-exact incremental save, garbage collection, page ops, insert_pdf merge / split, metadata / TOC, and encryption write.

Render

get_pixmap rasterizes vector + text + image + shadings via tiny-skia — near-parity SSIM with fitz and ~1.74× faster.

OCR

A pluggable engine: a Tesseract adapter and a pure-Rust PaddleOCR engine (PP-OCRv4, embedded models) that beats fitz on CJK scans.

Status

Alpha / pre-1.0 — but the core is feature-complete.

About 88.7% of the PyMuPDF 1.24 public API is implemented and tested. Not yet on PyPI — build from source for now.

Read

Parse, malformed-PDF repair, and decrypt (RC4 / AES-128 / AES-256, R2–R6).

Text & tables

get_text formats, search_for, and find_tables with merged-cell detection.

Edit & save

Byte-exact incremental save, page ops, insert_pdf merge / split, encryption.

Render & OCR

get_pixmap rasterizer plus Tesseract and a pure-Rust PaddleOCR engine.