pdfspine
Guide

Migrating from PyMuPDF

An honest account of pdfspine's PyMuPDF compatibility — the opt-in fitz shim, coverage at a glance, how gaps behave, the feature mapping, and what differs or is out of scope.

pdfspine is designed so that existing PyMuPDF code can run unmodified for the supported subset. This page is the honest, no-marketing account of what works, what differs, and what isn't there yet.

import fitz — one opt-in step

The compatibility shim maps PyMuPDF's exact names onto pdfspine. It is opt-in so a default install never collides with a real PyMuPDF in the same environment. Either import the shim under its submodule name (no global-name collision):

import pdfspine.fitz as fitz      # the shim, always available

doc = fitz.open("input.pdf")
page = doc[0]
text = page.get_text()
pix = page.get_pixmap(dpi=150)
pix.save("out.png")
doc.save("out.pdf")

…or, to keep an unmodified import fitz working, opt in once at startup:

import pdfspine
pdfspine.install_fitz_shim()      # registers global `fitz` / `pymupdf`
import fitz                       # now resolves to the pdfspine shim

install_fitz_shim() is idempotent and uses setdefault, so it never clobbers a real PyMuPDF you already imported. import pymupdf (and from pdfspine import pymupdf) is supported the same way. For new code, prefer the native package:

import pdfspine
doc = pdfspine.open("input.pdf")

Both expose the identical open, Document, Page, Pixmap, DisplayList, TextPage, Annot, Widget, Shape, Table, and geometry classes.

Coverage at a glance

The baseline is PyMuPDF 1.24.x. The machine-readable COMPAT.toml in the repository tracks the disposition of every public PyMuPDF symbol:

DispositionCountWhat it means
Implemented647Works today; does not raise on use.
Deferred56Known and planned for a later milestone.
Out-of-scope66Intentionally never in v1.
Total76984.1% implemented

"Implemented" means the method exists and returns a result of the right shape. Byte-for-byte / pixel-for-pixel agreement with PyMuPDF across a real PDF corpus is still being validated. Verify output on your own documents before relying on it.

How gaps behave

Anything not yet implemented raises a typed, catchable pdfspine.PdfUnsupportedError (aliased as fitz.PdfUnsupportedError) with a hint — never a bare AttributeError. That means you can detect and handle gaps cleanly:

import pdfspine

try:
    doc.some_unimplemented_method()
except pdfspine.PdfUnsupportedError as e:
    print("not yet:", e)

PyMuPDF exception names are aliased onto the typed hierarchy, so existing except clauses keep working:

PyMuPDF namepdfspine type
fitz.FileDataErrorPdfSyntaxError
fitz.EmptyFileErrorPdfSyntaxError
fitz.FileNotFoundErrorbuilt-in FileNotFoundError
fitz.mupdf_display_errorsPdfError

What is 100% compatible

The geometry layer (Point, Rect, IRect, Matrix, Quad) mirrors PyMuPDF 1.24.x arithmetic exactly — operators, transforms, inversion, morph / torect, quad convexity — as a documented contract. These classes are also sequences, so r[0], tuple(r), and unpacking all behave like PyMuPDF.

Feature mapping

AreaPyMuPDFpdfspineStatus
Open / pagesfitz.open, doc[i], page_countsame✅ Implemented
Metadatadoc.metadata, set_metadatasame✅ Implemented
XMPget_xml_metadata / set_xml_metadatasame✅ Implemented
Encryption (read)authenticate, needs_pass, permissionssame✅ Implemented
Encryption (write)save(encryption=...)RC4 / AES-128 / AES-256✅ Implemented
Textget_text("text"/"words"/"blocks"/"dict"/"rawdict"/"json"/"html"/"xhtml"/"xml")same✅ Implemented
Searchsearch_for (rects / quads)same✅ Implemented
TextPageget_textpage, extract*same✅ Implemented
Tablesfind_tables, to_markdown+ to_html✅ Implemented
Renderget_pixmap (DPI / matrix / clip / colorspace / alpha)same✅ Implemented
DisplayListget_displaylist, get_pixmapsame✅ Implemented
SVGget_svg_imagesame✅ Implemented
Pixmapsave / tobytes / samples / buffer protocolsame✅ Implemented
Savesave, ez_save, tobytes/write, incremental=same✅ Implemented
Page opsnew_page, delete_page, selectsame✅ Implemented
Mergeinsert_pdfsame✅ Implemented
TOCget_toc, set_tocsame✅ Implemented
Linksget_links, insert_link, delete_linksame✅ Implemented
Annotationsadd_*_annot, annots, delete_annotsame✅ Implemented
Formsis_form_pdf, widgets, form_fill, form_flattensame✅ Implemented
Redactionadd_redact_annot, apply_redactionssame✅ Implemented
Sanitizescrub, bakesame✅ Implemented (subset of toggles)
Embedded filesembfile_*same✅ Implemented
OCG / layersget_ocgs, add_ocg, get_layer, set_layer, set_ocsame✅ Implemented (read + add/toggle/bind)
xref readxref_object, xref_stream, xref_get_key, …same✅ Implemented

What differs

  • scrub toggles — the full PyMuPDF toggle set is accepted, but only a subset (metadata, JavaScript, attached/embedded files, links, XMP) is acted on; the rest are no-ops.
  • insert_image(pixmap=...) — not yet supported; pass stream= bytes or filename= instead.
  • Deprecated camelCase aliases — PyMuPDF's old getText / getPixmap / setMetadata style names are provided as aliases where they existed, so legacy code keeps working.
  • to_html() on tables — an pdfspine extra beyond PyMuPDF.

What is not yet implemented

These are deferred (planned) — they raise PdfUnsupportedError today:

  • convert_to_pdf / image-as-document inputs (milestone M5).
  • Page-level word/block helpers (get_text_words, get_text_blocks, get_textbox), get_texttrace (M2 follow-ups).
  • Image-info helpers (get_image_info, get_image_bbox, get_image_rects).
  • show_pdf_page, write_text, insert_font, replace_image, delete_image.
  • Page-label read/write, copy_page / move_page / delete_pages.

Out of scope for v1

Intentionally never in v1 (these raise PdfUnsupportedError):

  • OCR (get_textpage_ocr).
  • EPUB-class reflow (doc.layout, chapters, locations, bookmarks).
  • HTML/CSS layout (insert_htmlbox).
  • Journalling / undo-redo (journal_*, save_snapshot).
  • Full Unicode shaping (complex scripts).

Consult the repository's COMPAT.toml for the authoritative, per-symbol disposition — it is CI-enforced to stay in sync with the code.

On this page