🔥

PyPDF2?いやPyMuPDFでしょ

2022/12/10に公開

Python

PyMuPDF

tech

PyPDF2だと読み書きのときにエラーをはかれる。

ことがある。うちの扱ってるPDFだと結構あるので問題になった。

 (<class 'PyPDF2.utils.PdfReadError'>, PdfReadError('Illeagal character in Name Object',) <traceback object at 0x0123456789ABCDEF

PyMuPDFなら大丈夫！問題ない！

日本語だとPyPDF2の資料がそこそこあるんですが、PyMuPDFは英語ばっかり。
なので自分が使ったメソッドとか書いていきます。
Wand+ImageMagickも使えそうなのですが、インストールに躓きました。

PyPDF2での実装

    import PyPDF2

    # 一部のページのみ抜き出す処理の例
    # pdf_pathはpathlibのPath形式を想定しています。
    reader = PdfFileReader(str(pdf_path))
    writer = PdfFileWriter()
    page = 1234567890
    writer.addPage(reader.getPage(page))

    save_path = hogepiyo.pdf
        with open(save_path, 'wb') as f:
            writer.write(f)

    # ページ数を取得する例
    with open(pdf_path, mode='rb') as f:
        reader = PdfFileReader(f, strict=True)
        _pages = reader.getNumPages()

PyMuPDFでの実装

まずはインストール

pip install –upgrade pymupdf

PyMuPDFでの実装

    # インポートは"fitz"。名前はPyMuPDFであるが歴史ある故にこのように呼び出す。
    import fitz

    # 一部のページのみ抜き出す処理の例
    # pdf_pathはpathlibのPath形式を想定しています。
    reader = fitz.open(str(pdf_path)
    writer = fitz.open()
    page = 1234567890
    writer.insertPDF(reader, from_page=page, to_page=page)

    save_path = hogepiyo.pdf
    writer.save(save_path)

    # ページ数を取得する例
    _pages = fitz.open(pdf_path).pageCount

PyPDF2?いやPyMuPDFでしょ

PyPDF2だと読み書きのときにエラーをはかれる。

PyMuPDFなら大丈夫！問題ない！

PyPDF2での実装

PyMuPDFでの実装

他、参考になりそうなところ

Discussion