iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📖

How I Built an E-book Capture Tool with AI as a Python Beginner

に公開

How I Built an E-book Capture Tool with Claude as a Python Beginner

Introduction

I have almost no experience with programming.
However, driven solely by the desire to "save the e-books I bought to my own computer," I built a Python tool together with an AI.

As a result, I completed a tool that works perfectly.

In this article, I will honestly share "how I gave instructions to the AI" and "what obstacles I encountered." I hope this serves as a reference for others who are in the same boat: "I have something I want to make, but I can't write code."


What I Built

General Purpose E-book Capture Tool

  • Automatically turns pages in an e-book viewer while saving screenshots.
  • Compiles images into a PDF at the end.
  • Operates via a GUI (just by pressing buttons).

It is available on GitHub:
https://github.com/mamez31/ebook-capture-tool


What I Did

Step 1: First, I Created a Specification Document

Rather than having the AI write code right away, I first created a specification document summarizing "what I want to build."

By bouncing ideas back and forth with Claude, I nailed down the following details:

  • What functions are necessary?
  • What settings should be configurable in the GUI?
  • Where should files be saved?
  • How should the end of the book be detected?

By creating a solid specification document, my instructions to the AI remained consistent. I believe this was the most important step.

Step 2: Built Incrementally Using Phases

Trying to have everything built at once leads to complex code and tons of bugs.

So, I divided the project into 5 phases and proceeded while verifying the operation at each stage.

Phase Content
Phase 1 GUI, Capture, Saving, Test Mode
Phase 2 PDF Generation, Spreads Splitting
Phase 3 Image-based Termination Detection (OpenCV)
Phase 4 Text-based Termination Detection (OCR)
Phase 5 Help Screen, Log Improvements

Since I confirmed "it works!" at each phase before moving on, I knew exactly where a problem occurred if something went wrong.

Step 3: Giving Instructions to Claude Code

I used Claude Code for the actual code generation.

Giving instructions is simple.

Please implement Phase 1.

[Implementation Scope]
・tkinter GUI
・Screenshot acquisition
・Image saving
・Stop button
・Test mode

[Things NOT to implement yet]
・PDF generation
・OCR
・Termination detection

[Important]
Please prioritize the "minimum viable product that definitely works" first.

The key point is to explicitly state "things not to implement yet." If you don't, the AI will try to implement everything at once, and things will quickly get out of hand.


Challenges Faced

Installing Tesseract

I needed software called Tesseract to use the OCR function, but after installing it, it wouldn't work because the PATH was not set correctly, causing an error.

Solution: I resolved this by setting the environment variables in PowerShell.

Handling Two-Page Spreads

I encountered an issue where splitting pages in a two-page spread caused only the first page to be misaligned. This was due to the specifications of the e-book viewer application.

Solution: I changed my workflow to capture with the two-page spread display turned off.


Reflections

I barely wrote any code myself, yet I successfully completed a tool that works as expected.

I realized that the key is not just to outsource everything to AI, but to think carefully about "what I want to build" and communicate that clearly. If you properly create a specification, divide the work into phases, and verify operations at each step, even a beginner can build tools with AI.


Technologies Used

  • Python / tkinter
  • mss (for screenshots)
  • Pillow (for image processing)
  • pyautogui (for sending keyboard input)
  • opencv-python (for image recognition)
  • pytesseract (for OCR)
  • Claude Code (for code generation)

Important Notes

  • This is a tool created for personal learning purposes.
  • It is intended for personal use of content you have purchased yourself.
  • Please check the terms of service for each e-book service and use this tool at your own risk.

Discussion