iTranslated by AI
How Far Can Humans Step Back from Creation? Building an Autonomous 4-Panel Manga System with Gemini 2.0 Thinking and Imagen 3
What is "Nano Banana Pro Powered Super AI 4-koma System / Autonomous 4-Panel Manga Generation System"?

1. Introduction: AI Evolving from "Tool" to "Director"
AI-driven manga production has primarily played a supporting role, where humans refine prompts and repeatedly "pull the gacha." This project was developed to fundamentally overturn that process by positioning AI as the director, aiming to completely automate everything from brainstorming to composition, direction, and rendering.
Based on the design philosophy that "humans intentionally withdraw from the creative process," we have implemented original protocols to control the inherent weaknesses of diffusion models at the code level.

2. Technical Challenges Resolved
When generating "4-panel manga" using image generation AI (Diffusion Models), the following three hurdles arise:
- Spatial Collapse: Failure to maintain accurate panel layouts or borders, leading to content merging.
- Lack of Identity: Character designs (hairstyle, glasses, clothing) change from panel to panel within the same canvas.
- Decision-Making Dependency: Reliance on humans for the creative step of "what to draw."
To address these issues, we implemented the following technologies.
3. Physical Spatial Constraints: ABSOLUTE PHYSICAL GEOMETRY LOCK
Instead of asking in natural language to "draw in 4 panels," we hammer the canvas spatial coordinates into the LLM as pixel-level physical constraints.
// Excerpt from src/App.jsx
// Force 4 equally spaced horizontal strips within a 2:3 vertical canvas
const geometryLock = `
[ABSOLUTE PHYSICAL GEOMETRY LOCK - v121.3]
(Aspect Ratio: 2:3 Vertical ONLY).
(Orientation: Portrait Mode).
(Physical Barrier: Top 15% and bottom 15% are solid pure white blocks).
(Structure: 4 EQUAL-SIZED HORIZONTAL STRIPS stacked vertically).
(Panel Width: EXTEND TO THE VERY EDGES. ZERO PADDING ON SIDES).
`;
By intentionally dead-zoning the top and bottom 15% of the canvas as a "Physical Barrier," we physically secure space for the AI to write titles or signatures, preventing the panel layout from being cut off at the edges of the canvas.
4. Mathematical Control of Feature Quantities: Weighted Immutable Prompts
To maintain character consistency, we assign mathematical weights to features extracted during the analysis phase and inject them into the prompt as immutable constants.
// Logic for weighted injection to ensure feature consistency
// Example: specify intensities such as female:1.6, glasses:1.2
const VAR_CAST_LIST = "${castList.replace(/\n/g, ', ')}";
By explicitly specifying weights with intensities that the model cannot ignore, such as "female:1.6", we minimize "omissions" of features like the presence of glasses or hair color, as well as character "merging (fusion)" across multiple panels.
5. Robust Model Cycling Strategy
Latest models like Gemini 2.0 Flash Thinking have unstable aspects where 404 errors or quota limits are prone to occur due to their experimental nature. To overcome this, we built a fallback mechanism that dynamically cycles through available models.
// Excerpt from src/lib/gemini.js
const MODEL_IDS = [
"gemini-2.0-flash-thinking-exp-01-21", // Prioritize the latest thinking model
"gemini-2.0-flash-thinking-exp",
"gemini-2.0-flash",
"gemini-1.5-flash-latest" // Fallback to the final stable version
];
export const callThinkingGemini = async (prompt, ...) => {
for (const modelId of MODEL_IDS) {
try {
// Execute connection to the model and generation process
const result = await model.generateContent(...);
return { text: finalOutput, thought: thought };
} catch (err) {
console.warn(`Model ${modelId} failed:`, err.message);
// Fallback to the next model immediately on failure and continue generation
}
}
};
Furthermore, by extracting the thought part of Gemini 2.0 and displaying it in real-time in the frontend's "Neural Process" log, we have introduced a mechanism that allows users to transparently monitor the AI's thinking process.
6. Rendering Optimization for Imagen 3
In the final output, parameter control is performed to maintain the vertical stacking structure unique to manga when calling the Imagen 3 API.
// Excerpt from src/lib/imagen.js
parameters: {
sampleCount: 1,
aspectRatio: "9:16", // Specify an ultra-tall ratio optimal for vertical 4-panel stacking
personGeneration: "allow_adult"
}
By explicitly specifying aspectRatio: "9:16", horizontal margins within the canvas are removed, and optimization is done so that the four panels are rendered densely.
Tech Stack
- Frontend: React 19 / Vite 7 / Tailwind CSS
- AI Engine: Google Gemini API & Imagen 3
- Architecture: Asynchronous prompt compilation utilizing thinking traces
Conclusion
This system is an experimental subject to verify a future where AI goes beyond being a "human tool" and behaves as an autonomous "creator." The source code is released under the MIT License, and the logic under CC BY-NC-SA 4.0.
I hope that fellow engineers will further refine this protocol with their own hands.
Repository: https://github.com/FURUYAN1234/nano-banana-pro
Demo: https://furuyan1234.github.io/nano-banana-pro/
Technical Article (note): https://note.com/happy_duck780/n/ndf063558c1f5
Discussion