iTranslated by AI
Google Jules Verification Log: Procedure for Delivering Outputs on GitHub and Evaluating Redesign Capabilities
This article is a re-edited version of an experience report published on ayatabi.net, from a technical perspective.
The moment I heard at an LT session that "there's a development agent that can be run even from a smartphone," it no longer felt like a theoretical discussion, and I immediately began verification.
What you'll learn in this article
- What kind of agent Google Jules is (clarifying its mechanism)
- Reproducible steps to generate deliverables assuming GitHub Pages (creating a demo site)
- How far updates leaning towards "redesign" rather than "correction" of existing code are possible
- Pros and cons discovered through use (decision-making factors)
Target Audience
- Those who develop with GitHub and are accustomed to PR-based operations
- Those who have used Cursor / Gemini CLI / Claude Code and are looking for "the next step"
- Those who prefer to judge by "what works" (deliverable-oriented people rather than conversationalists)
Prerequisites
- GitHub account (capable of repository creation and PR management)
- Ability to enable GitHub Pages
(Public repositories are required for free plans. Private repositories are available for paid plans. To enhance reproducibility of the verification, this article assumes a public configuration.) - Jules is available in your region and account status (jules.google.com)
What is Jules? (Overview)
Jules is a coding agent that "autonomously progresses tasks on GitHub repositories and returns deliverables as PRs." The official changelog explains that when a task is given, it sets up a new development environment on a VM, proceeds with dependency preparation, test execution, and PR creation. (jules.google)
Its launch was announced on May 19, 2025, and its capabilities include "bug fixes," "dependency updates," "migrations," "feature additions," and "PRs with test results." (jules.google)
Later, a CLI (Jules Tools) was also provided, with instructions to use it via npm install -g @google/jules or npx @google/jules. (jules.google)
Verification 1: Having it build an entire demo site
Here, I'm checking if it "produces deliverables," not if it's "good at conversation."
Deliverables
- Demo Site: https://ayanecen.github.io/GoogleJules
Steps
The Jules UI tends to be updated frequently, so here I'll stick to "the minimum steps whose underlying concept doesn't change."
-
Create an empty GitHub repository
- Example:
yourname/jules-demo
- Example:
-
Enable GitHub Pages
- Settings → Pages → Deploy from branch
- Select
main / (root)for the branch (it's fine to try with the standard configuration first.)
-
Give Jules a task
- Goal: Create a static site that doesn't require building and runs on GitHub Pages
- Example conditions:
- Completed with
index.html / style.css / script.js - Minimum navigation such as "Description," "Demo," "Contact" at the top
- Include one mini-game (Canvas, etc.)
- Include one news-style UI (dummy data is fine)
- Completed with
-
Review the Plan (work plan)
- Check if the file structure is not overly complex
- Check if it's not a configuration that requires building to run on Pages
-
Have it create a PR, review the diff, then merge
-
Check the Pages URL
- It might take a few minutes for changes to reflect (this is GitHub Pages behavior)
What was confirmed
- Changes are grouped by PR and appear as reviewable diffs
- It works directly on GitHub Pages without a build step
When things go wrong
- Pages 404 → First suspect Pages settings (branch/root)
- Screen appears but doesn't work → Suspect
script.jsloading path (relative path)
Verification 2: Having it "redesign" Whac-A-Mole
I focused on whether it could "reconstruct the structure" rather than just "add features" here.
| Before Update | After Update |
|---|---|
![]() |
![]() |
Observations
- UI changes (pastel tones, rounded cards)
- Added sound effects and ON/OFF toggle
- Difficulty selection (Easy/Normal/Hard)
Tips for requests
This is a "writing method that works" derived from my verification, not a factual procedure.
- Impression: * Impression: Vague requests tended to result in vague structures, while concretizing the intent tended to produce outputs with internal states well-organized.
-
Specific examples (instructions that can be used as-is)
- "Consolidate state into
gameState(score, timeLeft, isSoundOn, difficulty), separating rendering, input, and logic." - "Vary appearance interval and time limit by difficulty, and display the current difficulty in the UI."
- "Keep sound effects lightweight with Web Audio API (no external files)."
- "Consolidate state into
Verification 3: Making Neon Sentinel a "game" by having it recognize its purpose
I checked if it could recognize the purpose and make a functional but meaningless state into a game.
| Before Update | After Update |
|---|---|
![]() |
![]() |
Observations
- Clarification of HUD (Score / Lives / Level)
- Full-window Canvas, organization of background and effects
- Adding the objective "protect from meteors" clarified the meaning of play (i.e., what needed to be achieved was verbalized)
- In this update, the design did not explicitly assume a window size, so the difficulty increased significantly on wider displays.
- Canvas became full-window
- The expansion of the drawing area likely led to an increase in enemy spawn positions and processing range
- As a result, the area the player needs to cover increased
- The wider the screen, the higher the perceived difficulty.
Hypothesis
- The addition of purpose may have made it easier for the agent to design "necessary display elements" and "collision detection/life system," potentially leading to more structured output.
- The responsive conditions and reference resolution were not included in the specifications, which may have led to a design where the drawing area was simply scaled.
Impression
- What changed most here was not the amount of code, but the feeling that "the specification text" began to be the core of the product. It felt like the task was to fix "the meaning of the game" first, rather than writing code.
- This felt like an example of "unwritten specifications not being reflected" rather than a bug. In other words, I learned that unless both "what to create" and "under which environment it should run" are verbalized, the difficulty and experience will not be stable.
Jules' Strengths and Weaknesses
Strengths
- Returns PRs: Delivers results in a reviewable diff format
- Assumes a test-inclusive flow: Explicitly states its philosophy of setting up a VM, then proceeding with dependencies → tests → PRs (jules.google)
- CLI available: Provides room for local operations and script integration (jules.google)
Weaknesses / Constraints
- Results are driven by the "quality of specifications": The time spent verbalizing specifications tends to dominate over code generation time (this can also be an advantage)
- Due to its GitHub-centric nature, it may not be suitable for local-only workflows
Summary
- Jules can be categorized as an agent that "autonomously performs tasks on GitHub and creates PRs." (jules.google)
- For verification, it's quicker to evaluate based on "deliverables (demos working on Pages)" than conversations
- It seems to be better at "redesign" than "correction" in situations (e.g., state management, UI/logic separation)
- The most critical part is "verbalizing the specifications" (this is key)
The emotional tone of the experience (the atmosphere of the LT session, the buzz of the discussion) can be found in the original article.
Note that this verification is based on behavior as of February 2026.




Discussion