iTranslated by AI
Introducing TODOMaker: An App to Automatically Turn Complex Procedures into TODO Lists
This is a submission for the AI Agent Hackathon with Google Cloud. I created an app called TODOMaker.
By the way, it's tax return season.
What I Made
I created a mobile app called TODOMaker. It is a Flutter app that works on both iOS and Android.
TODOMaker is a mobile app that creates a TODO list by telling the AI about events that evoke feelings like "I want to do," "I want to become," or "I have to do." For example, the following use cases are possible:
- Want to do: Marriage procedures
- Want to become: How to become a programmer
- Have to do: How to file a tax return
It's hard to even research or get started when you "don't know" or "don't understand" something. Especially tax returns. To lower these hurdles, the app automatically collects information from the web and breaks it down into a TODO list. If more detailed explanations are needed, Gemini on Vertex AI searches for information via Google Search (Retrieval) and provides a polite explanation for each step. Furthermore, the source of information can be verified through Grounding.
Target User Persona and Challenges
User Persona
- Users who generally cannot or are not good at searching on browsers or extracting and summarizing information from AI.
- People who want to know what steps to take and in what order when starting something.
Specific examples are as follows:
- People who "want to do" something like marriage procedures but don't know what to do.
- People who have a vision of what they "want to become" but don't know what steps to follow.
- Users who "have to do" something like a tax return but feel a high hurdle because there is so much they "don't know" or "don't understand." Also, users like myself who forget what to do every year even though they file a tax return annually.
- People facing procedures like tax returns where it's better to research and review each step.
- Users starting their tax returns now. People who want to lighten the burden of filing tax returns.
Challenges and Solutions
To summarize the challenges: "Procedures for unfamiliar actions are unclear, making the hurdle to act high." The solution is: "Clearly provide the steps for unfamiliar actions and lower the hurdle to act." I will describe this in more detail below.
As mentioned earlier, taking action on something you "don't know" or "don't understand" is very costly. Especially tax returns. When such problems arise, most people probably repeat actions like searching, reading articles, and understanding on a browser, and then create a checklist by breaking down actions and prioritizing them in their own notebook. Or, in some cases, it's so tedious that people often feel like escaping or procrastinating.
Of course, these days you can use ChatGPT to ask AI and have it output something like a TODO list in Markdown format. However, since it's something you're not used to, you also want explanations and evidence for each step. Since it's not dedicated software for creating TODO lists, it doesn't format things that conveniently. It's a hassle to manually repeat AI queries, searches, and formatting yourself.
With TODOMaker, simply telling the AI "I want to do this" will generate a TODO list to achieve it. At the same time, it attaches explanations for each TODO item and even links to information sources. Explanations and web sources (presumably reasonably reliable) are provided without the hassle of searching. You can quickly check any unclear points. The AI performs the searching, article reading, understanding, action breakdown, and prioritization mentioned in bold on your behalf. You can approach the TODO items with more confidence by looking at the detailed information summarized by the AI or the sources presented through Grounding for any points of concern in the generated output.
This leads to "clearly providing the steps for unfamiliar actions and lowering the hurdle to act."
Unfamiliar actions. Yes. Tax returns. I'm talking about you. I do it every year but always forget.
Feature Introduction
First, here is a video.
- Video notes
- The fast scrolling is a known issue on the Flutter side when developing with iPhone Simulator and Flutter, so please ignore it.
- As mentioned later, the design involves multiple calls via the Gemini API, so there are parts where loading takes time. I am using Cloud Tasks to mitigate this as much as possible.
Main Features
- Input "what you want to do" to the 🤖. In the demo, this corresponds to the "How to file a tax return" part.
- After waiting, a TODO list is generated. Here, the structure is divided into parent
Taskand childTODO, so I will use these terms from now on. On the screen, theTaskroughly represents the "How to file a tax return" card part, and theTODOrepresents each checklist item. - Both Task and TODO have detail pages. In each, you can check detailed explanations and verify the information sources.
- You can check off both Task and TODO items to mark them as completed.
- The time required for each TODO task is calculated by AI. The total time for all TODOs is also displayed on the Task detail page.
- From the TODO detail screen, you can add entries to your device's calendar app based on the estimated duration.
- From the Task and TODO detail screens, the app can search for relevant locations based on address input (e.g., home or work) or your current location. For example, for a tax return, it shows information for tax offices or government buildings. Location information includes phone numbers and email addresses in addition to coordinates. Map, phone, and email apps can be opened from these.
Bonus
- Tapping somewhere 10 times makes something appear. Please give me a star.
- To the judges: if it appears, it will disappear after restarting the app, so please don't deduct points. I was just frustrated with my tax return.
Architecture
System diagram:

The overall flow is:
- Send input from the Flutter app to Cloud Functions.
- Cloud Functions delegates heavy processing or tasks that can be parallelized to Cloud Tasks. Light processing is handled directly.
- Query Gemini through GenKit's Vertex AI module or directly from step
2. - Format the returned query results and write them to Firestore.
- Since the client is set to detect Firestore changes, the UI updates each time a response from Gemini returns.
Client
- Flutter
- Works on both iOS and Android.
- Firebase SDK
- Firebase Auth
- Cloud Firestore
- Cloud Functions
Backend
The following services were used through Firebase:
- Cloud Firestore
- Cloud Functions
- Cloud Tasks
Additionally, Vertex AI was used with the following framework:
- GenKit
- Some parts use Gemini from other libraries because GenKit had unsupported interfaces.
Selection Criteria
- I chose a technology stack that could cover the judging criteria as broadly as possible.
- Initially, I misunderstood the hackathon's premise and used Cloud Run, but once I realized the misunderstanding, I decided to configure it with the Firebase family. This is because there is a wealth of information for Flutter x Firebase, and mBaaS is very easy to use for mobile app development, so I ported the infrastructure midway.
Efforts and Challenges
- I chose GenKit because it makes verifying behavior easy. There are downsides, such as the interface for the Gemini 2.0 x Grounding combination not being ready yet. Therefore, I used the 1.5 series in many places. However, there were no major issues with accuracy, and the difference was just a slight variation in how the config is written.
- Specifically, up to 1.5 it's
{ googleSearchRetrieval }, and from 2.0 it's{ googleSearch }.
- Specifically, up to 1.5 it's
- I didn't initially intend to go as far as using Cloud Tasks, but the number of TODOs became quite large, and it became extremely difficult speed-wise without parallel processing. Tax returns, for example, have many items. Also, Gemini often stops with a
429 Too Many Requesterror, so I felt it was a good match for Cloud Tasks' nature of retrying onthrow error, leading to an early introduction.- I thought applying for a limit increase might mitigate or solve the problem, but due to high uncertainty and starting development in February, I leaned towards an architecture that could handle things reliably.
- I prepared prompts for each Task and TODO property as much as possible. In terms of behavior, this is a trade-off between AI response accuracy and speed. I feel that the simpler the prompt, the more the AI provides the response the developer wants. This time, instead of preparing a "super prompt" that returns various things at once, I developed it by introducing Cloud Tasks. As a result, I feel that fine-tuning the prompts was completed quickly.
- Since I wanted to submit an Android app, the choice to make it a mobile app was decided. Finally, to add a "mobile app-like" feel, I included location information (coordinates, maps, phone, email) and calendar registration features. Especially, being able to use Geocoding easily is a great benefit of mobile apps.
Notes for running the app
Since I am making the repository public during the review, I thought some people might want to try running it locally, so here are a few notes. I won't go into detail about basic setup like environment variables; those will be covered in the repository's README.
-
429 Too Many Requesterrors occasionally occur with Gemini. This happens more easily because Gemini is used in parallel via Cloud Tasks. While retries are configured to some extent, there is no mechanism to synchronize the client-side display with the progress or error status of the asynchronous processing using Cloud Tasks. A simple feature to "stop and try again if it takes too long" is provided, so if it takes more than 5 minutes, please try again. The best solution is to request Google to increase the Gemini limits. - In Gemini's JSON Mode, when a
string | nulltype is specified, the string "null" is often returned. This should be resolved as AI accuracy improves, but there are places where it might be displayed as is. - Everything works on physical devices. On simulators or emulators, some features may not work or may behave inconsistently, specifically Geocoding, Location services, and Calendar integration.
Summary
With TODOMaker, I can now take action in unknown territories with more agility. It's a truly wonderful app. Now I really have no choice but to do my tax return...
I have published the repository for the hackathon submission, so please give it a star if you like! ⭐️
That's it! \(^o^)/
Discussion