iTranslated by AI
Capabilities and Limitations of ChatGPT's Agent Mode for Open Data Exploration
tl;dr
- Objective: Visualize factors behind the increase in Sika deer (declining hunters, warm winters/reduced snowfall, changes in capture pressure, etc.) using a dashboard to advance the discussion.
- Hypotheses for Verification: ① Declining and aging hunter population → reduced management capacity, ② Warm winters and reduced snowfall → increased survival rates, ③ Changes in the composition of capture pressure → impact on population numbers, (Next version) ④ Policy events and changes in land use contributing to fluctuations.
- Data Automatically Found and Adopted by AI: Estimated population (quantiles), number of captures (breakdown), and hunting license holders (by age) from the White Paper on the Environment; daily snow depth → annual aggregation for representative observation points (Nara/Niigata/Gifu); and climate indices by prefecture (2019) as supplementary data.
- Main Preprocessing: Converted old .XLS to CSV, converted Japanese calendar to Western calendar, normalized headings, converted from wide to long format, and aggregated annual snow data (snowy days/average/maximum/total). All output as UTF-8 CSV for immediate import into Tableau.
- What was achieved in Agent Mode: Navigated to Excel tables on official sites → conversion → shaping, batch retrieval of annual CSVs, lightweight ETL, and automatic file output.
- What Agent Mode is not good at: Automatic retrieval from login-protected/dynamic pages/old formats; comprehensive coverage of population by prefecture requires additional work.
Author's Thoughts
- If you summarize the context of what hypothesis you want to verify beforehand (using AI for this as well) and communicate it to the AI as a prompt, it will do a pretty good job just by being told "bring some good data." It was very helpful that it automatically found tools to convert even old .xls formats into CSV.
- Even if recent data exists, it sometimes fails to reach it and brings back data from an incomplete range of years. Therefore, a process of human verification and scrutiny is essential.
- If you ask the Agent, it will return data that is at a stage where it can be turned into a graph just by importing it into a BI tool. Conversely, that’s all it is, so rather than data worth deep exploration, it tends to be data that only leads to obvious conclusions. (Though that in itself is appreciated.)
- As of October 2025, if you're using AI for open data exploration, it's best to use an AI Agent in the phase of brainstorming a theme and figuring out what kind of data might be available.

Data for a relatively ordinary graph like this can be retrieved quickly (though several parts might be missing).
Background
Problem Statement: Why are wild deer and boars increasing so much in Japan? → But data preprocessing is tedious → Can ChatGPT do it?
Many of you might want to perform exploratory data analysis and visualize data (in Tableau) to verify hypotheses for questions that suddenly come to mind like this. However, the process of searching for, retrieving, and processing public data is extremely tedious. (This is a personal opinion.)
Therefore, I conducted an experiment to see how much of the data preprocessing can be left to AI and how much can be done using ChatGPT's Agent mode.
Tableau Umauma Kai
The catalyst was an event called Tableau Umauma Kai. The Tableau community is very active, and I participated in a session aimed at "becoming better at creating Vizzes (data visualizations) while eating delicious food." The theme this time was "Gibier (wild game) Yakiniku." It was incredibly delicious.

What is Gibier?
(This section is just a copy-paste of search results from Perplexity)
Gibier refers to meat obtained by hunting wild birds and animals for food, as well as dishes made using that meat. [1][2][5]
Etymology and Definition
- "Gibier" originates from French and is translated as "wild animal meat" in Japanese. [2][3]
- Unlike livestock (beef, pork, chicken, etc.), it refers to animals raised in the wild (deer, wild boar, duck, bear, rabbit, pheasant, etc.). [6][8][1]
Characteristics of Gibier
- Since it is obtained through hunting, there are unique differences in taste and meat quality depending on the animal's activity level and diet. [1]
- Japan has a long-standing tradition of eating gibier as part of its traditional food culture, and in recent years, its use has been promoted as a countermeasure against crop damage. [3][9]
- In Europe, particularly France, it developed as a luxury ingredient and aristocratic cuisine. [5][6]
Examples of Typical Gibier
- Deer (venison), wild boar, bear, rabbit, duck, pheasant, and pigeon are common. [7][8][1]
- Each has a different taste and preparation method; dishes like "Botan-nabe" (boar) and "Momiji-nabe" (deer) are well-known in Japan. [3][7]
Precautions
- Because wild animal meat carries hygiene risks (such as hepatitis E or parasites), thorough cooking is recommended. [2]
Thus, gibier is a food culture that celebrates the blessings of nature and is gaining increasing attention today from the perspectives of sustainability and regional revitalization. [6][3]
References
Following this delicious experience, the question of why wild deer and boar populations are increasing in the first place arose, leading to the experiment described at the beginning.
Experiment Overview
To create an MVP (Minimum Viable Product) dashboard in Tableau that demonstrates the factors contributing to the increase of Sika deer in Japan—such as the absence of natural enemies, the decline in hunters, warm winters and reduced snowfall, the impact of conservation policies, and changes in land use—in a visible format, I collected and shaped open data in a short amount of time. This article is a chronicle of that exploration and a summary of what I was able to achieve and where I struggled using ChatGPT in Agent mode (an environment capable of browser operations, file processing, and light Python data shaping).
Data exploration plan devised by AI
In order to construct a compelling story quickly, I collected the following as essentials for the MVP:
-
Long-term trends in estimated population (Nationwide / Honshu and southward)
Extracted from the Excel tables in the Ministry of the Environment's "White Paper on the Environment" → Converted to CSV.- Estimated population including quantiles (0.05–0.95) → Tracking trends based on the median (Q0.50).
-
Trends in capture numbers (Nationwide)
Extracted from Excel tables in the same White Paper → Converted to CSV.- Breakdown of "hunting," "permitted capture," and "designated projects" + total → Proxy variables for anthropogenic pressure.
-
Number of hunting license holders (By age / Long-term)
Excel data of license holders by age from the White Paper → Reshaped into long format, including annual totals.- Directly linked to visualizing aging and decline.
-
Snowfall and temperature (Climate factors)
① 2019 "Annual Average Temperature" and "Annual Number of Snowy Days" by prefecture (derived from ranking tables).
② As a "fixed-point observation" of snowfall, I collected daily snow depth from archive records for observation points located in prefectures with high deer populations and aggregated them annually (e.g., representative stations in Nara, Niigata, and Gifu).
With these, the four core axes of "Population, Capture, Hunters, and Snowfall" were established, making it possible to construct a story with a minimal dashboard.
While the AI says it "collected" and "became" ready, in reality (likely due to token constraints), it couldn't bring all the data in one go, or some data was missing. Please be patient with the AI's confident tone.
Record of Data Exploration
1) White Paper Excel tables → CSV conversion
- Objective: Handle estimated population, capture numbers, and hunters (by age) as annual time-series data.
-
What (the AI) did:
- Navigated from the White Paper PDF table pages to the Excel attachments.
- Read
.xlsxfiles directly; converted.xlsfiles to CSV using LibreOffice (headless). - Performed normalization of Japanese headings, converted Japanese era names to Western years, handled multi-row headers, and converted data to long format.
- Results: Prepared clean CSVs ready for Tableau, allowing for immediate creation of annual line and area charts.
2) "Fixed-point" snowfall data (1989–present, no need for nationwide coverage)
- Objective: Track the hypothesis of warm winters and reduced snowfall over the long term at the "same observation points."
-
What was done:
- Downloaded annual CSVs (station × year) from meteorological offices.
- Obtained daily snow depth (mm) for Nara (Katsuragi), Niigata (Niigata), and Gifu (Takayama).
- Aggregated number of snowy days / average, maximum, and total snow depth by year.
- Insights: Nara is a warm region with extremely few snowy days to begin with; heavy-snow regions like Niigata have more explanatory power to show the effect of "reduced snowfall ↑ survival rate." The annual fluctuations in Niigata were effective for the story.
Obstacles Encountered (Overcome / Bypassed)
Here too, the "AI is the subject." The AI (GPT-5's Agent mode) stumbled, thought, and overcame challenges. Meanwhile, the human was just leisurely drinking tea.
-
Loading old Excel (.xls) files
→ These could not be read with standard Python, so the issue was resolved by converting them to CSV using LibreOffice. -
Multi-row headers, Japanese era names, and full-width spaces
→ Unified through column name normalization, Japanese-to-Western calendar mapping, and conversion to long format. -
Site cross-domain restrictions and login requirements
→ Some statistical portals have restrictions on automatic downloads. We reached primary Excel/CSV files directly where possible and switched to alternative open data or fixed-point observations at representative stations for difficult cases. -
Prioritizing "fixed-point observation" over "prefectural averages" for climate data
→ Since prefectural averages have limits in terms of coverage of years and indicators, switching to "station × year" data, which is long-term and highly reproducible, proved successful.
Preprocessing Recipe (What was done this time)
- Column name cleaning (Japanese → English, removal of spaces, line breaks, and full-width characters)
- Applied conversion dictionary for Japanese calendar → Western calendar
- Pivoting (Wide → Long format)
- Forced conversion from string → numeric; used
NaN/0 for missing values depending on the context - Annual aggregation (Snowy days = count of days where
snow_depth_mm > 0, average/maximum/total snow depth) - Export: UTF-8 CSV (interoperable with Tableau, Sheets, and Python)
Minimal Storyboard in Tableau
The term "minimal" is used because we were discussing the MVP (Minimum Viable Product) in the preceding context.
- Estimated Population (Q0.50) Trend: Line chart of national time series
- Capture Count Stack: Stacked bar chart of hunting, permitted, and designated projects
- Aging Hunters: Area chart or population pyramid by age group
-
Fixed-Point Snowfall Comparison: Annual snowy days in heavy-snow regions (Niigata) vs. warm regions (Nara)
- → Highlight the overlap between "warm winters/low snowfall years" and periods of population increase
- (Optional) Annotation Layer: Note policy shifts (e.g., lifting of the ban on hunting female deer) using vertical lines
ChatGPT in Agent Mode: What it could do
-
Data exploration and reaching sources from official websites
- Navigated to embedded Excel files in white paper charts.
- Implemented creative ways to batch-retrieve annual observation CSVs (station × year).
-
File conversion, shaping, and merging
-
.xls→ CSV conversion, reading.xlsx, and fixing headers containing Japanese characters. - Lightweight ETL such as converting to long format and annual aggregation.
-
-
Optimization for "ready-to-use" output
- UTF-8 CSV format compatible with Tableau, Sheets, and Python.
- Making column names and data types BI-friendly.
It is extremely well-suited for getting data into a "usable form" in a short amount of time.
ChatGPT in Agent Mode: Weak Points / Cautions
-
Automatic retrieval from sites requiring login/API keys or dynamically generated pages
- Login requirements, cross-domain restrictions, and cookie control on statistical portals are barriers.
- Workaround: Find bypass routes to primary distribution Excel/CSV files, or if absolutely impossible, limit to representative points or switch to another primary source.
-
Old Excel formats (.xls) and Japanese-specific notation inconsistencies
- Often gets stuck in environments where additional libraries cannot be used.
- Workaround: Persistently shape the data using LibreOffice headless conversion or skipping rows + regular expressions.
-
Guaranteeing perfect comprehensiveness in a short time
- Example: Immediate retrieval of a unified format for Sika deer populations by prefecture is difficult.
- Workaround: First create a compelling MVP using national trends + capture + hunters + fixed-point snowfall, and then expand to regional data in subsequent steps.
-
Differences in units and definitions
- Snowfall has many indicators such as "snowfall," "snow depth," and "snowy days," and units (cm/mm/days) are also mixed.
-
Workaround: Clearly state definitions and embed units in column names (e.g.,
snow_depth_mm).
Reproducible Workflow (Easier if set as an internal standard)
-
Data Acquisition
- White Paper chart Excel →
/raw/env_whitepaper/*.xlsx|.xls - Station × year CSV →
/raw/snow/{station}/{year}.csv.gz
- White Paper chart Excel →
-
Preprocessing (Lightweight ETL)
- Format conversion (
.xls→ CSV), column name normalization, Japanese calendar → Western calendar, numeric conversion. - Wide → long format, alignment by keys (Year, Prefecture, Station).
- Format conversion (
-
Feature Engineering
- Number of snowy days, average/maximum/total snow depth.
- Ratio of hunter age groups and median age (optional).
-
Verification View
- Line charts (population, captures), stacked bars (capture breakdown), population pyramids (hunters), snowfall comparison.
-
Export
- Import
/out/csv/*.csvinto Tableau and bind them to the workbook template.
- Import
Even Better Practices (Practical Tips)
- Gradually increase snowfall data for "representative points × multiple prefectures" (Hokkaido, Aomori, Akita, Niigata, Nagano, Gifu, etc.).
- Prepare a list of policy event years (years when the female deer hunting ban was lifted, years when capture projects were expanded, etc.) → Present causal hypotheses using vertical line annotations.
- Add just 1–2 land-use proxies (abandoned farmland, forest age composition, broad-leaved tree ratio, etc.).
- Create a data dictionary first (column names, units, definitions, source URLs, update frequency).
Summary
- ChatGPT in Agent mode is excellent at "reaching primary data, shaping it in the shortest time, and loading it into BI tools."
- On the other hand, it often gets stuck on login-protected APIs, highly dynamic sites, and old formats. A practical solution is to bypass these by mixing in workaround routes (alternative sources, representative points, format conversion, manual steps).
- The workflow in this article (national trends + capture + hunters + fixed-point snowfall) alone serves as a strong starting point for demonstrating the "warm winter/reduced snowfall × population dynamics" relationship. By expanding this with regional details and layering policy events, it can grow into a dashboard capable of supporting decision-making.
Appendix 1: Hypotheses and the Data Searched/Acquired
| Hypothesis | Objective of Verification | Data Searched/Acquired (Source) | Status/Notes |
|---|---|---|---|
| Declining and aging hunter population is weakening population management capacity | Understand long-term trends in the number of hunters and changes in age composition | Number of hunting license holders (by age, nationwide) (Excel tables from the White Paper on the Environment) | Acquired and shaped (converted to long format, created annual totals) |
| Climate change (warm winters, reduced snowfall) is pushing up overwintering rates | Check if reduced snow is expanding the "survival environment" | Daily snow depth at observation points (from 1989): Nara (Katsuragi), Niigata (Niigata), Gifu (Takayama) / Annual aggregation (snowy days, average, maximum, total) | Acquired and aggregated annually (Nara = rare snow, Niigata = high annual fluctuation with explanatory power) |
| Population has increased since the 1990s | Visualize long-term growth trends | Estimated population (Honshu and southward, quantiles 0.05–0.95) (White Paper Excel) | Acquired and converted to CSV (Median Q0.50 as main index) |
| Changes in capture pressure (hunting, permitted, designated) affect population | Understand changes in capture composition and total volume | Number of Sika deer captures (hunting / permitted / designated / total) (White Paper Excel) | Acquired and converted to CSV (Visualized as annual stacks) |
| (Supplementary) Understand wide-area trends of warm winters at the prefectural scale | Check consistency between station data and prefectural scale | By prefecture: Annual average temperature, annual number of snowy days (2019) (Derived from public ranking tables) | Acquired (Adopted 2019 data for comparison as indicators were available) |
| (Next version) Impact of laws and policy changes on increases/decreases | Overlay event years and dynamics using vertical annotations | Years when female deer hunting ban was lifted, years when damage countermeasure projects were expanded, etc. (Chronology) | Under collection (Candidates listed for MVP) |
| (Next version) Changes in habitat environment (abandoned farmland, forest composition) | Verify hypothesis of expanding food resources and hiding environments | Trends in abandoned farmland area (National/Prefectural), Forest resource statistics, etc. | Subsequent acquisition (Currently listed as candidates) |
| (Next version) Population and captures by prefecture | Increase explanatory power of regional differences | Number of captures/slaughtered by prefecture, population estimation by prefecture | Some access restrictions → Gradual expansion in next version |
Note: Natural enemies (extinction of wolves) are positioned as qualitative background facts due to historical reality; quantitative data is unavailable (lack of comparative targets in time and space).
Appendix 2: Main Tools/Functions used in Agent Mode (Based on this actual work)
Data Acquisition
- Web Browser Operations: Navigated to and downloaded Excel attachments from the White Paper on the Environment table pages.
- Batch Retrieval of Annual CSVs (Station × Year): Acquired annual archives of snowfall depth on a yearly basis (Nara, Niigata, Gifu, etc.).
Conversion and Preprocessing
-
LibreOffice (headless): Batch converted old
.xls→ CSV (read.xlsxdirectly). -
Python (pandas / csv / gzip):
- Column name normalization (removal of Japanese, line breaks, and full-width spaces).
- Japanese calendar → Western calendar conversion.
- Wide → Long format shaping, numeric conversion (handling missing values as 0/NaN based on context).
- Daily snow depth → Annual aggregation (snowy days, average, maximum, total).
-
Minimization of Shell Work: Organized downloading and file placement (
/raw→/out).
Verification and Visualization Prep
- Exported as UTF-8 CSV (interoperable with Tableau, Google Sheets, and Python).
-
Tableau-ready schema (Date/Year keys, long format, explicit units like
snow_depth_mm).
Constraints and Workarounds
- Portals requiring login/cookies or sites with strong cross-domain restrictions → Reached primary Excel/CSV files directly, or pivoted to representative observation stations.
- Old Excel (.xls) or multi-row headings → Resolved with LibreOffice conversion + normalization.
- Prefectural average climate indicators may have missing years/items → Focused on long-term trends of fixed-point observations.
Discussion