iTranslated by AI
Speculating on the Tech Stack Behind Takahiro Anno's Broad Listening Based on His Video Statements
Introduction: Source Video
This article is based on the following video, in which Takahiro Anno explains the mechanism for "compressing high-dimensional vectors into two dimensions."
(The relevant explanation starts around 17:20)
Background
During the 2024 Tokyo Gubernatorial Election, Takahiro Anno used a method combining text vectorization and dimensionality reduction to analyze and visualize over 10,000 comments left on a YouTube video (his dialogue with candidate Shinji Ishimaru). This serves as the technical foundation for "Broad Listening" (a method of aggregating vast amounts of public opinion via AI to update democracy).
Overview of the Technology Stack
Details of Each Phase
Phase 1: Text Vectorization (Embedding)
When text from each comment is sent to the API, a 1536-dimensional vector quantifying its meaning is returned. A key feature is that semantically similar sentences (e.g., "I support you" and "Good luck") end up positioned closely in the vector space.
Speculation regarding the model used
Mr. Anno mentioned the following in the video:
- "I was using an OpenAI Embedding model."
- "Converting to about 1500-dimensional vectors."
Based on these two points, it is highly likely that the model used was text-embedding-ada-002 (output: 1536 dimensions). At the time, it was the most common, cost-effective, and high-performance embedding model provided by OpenAI, which aligns perfectly with the "about 1500 dimensions" comment.
Note: If you were to attempt this now, the successor models text-embedding-3-small (output: 1536 dimensions) or text-embedding-3-large (output: 3072 dimensions) are recommended. They offer improved performance and lower costs, and can be swapped directly into the same pipeline.
Phase 2: Dimensionality Reduction
Since 1536 dimensions are incomprehensible to humans, dimensionality reduction is performed to project the data onto a 2D plane.
The most important aspect of this compression is preserving "semantic proximity" as "distance proximity." Similar opinions are placed near each other, while different opinions are placed further apart, resulting in clusters of opinions being visualized like a map.
UMAP vs. t-SNE: Which was used?
Although Mr. Anno did not explicitly mention the algorithm name, the standard choices for this use case are the following two:
| Item | UMAP | t-SNE |
|---|---|---|
| Processing Speed | Fast (good for large-scale data) | Relatively slow |
| Global Structure Retention | Strong (maintains overall distribution shape) | Weak (specializes in local structure) |
| Parameter Tuning | Relatively easy | Requires delicate adjustment |
| Suitability for ~10,000 items | ◎ | ○ |
Given that they were handling over 10,000 pieces of data and aiming to grasp the big picture, it is strongly inferred that UMAP was adopted. This aligns with Mr. Anno's visual description of "continuously drawing maps."
Phase 3: Visualization
By plotting the 2D coordinates obtained through dimensionality reduction as a scatter plot, the big picture of public opinion is displayed like a "map."
This allows for visual comprehension of the big picture—understanding that "there are many supportive opinions around here" or "criticism of policy is concentrated here"—without having to read all 10,000 comments individually.
Summary of the Entire Processing Flow
Additional Practical Notes
Required Environment/Tools
| Usage | Example Tools |
|---|---|
| Comment Collection | YouTube Data API v3 |
| Vectorization | OpenAI Embedding API (text-embedding-ada-002 or successor text-embedding-3-small/large) |
| Dimensionality Reduction | Python umap-learn library / scikit-learn t-SNE |
| Visualization | matplotlib, plotly, streamlit, etc. |
Cost Estimation (Reference)
- OpenAI Embedding API: For about 10,000 items, it costs around tens to hundreds of yen.
- Dimensionality Reduction/Visualization: Executable on a local PC (GPU not required).
Conclusion
What Mr. Anno achieved with this mechanism was not just data analysis, but a new approach to democracy: "structurally understanding vast amounts of public opinion." While the technology stack—embedding, dimensionality reduction, and visualization—is established, the significance lies in its practical application within the context of an election.
Technically, this pipeline is relatively easy to replicate, so it should be applicable to various scenarios, not limited to YouTube comments, such as free-form survey responses or social media post analysis.
Discussion