iTranslated by AI
Humans Have Become the Rate-Limiting Step in Research and Verification
The Rate-Limiting Factor in Research and Validation has Become "Humans"
💡 Written in collaboration with Generative AI.
Introduction
I stopped opening Jupyter Notebooks.
This is a change I noticed recently. Previously, analysis meant Notebooks. Running cells one by one, checking results, and trial and error. That was the norm.
Now, if I tell the AI, "validate this hypothesis with this data," a script is generated, executed, and the results are summarized. The time spent clicking through cells in a Notebook is gone.
The iteration for research, validation, and analysis has become lightning-fast. And yet, the overall speed of research hasn't increased as much as I expected.
That's because the rate-limiting step has shifted from "experimentation" to "humans."
Which Parts of Research Got Faster?
I'll describe this following the actual research flow.
1. Investigation
When I say "I want to do something like this," it provides ideas.
When I say "I need some papers to make this happen," it researches and summarizes them.
If I upload a paper PDF, it gets summarized. Organizing related research has gone from 2 hours to 10 minutes (even here, humans are the bottleneck because it takes time to understand it afterward).
2. Prototyping Code
Saying "Implement this algorithm in Python" produces working code. If an error occurs, pasting the error message fixes it. Of course, it creates tests together with the code.
3. Analysis and Evaluation
Results come out with "Perform a significance test on this data" or "Do a regression analysis and interpret the coefficients."
Preprocessing also finishes with "Handle missing values appropriately" or "Detect outliers."
Evaluating LLM output can also be automated using LLM-as-a-judge (a method where an LLM evaluates another LLM). Previously, I used MLflow for experiment management, but now AI even builds the evaluation pipelines for me.
4. Visualization
Saying "Turn this result into a graph of a quality suitable for a paper" produces it. Adjusting colors, fonts, and legends is communicated with "Make it a bit simpler." The time spent reading matplotlib documentation is gone.
Beyond just analysis, it easily creates UIs and mocks with Streamlit. It has become easier to present.
5. Discussion
Asking "Consider what can be said from these results from three perspectives" produces an initial draft.
It can be used as a sounding board to compare against one's own ideas.
6. Documentation
By throwing in the analysis results and saying "Summarize this in the format of a report," the draft is complete.
The time spent formatting has vanished.
The time spent clicking through cells in a Notebook has vanished.
The "Hypothesis → Implementation → Validation" loop can now cycle many times a day. Previously, doing it once or twice was the limit.
The time spent on manual tasks has approached zero.
What is Becoming the Rate-Limiting Factor?
1. Time for "Explaining while Showing"
Creating a mock with Streamlit takes an instant. A dashboard for a demo can also be completed in 30 minutes.
But it takes a week to set up the venue to say, "Please take a look at this."
Coordinating the schedules of stakeholders, booking a meeting room—finally, the demo can happen. Preparation takes 5 minutes; setting up the venue takes 5 days.
AI can build the UI. However, only humans have the "time to look at it."
2. Domain Expert Time
"How should we interpret this result from a business perspective?"
Without asking someone who knows the field, the meaning of the analysis remains unclear. But that person is busy. It takes a week to schedule a 30-minute meeting.
AI can output p-values. But only humans can answer the question, "So what?"
3. Explanation Cost of Using AI
I show an analysis using LLMs. I also evaluated it using LLM-as-a-judge. The results are good.
"So, can we trust this result?"
"Is an LLM evaluating an LLM? Is that okay?"
"Does it match human evaluations?"
"What about hallucinations?"
I traced it with Langfuse and visualized the prompts. Still, a meeting is needed to explain all that.
AI can give answers. But it is humans who explain "whether we can trust that answer." The newer the technology, the higher the explanation cost.
4. Consensus Building on Results
Analysis results are out. Reports are made.
"Setting up a meeting to discuss how to interpret these results." Aligning the schedules of six participants. It ends up being two weeks away.
In the meantime, the next action comes to a halt.
5. My Own Understanding Can't Keep Up
AI summarized ten papers for me. Organizing related research is done.
But I still need time to understand it all.
What has "sped up" is the task, not the understanding. Human cognitive speed has not changed.
6. Waiting for Reviews and Approvals
Designed an analysis approach. "Is it okay to proceed in this direction?"
Waiting for the review. Coordinating the schedules of stakeholders. Since meetings are held once a week, it's a one-week wait at the very least.
AI can create a design document in 5 minutes. Approval takes 5 days.
Why Does This Happen?
The bottleneck of research has changed.
Traditional: Analysis/Implementation takes time -> Waiting time isn't a concern
Current: Analysis/Implementation is instantaneous -> Waiting time is almost everything
Previously, a Slack reply would arrive while I was "writing analysis code." Now, ten analyses are finished while "waiting for a reply."
The time spent running cells in a Notebook was actually a "buffer for waiting time." Now that it's gone, the wait is fully exposed.
It's not that the waiting time has become more noticeable. It's that there is nothing left but waiting time.
Why It's Especially Tough for Researchers and Data Scientists
I think engineers are in the same boat. However, researchers and data scientists face additional hardships.
High Uncertainty
Engineers often have a clear idea of "what to build." In research, you don't know "what you will discover."
Because of this, it's difficult to take the "just move forward and pivot if it fails" approach. You want to reach a consensus. However, the time it takes to reach that consensus becomes the bottleneck.
Being a "Connector"
Domain knowledge is held by those on the front lines. Decision-making authority lies with management. You are the one who must explain the technology.
Data scientists are "connectors." The more parties you have to connect, the more waiting time increases. There are simply more stakeholders involved than in software engineering.
Tendency to Use New Technologies
LLM-as-a-judge, RAG, agents. Trying out new technologies is part of the job.
But the newer the tech, the more often you're asked, "Is that reliable?" Even as the work itself has sped up, that extra time is now consumed by explanations.
Ideas for Solutions (Still Exploring)
To be honest, I haven't found the answer yet.
The "Proceed Before Asking" Style
Instead of "Is this okay?", try "I'm moving forward with this. Please stop me if there are any issues."
Shifting from prior approval to post-reporting. It increases risk, but it also increases speed.
Enriching Asynchronous Documentation
Instead of explaining in meetings, share documents in advance. Creating documents is fast if you use AI.
"Please read this in advance. If you have any questions, we can discuss them in the meeting."
Recording and sharing Streamlit demos can also reduce the need for synchronous sessions.
Splitting Consensus Building
Instead of "deciding everything before proceeding," try "agreeing only on the direction first, then figuring out the details on the go."
Accept uncertainty. Move forward on the premise that you can backtrack if you make a mistake.
Self-Checking with AI Before Human Review
Analysis approaches, statistical methods, and validity of interpretations.
Use AI as a sounding board before showing it to humans to eliminate obvious flaws. This reduces the review load on humans.
Building a "Track Record" of AI Utilization
The strongest response to "Can we trust AI results?" is "It was right last time too."
Build up small successes to lower explanation costs. Trust takes time.
Is This a "Problem" That Needs to Be Solved?
Let's pause for a moment.
Is it truly a bad thing for humans to be the rate-limiting factor?
- Confirmation by domain experts prevents off-target analysis.
- Consensus building reduces the risk of results not being utilized.
- Skepticism toward AI results prevents hallucinations.
There are values protected by being "slow."
If everything is cycled at high speed by AI, we will only mass-produce "something" that is fast but no one reads or looks at.
Even if we graduate from Notebooks, we cannot graduate from being human.
Things I'd Like to Ask
If there are people in the same situation, I'd love to hear from you.
- How do you deal with this bottleneck?
- Do you have any tips for reducing the "wait"?
- Conversely, are there points where you think "this must involve a human"?
I would appreciate it if you could let me know in the comments.
Discussion