iTranslated by AI
Classifying Machine Learning Roles: My Personal Observations
This article is the 6th day of the SmartHR Advent Calendar 2025 Series 2🎄
Introduction
I joined SmartHR this October as the third ML Engineer. While SmartHR itself has been around for over 10 years, it seems that the company has only recently begun to focus heavily on "AI feature development." I suspect this is because AI development—which targets long-term returns through massive investments of cost and time—didn't quite match the sense of speed typical of young SaaS companies. Perhaps because of this, people who have been at SmartHR for a long time sometimes ask me questions like, "What exactly does an ML Engineer do? Is it different from a Data Scientist?"
Therefore, based on my "entirely personal scope of observation," I have classified the various roles in the machine learning field 💡

List of roles in the machine learning field based on my "entirely personal scope of observation"
1. Data Scientist
This job title is extremely diverse, and even with the same name, the orientations can be completely different. Therefore, I've divided them into three major categories this time.
1-1. Research Data Scientist
[Main Active Venues]
Academic conferences (NLP, JSAI, etc.), university labs, R&D departments of large corporations
[Main Orientations]
Instead of just using existing models, they are passionate about designing new architectures and aiming for SoTA (State-of-the-Art: the highest performance at the present time).
Their greatest strength lies in "not being swayed by superficial numbers." They can rigorously evaluate not just leaderboard scores, but also why a model works and whether it truly possesses generalizability, using logical and mathematical rigor. In discussions, they seek the truth rather than rushing to easy conclusions. This attitude helps mitigate project failure risks and increases the probability of success.
Their reward system often lies in "theoretical novelty" or "correctness." As a result, their rhythm may occasionally clash with approaches like "let's just get it working to increase sales." The tendency for discussions to become academic rather than starting with the conclusion is also a characteristic born of their integrity.

Research Data Scientist
1-2. Analytical Data Scientist
[Main Active Venues]
Marketing departments, corporate planning offices
[Main Orientations]
They do not necessarily insist on building "machine learning models." Their specialties are aggregation using SQL and Pandas, and statistical approaches including causal inference.
The most prominent characteristics of individuals in this role are high EDA (Exploratory Data Analysis) skills and business sensitivity. They excel at translating "what the data says" into terms the management team can understand and navigating decision-making.
Since their goal is often "decision-making support through reports," they tend not to dive too deep into the "engineering" realm of system automation and persistence. Some specialize in quickly providing insights rather than implementing code into the product.

Analytical Data Scientist
1-3. Model Development Data Scientist
[Main Active Venues]
Kaggle, development departments of large corporations
[Main Orientations]
The difference from the analytical type is that the goal is the "high-precision model itself" rather than a "report." While they do perform data analysis, it is positioned primarily as "preprocessing" for the sake of building the model.
They handle a wide range of technologies from Deep Learning (Vision, NLP) to tabular data models like LightGBM, and possess the skills to push accuracy to its limits within given requirements. This group includes a broad spectrum from veterans who have mastered traditional machine learning techniques to young talents utilizing the latest technology.
While they care about "business impact" and "operational costs," they are particularly passionate about pursuing "Accuracy." While they are unbeatable within the laboratory setting of a Jupyter Notebook, they have room for growth in general software development practices (such as CI/CD and API design) when it comes to integrating models into production systems.

Model Development Data Scientist
2. ML Engineer
[Main Active Venues]
Development departments of large corporations
[Main Orientations]
I think many people in this role are those who were previously "Model Development Data Scientists" and adapted to the rough seas of the real world (production environments).
Their role is to implement delicate models created by craftsmen as robust systems that run 24/7 (MLOps). They take experimental code and elevate it to production-grade code, considering scalability, inference speed, and fault tolerance. They are bilingual, speaking the languages of both machine learning and infrastructure engineering.
Compared to those coming from a Web engineering background, their adaptation to modern DevOps culture—such as CI/CD and TDD (Test-Driven Development)—is sometimes still a work in progress. Also, because their scope of responsibility is so broad, they tend to become "broad but shallow" generalists, and some struggle with the difficulty of dedicating time to deep dives into specific areas.

ML Engineer
3. LLM Engineer
[Main Active Venues]
Active in a wide range of companies, from startups to large corporations
[Main Orientations]
This is a role that emerged after the debut of ChatGPT. These individuals overturn the traditional common sense of machine learning. They skip the long process of "collecting and training data" and instead use APIs and prompt engineering to prototype RAG (Retrieval-Augmented Generation) and AI agents at an incredible speed.
Since deep mathematical knowledge of machine learning is not strictly mandatory, many transition from Web server-side or infrastructure engineering backgrounds, making this a very diverse group.
I have the impression that many prioritize speed and use the extremely convenient hammer of LLMs first, rather than opting for time-consuming traditional machine learning or rule-based approaches.

LLM Engineer
4. Data Engineer
[Main Active Venues]
Development departments of large corporations
[Main Orientations]
They are an extremely vital presence that prepares the foundation for all the previously mentioned roles to operate. In fact, I believe this is the talent that many companies currently need the most, albeit latently.
They are experts in data integration technologies and sometimes manage everything up to cloud governance and security. Thanks to them, other roles can focus on analysis and model development without getting bogged down in the mud. Another characteristic is their high level of professional pride as "unsung heroes," happy to see the team win even if they aren't in the spotlight themselves.
Because they dedicate their full effort to maintaining the reliability and quality of the infrastructure, there is often a clear division of labor where they do not dive deep into the internals of machine learning models or the analysis algorithms themselves.

Data Engineer
Summary
The roles can be summarized by needs as follows:
- If you want to explore unknown truths → Research Data Scientist
- If you want a compass for management → Analytical Data Scientist
- If you want to develop high-precision models → Model Development Data Scientist
- If you want to integrate models into products → ML Engineer
- If you want to produce results quickly using LLMs → LLM Engineer
- If you want to solidify the foundation for the above roles to succeed → Data Engineer
Discussion