iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤖

30 Core Concepts Every AI Engineer Should Master

に公開

From tokens to MCP, from RAG to vector databases—a quick reference to rapidly build your AI engineering knowledge framework.

Introduction

If you start as an AI engineer now, your screen will be flooded with new terminology:

"Use RAG to augment LLMs, store in vector databases with Embeddings, perform tool calls via MCP, and optimize output with Few-shot Prompting..."

It sounds like an alien language.

However, these concepts are not actually difficult. What is difficult is that there is no good way to organize them, and their relationships remain unclear.

This article serves as a knowledge map to help you build a complete AI engineering knowledge framework.


Layer 1: Fundamental Concepts (Required)

1. Token

Definition: The smallest unit of text processing, which can be understood as a "word block."

Examples:

  • "Hello World" → ["Hello", " World"] (2 tokens)
  • "你好世界" → ["你好", "世界"] (2 tokens; generally, 1 character = 1 token for Chinese)

Importance:

  • Models charge by tokens (not character count)
  • Models have a token limit (e.g., GPT-4 supports 128k tokens)
  • Optimizing token usage = reducing costs

Tools:


2. Context / Context Window

Definition: The amount of content a model can "remember" at once.

Analogy: Your "short-term memory" when chatting with a friend. If the conversation gets too long, you forget what was said earlier; models are the same.

Examples:

  • GPT-4 Turbo: 128k tokens (approx. 100,000 Chinese characters)
  • Claude 3: 200k tokens
  • Gemini 1.5 Pro: 1M tokens

What happens when you run out of Context?

  • Initial content gets "forgotten"
  • You need to expand memory using RAG or summarization techniques

3. Prompt / Prompt Engineering

Definition: Instructions and input provided to the AI.

Advanced Techniques:

Few-shot Prompting

Providing examples to the model to help it learn patterns:

Example:
Input: "The weather is nice today" → Sentiment: Positive
Input: "This product is the worst" → Sentiment: Negative

Analyze now: "This feature is okay" → Sentiment: ?

Chain-of-Thought (CoT)

Making the model "think slowly":

Problem: Taro has 5 apples, eats 2, and buys 3. How many does he have now?

Answer (Standard): 6

Answer (CoT):
1. Initial: 5
2. Ate 2: 5 - 2 = 3
3. Bought 3: 3 + 3 = 6
Final answer: 6

System Prompt

Defining the model's "persona" and behavioral rules:

System: You are a strict legal assistant. You must cite legal articles in your responses.
User: How should I handle a breach of contract?

4. Temperature / Top-p

Definition: Controlling the "randomness" of the model's output.

Temperature:

  • 0.0: Fully deterministic, same output every time (suitable for code generation)
  • 1.0: Standard randomness (suitable for chat)
  • 2.0: High creativity (suitable for poetry, brainstorming)

Top-p (Nucleus Sampling):

  • 0.1: Only consider the top 10% of choices by probability (conservative)
  • 0.9: Consider choices with a cumulative probability of 90% (balanced)

Experience:

  • Code: temperature=0, top_p=0.1
  • Articles: temperature=0.7, top_p=0.9
  • Creative work: temperature=1.5, top_p=0.95

5. Embedding / Vectorization

Definition: Converting text into numerical vectors (sequences of floating-point numbers).

Necessity:

  • Computers cannot understand the similarity between "cat" and "dog"
  • But the distance between vectors [0.2, 0.8, ...] and [0.3, 0.7, ...] can be calculated

Example:

from openai import OpenAI
client = OpenAI()

text = "Artificial intelligence changes the world"
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=text
)
# Return: [0.023, -0.45, 0.78, ..., 0.12] (1536 dimensions)

Applications:

  • Semantic search
  • Recommendation systems
  • Foundation of RAG

Layer 2: Architecture and Models

6. Transformer

Definition: The core architecture of modern LLMs (proposed by Google in 2017).

Core Innovation: Self-Attention mechanism

  • Traditional RNN: Processes character by character, slow, cannot remember long text
  • Transformer: Parallel processing, fast, capable of handling long-range dependencies

Components:

  • Encoder: Understanding input (e.g., BERT)
  • Decoder: Generating output (e.g., GPT)
  • Encoder-Decoder: Translation tasks (e.g., T5)

7. LLM (Large Language Model)

Definition: A language model with an ultra-large number of parameters.

Scale Comparison:

  • GPT-3: 175B parameters
  • GPT-4: Estimated 1.7T parameters
  • Llama 2: 7B / 13B / 70B parameters

10. RAG (Retrieval-Augmented Generation)

Definition: Retrieval-Augmented Generation = Retrieval + Generation.

Workflow:

  1. User question: "What is the context length of Llama 3?"
  2. Retrieval: Find relevant documents in the knowledge base
  3. Generation: Pass the document and the question together to the LLM to get an answer

Why RAG is necessary:

  • Model knowledge has an expiration date (GPT-4 is up to 2023)
  • Models have never seen corporate data
  • Fine-tuning is expensive, whereas RAG is more flexible

RAG vs Fine-tuning:

Scenario Use RAG Use Fine-tuning
Frequently updated data
Learn specific style
Cost-conscious
Need to cite sources

11. Vector Database

Definition: A database dedicated to storing and searching vectors.

Why not use a standard DB?:

  • Vector dimensions are high (768/1536/3072 dimensions)
  • Requires efficient "similarity search" (KNN/ANN)
  • Impossible with standard databases

Mainstream Products:

  • Pinecone: Managed service, easy
  • Weaviate: Open-source, supports hybrid search
  • Qdrant: Open-source, written in Rust, high performance
  • Milvus: Open-source, developed by Alibaba, large-scale
  • Chroma: Open-source, Python-native, lightweight
  • pgvector: PostgreSQL plugin

15. Function Calling / Tool Use

Definition: Allowing an LLM to call external tools (APIs, DBs, calculators, etc.).

Example:

from openai import OpenAI
client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is the weather in Tokyo?"}],
    tools=tools
)

Applications:

  • Real-time data lookup (stock prices, weather)
  • Executing actions (sending emails, booking tickets)
  • Accessing private data (internal databases)

16. MCP (Model Context Protocol)

Definition: A standardized tool-calling protocol proposed by Anthropic (released in November 2024).

Problem solved:

  • Previously, each AI application implemented tool calling independently
  • MCP standardizes this, making tools reusable across applications

Architecture:

AI App ←→ MCP Client ←→ MCP Server ←→ Tools/Data Sources

Significance:

  • Similar to the USB protocol: connect everything with one standard
  • Foundation for future AI tool ecosystems

Layer 3: Engineering Practice

18. Agent

Definition: AI that can autonomously reason, call tools, and complete tasks.

Workflow:

  1. Reasoning: Task analysis
  2. Acting: Calling tools
  3. Observing: Checking results
  4. Repeating: Until completion

Example:

Task: Book a flight from Tokyo to Osaka for tomorrow

Agent:
  Reasoning: Flight information is needed
  Acting: Call search_flights("Tokyo", "Osaka", "tomorrow")
  Observing: Found 3 flights
  Reasoning: User selection required
  Acting: Call ask_user("Which flight would you like?")
  Observing: User chose flight 2
  Acting: Call book_flight(flight_id=2)
  Observing: Booking successful
  Done!

19. Streaming Output

Definition: Returning results incrementally without waiting for the full generation to complete.

UX:

  • Non-streaming: Wait 10 seconds → See complete answer
  • Streaming: See the first character immediately and display incrementally

21. SDD (Specification-Driven Development)

Definition: Defining "specifications" (specs) first, then generating code.

Traditional development:

Requirements → Design → Code → Testing

SDD:

Requirements → Create Specs (test cases/constraints) → AI generation → Automated verification

Significance:

  • Development paradigm in the AI era
  • Humans focus on "what they want," AI is responsible for "how to do it"

30. Hallucination

Definition: When the model "confidently spews nonsense."

Example:

User: Introduce the book "Quantum Buddhism"
AI: This book was published by Taro Yamada in 2018...
(Actually, this book does not exist)

Causes:

  • Models "predict the next word," they don't "query a database"
  • Noise in training data

Mitigation methods:

  1. RAG: Provide authentic documents
  2. Lower Temperature: Reduce randomness
  3. Prompt constraints: Tell it to say "I don't know" if it doesn't know
  4. Citation: Require the model to cite its basis

How to learn these concepts?

1. Hierarchical Learning

Week 1: Token, Context, Prompt, Temperature (Play by writing some prompts)
Week 2: Embedding, RAG, Vector DB (Build a simple Q&A system)
Week 3: Function Calling, Agent (Make AI call a weather API)
Week 4: Fine-tuning, Quantization (Fine-tune a small model)

2. Practice

Project suggestions:

  • Week 1: Personal Knowledge Base Q&A (RAG)
  • Week 2: Multi-tool AI Assistant (Function Calling)
  • Week 3: Code Generator (Few-shot + CoT)
  • Week 4: Fine-tuned Customer Service Bot

Conclusion

These 30 concepts are the "Periodic Table" of AI engineering.

You don't need to learn everything at once. But knowing that they exist, their relationships, and where to use them will help you quickly identify issues when you encounter them.

Remember two things:

  1. Concepts are dead, applications are alive. Don't memorize definitions; understand the scenarios.
  2. AI technology iterates rapidly. Today's best practice might be obsolete in half a year. Keep learning and embrace change.

Choose one concept you are interested in and try it out now.

The best way to understand AI is to work with AI.


The original version of this article can be found here → hongqi-lgs.github.io

GitHubで編集を提案

Discussion