iTranslated by AI
Data Modeling - Part 3: Proposals for the Next Generation (Part 1)
📘 Part 3: Proposals for the Next Generation (Part 1) — Reconstructing Data Modeling
In Chapter 8, we address three issues related to structure (Structure) among the ER diagram challenges revealed in Part 2.
- Organizing the scope that ER diagrams can handle and the scope that should be handled through abstraction.
- Layering into Conceptual, Logical, and Physical, and redefining the ER diagram as the “Logical Layer.”
- Moving away from the situation where everything is forced into the ER diagram, and separating the “Scope of the ER Diagram” from the “Scope of DB-Specific Design.”
All of these involve “Redesigning the Model Structure”—determining which layer handles the structure, what the responsibilities of the ER diagram are, and where to delegate to another layer. We will clearly define the three layers (Abstract, Logical, Physical) that handle “Structure.”
Regarding point 4, which has a different nature from structure:
4. Running structure and intent (Intent) side-by-side, assuming an operation where intent naturally remains.
This will be handled in Part 4 (Chapter 9).
✨ Chapter 8: CLM — Redefinition of Layered Structure
CLM (Canonical Layered Modeling) is a new modeling-based design system for consistently describing entity relationships, attributes, constraints, and reasons by organizing the four layers: Abstract, Logical, Physical, and Intent.
It is positioned as a mechanism to organize the defensive scope of ER diagrams and to systematize the bridge from Abstraction → Structure → Implementation.
In next-generation modeling, it is essential to clearly separate the three layers of Abstract Layer (Concept/Semantics), Logical Layer (Structure), and Physical Layer (Instantiation/SQL) and define their roles and transition rules.
Figure 8.0 Process Diagram
Through this, consistency of “Semantics → Structure → SQL” is maintained, making the data model resilient to change and creating a system where misunderstandings are less likely to occur.
🔵 8.1 【Abstract Layer (Conceptual Layer)】— Handling Semantics
The Abstract Layer is the stage where the real world is organized as meaning (concepts) before being converted into data.
A. Reality → Abstraction: Define it even if it’s not used
- Identification of Entities
- Consideration of the existence of Attributes (at the “meaning” level)
- Definition of relationships between entities (Relation)
- Semantics of business rules
- Judgment of “Persistent or Derived” (Consideration of whether to make it a View)
- Definition of Boundaries
Note: Column names, types, constraints, etc., are not determined.
🔵 8.2 【Logical Layer】— Handling Data Structure
The Logical Layer is the stage where only those elements deemed necessary in the Abstract Layer are realized into a structure that can be handled as data.
B. Abstraction → Logical: Selection of attributes is completed within Abstraction
- Only attributes judged as "necessary" in the Abstract Layer proceed to the Logical Layer.
- Re-evaluating necessity is not done in the Logical Layer.
- Policies for persistence/derivation are also decided in the Abstract Layer.
- Guarantees that the Abstract Layer acts as the "entry point for structuring."
✔ What the Logical Layer handles:
- Columnization (Finalizing names)
- Logical types (TEXT / NUMBER / DATE, etc.)
- Primary keys and Foreign keys
- NOT NULL / UNIQUE / CHECK (Logical constraints)
- Materialization of normalization/denormalization
- Abstract indexes (at the necessity level)
- Structure of Views / Generated columns (Structural definitions, not SELECT statements)
🔴 8.3 【Physical Layer】— Handling SQL Parameters
The Physical Layer is the stage where the logical structure is instantiated as SQL for a specific DB.
C. Logical → Physical: Specialize in Detailization of Logical Structure
-
Only the structure from the Logical Layer is instantiated in the Physical Layer.
-
The Physical Layer must not uniquely add columns.
-
The Physical Layer specializes in determining SQL parameters.
- Index types
- Partitioning methods
- RLS (Row-Level Security)
- Storage parameters
✔ What the Physical Layer handles
- Index types (B-tree / GIN / GiST / BRIN / Partial / Expression)
- Partitioning (Range / List / Hash)
- Materialized View (Physicalization)
- Generated columns STORED / VIRTUAL
- Tablespaces / Storage settings
- RLS (Row-Level Security)
- Role / Grant (Permissions)
- Tenant isolation methods (RLS / Schema / Column)
- Sharding (Shard keys)
- Extensions (PostGIS, etc.)
✨ 8.4 Modeling Notation — Differentiation through Canonical Relationship Graphs and Views
In next-generation modeling, the Abstract Layer and Logical Layer are centrally managed as the same Canonical Relationship Graph.
The Canonical Relationship Graph is the "reference point to prevent semantics and structure from fluctuating," and serves as the unique master copy that all views refer to.
On top of that, the representation is switched between the Abstract Layer as a "view emphasizing semantics" and the Logical Layer as a "view emphasizing structure."
Figure 8.5 Canonical Relationship Graph (Canonical Graph)
✅ Role of the Canonical Relationship Graph (Canonical Relationship Graph)
The Canonical Relationship Graph is the sole "truth" in three-layer modeling and serves as the common foundation for Abstract Views and Logical Views.
The Canonical Relationship Graph maintains only the following skeleton:
- Definition of Entities
- Relationships (Relation) and Cardinality (1..n / n..n) between entities
- Definition of Boundaries (Boundary / Context)
- Distinction between persistent and derived (persistent / derived)
- Existence of Attributes (at the semantic level)
Note: Column names, types, constraints, and SQL syntax are not included in the Canonical Graph.
✅ Abstract View
An Abstract View displays the Canonical Relationship Graph as a "diagram for understanding and agreeing upon the Meaning (Concept)."
- Shows Entities as conceptual entities
- Shows Attributes by existence and meaning only
- Shows Relations as semantic associations
- Notation can be lightweight; detailed line types or physical elements are not used
The purpose of the Abstract View is to reach an agreement on which elements of reality were abstracted as meanings and how the boundaries were cut.
✅ Logical View
A Logical View displays the Canonical Relationship Graph as a "diagram for realizing the data structure (Structure)."
- Determines Tables / Views according to the persistent / derived distinction
- Columnizes and shows only those attributes deemed necessary in the Abstract Layer
- Realizes Relations as FK structures and expands many-to-many relationships into bridge tables
- ER diagram notation (Crow’s Foot, etc.) may be used
The purpose of the Logical View is to present the results of converting meanings into structures in a reviewable format.
✅ Handling the Physical Layer (SQL as the Source of Truth, not Diagrams)
The Physical Layer is the stage where the Logical View is instantiated into specific DB entities (SQL / DDL). In principle, there are no diagrams for the Physical Layer.
The master copy of the Physical Layer is the SQL itself.
- Index types
- Partitioning methods
- RLS / Role / Grant
- Materialized View
- Generated columns (STORED / VIRTUAL)
- DB-specific designs such as storage/extensions
This is because DDL is more accurate than diagrams for these elements, and forcing them into a diagram would cause it to lose its value as a common language.
✨ 8.5 Next-Generation Modeling Process — From Upstream to Downstream, and Always Returning
In next-generation modeling, the fundamental approach is to proceed with design in the order of Abstract → Logical → Physical. This sequence is not merely a convention; it is a prerequisite for ensuring the consistency and change resilience required in modern data design.
Here, we organize the steps for proceeding with three-layer modeling.
8.5.1 The Grand Principle of the Process: Proceed from Upstream to Downstream
The grand principle that must be clarified first can be summarized in this one sentence:
Modeling must always proceed from upstream to downstream, in the order of Abstract → Logical → Physical.
Define "meaning" upstream, realize it into "structure" in the middle stream, and finalize it as "entity (SQL)" downstream.
If this order becomes ambiguous, the logical and physical layers become the de facto "design source," leaving the meaning (abstraction) behind. Many of the issues seen in Part 2 were the result of this order breaking down.
8.5.2 Abstract Layer: Organizing Meaning
In the Abstract Layer, actual business operations are structured as meaning (Concept) before being converted into data.
Things to be decided here are as follows:
- Definition of Entities
- Enumeration of Attributes (at the semantic level)
- Necessity/Unnecessity of attributes
- Persistent or Derived (whether it becomes a View)
- Relationships between entities (Relation)
- Semantics of business rules
- Definition of Boundaries
It is important in the Abstract Layer to "not decide on data types or constraints." This is because this is the stage for defining "what to handle," not "how to handle it."
8.5.3 Logical Layer: Finalizing Structure
The Logical Layer is the stage where elements deemed "necessary" in the Abstract Layer are realized into a structure that can actually be handled as data.
The main tasks performed in the Logical Layer are as follows:
- Columnization (Finalizing names)
- Determination of logical types (TEXT/NUMBER/DATE, etc.)
- Setting Primary keys and Foreign keys
- Logical constraints such as NOT NULL/UNIQUE/CHECK
- Structuring Tables/Views based on the persistent/derived decisions
- Normalization/Denormalization
- Judgment of abstract indexes (at the necessity level)
The important point here is as follows:
In the Logical Layer, the "meaning" decided in the Abstract Layer is directly reflected in the structure. Do not re-evaluate necessity in the Logical Layer.
By using the Abstract Layer as the "entry point," the consistency of the structure is maintained.
8.5.4 Physical Layer: Instantiation into SQL
The Physical Layer is the stage where the logical structure is implemented for a specific DB product as SQL (DDL).
The targets to be handled are as follows:
- Index types (B-tree/GIN/GiST/BRIN, etc.)
- Partitioning methods
- Generated columns (STORED/VIRTUAL)
- Materialized Views
- Tablespaces/Storage parameters
- RLS (Row-Level Security)
- Role / Grant (Permissions)
- Tenant isolation methods (RLS/Schema/Column)
- Sharding
- Extensions (PostGIS, etc.)
The Physical Layer is "not a layer where structure may be changed."
The Physical Layer is a place to specify the structure of the Logical Layer into SQL parameters, not a place to create a new structure.
8.5.5 Always Feedback Changes Found in Later Stages Upstream
Changes will occur. In fact, in modern systems, change is a given.
However, a design that only modifies the downstream layers when a change occurs and does not feedback to the upstream is a recipe for future collapse.
Problems found in later stages must always be fed back upstream. This is to maintain the consistency of Abstract → Logical → Physical.
It is important to note that the act of returning is not "rework" but a fundamental action to maintain consistency.
✨ 8.6 Handling DBMS-Specific Features — Separation of "Meaning" and "Implementation"
DBMSs have many implementation-dependent features such as Sequence, Identity, Partition, and RLS. Mixing these into the upstream of modeling significantly compromises the reusability and change resilience of the model.
CLM adopts the following principles:
1. All specific features are handled in the Physical Layer and are not brought into the Abstract or Logical layers.
The Logical Layer only handles "requirements as meaning," while the implementation method is delegated to each DBMS.
2. "Constraints as meaning" are recorded as Intent in the Logical Layer.
Primary key generation methods, continuity of numbering, presence or absence of branch numbers—these are organized as meaning, not implementation, and remain in the Logical Layer.
3. "Implementation quirks" are recorded as Intent in the Physical Layer.
Sequence gaps, Identity behavior, RLS performance impact, and other product-dependent properties are handled in the Physical Layer.
Note that specific guidelines for specific features are detailed in the "CLM Specification."
📘 Appendix A: CLM / ER / ANSI Three-Layer Comparison (Revised Edition: Correctly Understanding ANSI)
🆚 1. CLM / ER / ANSI — Comparison of Modeling Framework Positioning
| Aspect | CLM (Canonical Layered Modeling) | ER Modeling | ANSI/SPARC Three-Level Architecture |
|---|---|---|---|
| Positioning | The "Practical Design Framework" central to this book | Modeling method to represent data structures diagrammatically | "Theoretical Architecture" for ensuring data independence (Not a modeling method) |
| Purpose | Ensuring consistency and change resilience from Semantics → Structure → SQL | Organizing and visualizing data structures | Principle of maintaining data independence through External, Conceptual, and Internal levels |
| Defined Content | Layer boundaries, responsibilities, change rules, view systems | Notation (Chen notation, Crow's Foot) | Layer concepts only (External/Conceptual/Internal) |
| Diagrams | ✔ Abstract/Logical views + Canonical Relationship Graph | ✔ ER Diagram | ❌ Notation is not prescribed |
| Meaning of "Conceptual Level" | Organizing meaning and judging necessity of attributes | Often mixed with the Logical level in practice | Cross-application common concepts (Abstract) |
| Meaning of "Logical Level" | Structuring decisions from the Abstract Layer | Structure-centric | Intermediate between ANSI's Conceptual and Internal levels (Not clearly defined) |
| Meaning of "Physical Level" | Implementation such as SQL, storage, and partitioning | Sometimes treated as a derivative of ERD | DB-specific internal structure |
| Applicable Domain | Supports RDB, NoSQL, and Event-Driven | RDB-centric | General database conceptual model |
| Essence | "Reconstructed Modeling Framework" for practical use | Structural representation | Architecture principle (Not modeling) |
🆚 2. Comparison by Layer (Difference in "What to Handle")
(1) Comparison of Meaning (Abstract) Layer
| Aspect | CLM: Abstract Layer | ER: Conceptual Model | ANSI: Conceptual Schema |
|---|---|---|---|
| Handling of Attributes | Enumeration of existence + Necessity judgment | Mixed use of writing/not writing attributes depending on the site | Does not handle attributes (Pure semantic layer) |
| Relationships | Relation as meaning | Same as ER | Conceptual associations but no notation rules |
| Persistence/Derivation | Determined here | Often mixed | No specific rules |
| Modeling Granularity | Directly linked to practice | Depends on the practice | Highly abstract and difficult to apply to practice |
👉 CLM clearly defines the "scope of decision-making in the Abstract Layer."
In ER, conceptual and logical aspects tend to be mixed, and ANSI is too abstract to be easily adopted in practice.
(2) Comparison of Structure (Logical) Layer
| Aspect | CLM: Logical Layer | ER: Logical Layer | ANSI: Mix of External/Conceptual |
|---|---|---|---|
| Columns | Only elements deemed "necessary" in Abstract are realized | Depends on the site | No specific rules |
| Constraints | Handles up to logical constraints | Same | No specific rules |
| View/Generated Col | Structures persistence/derivation decisions | Notation varies by site | No specific rules |
👉 CLM has strong consistency in that it "flows Abstract decisions directly into Structure."
(3) Comparison of Substance (Physical) Layer
| Aspect | CLM: Physical Layer | ER: Physical | ANSI: Internal Schema |
|---|---|---|---|
| Role | SQL is the master copy, no diagrams | Many sites force everything into diagrams | Theoretical role of physical structure |
| Content | Index / Partition / RLS / Grant | Some elements may be included in diagrams | Low-level implementation elements (but no notation rules) |
👉 CLM is clear that "SQL is the master copy, not the diagram."
ER tends to cram everything into diagrams, leading to confusion.
🧩 Appendix B: CLM Internal Specifications (Layered Inclusion Matrix)
| Layer | CLM: Abstract Layer (Concept) | CLM: Logical Layer (Structure) | CLM: Physical Layer (SQL) |
|---|---|---|---|
| Target | Meaning / Concept | Data Structure | SQL Instance |
| Entity | ✔ Define | ✔ Finalize as Table | Generate with DDL |
| Attribute | ✔ Enumerate existence only ✔ Judgment of necessity |
✔ Finalize column names ✔ Logical types/constraints |
Types, constraints, Storage definitions |
| Relation | ✔ Semantic association | ✔ Realize as FK/Bridge table | Implementation of constraints (FK/Index) |
| Rule (Business Rule) | ✔ Define at semantic level | Partially reflected in structure | Check, Trigger, RLS, etc. |
| Persistence/Derivation | ✔ Decided in Abstract Layer | ✔ Branch into Table / View | Implementation of View / MView |
| View/Diagram Notation | Abstract View | Logical View (ER Diagram) | No diagram (SQL is the master) |
| Corresponding ANSI Layer | Close to ANSI Conceptual but clear for practice | Intermediate between ANSI Conceptual and Internal | ANSI Internal Layer |
| Correspondence in ER Modeling | Equivalent to Conceptual Model | Equivalent to Logical Model | Equivalent to Physical Model |
In the next chapter, we will organize how to preserve the "reasons for judgment = Intent" that supports the structure.
To be continued in the second part
Discussion