iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🪜

Data Modeling - Part 3: Proposals for the Next Generation (Part 1)

に公開

📘 Part 3: Proposals for the Next Generation (Part 1) — Reconstructing Data Modeling


In Chapter 8, we address three issues related to structure (Structure) among the ER diagram challenges revealed in Part 2.

  1. Organizing the scope that ER diagrams can handle and the scope that should be handled through abstraction.
  2. Layering into Conceptual, Logical, and Physical, and redefining the ER diagram as the “Logical Layer.”
  3. Moving away from the situation where everything is forced into the ER diagram, and separating the “Scope of the ER Diagram” from the “Scope of DB-Specific Design.”

All of these involve “Redesigning the Model Structure”—determining which layer handles the structure, what the responsibilities of the ER diagram are, and where to delegate to another layer. We will clearly define the three layers (Abstract, Logical, Physical) that handle “Structure.”

Regarding point 4, which has a different nature from structure:
4. Running structure and intent (Intent) side-by-side, assuming an operation where intent naturally remains.
This will be handled in Part 4 (Chapter 9).


Chapter 8: CLM — Redefinition of Layered Structure

CLM (Canonical Layered Modeling) is a new modeling-based design system for consistently describing entity relationships, attributes, constraints, and reasons by organizing the four layers: Abstract, Logical, Physical, and Intent.

It is positioned as a mechanism to organize the defensive scope of ER diagrams and to systematize the bridge from Abstraction → Structure → Implementation.

In next-generation modeling, it is essential to clearly separate the three layers of Abstract Layer (Concept/Semantics), Logical Layer (Structure), and Physical Layer (Instantiation/SQL) and define their roles and transition rules.

Figure 8.0 Process Diagram

Through this, consistency of “Semantics → Structure → SQL” is maintained, making the data model resilient to change and creating a system where misunderstandings are less likely to occur.


🔵 8.1 【Abstract Layer (Conceptual Layer)】— Handling Semantics

The Abstract Layer is the stage where the real world is organized as meaning (concepts) before being converted into data.

A. Reality → Abstraction: Define it even if it’s not used

  1. Identification of Entities
  2. Consideration of the existence of Attributes (at the “meaning” level)
  3. Definition of relationships between entities (Relation)
  4. Semantics of business rules
  5. Judgment of “Persistent or Derived” (Consideration of whether to make it a View)
  6. Definition of Boundaries

Note: Column names, types, constraints, etc., are not determined.


🔵 8.2 【Logical Layer】— Handling Data Structure

The Logical Layer is the stage where only those elements deemed necessary in the Abstract Layer are realized into a structure that can be handled as data.

B. Abstraction → Logical: Selection of attributes is completed within Abstraction

  • Only attributes judged as "necessary" in the Abstract Layer proceed to the Logical Layer.
  • Re-evaluating necessity is not done in the Logical Layer.
  • Policies for persistence/derivation are also decided in the Abstract Layer.
  • Guarantees that the Abstract Layer acts as the "entry point for structuring."

✔ What the Logical Layer handles:

  • Columnization (Finalizing names)
  • Logical types (TEXT / NUMBER / DATE, etc.)
  • Primary keys and Foreign keys
  • NOT NULL / UNIQUE / CHECK (Logical constraints)
  • Materialization of normalization/denormalization
  • Abstract indexes (at the necessity level)
  • Structure of Views / Generated columns (Structural definitions, not SELECT statements)

🔴 8.3 【Physical Layer】— Handling SQL Parameters

The Physical Layer is the stage where the logical structure is instantiated as SQL for a specific DB.

C. Logical → Physical: Specialize in Detailization of Logical Structure

  • Only the structure from the Logical Layer is instantiated in the Physical Layer.

  • The Physical Layer must not uniquely add columns.

  • The Physical Layer specializes in determining SQL parameters.

    • Index types
    • Partitioning methods
    • RLS (Row-Level Security)
    • Storage parameters

✔ What the Physical Layer handles

  • Index types (B-tree / GIN / GiST / BRIN / Partial / Expression)
  • Partitioning (Range / List / Hash)
  • Materialized View (Physicalization)
  • Generated columns STORED / VIRTUAL
  • Tablespaces / Storage settings
  • RLS (Row-Level Security)
  • Role / Grant (Permissions)
  • Tenant isolation methods (RLS / Schema / Column)
  • Sharding (Shard keys)
  • Extensions (PostGIS, etc.)

8.4 Modeling Notation — Differentiation through Canonical Relationship Graphs and Views

In next-generation modeling, the Abstract Layer and Logical Layer are centrally managed as the same Canonical Relationship Graph.

The Canonical Relationship Graph is the "reference point to prevent semantics and structure from fluctuating," and serves as the unique master copy that all views refer to.

On top of that, the representation is switched between the Abstract Layer as a "view emphasizing semantics" and the Logical Layer as a "view emphasizing structure."

Figure 8.5 Canonical Relationship Graph (Canonical Graph)


Role of the Canonical Relationship Graph (Canonical Relationship Graph)

The Canonical Relationship Graph is the sole "truth" in three-layer modeling and serves as the common foundation for Abstract Views and Logical Views.

The Canonical Relationship Graph maintains only the following skeleton:

  • Definition of Entities
  • Relationships (Relation) and Cardinality (1..n / n..n) between entities
  • Definition of Boundaries (Boundary / Context)
  • Distinction between persistent and derived (persistent / derived)
  • Existence of Attributes (at the semantic level)

Note: Column names, types, constraints, and SQL syntax are not included in the Canonical Graph.


Abstract View

An Abstract View displays the Canonical Relationship Graph as a "diagram for understanding and agreeing upon the Meaning (Concept)."

  • Shows Entities as conceptual entities
  • Shows Attributes by existence and meaning only
  • Shows Relations as semantic associations
  • Notation can be lightweight; detailed line types or physical elements are not used

The purpose of the Abstract View is to reach an agreement on which elements of reality were abstracted as meanings and how the boundaries were cut.


Logical View

A Logical View displays the Canonical Relationship Graph as a "diagram for realizing the data structure (Structure)."

  • Determines Tables / Views according to the persistent / derived distinction
  • Columnizes and shows only those attributes deemed necessary in the Abstract Layer
  • Realizes Relations as FK structures and expands many-to-many relationships into bridge tables
  • ER diagram notation (Crow’s Foot, etc.) may be used

The purpose of the Logical View is to present the results of converting meanings into structures in a reviewable format.


Handling the Physical Layer (SQL as the Source of Truth, not Diagrams)

The Physical Layer is the stage where the Logical View is instantiated into specific DB entities (SQL / DDL). In principle, there are no diagrams for the Physical Layer.

The master copy of the Physical Layer is the SQL itself.

  • Index types
  • Partitioning methods
  • RLS / Role / Grant
  • Materialized View
  • Generated columns (STORED / VIRTUAL)
  • DB-specific designs such as storage/extensions

This is because DDL is more accurate than diagrams for these elements, and forcing them into a diagram would cause it to lose its value as a common language.


8.5 Next-Generation Modeling Process — From Upstream to Downstream, and Always Returning

In next-generation modeling, the fundamental approach is to proceed with design in the order of Abstract → Logical → Physical. This sequence is not merely a convention; it is a prerequisite for ensuring the consistency and change resilience required in modern data design.

Here, we organize the steps for proceeding with three-layer modeling.


8.5.1 The Grand Principle of the Process: Proceed from Upstream to Downstream

The grand principle that must be clarified first can be summarized in this one sentence:

Modeling must always proceed from upstream to downstream, in the order of Abstract → Logical → Physical.

Define "meaning" upstream, realize it into "structure" in the middle stream, and finalize it as "entity (SQL)" downstream.

If this order becomes ambiguous, the logical and physical layers become the de facto "design source," leaving the meaning (abstraction) behind. Many of the issues seen in Part 2 were the result of this order breaking down.


8.5.2 Abstract Layer: Organizing Meaning

In the Abstract Layer, actual business operations are structured as meaning (Concept) before being converted into data.

Things to be decided here are as follows:

  • Definition of Entities
  • Enumeration of Attributes (at the semantic level)
  • Necessity/Unnecessity of attributes
  • Persistent or Derived (whether it becomes a View)
  • Relationships between entities (Relation)
  • Semantics of business rules
  • Definition of Boundaries

It is important in the Abstract Layer to "not decide on data types or constraints." This is because this is the stage for defining "what to handle," not "how to handle it."


8.5.3 Logical Layer: Finalizing Structure

The Logical Layer is the stage where elements deemed "necessary" in the Abstract Layer are realized into a structure that can actually be handled as data.

The main tasks performed in the Logical Layer are as follows:

  • Columnization (Finalizing names)
  • Determination of logical types (TEXT/NUMBER/DATE, etc.)
  • Setting Primary keys and Foreign keys
  • Logical constraints such as NOT NULL/UNIQUE/CHECK
  • Structuring Tables/Views based on the persistent/derived decisions
  • Normalization/Denormalization
  • Judgment of abstract indexes (at the necessity level)

The important point here is as follows:

In the Logical Layer, the "meaning" decided in the Abstract Layer is directly reflected in the structure. Do not re-evaluate necessity in the Logical Layer.

By using the Abstract Layer as the "entry point," the consistency of the structure is maintained.


8.5.4 Physical Layer: Instantiation into SQL

The Physical Layer is the stage where the logical structure is implemented for a specific DB product as SQL (DDL).

The targets to be handled are as follows:

  • Index types (B-tree/GIN/GiST/BRIN, etc.)
  • Partitioning methods
  • Generated columns (STORED/VIRTUAL)
  • Materialized Views
  • Tablespaces/Storage parameters
  • RLS (Row-Level Security)
  • Role / Grant (Permissions)
  • Tenant isolation methods (RLS/Schema/Column)
  • Sharding
  • Extensions (PostGIS, etc.)

The Physical Layer is "not a layer where structure may be changed."

The Physical Layer is a place to specify the structure of the Logical Layer into SQL parameters, not a place to create a new structure.


8.5.5 Always Feedback Changes Found in Later Stages Upstream

Changes will occur. In fact, in modern systems, change is a given.

However, a design that only modifies the downstream layers when a change occurs and does not feedback to the upstream is a recipe for future collapse.

Problems found in later stages must always be fed back upstream. This is to maintain the consistency of Abstract → Logical → Physical.

It is important to note that the act of returning is not "rework" but a fundamental action to maintain consistency.


✨ 8.6 Handling DBMS-Specific Features — Separation of "Meaning" and "Implementation"

DBMSs have many implementation-dependent features such as Sequence, Identity, Partition, and RLS. Mixing these into the upstream of modeling significantly compromises the reusability and change resilience of the model.

CLM adopts the following principles:

1. All specific features are handled in the Physical Layer and are not brought into the Abstract or Logical layers.
The Logical Layer only handles "requirements as meaning," while the implementation method is delegated to each DBMS.

2. "Constraints as meaning" are recorded as Intent in the Logical Layer.
Primary key generation methods, continuity of numbering, presence or absence of branch numbers—these are organized as meaning, not implementation, and remain in the Logical Layer.

3. "Implementation quirks" are recorded as Intent in the Physical Layer.
Sequence gaps, Identity behavior, RLS performance impact, and other product-dependent properties are handled in the Physical Layer.

Note that specific guidelines for specific features are detailed in the "CLM Specification."


📘 Appendix A: CLM / ER / ANSI Three-Layer Comparison (Revised Edition: Correctly Understanding ANSI)

🆚 1. CLM / ER / ANSI — Comparison of Modeling Framework Positioning

Aspect CLM (Canonical Layered Modeling) ER Modeling ANSI/SPARC Three-Level Architecture
Positioning The "Practical Design Framework" central to this book Modeling method to represent data structures diagrammatically "Theoretical Architecture" for ensuring data independence (Not a modeling method)
Purpose Ensuring consistency and change resilience from Semantics → Structure → SQL Organizing and visualizing data structures Principle of maintaining data independence through External, Conceptual, and Internal levels
Defined Content Layer boundaries, responsibilities, change rules, view systems Notation (Chen notation, Crow's Foot) Layer concepts only (External/Conceptual/Internal)
Diagrams ✔ Abstract/Logical views + Canonical Relationship Graph ✔ ER Diagram Notation is not prescribed
Meaning of "Conceptual Level" Organizing meaning and judging necessity of attributes Often mixed with the Logical level in practice Cross-application common concepts (Abstract)
Meaning of "Logical Level" Structuring decisions from the Abstract Layer Structure-centric Intermediate between ANSI's Conceptual and Internal levels (Not clearly defined)
Meaning of "Physical Level" Implementation such as SQL, storage, and partitioning Sometimes treated as a derivative of ERD DB-specific internal structure
Applicable Domain Supports RDB, NoSQL, and Event-Driven RDB-centric General database conceptual model
Essence "Reconstructed Modeling Framework" for practical use Structural representation Architecture principle (Not modeling)

🆚 2. Comparison by Layer (Difference in "What to Handle")

(1) Comparison of Meaning (Abstract) Layer

Aspect CLM: Abstract Layer ER: Conceptual Model ANSI: Conceptual Schema
Handling of Attributes Enumeration of existence + Necessity judgment Mixed use of writing/not writing attributes depending on the site Does not handle attributes (Pure semantic layer)
Relationships Relation as meaning Same as ER Conceptual associations but no notation rules
Persistence/Derivation Determined here Often mixed No specific rules
Modeling Granularity Directly linked to practice Depends on the practice Highly abstract and difficult to apply to practice

👉 CLM clearly defines the "scope of decision-making in the Abstract Layer."
In ER, conceptual and logical aspects tend to be mixed, and ANSI is too abstract to be easily adopted in practice.

(2) Comparison of Structure (Logical) Layer

Aspect CLM: Logical Layer ER: Logical Layer ANSI: Mix of External/Conceptual
Columns Only elements deemed "necessary" in Abstract are realized Depends on the site No specific rules
Constraints Handles up to logical constraints Same No specific rules
View/Generated Col Structures persistence/derivation decisions Notation varies by site No specific rules

👉 CLM has strong consistency in that it "flows Abstract decisions directly into Structure."

(3) Comparison of Substance (Physical) Layer

Aspect CLM: Physical Layer ER: Physical ANSI: Internal Schema
Role SQL is the master copy, no diagrams Many sites force everything into diagrams Theoretical role of physical structure
Content Index / Partition / RLS / Grant Some elements may be included in diagrams Low-level implementation elements (but no notation rules)

👉 CLM is clear that "SQL is the master copy, not the diagram."
ER tends to cram everything into diagrams, leading to confusion.


🧩 Appendix B: CLM Internal Specifications (Layered Inclusion Matrix)

Layer CLM: Abstract Layer (Concept) CLM: Logical Layer (Structure) CLM: Physical Layer (SQL)
Target Meaning / Concept Data Structure SQL Instance
Entity ✔ Define ✔ Finalize as Table Generate with DDL
Attribute ✔ Enumerate existence only
✔ Judgment of necessity
✔ Finalize column names
✔ Logical types/constraints
Types, constraints, Storage definitions
Relation ✔ Semantic association ✔ Realize as FK/Bridge table Implementation of constraints (FK/Index)
Rule (Business Rule) ✔ Define at semantic level Partially reflected in structure Check, Trigger, RLS, etc.
Persistence/Derivation ✔ Decided in Abstract Layer ✔ Branch into Table / View Implementation of View / MView
View/Diagram Notation Abstract View Logical View (ER Diagram) No diagram (SQL is the master)
Corresponding ANSI Layer Close to ANSI Conceptual but clear for practice Intermediate between ANSI Conceptual and Internal ANSI Internal Layer
Correspondence in ER Modeling Equivalent to Conceptual Model Equivalent to Logical Model Equivalent to Physical Model

In the next chapter, we will organize how to preserve the "reasons for judgment = Intent" that supports the structure.
To be continued in the second part

Discussion