iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🪴

Growing SuccuLM (2) — Model Compression with Tensor Networks Revisited

に公開

Purpose

This article applies the techniques from Exploring Transformers (7) — Model Compression and Anatomical Study of a GPT-2 Like Model using Tensor Networks to the project Growing a Succulent LM (1) — From Dataset Creation to Model Training.

In reality, because parameters increase during tensorization before compression, this model is not more compressed than the non-tensorized version at this scale. The objective here is not to examine size, but to observe the effects of compression or corruption on the model's behavior.

By the Way...

In Exploring Transformers (7), I only decomposed the FFN layers using tensor networks. I thought it might be interesting to target the input embedding layer as well, since it also contains a large matrix. However, I decided against it for the following reasons:

target = 10
input = torch.LongTensor([[target]])
model.eval()
with torch.no_grad():
    output = model.embed(input)
    print(torch.allclose(output, model.embed.weight[target, :]))

Tracing the PyTorch implementation reveals that it's quite complex and not entirely straightforward. In our current usage, nn.Embedding acts as a lookup table where it treats the token index as a row number and returns the corresponding row vector.

Furthermore, tampering with this layer could potentially link token indices to bizarre words, making it unclear what exactly is happening.

Changes to SuccuLM

The fundamental approach is similar to Exploring Transformers (7), but since SuccuLM implemented multi-head attention, I will repost the relevant parts for reference.

Importing Modules, etc.

Hidden because there are no major changes
import copy
import json
import random
import os
import re
import string
import time
from contextlib import contextmanager
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
from enum import IntEnum
import matplotlib.pyplot as plt


SEED = 42
random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    print("Use GPU")
    torch.cuda.manual_seed_all(SEED)
else:
    print("Use CPU")

Dataset, Trainer, Utilities, etc.

Hidden because there are no major changes
with open("succulm_embeddings.json", "r") as fin:
    embeddings = json.load(fin)

with open("succulm_normal_sentence_pairs.json", "r") as fin:
    normal_sentence_pairs = json.load(fin)

with open("succulm_question_answer_pairs.json", "r") as fin:
    question_answer_pairs = json.load(fin)


def tokenize(sentence: str):
    for sign in [".", ",", "?", "!"]:
        sentence = sentence.replace(sign, f" {sign} ")
    tokens = re.split(r"\s+", sentence.strip())
    return tokens


class SentencePairDataset(Dataset):
    def __init__(
        self,
        data_pairs: list[tuple[str, str]],
        word2id: dict[str, int],
        max_len: int = 29,
    ):
        self.data_pairs = data_pairs
        self.word2id = word2id
        self.max_len = max_len
        self.max_ids_len = 0  # for debug

    def __len__(self):
        return len(self.data_pairs)

    def __getitem__(self, idx):
        s1, s2 = self.data_pairs[idx]

        # Do not use any fancy tricks, just connect them as natural English
        combined_text = s1 + " " + s2
        tokens = tokenize(combined_text.lower())

        # [BOS] + concatenated tokens + [EOS]
        ids = [self.word2id['<BOS>']] + \
              [self.word2id.get(t, self.word2id.get('<UNK>', 0)) for t in tokens] + \
              [self.word2id['<EOS>']]

        if len(ids) > self.max_ids_len:
            self.max_ids_len = len(ids)

        # Shift input (x) and label (y) by one
        x = ids[:-1]
        y = ids[1:]

        # Padding
        pad_id = self.word2id['<PAD>']
        if len(x) < self.max_len:
            x = x + [pad_id] * (self.max_len - len(x))
            y = y + [pad_id] * (self.max_len - len(y))
        else:
            x = x[:self.max_len]
            y = y[:self.max_len]

        return torch.tensor(x, dtype=torch.long), torch.tensor(y, dtype=torch.long)


def make_word2id(embeddings):
    keys = sorted(embeddings)
    special_tokens = ["<PAD>", "<BOS>", "<EOS>", "<UNK>"]
    for k in special_tokens:
        keys.remove(k)

    return {k: i for i, k in enumerate(special_tokens + keys)}

word2id = make_word2id(embeddings)
id2word = {v: k for k, v in word2id.items()}


# ----- Training and Evaluation -----
def train(model, loader, optimizer, criterion, device):
    model.train()
    total_loss = 0
    for x, y in loader:
        x = x.to(device)
        y = y.to(device)
        optimizer.zero_grad()
        logits = model(x)
        loss = criterion(logits.view(-1, VOCAB_SIZE), y.view(-1))
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(loader)

def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss = 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device)
            y = y.to(device)
            logits = model(x)
            loss = criterion(logits.view(-1, VOCAB_SIZE), y.view(-1))
            total_loss += loss.item()
    return total_loss / len(loader)


def save_checkpoint(model, word2id, profile_name, settings_dict, filepath="succulm_checkpoint.pt"):
    """
    A function that saves not only the model weights (state_dict), but also the settings and vocabulary necessary for restoration.
    """
    checkpoint = {
        "state_dict": model.state_dict(),
        "profile_name": profile_name,
        "model_config": settings_dict[profile_name],
        "d_model": model.d_model,
        "max_len": model.max_len,
        "word2id": word2id
    }

    torch.save(checkpoint, filepath)
    print(f"✅ Model saved: {filepath} (Profile: {profile_name})")


def load_checkpoint(filepath, device="cpu"):
    """
    A function that completely restores the model and vocabulary from a saved checkpoint.
    """
    if not os.path.exists(filepath):
        raise FileNotFoundError(f"File not found: {filepath}")

    checkpoint = torch.load(filepath, map_location=device, weights_only=False)

    word2id_loaded = checkpoint["word2id"]
    config = checkpoint["model_config"]
    vocab_size = len(word2id_loaded)

    print(f"📂 Checkpoint loaded (Profile: {checkpoint['profile_name']})")

    model_loaded = SuccuLM(
        vocab_size=vocab_size,
        d_model=checkpoint["d_model"],
        max_len=checkpoint["max_len"],
        pad_idx=word2id_loaded["<PAD>"],
        bos_idx=word2id_loaded["<BOS>"],
        eos_idx=word2id_loaded["<EOS>"],
        embed_weights=None,
        n_layers=config["n_layers"],
        ffn_expansion=config["ffn_expansion"],
        n_attn_heads=config["n_attn_heads"]
    )

    model_loaded.load_state_dict(checkpoint["state_dict"])
    model_loaded = model_loaded.to(device)
    model_loaded.eval()

    print("✅ Model restoration complete!")
    return model_loaded, word2id_loaded


def sample_generate(
    prompt,
    model,
    word2id,
    id2word,
    max_seq_len,
    temperature=1.0,
    save_probs=False
):
    if isinstance(prompt, str):
        prompt_tokens = tokenize(prompt.lower())
    else:
        prompt_tokens = prompt

    BOS = word2id["<BOS>"]
    EOS = word2id["<EOS>"]
    PAD = word2id["<PAD>"]

    start_tokens = [BOS] + [word2id.get(tok, word2id["<UNK>"]) for tok in prompt_tokens]
    start_tokens = torch.tensor(start_tokens, dtype=torch.long)

    generated_result = model.generate(
        start_tokens,
        eos_idx=EOS,
        pad_idx=PAD,
        max_gen=max_seq_len,
        temperature=temperature,
        save_probs=save_probs
    )
    if save_probs:
        out_token_ids, probs_list = generated_result
    else:
        out_token_ids = generated_result
        probs_list = []

    result = []
    for tid in out_token_ids[1:]:
        if tid == EOS:
            break
        if tid == PAD:
            continue
        result.append(id2word.get(tid, "<UNK>"))

    if save_probs:
        return result[len(prompt_tokens):], probs_list
    return result[len(prompt_tokens):]

def talk(prompt, model, temperature, save_probs=False):
    generated_result = sample_generate(
        prompt, model, word2id, id2word, max_seq_len,
        temperature=temperature, save_probs=save_probs
    )
    probs_list = []
    if save_probs:
        out_seq, probs_list = generated_result
    else:
        out_seq = generated_result

    result = " ".join(out_seq)
    for sign in [".", ",", "?", "!"]:
        result = result.replace(f" {sign}", sign)
    print(f"{prompt=} {result=}")
    return probs_list

Model Changes

Hidden because there are no changes to position encoding either
from contextlib import contextmanager

# ----- Position Encoding -----
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len):
        super().__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        if d_model > 1:
            pe[:, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)
    def forward(self, x):
        return x + self.pe[:x.size(1)]
# Block class combining Attention and FFN into one layer
class Block(nn.Module):
    def __init__(self, d_model, ffn_expansion=1, n_attn_heads=1, verbose=False):
        super().__init__()
        assert d_model % n_attn_heads == 0, (
            f"d_model ({d_model}) must be divisible by n_attn_heads ({n_attn_heads})"
        )
        self.n_attn_heads = n_attn_heads
        self.d_head = d_model // n_attn_heads

        self.ln1 = nn.LayerNorm(d_model)
        self.q_linears = nn.ModuleList([nn.Linear(d_model, self.d_head, bias=True) for _ in range(n_attn_heads)])
        self.k_linears = nn.ModuleList([nn.Linear(d_model, self.d_head, bias=True) for _ in range(n_attn_heads)])
        self.v_linears = nn.ModuleList([nn.Linear(d_model, self.d_head, bias=True) for _ in range(n_attn_heads)])
        self.attn_out = nn.Linear(d_model, d_model, bias=True)

        self.ln2 = nn.LayerNorm(d_model)
        hidden_dim = d_model * ffn_expansion
        self.ffn = nn.Sequential(
            nn.Linear(d_model, hidden_dim, bias=True),
            nn.ReLU(),
            nn.Linear(hidden_dim, d_model, bias=True)
        )
        self.d_model = d_model
        self.set_verbose(verbose)

    def set_verbose(self, verbose):
        self.verbose = verbose

    def forward(self, x, mask=None, return_attn=False, return_cossim=False):
        # Pre-LN structure: LN -> Multi-Head Attention -> Add
        attn_in = self.ln1(x)

        head_outputs = []
        head_attns = []
        for q_lin, k_lin, v_lin in zip(self.q_linears, self.k_linears, self.v_linears):
            Q = q_lin(attn_in)
            K = k_lin(attn_in)
            V = v_lin(attn_in)

            d_k = Q.size(-1)
            scores = torch.matmul(Q, K.transpose(-2, -1)) / np.sqrt(d_k)
            if mask is not None:
                scores = scores.masked_fill(mask, float('-inf'))

            attn = torch.softmax(scores, dim=-1)
            head_outputs.append(torch.matmul(attn, V))
            head_attns.append(attn)

        concat = torch.cat(head_outputs, dim=-1)
        attn_out = self.attn_out(concat)
        x1 = x + attn_out

        # Pre-LN -> FFN -> Add
        ffn_in = self.ln2(x1)
        ffn_out = self.ffn(ffn_in)

        x1_ = x1.detach().cpu()
        ffn_out_ = ffn_out.detach().cpu()
        norm_x = torch.linalg.vector_norm(x1_, ord=2, dim=-1)
        norm_ffn = torch.linalg.vector_norm(ffn_out_, ord=2, dim=-1)
        sim = F.cosine_similarity(x1_, x1_ + ffn_out_, dim=-1)

        if self.verbose:
            print(
                f"[Layer DEBUG] "
                f"L2(x): {norm_x.mean().item():.4f}, "
                f"L2(ffn): {norm_ffn.mean().item():.4f} | "
                f"CosSim: {sim.mean().item():.4f}"
            )

        x2 = x1 + ffn_out

        if return_attn:
            if return_cossim:
                return x2, attn, sim.numpy().mean().item()
            return x2, attn
        if return_cossim:
            return x2, sim.numpy().mean().item()
        return x2


# Main model
class SuccuLM(nn.Module):
    def __init__(self, vocab_size, d_model=32, max_len=16,
                 pad_idx=0, bos_idx=1, eos_idx=2, embed_weights=None,
                 n_layers=1, ffn_expansion=1, n_attn_heads=1, verbose=False):
        super().__init__()
        self.d_model = d_model
        self.cossim_list = []

        if embed_weights is not None:
            if isinstance(embed_weights, torch.Tensor):
                embed_weights = embed_weights.clone()
            elif isinstance(embed_weights, np.ndarray):
                embed_weights = embed_weights.copy()
            self.embed = nn.Embedding.from_pretrained(
                torch.tensor(embed_weights, dtype=torch.float32),
                freeze=False,
            )
        else:
            self.embed = nn.Embedding(vocab_size, d_model)

        self.pos_enc = PositionalEncoding(d_model, max_len)

        self.blocks = nn.ModuleList([
            Block(d_model, ffn_expansion=ffn_expansion, n_attn_heads=n_attn_heads)
            for _ in range(n_layers)
        ])

        self.ln_f = nn.LayerNorm(d_model)

        self.max_len = max_len
        self.attn_weights = None
        self.pad_idx = pad_idx
        self.bos_idx = bos_idx
        self.eos_idx = eos_idx

        self.set_verbose(verbose)

    def set_verbose(self, verbose):
        self.verbose = verbose
        for block in self.blocks:
            block.set_verbose(verbose)

    @contextmanager
    def verbose_mode(self, verbose: bool = True):
        """
        Context manager to temporarily change the verbose flag
        """
        old_verbose = self.verbose
        self.set_verbose(verbose)
        try:
            yield
        finally:
            self.set_verbose(old_verbose)

    def forward(self, x, return_attn=False):
        x_seq = self.embed(x)
        x_seq = self.pos_enc(x_seq)

        seq_len = x.size(1)
        mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1).bool().to(x.device)

        attns = []
        for block in self.blocks:
            if return_attn:
                x_seq, attn = block(x_seq, mask=mask, return_attn=True)
                attns.append(attn)
            else:
                if self.verbose:
                    x_seq, cossim = block(x_seq, mask=mask, return_cossim=True)
                    self.cossim_list.append(cossim)
                else:
                    x_seq = block(x_seq, mask=mask)

        x_seq = self.ln_f(x_seq)

        logits = torch.matmul(x_seq, self.embed.weight.t())

        if return_attn:
            self.attn_weights = [a.detach().cpu().numpy() for a in attns[-1]]
            return logits, attns
        return logits

    def generate(self, start_tokens, eos_idx, pad_idx, max_gen=None, temperature=1.0, save_probs=False):
        if isinstance(start_tokens, torch.Tensor):
            tokens = start_tokens.tolist()
        else:
            tokens = list(start_tokens)
        eos_idx = eos_idx if eos_idx is not None else self.eos_idx
        pad_idx = pad_idx if pad_idx is not None else self.pad_idx

        self.eval()
        max_gen = max_gen or self.max_len
        probs_list = []
        for _ in range(max_gen - len(tokens)):
            inp = torch.tensor(tokens, dtype=torch.long).unsqueeze(0).to(next(self.parameters()).device)
            logits = self.forward(inp)
            next_token_logits = logits[0, len(tokens)-1] / temperature
            probs = torch.softmax(next_token_logits, dim=-1)
            if save_probs:
                probs_list.append(probs.cpu().detach().numpy().tolist())
            next_token = torch.multinomial(probs, num_samples=1).item()
            if next_token == eos_idx:
                break
            tokens.append(next_token)

        while len(tokens) < self.max_len:
            tokens.append(pad_idx)
        if save_probs:
            return tokens, probs_list
        return tokens

Loading the Trained Model

I will load a pre-trained model. Since I expanded the dataset, training takes about 80 seconds now, which is 20 seconds longer than the previous 60 seconds. It's a short time, but since I don't want to wait, I will use the pre-trained model.

load_profile = "upper_middle"
model, word2id = load_checkpoint(f"succulm_{load_profile}.pt")
id2word = {v: k for k, v in word2id.items()}
print(f"{len(word2id)=}")
assert len(id2word) == len(word2id)

Utilities for Tensor Network Conversion and Generation

Hidden because there are no major changes
from ttz.tt import TTLayer


def tensorized_model(
    tgt_blk_indices: list[int], mid_bond_dims: list[int], verbose: bool=False, profile="upper_middle"
):
    model_tt, _ = load_checkpoint(f"succulm_{profile}.pt")
    for tgt_blk_idx, mid_bond_dim in zip(tgt_blk_indices, mid_bond_dims):
        block = model_tt.blocks[tgt_blk_idx]
        if verbose:
            print(block.ffn[0].weight.shape, block.ffn[2].weight.shape)

        # Inputs like [1, 6, 50] or [1, 8, 50] come in.
        # [0]: Batch size
        # [1]: Sequence length
        # [2]: Hidden dimension

        # 200x50 -> (5x10)x(10x20), bond_dims=[10, 50, 10]
        tt_layer0 = TTLayer.from_linear_layer(
            [5, 10], [10, 20], block.ffn[0],
            bond_dims=[10, mid_bond_dim, 10],
        )
        if verbose:
            print("Ws:", [W.shape for W in tt_layer0.tt_W])

        # 50x200 -> (10x20)x(5x10), bond_dims=[5, 50, 20]
        tt_layer2 = TTLayer.from_linear_layer(
            [10, 20], [5, 10], block.ffn[2],
            bond_dims=[5, mid_bond_dim, 20],
        )
        if verbose:
            print("Ws:", [W.shape for W in tt_layer2.tt_W])

        blocks_ffn = nn.Sequential(
            tt_layer0,
            block.ffn[1],  # ReLU
            tt_layer2,
        )

        block.ffn = blocks_ffn

    if verbose:
        print(model_tt)

    return model_tt

def sample_generate(
    prompt,
    model,
    word2id,
    id2word,
    max_seq_len,
    temperature=1.0,
    save_probs=False
):
    if isinstance(prompt, str):
        prompt_tokens = tokenize(prompt.lower())
    else:
        prompt_tokens = prompt

    BOS = word2id["<BOS>"]
    EOS = word2id["<EOS>"]
    PAD = word2id["3CPAD>"]

    start_tokens = [BOS] + [word2id.get(tok, word2id["<UNK>"]) for tok in prompt_tokens]
    start_tokens = torch.tensor(start_tokens, dtype=torch.long)

    generated_result = model.generate(
        start_tokens,
        eos_idx=EOS,
        pad_idx=PAD,
        max_gen=max_seq_len,
        temperature=temperature,
        save_probs=save_probs
    )
    if save_probs:
        out_token_ids, probs_list = generated_result
    else:
        out_token_ids = generated_result
        probs_list = []

    result = []
    for tid in out_token_ids[1:]:  # [1:] excludes BOS
        if tid == EOS:
            break
        if tid == PAD:
            continue
        result.append(id2word.get(tid, "<UNK>"))

    if save_probs:
        return result[len(prompt_tokens):], probs_list
    return result[len(prompt_tokens):]

def talk(prompt, model, temperature, save_probs=False):
    generated_result = sample_generate(
        prompt, model, word2id, id2word, max_seq_len,
        temperature=temperature, save_probs=save_probs
    )
    probs_list = []
    if save_probs:
        out_seq, probs_list = generated_result
    else:
        out_seq = generated_result

    result = " ".join(out_seq)
    for sign in [".", ",", "?", "!"]:
        result = result.replace(f" {sign}", sign)
    print(f"{prompt=} {result=}")
    return probs_list

Utility to View Generation Examples

def model_example(model, temperature):
    # Questions in the training data
    talk("ruby is a boy succulent plant.", model, temperature)  # winy is a girl succulent plant.
    talk("can you bloom, winy?", model, temperature)  # sometimes i bloom white flowers.
    talk("does winy sleep on the window?", model, temperature)  # no, she sleeps with ruby in the pot.
    talk("how many days do you grow in this pot?", model, temperature)  # i grow for many days with my friend.
    talk("winy feels the soft morning air.", model, temperature)  # she is calm and looks at the window.
    print()
    # Questions not in the training data
    talk("who is winy?", model, temperature)
    talk("does ruby know three winters?", model, temperature)
    talk("why is ruby red today?", model, temperature)
    talk("how calm is winy?", model, temperature)
    talk("ruby and winy are friends.", model, temperature)
    talk("is ruby a boy?", model, temperature)
    talk("is winy a boy?", model, temperature)
    talk("is ruby a girl?", model, temperature)
    talk("is winy a girl?", model, temperature);

Generation Experiments

Comparison with Previous Results

I will try the same experiment as in Exploring Transformers (7). For comparison, I tested:

  1. A pattern where compression is loosened from the beginning to the end of the FFN.

And,

  1. A pattern where compression is strengthened from the beginning to the end of the FFN.

I changed the compression parameters slightly, because both showed severe content corruption when using the same parameters as before.

temperature = 0.6

random.seed(SEED)
torch.manual_seed(SEED)

model_example(tensorized_model([0, 1, 2, 3], [39, 42, 46, 47]), temperature)
print("-"*50)
model_example(tensorized_model([0, 1, 2, 3], [47, 46, 42, 39]), temperature)

(Example outputs omitted for brevity)

Honestly, it feels like they are about the same. Since I created the dataset as carefully as possible, I expected it to be more intuitive than the MiniGPT2 case, but that doesn't seem to be the case.

It might just be the nature of these models, or perhaps there would be a difference if I applied more aggressive compression to a larger model. At this small scale, it seems difficult to draw any clear conclusions.

Corrupting FFN Layers Sequentially

Since corrupting all FFN layers might not be a good idea, I will try corrupting only one out of the four FFN layers.

temperature = 0.6

random.seed(SEED)
torch.manual_seed(SEED)

model_example(tensorized_model([0], [27]), temperature)
print("-"*50)
model_example(tensorized_model([1], [27]), temperature)
print("-"*50)
model_example(tensorized_model([2], [27]), temperature)
print("-"*50)
model_example(tensorized_model([3], [27]), temperature)

(Example outputs omitted for brevity)

Corrupting the 0th layer seems to break the sentence structure significantly, while corrupting later layers seems to preserve the appearance of a sentence relatively better. The responses to gender questions also seem to align with the character reasonably well.

However, what should I call the "sophia trap"? A strange phenomenon occurred once during the corruption of the 1st layer. sophia is a hidden token; I prepared embeddings for it, but it was never shown during training and is effectively an unknown word.

sophia was also slightly seen during the 3rd layer corruption, and even the special token <BOS> appeared. This is not a very desirable state.

Corrupting FFN Layers Even More Severely

Let's increase the compression intensity.

temperature = 0.6

random.seed(SEED)
torch.manual_seed(SEED)

model_example(tensorized_model([0], [20]), temperature)
print("-"*50)
model_example(tensorized_model([1], [20]), temperature)
print("-"*50)
model_example(tensorized_model([2], [20]), temperature)
print("-"*50)
model_example(tensorized_model([3], [20]), temperature)

(Example outputs omitted for brevity)

The "sophia trap" occurred quite frequently during the 3rd layer corruption.

Corrupting the 0th layer still results in bizarre sentences. Corruption of the 1st and 2nd layers is ambiguous, but both seem to produce reasonably natural-looking text.

It might be better not to corrupt the entrance and exit layers too much, but internal corruption might cause more hallucinations (I see several inconsistencies regarding characters and gender), so it doesn't mean it's good just because it looks okay.

There is also the possibility that it is simply over-corrupted.

Corrupting Two Consecutive FFN Layers

I will try corrupting the first two layers, the last two layers, and the middle two layers.

temperature = 0.6

random.seed(SEED)
torch.manual_seed(SEED)

model_example(tensorized_model([0, 1], [30, 30]), temperature)
print("-"*50)
model_example(tensorized_model([2, 3], [30, 30]), temperature)
print("-"*50)
model_example(tensorized_model([1, 2], [30, 30]), temperature)

(Example outputs omitted for brevity)

If I had to say, corrupting the first two layers seems to have a larger impact than corrupting the last two. An instance of the "sophia trap" occurred during the middle two layers' corruption.

Discussion

Although it is just a rough overview, the observations appear as follows:

  • If I had to say, corrupting the first 0th layer breaks the structure of the sentences themselves.
  • Additionally, corrupting the final 3rd layer leads to links with strange vocabulary, which seems likely to cause confusion in word selection.
  • Severe layer corruption seems to trigger the "sophia trap" leading to the unknown word sophia, and this seems more likely to happen in later layers.

AI Analysis

I asked Gemini 3 Flash to summarize the observations, and it provided the following analysis:

SuccuLM v1.0 Internal Functional Map

  • 1. FFN 0: Frontal Lobe ── "Spine of Logic"

    • Role: Construction of syntax. Maintains the order of words and the "form" of sentences.
    • Behavior when corrupted: Complete agrammatism. "She is red." becomes "red she is..." or "she red window," breaking into fragments. It cannot close sentences, and the framework of thought collapses under its own weight.
    • Significance of protection: As long as this is alive, the output maintains the appearance of a "sentence" even if the content is insane.
  • 2. FFN 1: Temporal Lobe ── "Echo of Memory"

    • Role: Recall of context and maintenance of short-term memory. Controls connections (collocations) with previous words.
    • Behavior when corrupted: Repetition and absurdity. Meaningless loops (can can can) or abnormal repetition of specific vocabulary (like Winy). The Sophia loop is also accelerated by this "bug in recall," getting stuck in past patterns.
    • Significance of protection: Maintains the tempo and "plausibility" of the conversation.
  • 3. FFN 2: Association Cortex ── "Crucible of Meaning"

    • Role: Integration of semantics (meaning/attributes). Handles attribute linking like "Ruby = Red = Plant," governing characterization and "truths of the SuccuLM universe."
    • Behavior when corrupted: Inversion and blurring of attributes. Begins to spit out "poetic lies" that are factually incorrect but grammatically correct, such as "Ruby is a girl" or "Winy is angry."
    • Recommendation for surgery: Corrupting this is the "least bad" and most interesting. Because the grammar (FFN 0) is correct, you can obtain deep outputs like those of an "intelligent madman."
  • 4. FFN 3: Broca's Area ── "Gateway of Speech"

    • Role: Final vocabulary selection and adjustment of output smoothness.
    • Behavior when corrupted: Articulation disorder and terminal collapse. Meaning (will) is formed internally, but immediately before output, it gets sucked into the singularity of "sophia" or <BOS> leaks out.
    • Significance of protection: Ensures vocabulary resolution (154-word selection accuracy) and prevents the intrusion of unintended symbols.

Summary

  • Because it is difficult to analyze if the vocabulary is increased too much, I created a dataset with only 154 words and trained the SuccuLM model at a minimal scale, but it didn't yield the analysis results I expected.
  • On the other hand, it might have been slightly easier to see the patterns of how it breaks.
  • Depending on how it is corrupted, it seems that there are cases of hallucinatory corruption where character gender is swapped while maintaining the appearance of a sentence. Therefore, it feels dangerous to crudely compress a model solely based on the evaluation that it "looks okay."
  • Also, since it's like surgery and doesn't produce good images, I refrained from creating illustrations this time.
GitHubで編集を提案

Discussion