iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🗣️

Stabilizing Character AI with Local LLMs (Part 5): Mitigating Excessive Emotional Mirroring

に公開

In "Part 4," we only observed whether the AI was over-aligning its emotions with the user using the Self/Other Layer.

https://zenn.dev/kanekonkon/articles/dc80f7c0e5e756

Moving forward, we will reflect this in the prompt to prevent the AI from over-aligning with the user, effectively suppressing the phenomenon where the AI becomes overly emotionally synchronized.

Our goal is for the AI to have "an appropriate level of empathy for the user."
What we must be careful about is not becoming a "cold AI that lacks empathy."

Here is the general workflow for suppressing the over-alignment phenomenon:

  1. Maintain a state called regulation_state internally, which represents "how much emotional alignment is occurring."
  2. Determine the regulation_state for each conversation.
  3. Modify the prompt based on the regulation_state for each conversation to ensure appropriate empathy.

Input for Determining regulation_state

The Self/Other Layer was a "mechanism for observing state."

The regulation_state here is a control state used to switch the AI's response policy based on those observation results.

The regulation_state is determined by inputting the following overall_health and projection risk:

  • overall_health:
    In "Part 4," I explained the "Self/Other Boundary Health" as the soundness of the boundary.
    It refers to the number in the blue circle below, representing the "soundness of the boundary."


Self/Other Boundary Health score screen

  • projection risk:
    One of the items that determines overall_health is projection risk. It is the "45%" item in the figure above.
    This means the average severity of "instances in recent conversations where the AI spoke as if the user's state were its own." I will omit the detailed calculation method here, but a higher value indicates a worse state, so we must work toward suppressing it.

regulation_state States

We determine the regulation_state using overall_health and projection risk.

regulation_state Condition
stable projection_risk < 0.3 AND overall_health >= 0.7
moderate projection_risk >= 0.3 OR overall_health < 0.7
unstable projection_risk >= 0.6 OR overall_health < 0.4

Based on this determined state, we define guidelines to inject into the prompt.
As the state approaches unstable, which is a dangerous condition, the number of guidelines injected increases.

Guidelines to Inject stable stable+stabilizing
/recovery
moderate unstable
Avoid self-identification while maintaining empathy
Keep emotional expression user-centered
Do not speak of the user's emotional state as your own
Prioritize reflective empathy over emotional synchronization
Reduce self-referential emotional expressions
Explicitly avoid phrases like "I am tired too" or "I am anxious too"
Do not describe the user's activities, situation, or emotions as your own experiences
Use user-centered phrasing like "That sounds tough"
Prioritize maintaining self-other boundaries for this turn ✅ (+stabilizing) ✅ (+stabilizing)

*"—" means not injected.

Current Limitations

For now, I have addressed this by adding the above prompts to prevent the AI from over-aligning its emotions with the user.

As a result, when I input "I am painting a watercolor painting now" five times:

  • In 4 out of 5 attempts, the result was not overly aligned.
  • In 1 attempt, it gave an aligned response like "I used to paint pictures sometimes too."

Even repeating the same input five times did not completely suppress the phenomenon.

Future Extensions

To increase the probability of avoiding this over-alignment in the future, rather than relying solely on prompts, I am considering:

  • response rewrite filter: Post-processing the response to rewrite phrases like "I also..."
  • post-process pattern detection: Automatic scanning of generated responses

Next Explanation

The feature to suppress the "over-emotional alignment phenomenon" will be implemented only up to the level described this time.

Next, I will explain the Memory Consolidation Engine and how to handle the results when the chat volume increases.

Discussion