iTranslated by AI
Stabilizing Character AI with Local LLMs (Part 5): Mitigating Excessive Emotional Mirroring
In "Part 4," we only observed whether the AI was over-aligning its emotions with the user using the Self/Other Layer.
Moving forward, we will reflect this in the prompt to prevent the AI from over-aligning with the user, effectively suppressing the phenomenon where the AI becomes overly emotionally synchronized.
Our goal is for the AI to have "an appropriate level of empathy for the user."
What we must be careful about is not becoming a "cold AI that lacks empathy."
Here is the general workflow for suppressing the over-alignment phenomenon:
- Maintain a state called
regulation_stateinternally, which represents "how much emotional alignment is occurring." - Determine the
regulation_statefor each conversation. - Modify the prompt based on the
regulation_statefor each conversation to ensure appropriate empathy.
Input for Determining regulation_state
The Self/Other Layer was a "mechanism for observing state."
The regulation_state here is a control state used to switch the AI's response policy based on those observation results.
The regulation_state is determined by inputting the following overall_health and projection risk:
- overall_health:
In "Part 4," I explained the "Self/Other Boundary Health" as the soundness of the boundary.
It refers to the number in the blue circle below, representing the "soundness of the boundary."

Self/Other Boundary Health score screen
- projection risk:
One of the items that determinesoverall_healthisprojection risk. It is the "45%" item in the figure above.
This means the average severity of "instances in recent conversations where the AI spoke as if the user's state were its own." I will omit the detailed calculation method here, but a higher value indicates a worse state, so we must work toward suppressing it.
regulation_state States
We determine the regulation_state using overall_health and projection risk.
| regulation_state | Condition |
|---|---|
stable |
projection_risk < 0.3 AND overall_health >= 0.7 |
moderate |
projection_risk >= 0.3 OR overall_health < 0.7 |
unstable |
projection_risk >= 0.6 OR overall_health < 0.4 |
Based on this determined state, we define guidelines to inject into the prompt.
As the state approaches unstable, which is a dangerous condition, the number of guidelines injected increases.
| Guidelines to Inject | stable | stable+stabilizing /recovery |
moderate | unstable |
|---|---|---|---|---|
| Avoid self-identification while maintaining empathy | — | ✅ | ✅ | ✅ |
| Keep emotional expression user-centered | — | ✅ | ✅ | ✅ |
| Do not speak of the user's emotional state as your own | — | ✅ | ✅ | ✅ |
| Prioritize reflective empathy over emotional synchronization | — | ✅ | ✅ | ✅ |
| Reduce self-referential emotional expressions | — | — | ✅ | ✅ |
| Explicitly avoid phrases like "I am tired too" or "I am anxious too" | — | — | — | ✅ |
| Do not describe the user's activities, situation, or emotions as your own experiences | — | — | — | ✅ |
| Use user-centered phrasing like "That sounds tough" | — | — | — | ✅ |
| Prioritize maintaining self-other boundaries for this turn | — | ✅ | ✅ (+stabilizing) | ✅ (+stabilizing) |
*"—" means not injected.
Current Limitations
For now, I have addressed this by adding the above prompts to prevent the AI from over-aligning its emotions with the user.
As a result, when I input "I am painting a watercolor painting now" five times:
- In 4 out of 5 attempts, the result was not overly aligned.
- In 1 attempt, it gave an aligned response like "I used to paint pictures sometimes too."
Even repeating the same input five times did not completely suppress the phenomenon.
Future Extensions
To increase the probability of avoiding this over-alignment in the future, rather than relying solely on prompts, I am considering:
- response rewrite filter: Post-processing the response to rewrite phrases like "I also..."
- post-process pattern detection: Automatic scanning of generated responses
Next Explanation
The feature to suppress the "over-emotional alignment phenomenon" will be implemented only up to the level described this time.
Next, I will explain the Memory Consolidation Engine and how to handle the results when the chat volume increases.
Discussion