selfmoq Icon Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS

Findings of ACL 2026

Yue Zhao1, Hongyan Li1, Yong Chen1, Luo Ji1


1Geely AI Lab

Correspondence: Luo.Ji1@geely.com

Abstract

Emotional interaction is increasingly crucial for conversational AI, yet current systems lack a unified mechanism to coordinate emotional expression across text and speech. We propose an emotion-planning dialogue framework that formulates the system’s self-emotion as a reinforcement learning action determined before text generation, enabling consistent emotional control for both language and TTS outputs. The framework is trained using a hybrid reward that combines a correctness indicator with GPT-4o–based evaluative feedback, supporting robust open-domain emotional reasoning. With a streaming-compatible architecture for real-time deployment, our method outperforms strong baselines in emotional alignment, contextual appropriateness, and perceptual naturalness, with human studies confirming its improved coherence and empathy.

Approach

Overview

overview figure

Comparison with conventional paradigms.

(A) Vanilla streaming LLM-TTS pipeline without emotion consideration.
(B) ERC (emotional recognition conversation) combined with emoTTS. Emotion cognition occurs after LLM generation and therefore cannot be utilized by emoTTS in a streaming manner.
(C) Our plug-and-play Self-EmoQ, which determines self-emotion prior to response generation, effectively driving streaming LLM-TTS.

Framework

framework

Framework of Self-EmoQ

Framework of Self-EmoQ is post-trained on pretrained LLM, and produce Q-values by averaging output token logprobs.
We apply Plutchik Wheel of Emotionss to guide reward annotations of multi-turn conversations, and finally update model parameters based the Bellman Equation.

Plutchik wheel

Plutchik wheel

The Plutchik wheel of Emotion (Left) provides a structured theory of emotions, including their categories, opposite and adjacent relationships, and characteristic behavioral functions. This theory enables us to evaluate not only whether a predicted emotion matches a label but also whether it is reasonable, functional, and consistent within the dialogue context.

According to Plutchik’s Wheel of Emotion, the emotional expression of an utterance is closely coupled with its underlying behavioral function. Emotional transitions are not arbitrary but tend to follow the topological structure of the emotion wheel, where transitions between adjacent emotions are generally more natural, while transitions between opposite emotions are less plausible.

Prompts

Self-EmoQ Prompt (click to expand)
Description: {desc}
History: {h}
User's query: {query}
Please select the most appropriate response emotion from the following options:
(1) {Emo1} (2) {Emo2} … (K) {EmoK}

Please provide your selection in the format of A through G, your selection is:
Plutchik Score Prompt (click to expand)

You are an emotion evaluation module grounded in Plutchik’s Wheel of Emotion.

Plutchik’s theory defines eight emotions:
{Joy, Trust, Fear, Surprise, Sadness, Disgust, Anger, Anticipation}.

Note that these eight emotions are organized in a specific order and exhibit opposite and adjacent relationships. Specifically, Joy and Anticipation are also adjacent.

There are four pairs of opposite relationships:

(Joy, Sadness), (Trust, Disgust), (Fear, Anger), (Surprise, Anticipation)

According to Plutchik’s theory:

1. Each emotion is associated with typical behaviors and functions:

  • Joy: Courting, mating; Reproduction
  • Trust: Grooming, sharing; Affiliation
  • Fear: Running or flying away; Protection
  • Surprise: Stopping, alerting; Orientation
  • Sadness: Crying for help; Reintegration
  • Disgust: Vomiting, pushing away; Rejection
  • Anger: Biting, hitting; Destruction
  • Anticipation: Examining, mapping; Exploration

2. Emotional transitions follow structured relationships:

  1. a. Transitions between adjacent emotions are generally more natural.
  2. b. Transitions between opposite emotions are less plausible unless mediated.
  3. c. Emotional responses should not abruptly contradict the user's emotional state.

Your task is to evaluate the system response according to these principles.

History: {h}
User's emotion: {eu}   Query: {query}
System's emotion: {es}   Response: {response}

Evaluate the system response using the following criteria. For each criterion, assign an integer score from 0 to 5 (0 = completely inappropriate, 5 = highly appropriate).

1. Emotion Alignment:
To what extent does the response clearly express the target emotional state?

2. Emotion Transition Plausibility:
Is the emotional transition from the user's emotion to the system's target emotion reasonable according to Plutchik’s emotional structure?

3. Emotion–Function Consistency:
Does the response exhibit the typical behavioral function associated with the target emotion?

Your response needs to follow the following format:

{
alignment: int,
transition: int,
function: int
}

Algorithm

Algorithm: Self-EmoQ (click to expand)
  1. Initialize the batch size \(B\).
  2. Initialize replay buffer \(\mathcal{B}\), Q-network \(Q_{\theta}(s,a)\), and target Q-network \(\hat{Q}_{\theta^-}(s,a)\) with \(\theta^- \leftarrow \theta\).
  3. Load pretrained model \(g\); set exploration rate \(\epsilon\), discount factor \(\gamma\), update interval \(C\), and Plutchik weight \(w\).
  4. While not converged:
    1. Draw \(B\) samples \(\{(h_t, x_t^u)\}\) from \(\mathcal{B}\).
    2. For each data sample:
      1. Form state \(s_t = (h_t, s_t^u)\).
      2. Select emotion \(e_t^s\) using \(\epsilon\)-greedy policy.
      3. Obtain response \(x_t^s\) and reward \(r_t\).
      4. Update history \(h_{t+1}\) and state \(s_{t+1}\).
      5. Store transition \((s_t, a_t, r_t, s_{t+1})\) in \(\mathcal{B}\).
    3. Sample batch \(\{(s_i, a_i, r_i, s_i')\}\) from \(\mathcal{B}\).
    4. Compute TD target: \[ y_i = r_i + \gamma \max_{a'} \hat{Q}_{\theta^-}(s_i', a') \]
    5. Update Q-network by minimizing: \[ \mathcal{L}(\theta) = \mathbb{E}_i \bigl[ (Q_\theta(s_i, a_i) - y_i)^2 \bigr]. \]
    6. Every \(C\) steps, update target network \(\theta^- \leftarrow \theta\).

Result

Automatic Metrics

w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.60 0.58 0.70 0.85 0.55 3.53 11.48 40.81 3.63
ECoT 0.18 0.51 0.64 0.80 0.42 0.86 3.57 14.13 0.51
PS 0.65 0.64 0.76 0.85 0.55 2.40 6.73 35.97 1.41
MP 0.61 0.62 0.74 0.85 0.54 1.30 4.97 13.4 1.17
Finetune SFT 0.74 0.71 0.82 0.86 0.50 6.27 21.82 51.76 22.06
FSM 0.69 0.68 0.79 0.86 0.51 6.32 21.9 50.35 21.76
RL EMDP 0.73 0.78 0.86 0.85 0.52 6.66 20.26 51.08 24.78
Self-EmoQ 0.78 0.74 0.89 0.90 0.66 9.11 25.35 54.71 41.72
SFT-EmoQ 0.67 0.74 0.89 0.89 0.66 9.02 25.28 54.77 41.13
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.55 0.52 0.62 0.85 0.49 3.53 11.49 40.81 3.63
ECoT 0.16 0.52 0.65 0.79 0.41 0.86 3.57 14.13 0.51
PS 0.61 0.62 0.72 0.85 0.56 2.40 6.73 35.97 1.41
MP 0.57 0.58 0.68 0.85 0.53 1.30 4.97 13.4 1.17
Finetune SFT 0.70 0.74 0.84 0.86 0.62 6.27 21.85 51.76 22.06
FSM 0.66 0.70 0.79 0.86 0.61 6.32 21.89 50.35 21.76
RL EMDP 0.65 0.83 0.88 0.85 0.71 6.66 20.26 51.08 24.78
Self-EmoQ 0.73 0.76 0.90 0.89 0.65 9.11 25.35 54.72 41.73
SFT-EmoQ 0.55 0.78 0.90 0.90 0.64 9.04 25.29 54.78 41.34
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.51 0.52 0.62 0.85 0.49 3.53 11.48 40.81 3.63
ECoT 0.15 0.51 0.65 0.80 0.41 0.86 3.57 14.13 0.51
PS 0.56 0.62 0.72 0.85 0.56 2.40 6.73 35.97 1.41
MP 0.53 0.58 0.68 0.85 0.53 1.30 4.97 13.4 1.17
Finetune SFT 0.66 0.75 0.84 0.86 0.63 6.27 21.83 51.76 22.06
FSM 0.62 0.70 0.79 0.86 0.61 6.32 21.92 50.35 21.76
RL EMDP 0.57 0.83 0.88 0.85 0.71 6.66 20.28 51.08 24.78
Self-EmoQ 0.69 0.78 0.90 0.90 0.65 9.11 25.33 54.72 41.73
SFT-EmoQ 0.54 0.78 0.91 0.90 0.65 9.04 25.26 54.78 41.34
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.46 0.52 0.61 0.85 0.50 3.53 11.47 40.81 3.63
ECoT 0.14 0.51 0.65 0.79 0.41 0.86 3.57 14.13 0.51
PS 0.52 0.62 0.72 0.85 0.56 2.40 6.73 35.97 1.41
MP 0.48 0.58 0.67 0.85 0.53 1.30 4.97 13.4 1.17
Finetune SFT 0.62 0.75 0.84 0.87 0.63 6.27 21.79 51.76 22.06
FSM 0.59 0.70 0.79 0.86 0.62 6.32 21.9 50.35 21.76
RL EMDP 0.49 0.83 0.88 0.85 0.71 6.66 20.27 51.08 24.78
Self-EmoQ 0.65 0.78 0.91 0.91 0.65 9.11 25.37 54.66 41.91
SFT-EmoQ 0.45 0.79 0.91 0.91 0.66 9.04 25.27 54.78 41.34
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.37 0.47 0.56 0.84 0.48 3.53 11.48 40.81 3.63
ECoT 0.10 0.51 0.65 0.79 0.41 0.86 3.57 14.13 0.51
PS 0.43 0.60 0.68 0.86 0.57 2.40 6.73 35.97 1.41
MP 0.40 0.56 0.63 0.85 0.53 1.30 4.97 13.4 1.17
Finetune SFT 0.55 0.79 0.86 0.88 0.70 6.27 21.81 51.76 22.06
FSM 0.52 0.73 0.81 0.88 0.67 6.32 21.92 50.35 21.76
RL EMDP 0.33 0.83 0.88 0.86 0.71 6.66 20.25 51.08 24.78
Self-EmoQ 0.57 0.82 0.92 0.92 0.72 9.11 25.34 54.72 41.73
SFT-EmoQ 0.45 0.83 0.92 0.92 0.72 9.07 25.3 54.74 41.41
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.27 0.41 0.46 0.83 0.45 3.53 11.48 40.81 3.63
ECoT 0.08 0.51 0.67 0.78 0.40 0.86 3.57 14.13 0.51
PS 0.34 0.56 0.61 0.85 0.57 2.40 6.73 35.97 1.41
MP 0.31 0.51 0.56 0.84 0.53 1.30 4.97 13.4 1.17
Finetune SFT 0.48 0.85 0.89 0.90 0.79 6.27 21.81 51.76 22.06
FSM 0.45 0.77 0.82 0.89 0.75 6.32 21.92 50.35 21.76
RL EMDP 0.26 0.87 0.90 0.87 0.81 6.66 20.29 51.08 24.78
Self-EmoQ 0.48 0.89 0.95 0.94 0.81 9.11 25.35 54.72 41.72
SFT-EmoQ 0.48 0.89 0.95 0.93 0.81 9.04 25.23 54.78 41.34
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.13 0.40 0.41 0.64 0.46 3.53 11.48 40.81 3.63
ECoT 0.02 0.42 0.71 0.53 0.36 0.86 3.58 14.13 0.51
PS 0.21 0.58 0.59 0.73 0.61 2.40 6.73 35.97 1.41
MP 0.18 0.52 0.53 0.70 0.56 1.30 4.97 13.4 1.17
Finetune SFT 0.36 0.93 0.94 0.90 0.89 6.27 21.8 51.76 22.06
FSM 0.34 0.84 0.85 0.88 0.84 6.32 21.89 50.35 21.76
RL EMDP 0.21 0.89 0.90 0.85 0.83 6.66 20.26 51.08 24.78
Self-EmoQ 0.35 0.83 0.97 0.89 0.85 9.11 25.35 54.72 41.73
SFT-EmoQ 0.35 0.97 0.99 0.92 0.89 9.04 25.25 54.78 41.34
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.44 0.45 0.68 0.83 0.50 2.08 8.84 60.62 2.52
ECoT 0.32 0.45 0.67 0.81 0.43 0.51 2.56 19.58 0.34
PS 0.51 0.47 0.69 0.84 0.47 1.56 5.6 53.69 1.1
MP 0.42 0.44 0.68 0.82 0.52 0.99 4.33 33.57 0.79
Finetune SFT 0.50 0.47 0.69 0.84 0.47 3.68 12.15 12.35 11.93
FSM 0.51 0.45 0.69 0.84 0.52 3.89 12.24 7.84 12.26
RL EMDP 0.09 0.42 0.67 0.84 0.59 3.26 11.18 3.53 10.76
Self-EmoQ 0.53 0.58 0.81 0.87 0.50 4.15 12.36 18.95 13.28
SFT-EmoQ 0.50 0.57 0.81 0.87 0.49 4.33 12.47 21.24 15.15
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.49 0.51 0.68 0.80 0.46 2.08 8.83 60.62 2.52
ECoT 0.36 0.46 0.67 0.79 0.40 0.51 2.55 19.58 0.34
PS 0.55 0.55 0.71 0.82 0.47 1.56 5.58 53.69 1.10
MP 0.47 0.48 0.66 0.80 0.46 0.99 4.33 33.57 0.79
Finetune SFT 0.46 0.56 0.72 0.82 0.50 3.68 12.23 12.35 11.93
FSM 0.49 0.53 0.70 0.82 0.51 3.89 12.22 7.84 12.26
RL EMDP 0.16 0.46 0.63 0.75 0.47 3.26 11.19 3.53 10.76
Self-EmoQ 0.53 0.66 0.85 0.85 0.52 4.03 12.41 17.18 13.20
SFT-EmoQ 0.52 0.66 0.85 0.85 0.50 4.18 12.51 20.63 14.95
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.54 0.51 0.68 0.80 0.47 2.08 8.81 60.62 2.52
ECoT 0.40 0.45 0.68 0.78 0.39 0.51 2.55 19.58 0.34
PS 0.59 0.55 0.71 0.82 0.47 1.56 5.58 53.69 1.10
MP 0.51 0.48 0.66 0.80 0.46 0.99 4.33 33.57 0.79
Finetune SFT 0.57 0.56 0.73 0.83 0.50 3.68 12.22 12.35 11.93
FSM 0.60 0.53 0.70 0.82 0.51 3.89 12.23 7.84 12.26
RL EMDP 0.23 0.46 0.63 0.75 0.47 3.26 11.2 3.53 10.76
Self-EmoQ 0.61 0.66 0.85 0.85 0.54 4.33 12.65 21.85 15.87
SFT-EmoQ 0.57 0.65 0.83 0.84 0.49 4.04 12.35 20.03 13.37
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.58 0.51 0.68 0.80 0.46 2.08 8.83 60.62 2.52
ECoT 0.48 0.45 0.67 0.79 0.39 0.51 2.55 19.58 0.34
PS 0.63 0.55 0.71 0.82 0.47 1.56 5.58 53.69 1.10
MP 0.56 0.48 0.66 0.80 0.46 0.99 4.32 33.57 0.79
Finetune SFT 0.63 0.56 0.73 0.83 0.50 3.68 12.2 12.35 11.93
FSM 0.64 0.53 0.71 0.82 0.51 3.89 12.28 7.84 12.26
RL EMDP 0.30 0.46 0.63 0.75 0.47 3.26 11.18 3.53 10.76
Self-EmoQ 0.67 0.64 0.85 0.84 0.51 3.86 11.95 17.72 12.00
SFT-EmoQ 0.63 0.65 0.85 0.84 0.53 4.07 12.45 18.88 13.26
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.67 0.52 0.69 0.80 0.47 2.08 8.85 60.62 2.52
ECoT 0.57 0.45 0.68 0.77 0.39 0.51 2.56 19.58 0.34
PS 0.71 0.56 0.73 0.81 0.47 1.56 5.58 53.69 1.10
MP 0.66 0.50 0.67 0.79 0.47 0.99 4.33 33.57 0.79
Finetune SFT 0.71 0.59 0.74 0.83 0.51 3.68 12.2 12.35 11.93
FSM 0.70 0.56 0.72 0.82 0.51 3.89 12.25 7.84 12.26
RL EMDP 0.44 0.46 0.63 0.75 0.47 3.26 11.22 3.53 10.76
Self-EmoQ 0.71 0.63 0.84 0.83 0.50 4.39 12.89 16.82 14.23
SFT-EmoQ 0.70 0.66 0.83 0.85 0.54 4.3 12.4 22.72 15.24
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.77 0.54 0.69 0.78 0.47 2.08 8.83 60.62 2.52
ECoT 0.67 0.41 0.69 0.76 0.38 0.51 2.55 19.58 0.34
PS 0.78 0.58 0.73 0.79 0.48 1.56 5.59 53.69 1.10
MP 0.76 0.53 0.68 0.78 0.47 0.99 4.33 33.57 0.79
Finetune SFT 0.79 0.63 0.76 0.82 0.54 3.68 12.21 12.35 11.93
FSM 0.78 0.57 0.73 0.81 0.52 3.89 12.27 7.84 12.26
RL EMDP 0.79 0.45 0.61 0.73 0.44 3.26 11.2 3.53 10.76
Self-EmoQ 0.79 0.65 0.84 0.83 0.52 4.34 12.44 24.38 15.66
SFT-EmoQ 0.78 0.60 0.82 0.80 0.48 3.99 12.3 16.66 12.65
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.90 0.57 0.70 0.61 0.48 2.08 8.82 60.62 2.52
ECoT 0.90 0.44 0.73 0.53 0.37 0.51 2.56 19.58 0.34
PS 0.90 0.60 0.74 0.61 0.49 1.56 5.59 53.69 1.10
MP 0.90 0.55 0.69 0.61 0.48 0.99 4.33 33.57 0.79
Finetune SFT 0.95 0.65 0.77 0.67 0.56 3.68 12.23 12.35 11.93
FSM 0.93 0.60 0.74 0.65 0.53 3.89 12.27 7.84 12.26
RL EMDP 0.79 0.45 0.61 0.59 0.44 3.26 11.21 3.53 10.76
Self-EmoQ 0.88 0.48 0.78 0.60 0.47 4.34 12.44 24.38 15.66
SFT-EmoQ 0.88 0.51 0.80 0.54 0.39 3.84 12.27 18.9 11.99
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.69 0.54 0.78 0.83 0.57 1.77 9.16 52.17 2.31
ECoT 0.46 0.48 0.75 0.80 0.49 0.47 2.41 15.65 0.63
PS 0.66 0.54 0.79 0.82 0.53 1.53 5.15 44.58 0.98
MP 0.68 0.54 0.77 0.83 0.56 0.98 4.36 28.56 0.75
Finetune SFT 0.76 0.55 0.84 0.84 0.51 3.12 10.83 21.09 10.48
FSM 0.74 0.55 0.84 0.84 0.48 3.85 12.52 11.21 13.38
RL EMDP 0.75 0.47 0.77 0.81 0.46 2.61 9.88 7.66 8.27
Self-EmoQ 0.82 0.67 0.89 0.88 0.57 2.95 10.47 10.88 9.80
SFT-EmoQ 0.82 0.68 0.89 0.89 0.73 2.38 7.96 9.25 7.49
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.72 0.54 0.71 0.81 0.50 1.77 9.16 52.17 2.31
ECoT 0.49 0.46 0.74 0.77 0.43 0.47 2.41 15.65 0.63
PS 0.69 0.55 0.75 0.81 0.49 1.53 5.15 44.58 0.98
MP 0.70 0.52 0.69 0.81 0.48 0.98 4.37 28.56 0.75
Finetune SFT 0.78 0.62 0.87 0.83 0.58 3.12 10.8 21.09 10.48
FSM 0.77 0.63 0.88 0.82 0.57 3.85 12.53 11.21 13.38
RL EMDP 0.76 0.55 0.79 0.74 0.51 2.61 9.88 7.66 8.27
Self-EmoQ 0.82 0.69 0.89 0.86 0.54 2.89 10.21 9.67 9.19
SFT-EmoQ 0.82 0.74 0.91 0.87 0.54 2.36 7.96 9.31 7.54
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.74 0.54 0.71 0.81 0.50 1.77 9.15 52.17 2.31
ECoT 0.53 0.47 0.75 0.77 0.42 0.47 2.41 15.65 0.63
PS 0.72 0.55 0.75 0.81 0.50 1.53 5.15 44.58 0.98
MP 0.72 0.52 0.69 0.81 0.48 0.98 4.36 28.56 0.75
Finetune SFT 0.81 0.62 0.87 0.83 0.58 3.12 10.81 21.09 10.48
FSM 0.80 0.63 0.88 0.83 0.57 3.85 12.54 11.21 13.38
RL EMDP 0.76 0.55 0.79 0.74 0.51 2.61 9.89 7.66 8.27
Self-EmoQ 0.81 0.69 0.89 0.85 0.54 2.94 10.29 9.49 9.41
SFT-EmoQ 0.82 0.74 0.90 0.87 0.55 2.26 7.82 8.77 7.19
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.76 0.54 0.71 0.81 0.50 1.77 9.17 52.17 2.31
ECoT 0.57 0.46 0.75 0.77 0.42 0.47 2.42 15.65 0.63
PS 0.74 0.55 0.75 0.81 0.50 1.53 5.15 44.58 0.98
MP 0.74 0.52 0.69 0.80 0.48 0.98 4.36 28.56 0.75
Finetune SFT 0.82 0.62 0.86 0.83 0.58 3.12 10.83 21.09 10.48
FSM 0.83 0.63 0.88 0.83 0.57 3.85 12.49 11.21 13.38
RL EMDP 0.77 0.55 0.79 0.74 0.51 2.61 9.87 7.66 8.27
Self-EmoQ 0.81 0.68 0.88 0.85 0.54 2.92 10.23 9.5 9.21
SFT-EmoQ 0.81 0.74 0.90 0.87 0.56 2.95 10.47 12.55 10.02
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.80 0.53 0.69 0.81 0.49 1.77 9.15 52.17 2.31
ECoT 0.65 0.48 0.75 0.77 0.42 0.47 2.41 15.65 0.63
PS 0.79 0.55 0.73 0.81 0.50 1.53 5.15 44.58 0.98
MP 0.78 0.51 0.66 0.80 0.47 0.98 4.36 28.56 0.75
Finetune SFT 0.81 0.64 0.87 0.84 0.59 3.12 10.8 21.09 10.48
FSM 0.84 0.65 0.89 0.83 0.59 3.85 12.55 11.21 13.38
RL EMDP 0.78 0.55 0.79 0.74 0.51 2.61 9.89 7.66 8.27
Self-EmoQ 0.86 0.69 0.88 0.85 0.54 2.89 10.19 9.86 9.32
SFT-EmoQ 0.81 0.75 0.90 0.84 0.49 2.3 7.79 8.67 7.23
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.85 0.50 0.64 0.79 0.48 1.77 9.14 52.17 2.31
ECoT 0.14 0.45 0.75 0.75 0.40 0.47 2.42 15.65 0.63
PS 0.85 0.54 0.70 0.79 0.50 1.53 5.15 44.58 0.98
MP 0.83 0.49 0.61 0.78 0.45 0.98 4.36 28.56 0.75
Finetune SFT 0.88 0.66 0.87 0.84 0.63 3.12 10.77 21.09 10.48
FSM 0.89 0.67 0.89 0.83 0.63 3.85 12.54 11.21 13.38
RL EMDP 0.79 0.57 0.79 0.74 0.54 2.61 9.9 7.66 8.27
Self-EmoQ 0.91 0.74 0.87 0.86 0.62 3.28 11.47 11.36 11.00
SFT-EmoQ 0.81 0.74 0.88 0.83 0.48 2.30 7.76 8.66 7.23
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.91 0.51 0.62 0.62 0.48 1.77 9.15 52.17 2.31
ECoT 0.83 0.41 0.69 0.52 0.36 0.47 2.42 15.65 0.63
PS 0.92 0.54 0.68 0.63 0.51 1.53 5.14 44.58 0.98
MP 0.89 0.49 0.59 0.59 0.45 0.98 4.36 28.56 0.75
Finetune SFT 0.90 0.69 0.87 0.72 0.65 3.12 10.79 21.09 10.48
FSM 0.91 0.70 0.90 0.73 0.65 3.85 12.53 11.21 13.38
RL EMDP 0.80 0.57 0.79 0.65 0.54 2.61 9.91 7.66 8.27
Self-EmoQ 0.85 0.71 0.92 0.60 0.47 2.57 9.26 11.04 7.97
SFT-EmoQ 0.80 0.71 0.91 0.58 0.45 2.36 7.94 9.31 7.54
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.44 0.31 0.43 0.83 0.51 1.74 6.74 6.3 1.28
ECoT 0.21 0.27 0.41 0.79 0.39 0.5 2.47 9.14 0.71
PS 0.46 0.33 0.44 0.83 0.48 1.16 3.6 3.51 0.30
MP 0.41 0.30 0.42 0.83 0.52 0.76 3.16 6.93 0.17
Finetune SFT 0.39 0.33 0.44 0.83 0.48 6.52 13.27 1.8 36.83
FSM 0.43 0.34 0.45 0.84 0.50 3.24 9.52 38.86 13.15
RL EMDP 0.33 0.25 0.41 0.77 0.59 6.77 16.29 18.7 35.27
Self-EmoQ 0.53 0.46 0.65 0.87 0.40 19.43 30.42 44.57 40.1
SFT-EmoQ 0.32 0.45 0.64 0.87 0.42 20.06 31.06 45.77 38.88
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.49 0.39 0.47 0.82 0.46 1.74 6.74 6.3 1.28
ECoT 0.27 0.29 0.42 0.78 0.34 0.5 2.47 9.14 0.71
PS 0.51 0.42 0.49 0.83 0.46 1.16 3.59 3.51 0.30
MP 0.46 0.37 0.46 0.82 0.45 0.76 3.16 6.93 0.17
Finetune SFT 0.46 0.47 0.52 0.82 0.50 6.52 13.22 1.8 36.83
FSM 0.48 0.48 0.53 0.83 0.51 3.24 9.51 38.86 13.15
RL EMDP 0.39 0.26 0.44 0.68 0.36 6.77 16.32 18.7 35.27
Self-EmoQ 0.58 0.53 0.71 0.86 0.47 20 31.49 43.75 39.2
SFT-EmoQ 0.52 0.51 0.66 0.86 0.44 20.06 31.09 45.47 38.75
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.54 0.39 0.47 0.82 0.46 1.74 6.73 6.3 1.28
ECoT 0.36 0.28 0.41 0.78 0.33 0.5 2.47 9.14 0.71
PS 0.56 0.42 0.49 0.83 0.46 1.16 3.6 3.51 0.30
MP 0.52 0.37 0.46 0.82 0.45 0.76 3.16 6.93 0.17
Finetune SFT 0.51 0.47 0.52 0.83 0.50 6.52 13.25 1.8 36.83
FSM 0.53 0.48 0.53 0.83 0.51 3.24 9.48 38.86 13.15
RL EMDP 0.21 0.26 0.44 0.68 0.36 6.77 16.28 18.7 35.27
Self-EmoQ 0.61 0.55 0.70 0.86 0.47 20.19 31.66 44.78 39.00
SFT-EmoQ 0.44 0.52 0.67 0.86 0.45 20.2 31.4 43.68 40.24
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.59 0.39 0.46 0.83 0.46 1.74 6.75 6.3 1.28
ECoT 0.42 0.28 0.41 0.78 0.34 0.5 2.46 9.14 0.71
PS 0.61 0.42 0.49 0.83 0.46 1.16 3.6 3.51 0.30
MP 0.57 0.37 0.46 0.82 0.45 0.76 3.16 6.93 0.17
Finetune SFT 0.57 0.47 0.52 0.83 0.51 6.52 13.17 1.8 36.83
FSM 0.59 0.48 0.54 0.84 0.52 3.24 9.5 38.86 13.15
RL EMDP 0.50 0.26 0.44 0.68 0.36 6.77 16.27 18.7 35.27
Self-EmoQ 0.67 0.54 0.70 0.86 0.47 19.56 30.7 44.8 39.12
SFT-EmoQ 0.62 0.54 0.68 0.86 0.43 19.68 31.08 46.4 34.79
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.69 0.41 0.48 0.83 0.47 1.74 6.74 6.3 1.28
ECoT 0.51 0.28 0.41 0.78 0.33 0.5 2.47 9.14 0.71
PS 0.71 0.45 0.52 0.83 0.48 1.16 3.59 3.51 0.30
MP 0.68 0.39 0.47 0.82 0.45 0.76 3.17 6.93 0.17
Finetune SFT 0.59 0.51 0.56 0.84 0.53 6.52 13.25 1.8 36.83
FSM 0.64 0.52 0.57 0.84 0.54 3.24 9.52 38.86 13.15
RL EMDP 0.61 0.26 0.44 0.69 0.36 6.77 16.35 18.7 35.27
Self-EmoQ 0.75 0.54 0.71 0.85 0.45 20.1 31.65 42.04 38.89
SFT-EmoQ 0.73 0.52 0.68 0.85 0.43 19.07 29.86 45.51 31.58
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.80 0.50 0.55 0.81 0.51 1.74 6.73 6.3 1.28
ECoT 0.63 0.30 0.49 0.75 0.30 0.5 2.46 9.14 0.71
PS 0.81 0.54 0.59 0.82 0.53 1.16 3.61 3.51 0.30
MP 0.79 0.47 0.54 0.81 0.50 0.76 3.16 6.93 0.17
Finetune SFT 0.81 0.62 0.65 0.83 0.60 6.52 13.21 1.8 36.83
FSM 0.83 0.64 0.67 0.84 0.62 3.24 9.52 38.86 13.15
RL EMDP 0.72 0.34 0.53 0.67 0.39 6.77 16.29 18.7 35.27
Self-EmoQ 0.83 0.58 0.79 0.84 0.48 19.49 30.77 45.2 39.24
SFT-EmoQ 0.69 0.49 0.67 0.80 0.38 19.66 30.99 47.8 35.51
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Prompt 0-shot 0.90 0.52 0.55 0.64 0.52 1.74 6.73 6.3 1.28
ECoT 0.80 0.32 0.51 0.46 0.30 0.5 2.47 9.14 0.71
PS 0.96 0.56 0.60 0.64 0.54 1.16 3.61 3.51 0.30
MP 0.95 0.49 0.55 0.63 0.51 0.76 3.16 6.93 0.17
Finetune SFT 0.97 0.66 0.67 0.70 0.62 6.52 13.24 1.8 36.83
FSM 0.98 0.67 0.69 0.72 0.65 3.24 9.54 38.86 13.15
RL EMDP 0.88 0.35 0.53 0.54 0.39 6.77 16.28 18.7 35.27
Self-EmoQ 0.88 0.38 0.54 0.52 0.38 19.87 31.56 40.43 39.55
SFT-EmoQ 0.88 0.49 0.66 0.52 0.38 19.62 31.00 47.54 35.55

Tables below report the results of the automatic metrics for emotion determination and response generation, respectively. Across all four datasets, Self-EmoQ consistently achieves the highest or near-highest performance on the reward and ranking metrics.

Loss curves

The above four figures illustrate the training loss curves of Self-EmoQ on four datasets. We observe that the training process is stable across all datasets, with the loss consistently decreasing and converging without oscillation. The Plutchik score does not introduce training instability.

Transition matrix

The above four figures visualizes the emotion transition matrices. We observe that high-probability transitions concentrate along the diagonal and between adjacent emotions, which aligns well with the topology of Plutchik’s Wheel of Emotion.

Sensitivity analysis

The above two figures illustrate the sensitivity of the parameters \(w\) and \(\gamma\). We observe that the optimal value of \(\gamma\) varies across datasets and correlates with their average dialogue length. Datasets with longer conversational trajectories, such as IEMOCAP, tend to favor larger \(\gamma\) values, as long-term emotional consistency becomes more important. In contrast, DailyDialogue, which consists of shorter interactions, achieves optimal performance with a smaller \(\gamma\).

Ablation

w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.78 0.74 0.89 0.90 0.66 9.11 25.35 54.71 41.72
SFT-EmoQ 0.67 0.74 0.89 0.89 0.66 9.02 25.28 54.77 41.13
EmoQ-head 0.77 0.35 0.68 0.77 0.35 4.80 17.51 43.37 41.13
w/o history 0.49 0.74 0.89 0.90 0.66 2.48 12.07 0.03 31.82
w/o desc 0.62 0.75 0.89 0.90 0.59 5.77 20.37 46.78 0.00
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.73 0.76 0.90 0.89 0.65 9.11 25.35 54.72 41.73
SFT-EmoQ 0.55 0.78 0.90 0.90 0.64 9.04 25.29 54.78 41.34
EmoQ-head 0.62 0.49 0.80 0.81 0.41 4.65 17.15 44.55 12.53
w/o history 0.44 0.79 0.90 0.88 0.52 0.29 5.74 0.58 0.21
w/o desc 0.57 0.77 0.90 0.90 0.65 6.45 21.87 49.38 22.58
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.69 0.78 0.90 0.90 0.65 9.11 25.33 54.72 41.73
SFT-EmoQ 0.54 0.78 0.91 0.90 0.65 9.04 25.26 54.78 41.34
EmoQ-head 0.57 0.40 0.69 0.78 0.34 4.26 15.84 45.46 11.14
w/o history 0.33 0.78 0.90 0.91 0.65 2.48 12.07 0.03 0.00
w/o desc 0.45 0.79 0.90 0.88 0.50 5.58 19.99 47.72 18.42
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.65 0.78 0.91 0.91 0.65 9.11 25.37 54.66 41.91
SFT-EmoQ 0.45 0.79 0.91 0.91 0.66 9.04 25.27 54.78 41.34
EmoQ-head 0.53 0.52 0.80 0.82 0.43 4.49 16.35 45.17 12.66
w/o history 0.30 0.78 0.91 0.91 0.66 2.48 12.07 0.03 0.00
w/o desc 0.41 0.78 0.91 0.91 0.66 6.45 21.84 49.37 22.59
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.57 0.82 0.92 0.92 0.72 9.11 25.34 54.72 41.73
SFT-EmoQ 0.45 0.83 0.92 0.92 0.72 9.07 25.3 54.74 41.41
EmoQ-head 0.47 0.23 0.77 0.78 0.33 4.89 18.22 45.22 13.98
w/o history 0.21 0.82 0.92 0.88 0.48 0.00 0.69 50 0.01
w/o desc 0.33 0.82 0.92 0.92 0.72 6.45 21.86 49.38 22.58
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.48 0.89 0.95 0.94 0.81 9.11 25.35 54.72 41.72
SFT-EmoQ 0.45 0.89 0.95 0.93 0.81 9.04 25.23 54.78 41.34
EmoQ-head 0.40 0.35 0.75 0.74 0.32 4.23 15.45 44.26 10.88
w/o history 0.13 0.89 0.95 0.94 0.81 2.48 12.07 0.03 0.00
w/o desc 0.25 0.89 0.95 0.94 0.81 6.45 21.88 49.39 22.6
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.35 0.83 0.97 0.89 0.85 9.11 25.35 54.72 41.73
SFT-EmoQ 0.35 0.97 0.99 0.92 0.89 9.04 25.25 54.78 41.34
EmoQ-head 0.33 0.28 0.52 0.41 0.23 3.81 14.46 44.07 9.36
w/o history 0.04 0.21 0.18 0.43 0.27 0.22 6.67 0.03 0.00
w/o desc 0.14 0.53 0.98 0.49 0.33 5.14 17.78 43.47 15.12
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.53 0.58 0.81 0.87 0.50 4.15 12.36 18.95 13.28
SFT-EmoQ 0.50 0.57 0.81 0.87 0.49 4.33 12.50 21.24 15.15
EmoQ-head 0.43 0.47 0.73 0.83 0.41 4.10 12.48 5.85 14.06
w/o history 0.40 0.57 0.82 0.86 0.44 1.46 7.69 2.8 1.27
w/o desc 0.46 0.53 0.82 0.85 0.33 3.91 12.49 7.75 12.49
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.53 0.66 0.85 0.85 0.52 4.03 12.41 17.18 13.20
SFT-EmoQ 0.52 0.66 0.85 0.85 0.50 4.18 12.51 20.63 14.95
EmoQ-head 0.43 0.36 0.67 0.77 0.33 3.58 11.54 4.63 10.96
w/o history 0.42 0.65 0.84 0.83 0.43 0.95 6.69 4.11 1.34
w/o desc 0.48 0.60 0.84 0.84 0.46 4.07 12.57 9.5 12.83
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.61 0.66 0.85 0.85 0.54 4.33 12.65 21.85 15.87
SFT-EmoQ 0.57 0.65 0.83 0.84 0.49 4.04 12.35 20.03 13.37
EmoQ-head 0.48 0.45 0.73 0.79 0.38 4.05 12.29 5.48 12.11
w/o history 0.51 0.63 0.85 0.83 0.45 1.36 7.45 3.77 1.47
w/o desc 0.55 0.60 0.84 0.85 0.50 4.36 13 11.38 14.75
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.67 0.64 0.85 0.84 0.51 3.86 11.95 17.72 12.00
SFT-EmoQ 0.63 0.65 0.85 0.84 0.53 4.07 12.45 18.88 13.26
EmoQ-head 0.56 0.44 0.71 0.79 0.37 3.59 11.81 6.16 12.14
w/o history 0.58 0.65 0.85 0.84 0.52 1.71 8.11 1.67 0.85
w/o desc 0.60 0.50 0.84 0.81 0.36 3.96 12.65 9.06 12.58
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.71 0.63 0.84 0.83 0.50 4.39 12.89 16.82 14.23
SFT-EmoQ 0.69 0.66 0.83 0.85 0.54 4.3 12.4 22.72 15.24
EmoQ-head 0.60 0.40 0.67 0.76 0.36 3.97 12.91 4.94 13.45
w/o history 0.61 0.49 0.66 0.83 0.47 0.71 6.27 2.14 0.58
w/o desc 0.67 0.60 0.82 0.83 0.50 4.39 13.12 8.65 14.79
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.79 0.65 0.84 0.83 0.52 4.34 12.44 24.38 15.66
SFT-EmoQ 0.78 0.60 0.82 0.80 0.48 3.99 12.3 16.66 12.65
EmoQ-head 0.66 0.46 0.73 0.75 0.38 4.14 12.77 5.75 13.34
w/o history 0.69 0.64 0.82 0.78 0.40 1.17 7.33 1.33 1.6
w/o desc 0.73 0.41 0.82 0.76 0.33 3.89 12.34 7.13 12.43
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.88 0.48 0.78 0.60 0.47 4.34 12.44 24.38 15.66
SFT-EmoQ 0.88 0.51 0.80 0.54 0.39 3.84 12.27 18.9 11.99
EmoQ-head 0.75 0.41 0.69 0.51 0.36 4.00 12.44 5.9 13.23
w/o history 0.65 0.29 0.47 0.47 0.30 0.91 4.37 1.51 1.89
w/o desc 0.69 0.29 0.66 0.48 0.32 4.06 12.26 8.71 12.99
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.82 0.67 0.89 0.88 0.57 2.95 10.47 10.88 9.80
SFT-EmoQ 0.82 0.68 0.89 0.89 0.73 2.38 7.96 9.25 7.49
EmoQ-head 0.60 0.46 0.73 0.82 0.37 3.13 10.22 8.29 10.84
w/o history 0.68 0.68 0.90 0.89 0.72 0.84 3.53 0.29 0.94
w/o desc 0.70 0.68 0.88 0.89 0.72 2.55 8.32 12.84 8.62
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.82 0.69 0.89 0.86 0.54 2.89 10.21 9.67 9.19
SFT-EmoQ 0.82 0.74 0.91 0.87 0.54 2.36 7.96 9.31 7.54
EmoQ-head 0.60 0.41 0.70 0.76 0.36 2.70 9.44 6.42 8.73
w/o history 0.70 0.74 0.90 0.86 0.53 0.84 3.53 0.29 0.94
w/o desc 0.72 0.62 0.90 0.86 0.52 2.38 8.1 12.48 7.95
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.81 0.69 0.89 0.85 0.54 2.94 10.29 9.49 9.41
SFT-EmoQ 0.82 0.74 0.90 0.87 0.55 2.26 7.82 8.77 7.19
EmoQ-head 0.61 0.40 0.70 0.75 0.35 2.76 9.53 7.48 8.98
w/o history 0.69 0.74 0.90 0.87 0.55 0.84 3.55 0.29 0.94
w/o desc 0.73 0.74 0.90 0.87 0.55 2.38 8.12 12.48 7.95
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.81 0.68 0.88 0.85 0.54 2.92 10.23 9.5 9.21
SFT-EmoQ 0.81 0.74 0.90 0.87 0.56 2.95 10.47 12.55 10.02
EmoQ-head 0.58 0.43 0.73 0.75 0.36 2.99 10.56 8.3 9.91
w/o history 0.71 0.74 0.90 0.86 0.52 0.84 3.52 0.29 0.94
w/o desc 0.70 0.56 0.90 0.82 0.44 3.21 11.64 6.99 10.2
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.86 0.69 0.88 0.85 0.54 2.89 10.19 9.86 9.32
SFT-EmoQ 0.81 0.75 0.90 0.84 0.49 2.3 7.79 8.67 7.23
EmoQ-head 0.60 0.44 0.73 0.74 0.35 2.89 9.91 6.41 9.89
w/o history 0.61 0.76 0.90 0.84 0.49 0.84 3.53 0.29 0.94
w/o desc 0.67 0.70 0.90 0.85 0.51 2.38 8.1 12.48 7.95
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.91 0.74 0.87 0.86 0.62 3.28 11.47 11.36 11.00
SFT-EmoQ 0.81 0.74 0.88 0.83 0.48 2.30 7.76 8.66 7.23
EmoQ-head 0.70 0.32 0.73 0.68 0.27 2.75 9.24 4.53 8.99
w/o history 0.80 0.75 0.87 0.79 0.43 1.87 7.62 1.64 1.87
w/o desc 0.81 0.75 0.87 0.79 0.43 3.21 11.24 6.99 10.2
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.85 0.71 0.92 0.60 0.47 2.57 9.26 11.04 7.97
SFT-EmoQ 0.80 0.71 0.91 0.58 0.45 2.36 7.94 9.31 7.54
EmoQ-head 0.66 0.42 0.88 0.50 0.34 2.88 10.09 6.27 8.6
w/o history 0.74 0.21 0.75 0.46 0.29 0.65 7.26 0.94 0.24
w/o desc 0.75 0.25 0.75 0.44 0.27 3.08 10.57 6.19 8.02
w =
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.53 0.46 0.65 0.87 0.40 19.43 30.42 44.57 39.10
SFT-EmoQ 0.32 0.45 0.64 0.87 0.42 20.06 31.06 45.77 38.88
EmoQ-head 0.42 0.33 0.52 0.79 0.34 7.58 17.64 17.79 39.45
w/o history 0.32 0.34 0.56 0.83 0.35 0.36 7.07 0.05 0.01
w/o desc 0.31 0.43 0.65 0.87 0.34 13.51 24.45 14.25 31.07
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.58 0.53 0.71 0.86 0.47 20 31.49 43.75 39.2
SFT-EmoQ 0.52 0.51 0.66 0.86 0.44 20.06 31.09 45.47 38.75
EmoQ-head 0.46 0.35 0.54 0.79 0.32 7.57 17.55 15.58 39.27
w/o history 0.35 0.28 0.50 0.82 0.32 0.36 7.17 0.09 0
w/o desc 0.37 0.48 0.69 0.85 0.39 12.98 24.06 16.25 38.58
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.61 0.55 0.70 0.86 0.47 20.19 31.66 44.78 39.00
SFT-EmoQ 0.44 0.52 0.67 0.86 0.45 20.2 31.4 43.68 40.24
EmoQ-head 0.47 0.25 0.43 0.77 0.28 7.77 17.32 16.26 30.51
w/o history 0.35 0.42 0.62 0.84 0.36 0.44 4.41 0.01 0.82
w/o desc 0.37 0.45 0.64 0.84 0.37 12.71 23.68 16.82 25.85
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.67 0.54 0.70 0.86 0.47 19.56 30.7 44.8 39.12
SFT-EmoQ 0.62 0.54 0.68 0.86 0.43 19.68 31.08 46.4 34.79
EmoQ-head 0.55 0.29 0.51 0.79 0.31 7.34 16.48 20.79 37.77
w/o history 0.43 0.38 0.50 0.82 0.43 0.1 0.35 0 0
w/o desc 0.41 0.45 0.66 0.84 0.38 12.88 23.81 15.88 36.79
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.75 0.54 0.71 0.85 0.45 20.1 31.65 42.04 38.89
SFT-EmoQ 0.73 0.52 0.68 0.85 0.43 19.07 29.86 45.51 31.58
EmoQ-head 0.59 0.33 0.60 0.77 0.34 7.88 17.95 19.78 39.99
w/o history 0.58 0.38 0.53 0.80 0.35 0.37 7.08 0.05 0.01
w/o desc 0.46 0.49 0.66 0.84 0.41 12.92 23.91 16.31 26.81
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.83 0.58 0.79 0.84 0.48 19.49 30.77 45.2 39.24
SFT-EmoQ 0.69 0.49 0.67 0.80 0.38 19.66 30.99 47.8 35.51
EmoQ-head 0.69 0.25 0.44 0.72 0.25 8.24 18.06 17.94 33.53
w/o history 0.60 0.46 0.64 0.79 0.43 0.22 1.19 0 0.22
w/o desc 0.56 0.33 0.61 0.79 0.30 12.3 23.01 17.63 23.72
Model Emotion Response
Reward Recall@3 Recall@5 NDCG MRR B-2 R-L D-2 CIDEr
Self-EmoQ 0.88 0.38 0.54 0.52 0.38 19.87 31.56 40.43 39.55
SFT-EmoQ 0.88 0.49 0.66 0.52 0.38 19.62 31.00 47.54 35.55
EmoQ-head 0.78 0.26 0.44 0.44 0.28 7.98 17.97 15.98 31.24
w/o history 0.65 0.23 0.57 0.39 0.21 1.96 4.87 0 0.45
w/o desc 0.62 0.30 0.48 0.39 0.22 12.57 23.04 17.08 24.88

Table above reports the results of the ablation study. Replacing our token-logit-based DQN with an additional MLP head applied after the model output (EmoQ-head) leads to a degradation across all ranking metrics. This result confirms that the Q-function modeling adopted in Self-EmoQ is more effective.

Human Evaluation

Fluency p Emotion p Acceptance p Effective p Sensitivity p Alignment p Satisfaction p
Prompt 0-shot 3.09(1.35) <0.01 3.09(1.23) <0.01 2.68(1.11) <0.01 2.91(1.16) <0.01 2.91(1.28) <0.01 2.82(1.24) <0.01 2.92(0.53) <0.01
ECoT 3.08(1.25) <0.01 3.08(1.26) <0.01 2.77(1.10) <0.01 2.67(1.13) <0.01 3.00(1.11) <0.01 2.73(1.37) <0.01 2.89(0.47) <0.01
PS 3.12(1.09) <0.01 3.21(1.24) <0.01 2.78(1.27) <0.01 2.99(1.09) <0.05 2.83(1.17) <0.01 2.96(1.32) <0.01 2.98(0.52) <0.01
MP 3.14(1.43) <0.01 3.13(1.24) <0.01 2.74(1.15) <0.01 2.67(1.13) <0.01 2.77(1.2) <0.01 2.88(1.06) <0.01 2.89(0.52) <0.01
Finetune SFT 3.15(1.12) <0.01 3.40(1.30) <0.01 2.89(0.97) <0.01 2.71(1.34) <0.01 2.91(1.04) <0.01 3.05(1.30) <0.01 3.02(0.51) <0.01
FSM 3.36(1.36) <0.01 3.47(1.21) <0.01 2.96(1.18) <0.01 3.02(1.14) 0.08 3.09(1.25) <0.01 3.18(1.18) 0.13 3.18(0.53) <0.01
RL EMDP 3.48(1.20) 0.08 3.38(1.34) <0.01 2.79(1.18) <0.01 2.69(1.45) <0.01 2.99(1.40) <0.01 3.13(1.37) 0.06 3.08(0.48) <0.01
Self-EmoQ 3.60(1.32) -- 3.86(1.11) -- 3.21(1.17) -- 3.14(1.23) -- 3.30(1.26) -- 3.27(1.20) -- 3.40(0.48) --
SFT-EmoQ 3.54(1.40) 0.28 3.79(1.16) 0.20 3.10(1.25) 0.09 3.05(1.28) 0.18 3.26(1.33) 0.32 3.22(1.29) 0.28 3.33(0.46) <0.05

We invited 10 interns as evaluators for the human evaluation. We sampled 10 dialogue sessions from each test set, and each evaluator independently scored all generated responses. The final score for each method was obtained by averaging the evaluators’ ratings, as reported in Table above.

Regarding statistical significance, we conducted t-tests on the average scores between different methods. The results of the significance tests are also shown in the table, under the null hypothesis \( H_0: \text{Metric}_X > \text{Metric}_{Self-EmoQ} \). The results indicate that our method achieves statistically significant improvements over most baseline methods. The evaluation principles are presented below.

Principle of Human Scoring (click to expand)

We start with the criteria proposed by Kang et al. (2024). The human evaluation is designed to align with the ultimate goal of emotional support conversations (ESC), namely the seeker's satisfaction. To achieve this goal, the supporter’s behavior is evaluated according to the following criteria:

  • Acceptance: Whether the seeker can accept the response without discomfort.
  • Effectiveness: Whether the response helps shift negative emotions or attitudes toward a positive direction.
  • Sensitivity: Whether the response takes into account the seeker’s overall emotional state.
  • Alignment: Whether the response aligns with the predicted strategy.


To enable a more fine-grained assessment of generation quality, we further introduce the following dimensions:

  • Fluency: The overall fluency and linguistic quality of the response.
  • Emotion: The emotional intensity expressed in the response and its influence on the seeker.
  • Interesting: Whether the response can arouse the seeker’s interest or curiosity through vivid or engaging expressions.

Intern evaluators are asked to rate model outputs across multiple aspects, including Fluency, Emotion, Interesting, and Satisfaction, where Satisfaction encompasses Acceptance, Effectiveness, Sensitivity, and overall satisfaction.


Throughout the evaluation process, we strictly adhere to international regulations and ethical standards, ensuring compliance with established guidelines regarding participant involvement and data integrity. All evaluators independently assess each sample according to pre-defined criteria, maintaining objectivity, consistency, and reliability.

Detailed manual scoring criteria are listed below:

Fluency

  1. Highly incoherent; extremely difficult to understand.
  2. Significant incoherence; only fragments are meaningful.
  3. Some incoherence, but the general meaning is still conveyed.
  4. Mostly fluent with minor errors or awkwardness.
  5. Perfectly fluent, clear, and error-free.

Emotion

  1. Emotionally inappropriate or chaotic.
  2. Obvious emotional flaws or exaggeration.
  3. Average emotional expression with limited depth.
  4. Good emotional expression with appropriate intensity.
  5. Excellent, nuanced, and highly appropriate emotional expression.

Acceptance

  1. Strong emotional resistance is triggered.
  2. High likelihood of emotional resistance.
  3. Possible emotional resistance.
  4. Rare emotional resistance.
  5. No emotional resistance.

Effectiveness

  1. Worsens the seeker’s emotional distress.
  2. Potentially increases emotional stress.
  3. Fails to change emotional intensity.
  4. Partially effective but overly complex or unclear.
  5. Highly effective in soothing emotions and providing support.

Sensitivity

  1. Incorrect assessment of the seeker’s state.
  2. Rash judgment without sufficient exploration.
  3. One-sided understanding of the seeker’s state.
  4. Partial understanding of the seeker’s situation.
  5. Accurate and well-tailored understanding.

Alignment

  1. Completely contradicts the predicted strategy.
  2. Slight deviation from the predicted strategy.
  3. Ambiguous alignment.
  4. Largely aligned with minor ambiguities.
  5. Fully consistent with the predicted strategy.

Satisfaction

  1. Extremely disappointing and unhelpful.
  2. Poor and incomplete response.
  3. Adequate but unremarkable.
  4. Clear and helpful with useful details.
  5. Excellent, comprehensive, and insightful.

TTS

TTS Metrics

We evaluate the quality of the generated speech using SpeechBERTScore (BERT) and PESQ . SpeechBERTScore computes token-level similarity between the generated speech and the reference utterance in a shared embedding space, enabling reference-aware evaluation of semantic consistency while being robust to surface-level variations in acoustic realization. PESQ is a reference-aware objective metric to evaluate the perceptual speech quality. It assumes the generated and reference speech signals are time-aligned. As shown in the results, our method consistently outperforms the baselines across both metrics, indicating superior performance in terms of semantic preservation and perceptual speech quality.

TTS Case

history user: (no emotion) Hi , Monica , how is everything going ?
assistant: (no emotion) Everything goes well , but I am thinking about quitting my current job .
user: (surprise) Why ? You ’ re not satisfied anymore ?
assistant: (no emotion) I just sense . But I cannot grow anymore . My boss is not really supporting me . I am interested in some positions in other JV companies , but I need to do some more in-depth research before I send my application letters out .
user: (happiness) That is important . Doing research on a company you are interested in will definitely help your application .
Method Emotion Generated text Streaming TTS
Source Data happiness Certainly , it is very nice talking with you. But I really have to go now. Catch you later.
FSM No emotion Thank you for your advice.
CoCT surprise That's a good point, I didn't think about the importance of researching the company before applying.
Self-EmoQ happiness I will also need to update my resume. I ’ Ve heard that the most important thing in a resume is to make it short and concise.
history Joey Tribbiani: (Neutral) Hello.
Chandler Bing: (Scared) Hey, it's me. I know you can't stand to be in the same room as me, so I just thought I'd try and apologize over the phone. All I...
Joey Tribbiani: (Neutral) Hello.
Chandler Bing: (Mad) Look I never should have kissed your girlfriend, but I'm...
Method Emotion Generated text Streaming TTS
Source Data Mad Stop callin'!!
FSM Neutral I don't want to hear it.
CoCT Anger Oh, so now you're apologizing? You think a simple phone call is going to make everything okay? You kissed my girlfriend, Chandler. That's not something you just apologize for over the phone.
Self-EmoQ Mad I don't want to hear it.
history Joey: (sadness) God, it's gonna so weird like when I come home and you're not here.
Joey: (sadness) No more Joey and Chan's.
Joey: (sadness) No more J and C's.
Joey: (neutral) "You wanna go over to Joey and Chandler's?"
Joey: (neutral) "Can't, its not there."
Chandler: (neutral) Look, I'm just gonna be across the hall, we can still do all the same stuff.
Method Emotion Generated text Streaming TTS
Source Data sadness Yeah but we won't be able to like get up in the middle of the night and have those long talks about our feelings and the future.
FSM sadness We're all just gonna go with the specials board.
CoCT Neutral Oh, okay, that sounds cool, we can still hang out.
Self-EmoQ sadness Yeah, but it's not gonna be the same.
history ...
male: (Sadness) I felt, what you said, ashamed somehow
male: (Sadness) because nobody was changed at all. It seemed to make suckers out of a lot of guys.
male: (Sadness) I felt wrong to be alive, to open the new bank book, to drive the new car, to see the new refrigerator.
male: (Frustration) I mean you can take those things out of war but
male: (Sadness) when you drive the new car, you have to know that it came out of the love a man can have for a man. You've got to be a little better because of that.
male: (Sadness) Otherwise what you have is really just loot and there's blood on it. I didn't want to take any of it.
male: (Sadness) I guess that included you.
female: (Neutral) And do you still feel that way?
male: (Neutral) I want you now, Annie.
female: (Anger) Because you mustn't feel that way anymore because you have a right to whatever you have, Chris, do you understand, everything
female: (Anger) that includes me. And your money there's nothing wrong in your money. Your father put hundreds of planes in the air.
Method Emotion Generated text Streaming TTS
Source Data Anger You should be proud. A man should be paid for that.
FSM Neutral Yes, thank you.
CoCT Happiness Oh, Chris, I've missed you too."
Self-EmoQ Anger You can't think like that.

The last two columns of the table present the TTS results of the baseline methods and Self-EmoQ. The Emotion TTS column shows speech generated by combining the outputs of each model with their corresponding emotional states. The final column (Streaming TTS) reflects the streaming speech output used in our practical application. Since all baseline methods generate text before emotion, emotional information is unavailable during streaming synthesis; therefore, their streaming outputs are emotion-agnostic. In contrast, Self-EmoQ produces identical outputs under both settings.

Conclusion

We propose Self-EmoQ, an emotion-planning dialogue framework that formulates self-emotion as a reinforcement learning action determined prior to response generation. Based on Plutchik's Wheel of Emotion, our theory-driven reward enables long-term emotional planning beyond supervised labels, while a streaming-compatible design allows the planned emotion to consistently guide both text generation and emotional TTS. Experiments show that our method outperforms other baselines on multiple datasets.

BibTeX

@inproceedings{zhao-etal-2026-selfemoq,
    title     = {Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS},
    author    = {Zhao, Yue and Li, Hongyan and Chen, Yong and Ji, Luo},
    booktitle = {Findings of the Association for Computational Linguistics: ACL 2026},
    year      = {2026},
    publisher = {Association for Computational Linguistics}
}