EmoQ Project Page

Automatic Metrics

w =

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.60	0.58	0.70	0.85	0.55	3.53	11.48	40.81	3.63
	ECoT	0.18	0.51	0.64	0.80	0.42	0.86	3.57	14.13	0.51
	PS	0.65	0.64	0.76	0.85	0.55	2.40	6.73	35.97	1.41
	MP	0.61	0.62	0.74	0.85	0.54	1.30	4.97	13.4	1.17
Finetune	SFT	0.74	0.71	0.82	0.86	0.50	6.27	21.82	51.76	22.06
Finetune	FSM	0.69	0.68	0.79	0.86	0.51	6.32	21.9	50.35	21.76
RL	EMDP	0.73	0.78	0.86	0.85	0.52	6.66	20.26	51.08	24.78
	Self-EmoQ	0.78	0.74	0.89	0.90	0.66	9.11	25.35	54.71	41.72
	SFT-EmoQ	0.67	0.74	0.89	0.89	0.66	9.02	25.28	54.77	41.13

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.55	0.52	0.62	0.85	0.49	3.53	11.49	40.81	3.63
	ECoT	0.16	0.52	0.65	0.79	0.41	0.86	3.57	14.13	0.51
	PS	0.61	0.62	0.72	0.85	0.56	2.40	6.73	35.97	1.41
	MP	0.57	0.58	0.68	0.85	0.53	1.30	4.97	13.4	1.17
Finetune	SFT	0.70	0.74	0.84	0.86	0.62	6.27	21.85	51.76	22.06
Finetune	FSM	0.66	0.70	0.79	0.86	0.61	6.32	21.89	50.35	21.76
RL	EMDP	0.65	0.83	0.88	0.85	0.71	6.66	20.26	51.08	24.78
	Self-EmoQ	0.73	0.76	0.90	0.89	0.65	9.11	25.35	54.72	41.73
	SFT-EmoQ	0.55	0.78	0.90	0.90	0.64	9.04	25.29	54.78	41.34

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.51	0.52	0.62	0.85	0.49	3.53	11.48	40.81	3.63
	ECoT	0.15	0.51	0.65	0.80	0.41	0.86	3.57	14.13	0.51
	PS	0.56	0.62	0.72	0.85	0.56	2.40	6.73	35.97	1.41
	MP	0.53	0.58	0.68	0.85	0.53	1.30	4.97	13.4	1.17
Finetune	SFT	0.66	0.75	0.84	0.86	0.63	6.27	21.83	51.76	22.06
Finetune	FSM	0.62	0.70	0.79	0.86	0.61	6.32	21.92	50.35	21.76
RL	EMDP	0.57	0.83	0.88	0.85	0.71	6.66	20.28	51.08	24.78
	Self-EmoQ	0.69	0.78	0.90	0.90	0.65	9.11	25.33	54.72	41.73
	SFT-EmoQ	0.54	0.78	0.91	0.90	0.65	9.04	25.26	54.78	41.34

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.46	0.52	0.61	0.85	0.50	3.53	11.47	40.81	3.63
	ECoT	0.14	0.51	0.65	0.79	0.41	0.86	3.57	14.13	0.51
	PS	0.52	0.62	0.72	0.85	0.56	2.40	6.73	35.97	1.41
	MP	0.48	0.58	0.67	0.85	0.53	1.30	4.97	13.4	1.17
Finetune	SFT	0.62	0.75	0.84	0.87	0.63	6.27	21.79	51.76	22.06
Finetune	FSM	0.59	0.70	0.79	0.86	0.62	6.32	21.9	50.35	21.76
RL	EMDP	0.49	0.83	0.88	0.85	0.71	6.66	20.27	51.08	24.78
	Self-EmoQ	0.65	0.78	0.91	0.91	0.65	9.11	25.37	54.66	41.91
	SFT-EmoQ	0.45	0.79	0.91	0.91	0.66	9.04	25.27	54.78	41.34

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.37	0.47	0.56	0.84	0.48	3.53	11.48	40.81	3.63
	ECoT	0.10	0.51	0.65	0.79	0.41	0.86	3.57	14.13	0.51
	PS	0.43	0.60	0.68	0.86	0.57	2.40	6.73	35.97	1.41
	MP	0.40	0.56	0.63	0.85	0.53	1.30	4.97	13.4	1.17
Finetune	SFT	0.55	0.79	0.86	0.88	0.70	6.27	21.81	51.76	22.06
Finetune	FSM	0.52	0.73	0.81	0.88	0.67	6.32	21.92	50.35	21.76
RL	EMDP	0.33	0.83	0.88	0.86	0.71	6.66	20.25	51.08	24.78
	Self-EmoQ	0.57	0.82	0.92	0.92	0.72	9.11	25.34	54.72	41.73
	SFT-EmoQ	0.45	0.83	0.92	0.92	0.72	9.07	25.3	54.74	41.41

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.27	0.41	0.46	0.83	0.45	3.53	11.48	40.81	3.63
	ECoT	0.08	0.51	0.67	0.78	0.40	0.86	3.57	14.13	0.51
	PS	0.34	0.56	0.61	0.85	0.57	2.40	6.73	35.97	1.41
	MP	0.31	0.51	0.56	0.84	0.53	1.30	4.97	13.4	1.17
Finetune	SFT	0.48	0.85	0.89	0.90	0.79	6.27	21.81	51.76	22.06
Finetune	FSM	0.45	0.77	0.82	0.89	0.75	6.32	21.92	50.35	21.76
RL	EMDP	0.26	0.87	0.90	0.87	0.81	6.66	20.29	51.08	24.78
	Self-EmoQ	0.48	0.89	0.95	0.94	0.81	9.11	25.35	54.72	41.72
	SFT-EmoQ	0.48	0.89	0.95	0.93	0.81	9.04	25.23	54.78	41.34

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.13	0.40	0.41	0.64	0.46	3.53	11.48	40.81	3.63
	ECoT	0.02	0.42	0.71	0.53	0.36	0.86	3.58	14.13	0.51
	PS	0.21	0.58	0.59	0.73	0.61	2.40	6.73	35.97	1.41
	MP	0.18	0.52	0.53	0.70	0.56	1.30	4.97	13.4	1.17
Finetune	SFT	0.36	0.93	0.94	0.90	0.89	6.27	21.8	51.76	22.06
Finetune	FSM	0.34	0.84	0.85	0.88	0.84	6.32	21.89	50.35	21.76
RL	EMDP	0.21	0.89	0.90	0.85	0.83	6.66	20.26	51.08	24.78
	Self-EmoQ	0.35	0.83	0.97	0.89	0.85	9.11	25.35	54.72	41.73
	SFT-EmoQ	0.35	0.97	0.99	0.92	0.89	9.04	25.25	54.78	41.34

w =

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.44	0.45	0.68	0.83	0.50	2.08	8.84	60.62	2.52
	ECoT	0.32	0.45	0.67	0.81	0.43	0.51	2.56	19.58	0.34
	PS	0.51	0.47	0.69	0.84	0.47	1.56	5.6	53.69	1.1
	MP	0.42	0.44	0.68	0.82	0.52	0.99	4.33	33.57	0.79
Finetune	SFT	0.50	0.47	0.69	0.84	0.47	3.68	12.15	12.35	11.93
Finetune	FSM	0.51	0.45	0.69	0.84	0.52	3.89	12.24	7.84	12.26
RL	EMDP	0.09	0.42	0.67	0.84	0.59	3.26	11.18	3.53	10.76
	Self-EmoQ	0.53	0.58	0.81	0.87	0.50	4.15	12.36	18.95	13.28
	SFT-EmoQ	0.50	0.57	0.81	0.87	0.49	4.33	12.47	21.24	15.15

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.49	0.51	0.68	0.80	0.46	2.08	8.83	60.62	2.52
	ECoT	0.36	0.46	0.67	0.79	0.40	0.51	2.55	19.58	0.34
	PS	0.55	0.55	0.71	0.82	0.47	1.56	5.58	53.69	1.10
	MP	0.47	0.48	0.66	0.80	0.46	0.99	4.33	33.57	0.79
Finetune	SFT	0.46	0.56	0.72	0.82	0.50	3.68	12.23	12.35	11.93
Finetune	FSM	0.49	0.53	0.70	0.82	0.51	3.89	12.22	7.84	12.26
RL	EMDP	0.16	0.46	0.63	0.75	0.47	3.26	11.19	3.53	10.76
	Self-EmoQ	0.53	0.66	0.85	0.85	0.52	4.03	12.41	17.18	13.20
	SFT-EmoQ	0.52	0.66	0.85	0.85	0.50	4.18	12.51	20.63	14.95

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.54	0.51	0.68	0.80	0.47	2.08	8.81	60.62	2.52
	ECoT	0.40	0.45	0.68	0.78	0.39	0.51	2.55	19.58	0.34
	PS	0.59	0.55	0.71	0.82	0.47	1.56	5.58	53.69	1.10
	MP	0.51	0.48	0.66	0.80	0.46	0.99	4.33	33.57	0.79
Finetune	SFT	0.57	0.56	0.73	0.83	0.50	3.68	12.22	12.35	11.93
Finetune	FSM	0.60	0.53	0.70	0.82	0.51	3.89	12.23	7.84	12.26
RL	EMDP	0.23	0.46	0.63	0.75	0.47	3.26	11.2	3.53	10.76
	Self-EmoQ	0.61	0.66	0.85	0.85	0.54	4.33	12.65	21.85	15.87
	SFT-EmoQ	0.57	0.65	0.83	0.84	0.49	4.04	12.35	20.03	13.37

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.58	0.51	0.68	0.80	0.46	2.08	8.83	60.62	2.52
	ECoT	0.48	0.45	0.67	0.79	0.39	0.51	2.55	19.58	0.34
	PS	0.63	0.55	0.71	0.82	0.47	1.56	5.58	53.69	1.10
	MP	0.56	0.48	0.66	0.80	0.46	0.99	4.32	33.57	0.79
Finetune	SFT	0.63	0.56	0.73	0.83	0.50	3.68	12.2	12.35	11.93
Finetune	FSM	0.64	0.53	0.71	0.82	0.51	3.89	12.28	7.84	12.26
RL	EMDP	0.30	0.46	0.63	0.75	0.47	3.26	11.18	3.53	10.76
	Self-EmoQ	0.67	0.64	0.85	0.84	0.51	3.86	11.95	17.72	12.00
	SFT-EmoQ	0.63	0.65	0.85	0.84	0.53	4.07	12.45	18.88	13.26

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.67	0.52	0.69	0.80	0.47	2.08	8.85	60.62	2.52
	ECoT	0.57	0.45	0.68	0.77	0.39	0.51	2.56	19.58	0.34
	PS	0.71	0.56	0.73	0.81	0.47	1.56	5.58	53.69	1.10
	MP	0.66	0.50	0.67	0.79	0.47	0.99	4.33	33.57	0.79
Finetune	SFT	0.71	0.59	0.74	0.83	0.51	3.68	12.2	12.35	11.93
Finetune	FSM	0.70	0.56	0.72	0.82	0.51	3.89	12.25	7.84	12.26
RL	EMDP	0.44	0.46	0.63	0.75	0.47	3.26	11.22	3.53	10.76
	Self-EmoQ	0.71	0.63	0.84	0.83	0.50	4.39	12.89	16.82	14.23
	SFT-EmoQ	0.70	0.66	0.83	0.85	0.54	4.3	12.4	22.72	15.24

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.77	0.54	0.69	0.78	0.47	2.08	8.83	60.62	2.52
	ECoT	0.67	0.41	0.69	0.76	0.38	0.51	2.55	19.58	0.34
	PS	0.78	0.58	0.73	0.79	0.48	1.56	5.59	53.69	1.10
	MP	0.76	0.53	0.68	0.78	0.47	0.99	4.33	33.57	0.79
Finetune	SFT	0.79	0.63	0.76	0.82	0.54	3.68	12.21	12.35	11.93
Finetune	FSM	0.78	0.57	0.73	0.81	0.52	3.89	12.27	7.84	12.26
RL	EMDP	0.79	0.45	0.61	0.73	0.44	3.26	11.2	3.53	10.76
	Self-EmoQ	0.79	0.65	0.84	0.83	0.52	4.34	12.44	24.38	15.66
	SFT-EmoQ	0.78	0.60	0.82	0.80	0.48	3.99	12.3	16.66	12.65

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.90	0.57	0.70	0.61	0.48	2.08	8.82	60.62	2.52
	ECoT	0.90	0.44	0.73	0.53	0.37	0.51	2.56	19.58	0.34
	PS	0.90	0.60	0.74	0.61	0.49	1.56	5.59	53.69	1.10
	MP	0.90	0.55	0.69	0.61	0.48	0.99	4.33	33.57	0.79
Finetune	SFT	0.95	0.65	0.77	0.67	0.56	3.68	12.23	12.35	11.93
Finetune	FSM	0.93	0.60	0.74	0.65	0.53	3.89	12.27	7.84	12.26
RL	EMDP	0.79	0.45	0.61	0.59	0.44	3.26	11.21	3.53	10.76
	Self-EmoQ	0.88	0.48	0.78	0.60	0.47	4.34	12.44	24.38	15.66
	SFT-EmoQ	0.88	0.51	0.80	0.54	0.39	3.84	12.27	18.9	11.99

w =

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.69	0.54	0.78	0.83	0.57	1.77	9.16	52.17	2.31
	ECoT	0.46	0.48	0.75	0.80	0.49	0.47	2.41	15.65	0.63
	PS	0.66	0.54	0.79	0.82	0.53	1.53	5.15	44.58	0.98
	MP	0.68	0.54	0.77	0.83	0.56	0.98	4.36	28.56	0.75
Finetune	SFT	0.76	0.55	0.84	0.84	0.51	3.12	10.83	21.09	10.48
Finetune	FSM	0.74	0.55	0.84	0.84	0.48	3.85	12.52	11.21	13.38
RL	EMDP	0.75	0.47	0.77	0.81	0.46	2.61	9.88	7.66	8.27
	Self-EmoQ	0.82	0.67	0.89	0.88	0.57	2.95	10.47	10.88	9.80
	SFT-EmoQ	0.82	0.68	0.89	0.89	0.73	2.38	7.96	9.25	7.49

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.72	0.54	0.71	0.81	0.50	1.77	9.16	52.17	2.31
	ECoT	0.49	0.46	0.74	0.77	0.43	0.47	2.41	15.65	0.63
	PS	0.69	0.55	0.75	0.81	0.49	1.53	5.15	44.58	0.98
	MP	0.70	0.52	0.69	0.81	0.48	0.98	4.37	28.56	0.75
Finetune	SFT	0.78	0.62	0.87	0.83	0.58	3.12	10.8	21.09	10.48
Finetune	FSM	0.77	0.63	0.88	0.82	0.57	3.85	12.53	11.21	13.38
RL	EMDP	0.76	0.55	0.79	0.74	0.51	2.61	9.88	7.66	8.27
	Self-EmoQ	0.82	0.69	0.89	0.86	0.54	2.89	10.21	9.67	9.19
	SFT-EmoQ	0.82	0.74	0.91	0.87	0.54	2.36	7.96	9.31	7.54

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.74	0.54	0.71	0.81	0.50	1.77	9.15	52.17	2.31
	ECoT	0.53	0.47	0.75	0.77	0.42	0.47	2.41	15.65	0.63
	PS	0.72	0.55	0.75	0.81	0.50	1.53	5.15	44.58	0.98
	MP	0.72	0.52	0.69	0.81	0.48	0.98	4.36	28.56	0.75
Finetune	SFT	0.81	0.62	0.87	0.83	0.58	3.12	10.81	21.09	10.48
Finetune	FSM	0.80	0.63	0.88	0.83	0.57	3.85	12.54	11.21	13.38
RL	EMDP	0.76	0.55	0.79	0.74	0.51	2.61	9.89	7.66	8.27
	Self-EmoQ	0.81	0.69	0.89	0.85	0.54	2.94	10.29	9.49	9.41
	SFT-EmoQ	0.82	0.74	0.90	0.87	0.55	2.26	7.82	8.77	7.19

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.76	0.54	0.71	0.81	0.50	1.77	9.17	52.17	2.31
	ECoT	0.57	0.46	0.75	0.77	0.42	0.47	2.42	15.65	0.63
	PS	0.74	0.55	0.75	0.81	0.50	1.53	5.15	44.58	0.98
	MP	0.74	0.52	0.69	0.80	0.48	0.98	4.36	28.56	0.75
Finetune	SFT	0.82	0.62	0.86	0.83	0.58	3.12	10.83	21.09	10.48
Finetune	FSM	0.83	0.63	0.88	0.83	0.57	3.85	12.49	11.21	13.38
RL	EMDP	0.77	0.55	0.79	0.74	0.51	2.61	9.87	7.66	8.27
	Self-EmoQ	0.81	0.68	0.88	0.85	0.54	2.92	10.23	9.5	9.21
	SFT-EmoQ	0.81	0.74	0.90	0.87	0.56	2.95	10.47	12.55	10.02

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.80	0.53	0.69	0.81	0.49	1.77	9.15	52.17	2.31
	ECoT	0.65	0.48	0.75	0.77	0.42	0.47	2.41	15.65	0.63
	PS	0.79	0.55	0.73	0.81	0.50	1.53	5.15	44.58	0.98
	MP	0.78	0.51	0.66	0.80	0.47	0.98	4.36	28.56	0.75
Finetune	SFT	0.81	0.64	0.87	0.84	0.59	3.12	10.8	21.09	10.48
Finetune	FSM	0.84	0.65	0.89	0.83	0.59	3.85	12.55	11.21	13.38
RL	EMDP	0.78	0.55	0.79	0.74	0.51	2.61	9.89	7.66	8.27
	Self-EmoQ	0.86	0.69	0.88	0.85	0.54	2.89	10.19	9.86	9.32
	SFT-EmoQ	0.81	0.75	0.90	0.84	0.49	2.3	7.79	8.67	7.23

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.85	0.50	0.64	0.79	0.48	1.77	9.14	52.17	2.31
	ECoT	0.14	0.45	0.75	0.75	0.40	0.47	2.42	15.65	0.63
	PS	0.85	0.54	0.70	0.79	0.50	1.53	5.15	44.58	0.98
	MP	0.83	0.49	0.61	0.78	0.45	0.98	4.36	28.56	0.75
Finetune	SFT	0.88	0.66	0.87	0.84	0.63	3.12	10.77	21.09	10.48
Finetune	FSM	0.89	0.67	0.89	0.83	0.63	3.85	12.54	11.21	13.38
RL	EMDP	0.79	0.57	0.79	0.74	0.54	2.61	9.9	7.66	8.27
	Self-EmoQ	0.91	0.74	0.87	0.86	0.62	3.28	11.47	11.36	11.00
	SFT-EmoQ	0.81	0.74	0.88	0.83	0.48	2.30	7.76	8.66	7.23

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.91	0.51	0.62	0.62	0.48	1.77	9.15	52.17	2.31
	ECoT	0.83	0.41	0.69	0.52	0.36	0.47	2.42	15.65	0.63
	PS	0.92	0.54	0.68	0.63	0.51	1.53	5.14	44.58	0.98
	MP	0.89	0.49	0.59	0.59	0.45	0.98	4.36	28.56	0.75
Finetune	SFT	0.90	0.69	0.87	0.72	0.65	3.12	10.79	21.09	10.48
Finetune	FSM	0.91	0.70	0.90	0.73	0.65	3.85	12.53	11.21	13.38
RL	EMDP	0.80	0.57	0.79	0.65	0.54	2.61	9.91	7.66	8.27
	Self-EmoQ	0.85	0.71	0.92	0.60	0.47	2.57	9.26	11.04	7.97
	SFT-EmoQ	0.80	0.71	0.91	0.58	0.45	2.36	7.94	9.31	7.54

w =

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.44	0.31	0.43	0.83	0.51	1.74	6.74	6.3	1.28
	ECoT	0.21	0.27	0.41	0.79	0.39	0.5	2.47	9.14	0.71
	PS	0.46	0.33	0.44	0.83	0.48	1.16	3.6	3.51	0.30
	MP	0.41	0.30	0.42	0.83	0.52	0.76	3.16	6.93	0.17
Finetune	SFT	0.39	0.33	0.44	0.83	0.48	6.52	13.27	1.8	36.83
Finetune	FSM	0.43	0.34	0.45	0.84	0.50	3.24	9.52	38.86	13.15
RL	EMDP	0.33	0.25	0.41	0.77	0.59	6.77	16.29	18.7	35.27
	Self-EmoQ	0.53	0.46	0.65	0.87	0.40	19.43	30.42	44.57	40.1
	SFT-EmoQ	0.32	0.45	0.64	0.87	0.42	20.06	31.06	45.77	38.88

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.49	0.39	0.47	0.82	0.46	1.74	6.74	6.3	1.28
	ECoT	0.27	0.29	0.42	0.78	0.34	0.5	2.47	9.14	0.71
	PS	0.51	0.42	0.49	0.83	0.46	1.16	3.59	3.51	0.30
	MP	0.46	0.37	0.46	0.82	0.45	0.76	3.16	6.93	0.17
Finetune	SFT	0.46	0.47	0.52	0.82	0.50	6.52	13.22	1.8	36.83
Finetune	FSM	0.48	0.48	0.53	0.83	0.51	3.24	9.51	38.86	13.15
RL	EMDP	0.39	0.26	0.44	0.68	0.36	6.77	16.32	18.7	35.27
	Self-EmoQ	0.58	0.53	0.71	0.86	0.47	20	31.49	43.75	39.2
	SFT-EmoQ	0.52	0.51	0.66	0.86	0.44	20.06	31.09	45.47	38.75

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.54	0.39	0.47	0.82	0.46	1.74	6.73	6.3	1.28
	ECoT	0.36	0.28	0.41	0.78	0.33	0.5	2.47	9.14	0.71
	PS	0.56	0.42	0.49	0.83	0.46	1.16	3.6	3.51	0.30
	MP	0.52	0.37	0.46	0.82	0.45	0.76	3.16	6.93	0.17
Finetune	SFT	0.51	0.47	0.52	0.83	0.50	6.52	13.25	1.8	36.83
Finetune	FSM	0.53	0.48	0.53	0.83	0.51	3.24	9.48	38.86	13.15
RL	EMDP	0.21	0.26	0.44	0.68	0.36	6.77	16.28	18.7	35.27
	Self-EmoQ	0.61	0.55	0.70	0.86	0.47	20.19	31.66	44.78	39.00
	SFT-EmoQ	0.44	0.52	0.67	0.86	0.45	20.2	31.4	43.68	40.24

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.59	0.39	0.46	0.83	0.46	1.74	6.75	6.3	1.28
	ECoT	0.42	0.28	0.41	0.78	0.34	0.5	2.46	9.14	0.71
	PS	0.61	0.42	0.49	0.83	0.46	1.16	3.6	3.51	0.30
	MP	0.57	0.37	0.46	0.82	0.45	0.76	3.16	6.93	0.17
Finetune	SFT	0.57	0.47	0.52	0.83	0.51	6.52	13.17	1.8	36.83
Finetune	FSM	0.59	0.48	0.54	0.84	0.52	3.24	9.5	38.86	13.15
RL	EMDP	0.50	0.26	0.44	0.68	0.36	6.77	16.27	18.7	35.27
	Self-EmoQ	0.67	0.54	0.70	0.86	0.47	19.56	30.7	44.8	39.12
	SFT-EmoQ	0.62	0.54	0.68	0.86	0.43	19.68	31.08	46.4	34.79

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.69	0.41	0.48	0.83	0.47	1.74	6.74	6.3	1.28
	ECoT	0.51	0.28	0.41	0.78	0.33	0.5	2.47	9.14	0.71
	PS	0.71	0.45	0.52	0.83	0.48	1.16	3.59	3.51	0.30
	MP	0.68	0.39	0.47	0.82	0.45	0.76	3.17	6.93	0.17
Finetune	SFT	0.59	0.51	0.56	0.84	0.53	6.52	13.25	1.8	36.83
Finetune	FSM	0.64	0.52	0.57	0.84	0.54	3.24	9.52	38.86	13.15
RL	EMDP	0.61	0.26	0.44	0.69	0.36	6.77	16.35	18.7	35.27
	Self-EmoQ	0.75	0.54	0.71	0.85	0.45	20.1	31.65	42.04	38.89
	SFT-EmoQ	0.73	0.52	0.68	0.85	0.43	19.07	29.86	45.51	31.58

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.80	0.50	0.55	0.81	0.51	1.74	6.73	6.3	1.28
	ECoT	0.63	0.30	0.49	0.75	0.30	0.5	2.46	9.14	0.71
	PS	0.81	0.54	0.59	0.82	0.53	1.16	3.61	3.51	0.30
	MP	0.79	0.47	0.54	0.81	0.50	0.76	3.16	6.93	0.17
Finetune	SFT	0.81	0.62	0.65	0.83	0.60	6.52	13.21	1.8	36.83
Finetune	FSM	0.83	0.64	0.67	0.84	0.62	3.24	9.52	38.86	13.15
RL	EMDP	0.72	0.34	0.53	0.67	0.39	6.77	16.29	18.7	35.27
	Self-EmoQ	0.83	0.58	0.79	0.84	0.48	19.49	30.77	45.2	39.24
	SFT-EmoQ	0.69	0.49	0.67	0.80	0.38	19.66	30.99	47.8	35.51

Model		Emotion					Response
Model		Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Prompt	0-shot	0.90	0.52	0.55	0.64	0.52	1.74	6.73	6.3	1.28
	ECoT	0.80	0.32	0.51	0.46	0.30	0.5	2.47	9.14	0.71
	PS	0.96	0.56	0.60	0.64	0.54	1.16	3.61	3.51	0.30
	MP	0.95	0.49	0.55	0.63	0.51	0.76	3.16	6.93	0.17
Finetune	SFT	0.97	0.66	0.67	0.70	0.62	6.52	13.24	1.8	36.83
Finetune	FSM	0.98	0.67	0.69	0.72	0.65	3.24	9.54	38.86	13.15
RL	EMDP	0.88	0.35	0.53	0.54	0.39	6.77	16.28	18.7	35.27
	Self-EmoQ	0.88	0.38	0.54	0.52	0.38	19.87	31.56	40.43	39.55
	SFT-EmoQ	0.88	0.49	0.66	0.52	0.38	19.62	31.00	47.54	35.55

Tables below report the results of the automatic metrics for emotion determination and response generation, respectively. Across all four datasets, Self-EmoQ consistently achieves the highest or near-highest performance on the reward and ranking metrics.

Loss curves

The above four figures illustrate the training loss curves of Self-EmoQ on four datasets. We observe that the training process is stable across all datasets, with the loss consistently decreasing and converging without oscillation. The Plutchik score does not introduce training instability.

Transition matrix

The above four figures visualizes the emotion transition matrices. We observe that high-probability transitions concentrate along the diagonal and between adjacent emotions, which aligns well with the topology of Plutchik’s Wheel of Emotion.

Sensitivity analysis

The above two figures illustrate the sensitivity of the parameters \(w\) and \(\gamma\). We observe that the optimal value of \(\gamma\) varies across datasets and correlates with their average dialogue length. Datasets with longer conversational trajectories, such as IEMOCAP, tend to favor larger \(\gamma\) values, as long-term emotional consistency becomes more important. In contrast, DailyDialogue, which consists of shorter interactions, achieves optimal performance with a smaller \(\gamma\).

Ablation

w =

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.78	0.74	0.89	0.90	0.66	9.11	25.35	54.71	41.72
SFT-EmoQ	0.67	0.74	0.89	0.89	0.66	9.02	25.28	54.77	41.13
EmoQ-head	0.77	0.35	0.68	0.77	0.35	4.80	17.51	43.37	41.13
w/o history	0.49	0.74	0.89	0.90	0.66	2.48	12.07	0.03	31.82
w/o desc	0.62	0.75	0.89	0.90	0.59	5.77	20.37	46.78	0.00

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.73	0.76	0.90	0.89	0.65	9.11	25.35	54.72	41.73
SFT-EmoQ	0.55	0.78	0.90	0.90	0.64	9.04	25.29	54.78	41.34
EmoQ-head	0.62	0.49	0.80	0.81	0.41	4.65	17.15	44.55	12.53
w/o history	0.44	0.79	0.90	0.88	0.52	0.29	5.74	0.58	0.21
w/o desc	0.57	0.77	0.90	0.90	0.65	6.45	21.87	49.38	22.58

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.69	0.78	0.90	0.90	0.65	9.11	25.33	54.72	41.73
SFT-EmoQ	0.54	0.78	0.91	0.90	0.65	9.04	25.26	54.78	41.34
EmoQ-head	0.57	0.40	0.69	0.78	0.34	4.26	15.84	45.46	11.14
w/o history	0.33	0.78	0.90	0.91	0.65	2.48	12.07	0.03	0.00
w/o desc	0.45	0.79	0.90	0.88	0.50	5.58	19.99	47.72	18.42

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.65	0.78	0.91	0.91	0.65	9.11	25.37	54.66	41.91
SFT-EmoQ	0.45	0.79	0.91	0.91	0.66	9.04	25.27	54.78	41.34
EmoQ-head	0.53	0.52	0.80	0.82	0.43	4.49	16.35	45.17	12.66
w/o history	0.30	0.78	0.91	0.91	0.66	2.48	12.07	0.03	0.00
w/o desc	0.41	0.78	0.91	0.91	0.66	6.45	21.84	49.37	22.59

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.57	0.82	0.92	0.92	0.72	9.11	25.34	54.72	41.73
SFT-EmoQ	0.45	0.83	0.92	0.92	0.72	9.07	25.3	54.74	41.41
EmoQ-head	0.47	0.23	0.77	0.78	0.33	4.89	18.22	45.22	13.98
w/o history	0.21	0.82	0.92	0.88	0.48	0.00	0.69	50	0.01
w/o desc	0.33	0.82	0.92	0.92	0.72	6.45	21.86	49.38	22.58

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.48	0.89	0.95	0.94	0.81	9.11	25.35	54.72	41.72
SFT-EmoQ	0.45	0.89	0.95	0.93	0.81	9.04	25.23	54.78	41.34
EmoQ-head	0.40	0.35	0.75	0.74	0.32	4.23	15.45	44.26	10.88
w/o history	0.13	0.89	0.95	0.94	0.81	2.48	12.07	0.03	0.00
w/o desc	0.25	0.89	0.95	0.94	0.81	6.45	21.88	49.39	22.6

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.35	0.83	0.97	0.89	0.85	9.11	25.35	54.72	41.73
SFT-EmoQ	0.35	0.97	0.99	0.92	0.89	9.04	25.25	54.78	41.34
EmoQ-head	0.33	0.28	0.52	0.41	0.23	3.81	14.46	44.07	9.36
w/o history	0.04	0.21	0.18	0.43	0.27	0.22	6.67	0.03	0.00
w/o desc	0.14	0.53	0.98	0.49	0.33	5.14	17.78	43.47	15.12

w =

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.53	0.58	0.81	0.87	0.50	4.15	12.36	18.95	13.28
SFT-EmoQ	0.50	0.57	0.81	0.87	0.49	4.33	12.50	21.24	15.15
EmoQ-head	0.43	0.47	0.73	0.83	0.41	4.10	12.48	5.85	14.06
w/o history	0.40	0.57	0.82	0.86	0.44	1.46	7.69	2.8	1.27
w/o desc	0.46	0.53	0.82	0.85	0.33	3.91	12.49	7.75	12.49

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.53	0.66	0.85	0.85	0.52	4.03	12.41	17.18	13.20
SFT-EmoQ	0.52	0.66	0.85	0.85	0.50	4.18	12.51	20.63	14.95
EmoQ-head	0.43	0.36	0.67	0.77	0.33	3.58	11.54	4.63	10.96
w/o history	0.42	0.65	0.84	0.83	0.43	0.95	6.69	4.11	1.34
w/o desc	0.48	0.60	0.84	0.84	0.46	4.07	12.57	9.5	12.83

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.61	0.66	0.85	0.85	0.54	4.33	12.65	21.85	15.87
SFT-EmoQ	0.57	0.65	0.83	0.84	0.49	4.04	12.35	20.03	13.37
EmoQ-head	0.48	0.45	0.73	0.79	0.38	4.05	12.29	5.48	12.11
w/o history	0.51	0.63	0.85	0.83	0.45	1.36	7.45	3.77	1.47
w/o desc	0.55	0.60	0.84	0.85	0.50	4.36	13	11.38	14.75

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.67	0.64	0.85	0.84	0.51	3.86	11.95	17.72	12.00
SFT-EmoQ	0.63	0.65	0.85	0.84	0.53	4.07	12.45	18.88	13.26
EmoQ-head	0.56	0.44	0.71	0.79	0.37	3.59	11.81	6.16	12.14
w/o history	0.58	0.65	0.85	0.84	0.52	1.71	8.11	1.67	0.85
w/o desc	0.60	0.50	0.84	0.81	0.36	3.96	12.65	9.06	12.58

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.71	0.63	0.84	0.83	0.50	4.39	12.89	16.82	14.23
SFT-EmoQ	0.69	0.66	0.83	0.85	0.54	4.3	12.4	22.72	15.24
EmoQ-head	0.60	0.40	0.67	0.76	0.36	3.97	12.91	4.94	13.45
w/o history	0.61	0.49	0.66	0.83	0.47	0.71	6.27	2.14	0.58
w/o desc	0.67	0.60	0.82	0.83	0.50	4.39	13.12	8.65	14.79

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.79	0.65	0.84	0.83	0.52	4.34	12.44	24.38	15.66
SFT-EmoQ	0.78	0.60	0.82	0.80	0.48	3.99	12.3	16.66	12.65
EmoQ-head	0.66	0.46	0.73	0.75	0.38	4.14	12.77	5.75	13.34
w/o history	0.69	0.64	0.82	0.78	0.40	1.17	7.33	1.33	1.6
w/o desc	0.73	0.41	0.82	0.76	0.33	3.89	12.34	7.13	12.43

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.88	0.48	0.78	0.60	0.47	4.34	12.44	24.38	15.66
SFT-EmoQ	0.88	0.51	0.80	0.54	0.39	3.84	12.27	18.9	11.99
EmoQ-head	0.75	0.41	0.69	0.51	0.36	4.00	12.44	5.9	13.23
w/o history	0.65	0.29	0.47	0.47	0.30	0.91	4.37	1.51	1.89
w/o desc	0.69	0.29	0.66	0.48	0.32	4.06	12.26	8.71	12.99

w =

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.82	0.67	0.89	0.88	0.57	2.95	10.47	10.88	9.80
SFT-EmoQ	0.82	0.68	0.89	0.89	0.73	2.38	7.96	9.25	7.49
EmoQ-head	0.60	0.46	0.73	0.82	0.37	3.13	10.22	8.29	10.84
w/o history	0.68	0.68	0.90	0.89	0.72	0.84	3.53	0.29	0.94
w/o desc	0.70	0.68	0.88	0.89	0.72	2.55	8.32	12.84	8.62

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.82	0.69	0.89	0.86	0.54	2.89	10.21	9.67	9.19
SFT-EmoQ	0.82	0.74	0.91	0.87	0.54	2.36	7.96	9.31	7.54
EmoQ-head	0.60	0.41	0.70	0.76	0.36	2.70	9.44	6.42	8.73
w/o history	0.70	0.74	0.90	0.86	0.53	0.84	3.53	0.29	0.94
w/o desc	0.72	0.62	0.90	0.86	0.52	2.38	8.1	12.48	7.95

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.81	0.69	0.89	0.85	0.54	2.94	10.29	9.49	9.41
SFT-EmoQ	0.82	0.74	0.90	0.87	0.55	2.26	7.82	8.77	7.19
EmoQ-head	0.61	0.40	0.70	0.75	0.35	2.76	9.53	7.48	8.98
w/o history	0.69	0.74	0.90	0.87	0.55	0.84	3.55	0.29	0.94
w/o desc	0.73	0.74	0.90	0.87	0.55	2.38	8.12	12.48	7.95

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.81	0.68	0.88	0.85	0.54	2.92	10.23	9.5	9.21
SFT-EmoQ	0.81	0.74	0.90	0.87	0.56	2.95	10.47	12.55	10.02
EmoQ-head	0.58	0.43	0.73	0.75	0.36	2.99	10.56	8.3	9.91
w/o history	0.71	0.74	0.90	0.86	0.52	0.84	3.52	0.29	0.94
w/o desc	0.70	0.56	0.90	0.82	0.44	3.21	11.64	6.99	10.2

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.86	0.69	0.88	0.85	0.54	2.89	10.19	9.86	9.32
SFT-EmoQ	0.81	0.75	0.90	0.84	0.49	2.3	7.79	8.67	7.23
EmoQ-head	0.60	0.44	0.73	0.74	0.35	2.89	9.91	6.41	9.89
w/o history	0.61	0.76	0.90	0.84	0.49	0.84	3.53	0.29	0.94
w/o desc	0.67	0.70	0.90	0.85	0.51	2.38	8.1	12.48	7.95

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.91	0.74	0.87	0.86	0.62	3.28	11.47	11.36	11.00
SFT-EmoQ	0.81	0.74	0.88	0.83	0.48	2.30	7.76	8.66	7.23
EmoQ-head	0.70	0.32	0.73	0.68	0.27	2.75	9.24	4.53	8.99
w/o history	0.80	0.75	0.87	0.79	0.43	1.87	7.62	1.64	1.87
w/o desc	0.81	0.75	0.87	0.79	0.43	3.21	11.24	6.99	10.2

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.85	0.71	0.92	0.60	0.47	2.57	9.26	11.04	7.97
SFT-EmoQ	0.80	0.71	0.91	0.58	0.45	2.36	7.94	9.31	7.54
EmoQ-head	0.66	0.42	0.88	0.50	0.34	2.88	10.09	6.27	8.6
w/o history	0.74	0.21	0.75	0.46	0.29	0.65	7.26	0.94	0.24
w/o desc	0.75	0.25	0.75	0.44	0.27	3.08	10.57	6.19	8.02

w =

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.53	0.46	0.65	0.87	0.40	19.43	30.42	44.57	39.10
SFT-EmoQ	0.32	0.45	0.64	0.87	0.42	20.06	31.06	45.77	38.88
EmoQ-head	0.42	0.33	0.52	0.79	0.34	7.58	17.64	17.79	39.45
w/o history	0.32	0.34	0.56	0.83	0.35	0.36	7.07	0.05	0.01
w/o desc	0.31	0.43	0.65	0.87	0.34	13.51	24.45	14.25	31.07

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.58	0.53	0.71	0.86	0.47	20	31.49	43.75	39.2
SFT-EmoQ	0.52	0.51	0.66	0.86	0.44	20.06	31.09	45.47	38.75
EmoQ-head	0.46	0.35	0.54	0.79	0.32	7.57	17.55	15.58	39.27
w/o history	0.35	0.28	0.50	0.82	0.32	0.36	7.17	0.09	0
w/o desc	0.37	0.48	0.69	0.85	0.39	12.98	24.06	16.25	38.58

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.61	0.55	0.70	0.86	0.47	20.19	31.66	44.78	39.00
SFT-EmoQ	0.44	0.52	0.67	0.86	0.45	20.2	31.4	43.68	40.24
EmoQ-head	0.47	0.25	0.43	0.77	0.28	7.77	17.32	16.26	30.51
w/o history	0.35	0.42	0.62	0.84	0.36	0.44	4.41	0.01	0.82
w/o desc	0.37	0.45	0.64	0.84	0.37	12.71	23.68	16.82	25.85

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.67	0.54	0.70	0.86	0.47	19.56	30.7	44.8	39.12
SFT-EmoQ	0.62	0.54	0.68	0.86	0.43	19.68	31.08	46.4	34.79
EmoQ-head	0.55	0.29	0.51	0.79	0.31	7.34	16.48	20.79	37.77
w/o history	0.43	0.38	0.50	0.82	0.43	0.1	0.35	0	0
w/o desc	0.41	0.45	0.66	0.84	0.38	12.88	23.81	15.88	36.79

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.75	0.54	0.71	0.85	0.45	20.1	31.65	42.04	38.89
SFT-EmoQ	0.73	0.52	0.68	0.85	0.43	19.07	29.86	45.51	31.58
EmoQ-head	0.59	0.33	0.60	0.77	0.34	7.88	17.95	19.78	39.99
w/o history	0.58	0.38	0.53	0.80	0.35	0.37	7.08	0.05	0.01
w/o desc	0.46	0.49	0.66	0.84	0.41	12.92	23.91	16.31	26.81

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.83	0.58	0.79	0.84	0.48	19.49	30.77	45.2	39.24
SFT-EmoQ	0.69	0.49	0.67	0.80	0.38	19.66	30.99	47.8	35.51
EmoQ-head	0.69	0.25	0.44	0.72	0.25	8.24	18.06	17.94	33.53
w/o history	0.60	0.46	0.64	0.79	0.43	0.22	1.19	0	0.22
w/o desc	0.56	0.33	0.61	0.79	0.30	12.3	23.01	17.63	23.72

Model	Emotion					Response
Model	Reward	Recall@3	Recall@5	NDCG	MRR	B-2	R-L	D-2	CIDEr
Self-EmoQ	0.88	0.38	0.54	0.52	0.38	19.87	31.56	40.43	39.55
SFT-EmoQ	0.88	0.49	0.66	0.52	0.38	19.62	31.00	47.54	35.55
EmoQ-head	0.78	0.26	0.44	0.44	0.28	7.98	17.97	15.98	31.24
w/o history	0.65	0.23	0.57	0.39	0.21	1.96	4.87	0	0.45
w/o desc	0.62	0.30	0.48	0.39	0.22	12.57	23.04	17.08	24.88

Table above reports the results of the ablation study. Replacing our token-logit-based DQN with an additional MLP head applied after the model output (EmoQ-head) leads to a degradation across all ranking metrics. This result confirms that the Q-function modeling adopted in Self-EmoQ is more effective.

Human Evaluation

		Fluency	p	Emotion	p	Acceptance	p	Effective	p	Sensitivity	p	Alignment	p	Satisfaction	p
Prompt	0-shot	3.09(1.35)	<0.01	3.09(1.23)	<0.01	2.68(1.11)	<0.01	2.91(1.16)	<0.01	2.91(1.28)	<0.01	2.82(1.24)	<0.01	2.92(0.53)	<0.01
	ECoT	3.08(1.25)	<0.01	3.08(1.26)	<0.01	2.77(1.10)	<0.01	2.67(1.13)	<0.01	3.00(1.11)	<0.01	2.73(1.37)	<0.01	2.89(0.47)	<0.01
	PS	3.12(1.09)	<0.01	3.21(1.24)	<0.01	2.78(1.27)	<0.01	2.99(1.09)	<0.05	2.83(1.17)	<0.01	2.96(1.32)	<0.01	2.98(0.52)	<0.01
	MP	3.14(1.43)	<0.01	3.13(1.24)	<0.01	2.74(1.15)	<0.01	2.67(1.13)	<0.01	2.77(1.2)	<0.01	2.88(1.06)	<0.01	2.89(0.52)	<0.01
Finetune	SFT	3.15(1.12)	<0.01	3.40(1.30)	<0.01	2.89(0.97)	<0.01	2.71(1.34)	<0.01	2.91(1.04)	<0.01	3.05(1.30)	<0.01	3.02(0.51)	<0.01
Finetune	FSM	3.36(1.36)	<0.01	3.47(1.21)	<0.01	2.96(1.18)	<0.01	3.02(1.14)	0.08	3.09(1.25)	<0.01	3.18(1.18)	0.13	3.18(0.53)	<0.01
RL	EMDP	3.48(1.20)	0.08	3.38(1.34)	<0.01	2.79(1.18)	<0.01	2.69(1.45)	<0.01	2.99(1.40)	<0.01	3.13(1.37)	0.06	3.08(0.48)	<0.01
	Self-EmoQ	3.60(1.32)	--	3.86(1.11)	--	3.21(1.17)	--	3.14(1.23)	--	3.30(1.26)	--	3.27(1.20)	--	3.40(0.48)	--
	SFT-EmoQ	3.54(1.40)	0.28	3.79(1.16)	0.20	3.10(1.25)	0.09	3.05(1.28)	0.18	3.26(1.33)	0.32	3.22(1.29)	0.28	3.33(0.46)	<0.05

We invited 10 interns as evaluators for the human evaluation. We sampled 10 dialogue sessions from each test set, and each evaluator independently scored all generated responses. The final score for each method was obtained by averaging the evaluators’ ratings, as reported in Table above.

Regarding statistical significance, we conducted t-tests on the average scores between different methods. The results of the significance tests are also shown in the table, under the null hypothesis \( H_0: \text{Metric}_X > \text{Metric}_{Self-EmoQ} \). The results indicate that our method achieves statistically significant improvements over most baseline methods. The evaluation principles are presented below.

Principle of Human Scoring (click to expand)

We start with the criteria proposed by Kang et al. (2024). The human evaluation is designed to align with the ultimate goal of emotional support conversations (ESC), namely the seeker's satisfaction. To achieve this goal, the supporter’s behavior is evaluated according to the following criteria:

Acceptance: Whether the seeker can accept the response without discomfort.
Effectiveness: Whether the response helps shift negative emotions or attitudes toward a positive direction.
Sensitivity: Whether the response takes into account the seeker’s overall emotional state.
Alignment: Whether the response aligns with the predicted strategy.

To enable a more fine-grained assessment of generation quality, we further introduce the following dimensions:

Fluency: The overall fluency and linguistic quality of the response.
Emotion: The emotional intensity expressed in the response and its influence on the seeker.
Interesting: Whether the response can arouse the seeker’s interest or curiosity through vivid or engaging expressions.

Intern evaluators are asked to rate model outputs across multiple aspects, including Fluency, Emotion, Interesting, and Satisfaction, where Satisfaction encompasses Acceptance, Effectiveness, Sensitivity, and overall satisfaction.

Throughout the evaluation process, we strictly adhere to international regulations and ethical standards, ensuring compliance with established guidelines regarding participant involvement and data integrity. All evaluators independently assess each sample according to pre-defined criteria, maintaining objectivity, consistency, and reliability.

Detailed manual scoring criteria are listed below:

Fluency

Highly incoherent; extremely difficult to understand.
Significant incoherence; only fragments are meaningful.
Some incoherence, but the general meaning is still conveyed.
Mostly fluent with minor errors or awkwardness.
Perfectly fluent, clear, and error-free.

Emotion

Emotionally inappropriate or chaotic.
Obvious emotional flaws or exaggeration.
Average emotional expression with limited depth.
Good emotional expression with appropriate intensity.
Excellent, nuanced, and highly appropriate emotional expression.

Acceptance

Strong emotional resistance is triggered.
High likelihood of emotional resistance.
Possible emotional resistance.
Rare emotional resistance.
No emotional resistance.

Effectiveness

Worsens the seeker’s emotional distress.
Potentially increases emotional stress.
Fails to change emotional intensity.
Partially effective but overly complex or unclear.
Highly effective in soothing emotions and providing support.

Sensitivity

Incorrect assessment of the seeker’s state.
Rash judgment without sufficient exploration.
One-sided understanding of the seeker’s state.
Partial understanding of the seeker’s situation.
Accurate and well-tailored understanding.

Alignment

Completely contradicts the predicted strategy.
Slight deviation from the predicted strategy.
Ambiguous alignment.
Largely aligned with minor ambiguities.
Fully consistent with the predicted strategy.

Satisfaction

Extremely disappointing and unhelpful.
Poor and incomplete response.
Adequate but unremarkable.
Clear and helpful with useful details.
Excellent, comprehensive, and insightful.

Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS

Abstract

Approach

Overview

Comparison with conventional paradigms.

Framework

Framework of Self-EmoQ

Plutchik wheel

Prompts

Algorithm

Result