Multi-Turn Chat Evaluation
This template uses the example available here: Multi-turn Chat Labeling: Evaluating Virtual Assistant Conversations
You can use this example to evaluate multi-turn chat conversations in Label Studio, identifying areas to enhance your virtual assistant’s performance and user experience.
For this example, you will need the following:
- Label Studio instance
- Label Studio SDK (
pip install label-studio-sdk
) - Python 3.8+ with pandas
Labeling configuration
In this example, the labeling configuration is dynamically generated. This is necessary because each chat has a different number of turns (questions and responses).
To build your own template XML, you will need to follow the steps outlined in the following notebook: Evaluating Virtual Assistant Conversations.ipynb
However, here is an example of the labeling configuration for a 5-turn chat:
<View>
<Style>
.root {
font-family: Arial, sans-serif;
display: flex;
flex-direction: column;
height: 100vh; /* Full height of the viewport */
margin: 0;
padding: 0;
}
.container {
display: flex;
flex: 1;
gap: 20px;
height: 100%; /* Ensure it stretches to fill the root height */
overflow: hidden; /* Prevent scrolling at the container level */
}
.column {
flex: 1;
display: flex;
flex-direction: column;
overflow: hidden; /* Prevent column itself from scrolling */
}
.dialogue {
max-width: 750px;
border: 1px solid #ccc;
padding: 10px;
border-radius: 5px;
background-color: #f8f9fa;
overflow-y: auto; /* Enable vertical scrolling */
flex: 1; /* Stretch to fill the available height */
}
.questions {
border: 1px solid #ddd;
padding: 10px;
border-radius: 5px;
background-color: #f8f9fa;
overflow-y: auto; /* Enable vertical scrolling */
flex: 1; /* Stretch to fill the available height */
}
.panel {
margin-bottom: 10px;
padding: 10px;
border: 1px solid #e9ecef;
border-radius: 5px;
background-color: #f8f9fa;
}
.panel-header {
font-weight: bold;
margin-bottom: 10px;
}
.section-header {
margin-bottom: 10px;
}
.turn-1 {
border: 2px solid #6A5ACD;
background-color: #EDEDFD;
padding: 10px;
border-radius: 5px;
margin-bottom: 20px;
}
.turn-2 {
border: 2px solid #2E8B57;
background-color: #EAF5F1;
padding: 10px;
border-radius: 5px;
margin-bottom: 20px;
}
.turn-3 {
border: 2px solid #FF4500;
background-color: #FFF4EC;
padding: 10px;
border-radius: 5px;
margin-bottom: 20px;
}
.turn-4 {
border: 2px solid #DC143C;
background-color: #FDECEC;
padding: 10px;
border-radius: 5px;
margin-bottom: 20px;
}
.turn-5 {
border: 2px solid #4B0082;
background-color: #F3EAFD;
padding: 10px;
border-radius: 5px;
margin-bottom: 20px;
}
</Style>
<View className="root">
<Header value="Dialogue and Questions" />
<View className="container">
<View className="column">
<View className="dialogue">
<Header value="Full Conversation" />
<Paragraphs name="prg" value="$messages" layout="dialogue" nameKey="role" textKey="content" />
</View>
</View>
<View className="column">
<View className="questions">
<Header value="Answer the questions for each turn" className="section-header" />
<Collapse>
<Panel value="Turn 1" className="panel-header">
<View className="panel-turn turn-1">
<Paragraphs name="turn1_prg" value="$turn1_dialogue" layout="dialogue" nameKey="role" textKey="content" />
<Header value="What is the user's intent in this turn?" />
<Choices name="turn1_user_intent" toName="turn1_prg" choice="multiple">
<Choice value="Product Inquiry" />
<Choice value="Order Status" />
<Choice value="Return/Exchange Inquiry" />
<Choice value="Payment/Refund Inquiry" />
<Choice value="Complaint" />
<Choice value="Store/Location Information" />
<Choice value="Other" />
</Choices>
<Header value="Did the assistant’s response address the user's intent?" />
<Choices name="turn1_response_address_intent" toName="turn1_prg" choice="single">
<Choice value="Fully Addressed" />
<Choice value="Partially Addressed" />
<Choice value="Not Addressed" />
</Choices>
<Header value="Is the assistant’s response accurate and helpful?" />
<Choices name="turn1_response_accuracy_helpfulness" toName="turn1_prg" choice="single">
<Choice value="Yes, Accurate and Helpful" />
<Choice value="Yes, Accurate but Unhelpful" />
<Choice value="No, Inaccurate" />
<Choice value="No Response" />
</Choices>
<Header value="What action is implied by the assistant’s response (if any)?" />
<Choices name="turn1_response_action" toName="turn1_prg" choice="multiple">
<Choice value="Provide More Information to the User" />
<Choice value="Request More Information from the User" />
<Choice value="Escalate to Human Support" />
<Choice value="Redirect to a Different Team/Resource" />
<Choice value="Confirm Action Taken" />
<Choice value="No Action/Response" />
</Choices>
</View>
</Panel>
<Panel value="Turn 2" className="panel-header">
<View className="panel-turn turn-2">
<Paragraphs name="turn2_prg" value="$turn2_dialogue" layout="dialogue" nameKey="role" textKey="content" />
<Header value="What is the user's intent in this turn?" />
<Choices name="turn2_user_intent" toName="turn2_prg" choice="multiple">
<Choice value="Product Inquiry" />
<Choice value="Order Status" />
<Choice value="Return/Exchange Inquiry" />
<Choice value="Payment/Refund Inquiry" />
<Choice value="Complaint" />
<Choice value="Store/Location Information" />
<Choice value="Other" />
</Choices>
<Header value="Did the assistant’s response address the user's intent?" />
<Choices name="turn2_response_address_intent" toName="turn2_prg" choice="single">
<Choice value="Fully Addressed" />
<Choice value="Partially Addressed" />
<Choice value="Not Addressed" />
</Choices>
<Header value="Is the assistant’s response accurate and helpful?" />
<Choices name="turn2_response_accuracy_helpfulness" toName="turn2_prg" choice="single">
<Choice value="Yes, Accurate and Helpful" />
<Choice value="Yes, Accurate but Unhelpful" />
<Choice value="No, Inaccurate" />
<Choice value="No Response" />
</Choices>
<Header value="What action is implied by the assistant’s response (if any)?" />
<Choices name="turn2_response_action" toName="turn2_prg" choice="multiple">
<Choice value="Provide More Information to the User" />
<Choice value="Request More Information from the User" />
<Choice value="Escalate to Human Support" />
<Choice value="Redirect to a Different Team/Resource" />
<Choice value="Confirm Action Taken" />
<Choice value="No Action/Response" />
</Choices>
</View>
</Panel>
<Panel value="Turn 3" className="panel-header">
<View className="panel-turn turn-3">
<Paragraphs name="turn3_prg" value="$turn3_dialogue" layout="dialogue" nameKey="role" textKey="content" />
<Header value="What is the user's intent in this turn?" />
<Choices name="turn3_user_intent" toName="turn3_prg" choice="multiple">
<Choice value="Product Inquiry" />
<Choice value="Order Status" />
<Choice value="Return/Exchange Inquiry" />
<Choice value="Payment/Refund Inquiry" />
<Choice value="Complaint" />
<Choice value="Store/Location Information" />
<Choice value="Other" />
</Choices>
<Header value="Did the assistant’s response address the user's intent?" />
<Choices name="turn3_response_address_intent" toName="turn3_prg" choice="single">
<Choice value="Fully Addressed" />
<Choice value="Partially Addressed" />
<Choice value="Not Addressed" />
</Choices>
<Header value="Is the assistant’s response accurate and helpful?" />
<Choices name="turn3_response_accuracy_helpfulness" toName="turn3_prg" choice="single">
<Choice value="Yes, Accurate and Helpful" />
<Choice value="Yes, Accurate but Unhelpful" />
<Choice value="No, Inaccurate" />
<Choice value="No Response" />
</Choices>
<Header value="What action is implied by the assistant’s response (if any)?" />
<Choices name="turn3_response_action" toName="turn3_prg" choice="multiple">
<Choice value="Provide More Information to the User" />
<Choice value="Request More Information from the User" />
<Choice value="Escalate to Human Support" />
<Choice value="Redirect to a Different Team/Resource" />
<Choice value="Confirm Action Taken" />
<Choice value="No Action/Response" />
</Choices>
</View>
</Panel>
<Panel value="Turn 4" className="panel-header">
<View className="panel-turn turn-4">
<Paragraphs name="turn4_prg" value="$turn4_dialogue" layout="dialogue" nameKey="role" textKey="content" />
<Header value="What is the user's intent in this turn?" />
<Choices name="turn4_user_intent" toName="turn4_prg" choice="multiple">
<Choice value="Product Inquiry" />
<Choice value="Order Status" />
<Choice value="Return/Exchange Inquiry" />
<Choice value="Payment/Refund Inquiry" />
<Choice value="Complaint" />
<Choice value="Store/Location Information" />
<Choice value="Other" />
</Choices>
<Header value="Did the assistant’s response address the user's intent?" />
<Choices name="turn4_response_address_intent" toName="turn4_prg" choice="single">
<Choice value="Fully Addressed" />
<Choice value="Partially Addressed" />
<Choice value="Not Addressed" />
</Choices>
<Header value="Is the assistant’s response accurate and helpful?" />
<Choices name="turn4_response_accuracy_helpfulness" toName="turn4_prg" choice="single">
<Choice value="Yes, Accurate and Helpful" />
<Choice value="Yes, Accurate but Unhelpful" />
<Choice value="No, Inaccurate" />
<Choice value="No Response" />
</Choices>
<Header value="What action is implied by the assistant’s response (if any)?" />
<Choices name="turn4_response_action" toName="turn4_prg" choice="multiple">
<Choice value="Provide More Information to the User" />
<Choice value="Request More Information from the User" />
<Choice value="Escalate to Human Support" />
<Choice value="Redirect to a Different Team/Resource" />
<Choice value="Confirm Action Taken" />
<Choice value="No Action/Response" />
</Choices>
</View>
</Panel>
<Panel value="Turn 5" className="panel-header">
<View className="panel-turn turn-5">
<Paragraphs name="turn5_prg" value="$turn5_dialogue" layout="dialogue" nameKey="role" textKey="content" />
<Header value="What is the user's intent in this turn?" />
<Choices name="turn5_user_intent" toName="turn5_prg" choice="multiple">
<Choice value="Product Inquiry" />
<Choice value="Order Status" />
<Choice value="Return/Exchange Inquiry" />
<Choice value="Payment/Refund Inquiry" />
<Choice value="Complaint" />
<Choice value="Store/Location Information" />
<Choice value="Other" />
</Choices>
<Header value="Did the assistant’s response address the user's intent?" />
<Choices name="turn5_response_address_intent" toName="turn5_prg" choice="single">
<Choice value="Fully Addressed" />
<Choice value="Partially Addressed" />
<Choice value="Not Addressed" />
</Choices>
<Header value="Is the assistant’s response accurate and helpful?" />
<Choices name="turn5_response_accuracy_helpfulness" toName="turn5_prg" choice="single">
<Choice value="Yes, Accurate and Helpful" />
<Choice value="Yes, Accurate but Unhelpful" />
<Choice value="No, Inaccurate" />
<Choice value="No Response" />
</Choices>
<Header value="What action is implied by the assistant’s response (if any)?" />
<Choices name="turn5_response_action" toName="turn5_prg" choice="multiple">
<Choice value="Provide More Information to the User" />
<Choice value="Request More Information from the User" />
<Choice value="Escalate to Human Support" />
<Choice value="Redirect to a Different Team/Resource" />
<Choice value="Confirm Action Taken" />
<Choice value="No Action/Response" />
</Choices>
</View>
</Panel>
</Collapse>
</View>
</View>
</View>
</View>
</View>
<!-- {
"data": {
"messages": [
{
"role": "user",
"content": "Hello, I need help with my account."
},
{
"role": "assistant",
"content": "Sure, I'd be happy to assist you. What seems to be the issue?"
},
{
"role": "user",
"content": "I can't access my account settings."
},
{
"role": "assistant",
"content": "Let's reset your password to regain access."
}
],
"turn1_dialogue": [
{
"role": "user",
"content": "Hello, I need help with my account."
},
{
"role": "assistant",
"content": "Sure, I'd be happy to assist you. What seems to be the issue?"
}
],
"turn2_dialogue": [
{
"role": "user",
"content": "I can't access my account settings."
},
{
"role": "assistant",
"content": "Let's reset your password to regain access."
}
],
"turn3_dialogue": [
{
"role": "",
"content": ""
},
{
"role": "",
"content": ""
}
],
"turn4_dialogue": [
{
"role": "",
"content": ""
},
{
"role": "",
"content": ""
}
],
"turn5_dialogue": [
{
"role": "",
"content": ""
},
{
"role": "",
"content": ""
}
]
}
} -->
About the labeling configuration
Paragraphs
<Paragraphs name="prg" value="$messages" layout="dialogue" nameKey="role" textKey="content" />
This displays the entire conversation in one column under “Full Conversation” using a Paragraphs tag. It shows each message (with role and content) as a dialogue.
On the other column, it organizes annotation questions by turn. Each “Turn” is inside a collapsible <Panel>
component and has its own <Paragraphs>
tag. For example:
<Paragraphs name="turn1_prg" value="$turn1_dialogue" layout="dialogue" … />
This lets you see only the subset of the conversation relevant to that turn.
Choices
For each turn, there are multiple blocks, each focusing on different questions:
- User’s intent in this turn (multiple choice).
- Whether the assistant’s response addresses that intent (single choice).
- Whether the assistant’s response is accurate/helpful (single choice).
- The implied “action” of the assistant’s response (multiple choice).
The toName
attributes (for instance, toName="turn1_prg"
) tie each set of choices to that turn’s Paragraphs object, so each question is specifically linked to the text of that turn.