LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies
Harshit Joshi, Shicheng Liu, James Chen, Robert Weigle, Monica S. Lam·July 08, 2024
Summary
KITA is a programmable framework for task-oriented conversational agents, offering reliable, grounded responses through controllable policies specified in a KITA Worksheet. Unlike Large Language Models (LLMs), KITA is resilient to diverse user queries, helpful with knowledge sources, and provides ease of programming policies through its declarative paradigm. It outperforms GPT-4 in execution accuracy, dialogue act accuracy, and goal completion rate, as demonstrated in a real-user study involving 62 participants. KITA enables explicit control and high-level support for integrated knowledge assistants, addressing limitations of traditional slot-filling frameworks and dialogue trees. The system uses a KITA Worksheet to manage dialogue flow, track user-supplied values, and support full compositionality between APIs and queries. It has been evaluated across three domains, showing superior performance compared to GPT-4 with function calling, particularly in restaurant reservations, ticket submissions, and course enrollments.
The text discusses an AI system's interactions with users across various domains, focusing on issues with agent actions, error analysis, and user feedback. Key points include errors mainly occurring due to the assistant's failure to request specific details, leading to premature API calls and misinterpretation of user instructions. The AI demonstrates adaptability by recovering from errors through requesting additional information. Overall, user feedback is positive, praising the AI for accuracy, smooth conversation flow, and confirmation of selections. However, some users noted issues with response speed, restructuring queries, and a perceived forced conversation tone. A common issue is the AI's failure to seek confirmation before submitting final API calls, potentially leading to errors or incorrect actions. The text also describes a mapping between dialog acts in KITA and StarV2 banking systems, including 'AskField' for user information and 'Report' for status updates.
Introduction
Background
Overview of KITA as a programmable framework
Importance of reliable, grounded responses in conversational agents
Objective
Aim of the research: comparing KITA with Large Language Models (LLMs)
Focus on KITA's resilience, knowledge integration, and ease of programming policies
Method
Data Collection
Real-user study methodology
Participants and their interactions with KITA
Data Analysis
Metrics for evaluation: execution accuracy, dialogue act accuracy, goal completion rate
Comparison with GPT-4 performance
System Architecture
KITA Worksheet
Role in managing dialogue flow
Tracking user-supplied values
Support for full compositionality between APIs and queries
Integration with Knowledge Sources
How KITA leverages external knowledge for responses
Advantages over traditional slot-filling frameworks and dialogue trees
Performance Evaluation
Domain-Specific Analysis
Evaluation across three domains: restaurant reservations, ticket submissions, course enrollments
Comparison with GPT-4 in function calling capabilities
User Interaction and Feedback
Error Analysis
Common errors and their causes
AI's adaptability in recovering from errors
User Feedback
Positive aspects: accuracy, conversation flow, confirmation of selections
Challenges: response speed, query restructuring, conversation tone
Specific issue: lack of confirmation before final API calls
Dialog Act Mapping
KITA and StarV2 Systems
Mapping dialog acts between KITA and StarV2 banking systems
'AskField' for user information and 'Report' for status updates
Conclusion
Summary of Findings
KITA's performance and user experience
Comparison with GPT-4 and implications for conversational AI
Future Directions
Potential improvements and areas for further research
Basic info
papers
computation and language
programming languages
artificial intelligence
Advanced features