Capstone Project

Evaluation CoPilot

Problem

The rapid integration of Large Language Models (LLMs) in app development introduces challenges in understanding and trusting AI-generated responses.

Approach

Our team combined user centered design with technological innovation, starting with extensive user research to understand developers’ needs and pain points. We iteratively developed a series of prototypes following Agile framework, incorporating feedback from user testing sessions and learning from the latest research outcomes in the field.

Solution

The “Evaluation Copilot” is a web app that demystifies LLM evaluation metrics for developers, offering an intuitive platform to test, understand, and refine AI-generated text. It provides clear, actionable feedback on how to improve prompts for better LLM responses, ensuring developers can enhance AI reliability and effectiveness in their applications.

View Poster