MindStudio University
  • Documentation
  • Video Courses
  • Newsroom
  • Video Courses
  • 1: Core Building Principles
    • Intro to AI Agents & MindStudio
    • AI Editor Overview
    • Building Your First AI Agents
    • Building an AI Agent for the Chrome Extension
    • Testing & Debugging Basics
    • Designing User Inputs & Forms
    • How to Scrape Web Data for AI Agents
    • Chaining Multiple Blocks Together in AI Agent Workflows
    • How to Generate Content & Media with AI
    • How to Choose the Right AI Model
    • Prompt Writing 101
    • Using Integration Blocks to Connect to External Services
    • Creating & Using Data Sources
  • 2: Workflow Mastery
    • Building AI Agents that Run on a Schedule
    • Using Launch Variables & Global Variables in Workflows
    • Routing, Logic, & Checkpoint Blocks
    • Advanced Testing Using Evaluations
    • Running Blocks in Parallel for Workflow Optimization
    • Working with Structured Data (JSON)
    • Running Sub-Workflows to Iterate and Process Data With AI
    • Creating Dynamic User Inputs
    • How to Generate HTML Assets for Your AI Agents
  • Masterclass Sessions
    • AI Agent Zero to Hero Masterclass (Beginner)
    • AI Agent Monetization Masterclass
    • AI for Content Marketing Masterclass
    • Deep Research Masterclass
    • AI Agents In-Depth Masterclass
    • AI Agents for Partnerships Masterclass
Powered by GitBook
On this page
  • Why Use Evaluations?
  • Sample Use Case: Spam Detection
  • Creating and Running Test Cases
  • Matching Methods
  • Benefits of Evaluations
Export as PDF
  1. 2: Workflow Mastery

Advanced Testing Using Evaluations

Learn how to bulk generate, run, and analyze test cases efficiently to validate your AI agents' behavior across multiple scenarios.

Last updated 1 day ago

The Evaluations feature in MindStudio allows you to test AI workflows at scale using autogenerated or manually defined test cases. This is especially helpful for validating workflows like moderation filters, where consistent logic must be applied across many inputs.

Why Use Evaluations?

Manually testing workflows via the preview debugger becomes inefficient as the number of test cases grows. Evaluations allow you to:

  • Autogenerate test cases with AI

  • Specify expected outputs

  • Run tests in bulk

  • Compare actual vs. expected results

  • Use fuzzy matching for flexible validation

Sample Use Case: Spam Detection

In this example, an AI workflow is designed to detect spam comments and flag violations based on defined community guidelines. The workflow takes in a comment via a launch variable and outputs:

  • A boolean indicating whether it's spam

  • An array of flags indicating types of violations

Creating and Running Test Cases

Step 1: Access the Evaluations Tab

  • Navigate to the top-level "Evaluations" tab in your project.

  • Click New Test Case to manually add a test or use Autogenerate to let AI create test cases for you.

Step 2: Autogenerate Violating Test Cases

  • Input guidance like “generate five test cases that are in violation of our guidelines.”

  • AI will produce sample comments with the correct input structure.

  • Add expected results (e.g., "is_spam": true, "flags": ["hateful", "off-topic"]).

Step 3: Run Test Cases

  • Click Run All to test all cases in parallel.

  • MindStudio will show which tests pass or fail based on comparison with expected results.

  • Each test can be inspected in the debugger.

Step 4: Autogenerate Non-Violating Test Cases

  • Repeat the process with a new prompt: “generate five comments not in violation.”

  • Provide expected results (e.g., "is_spam": false, "flags": []).

  • Run the new set and verify accuracy.

Matching Methods

MindStudio supports two types of result matching:

  • Literal Match: Requires the actual output to exactly match the expected value.

  • Fuzzy Match: Allows minor differences or variations in phrasing. Useful for outputs with dynamic AI wording.

Benefits of Evaluations

  • Run many test cases at once

  • Easily edit and rerun failing cases

  • Debug individual results

  • Improve the reliability of your AI workflows


Evaluations are a key tool for ensuring your AI behaves as expected at scale. Whether you're building content filters, classifiers, or other deterministic logic, this feature helps you confidently validate your workflows.