MindStudio Docs
  • Get Started
    • Overview
    • MindStudio Chrome Extension
    • Quickstart Guide
    • What is an AI Agent?
    • AI Agent Use Cases
  • Free vs. Paid AI Agents
  • Building AI Agents
    • Editor Overview
    • Workflow Generator
    • Writing Prompts
      • Templating
    • AI Models
    • Variables
      • Working with JSON
      • Using Handlebars Templating
    • Dynamic Variables
    • Data Sources
    • Automations
      • Start Block
      • Generate Text Block
      • Generate Image Block
      • Generate Chart Block
      • Generate Asset Block
      • Display Content Block
      • Text to Speech Block
      • Analyze Image Block
      • User Input Block
      • User Context Block
      • Query Data Block
      • Run Function Block
      • Scrape URL Block
      • Extract Text from File Block
      • Post to Slack Block
      • Menu Block
      • Logic Block
      • Checkpoint Block
      • Jump Block
      • Run Workflow Block
      • Terminator Block
    • Integrations
      • Search Bluesky Posts
      • Scrape Facebook Page
      • Scrape Meta Threads Profile
      • Scrape Instagram Comments
      • Scrape Instagram Mentions
      • Scrape Instagram Posts
      • Scrape Instagram Profile
      • Scrape Instagram Reels
      • Create LinkedIn Post
      • Create X Post
      • Search X Posts
      • Search Google
      • Search Google Images
      • Search Google Trends
      • Search Google News
      • Create Google Doc
      • Fetch Google Doc
      • Update Google Doc
      • Create Google Sheet
      • Fetch Google Sheet
      • Update Google Sheet
      • Enrich Company via Domain
      • Find Contact Email for Website
      • Find Email
      • Verify Email
      • Enrich Person via Email
      • Fetch YouTube Captions
      • Fetch YouTube Channel
      • Fetch YouTube Comments
      • Fetch YouTube Video
      • Search YouTube
      • Search YouTube Trends
      • Create Notion Page
      • Update Notion Page
      • Apify
      • Run Scenario
      • Post to Slack
      • HTTP Request
      • Run Node
      • Create Contact
      • Add Note
      • Send Email
      • Send SMS
    • Publishing & Versioning
  • Embedding AI Agents
  • Using Webhooks
  • Workspace Management
    • Workspace Overview
    • Workspace Settings
    • Usage Explorer
    • Billing Settings
    • Account Settings
    • Team Settings & Access Controls
  • Test & Evaluate
    • Testing Suite Overview
    • Evaluations
    • Profiler
    • Debugger
  • Integration Guides
    • Zapier + MindStudio
    • Make.com + MindStudio
    • n8n + MindStudio
  • Developers
    • API Reference
    • NPM Package
    • Custom Workflow Functions
  • Additional Resources
    • Glossary
    • Allowing Access to Mindstudio From Your Network
  • Solutions
    • MindStudio Solutions Partners
    • MindStudio For Developers
    • MindStudio for Enterprises
Powered by GitBook
On this page
  • Requirements
  • Steps to Evaluate a Workflow
  • 1. Create Evaluation(s)
  • 2. Run the Evaluations
  • Anatomy of an Evaluation
  • Exporting Evaluations to CSV
  • Usage Example: Content Moderation
Export as PDF
  1. Test & Evaluate

Evaluations

Last updated 5 months ago

MindStudio's Evaluations feature enables you to rigorously test the accuracy and consistency of your workflows. By creating structured tests, you can validate expected outcomes, identify areas for improvement, and ensure your workflows are functioning as intended.


Requirements

To use the Evaluations tool effectively, your workflow must meet the following requirements:

  • Launch Variables: The workflow should be configured with launch variables to define inputs.

  • End Block: The workflow must contain an End block to return outputs.


Steps to Evaluate a Workflow

1. Create Evaluation(s)

Evaluations are structured test cases you can create to validate your workflow's output. There are two main ways to create evaluations:

Manually:

Define each test case from scratch by specifying the input variables and expected results. This approach gives you complete control over each evaluation.

Using the Generate Button:

Click Generate button to automatically populate evaluation cases using default or existing input data. Choose the number of test cases and optionally give the generator additional context about the test cases you want generated.

2. Run the Evaluations

Once you've created your evaluations, you can run them all at once or individually to test your workflow's output against the defined expectations.

Run all test cases at once

Click Run all at the top left to run all of the test cases at the same time. Results may take more time to appear depending on the size of the workflow and how many test cases you created.

Run an individual test case

Hover over the left side of the test case row and click on the Play icon to run an individual test case. Ideal if you don’t want to rerun previous tests.


Anatomy of an Evaluation

Each evaluation consists of three main components:

Input

The set of variables or data points that the workflow will process.

Expected Result

The anticipated output for the given input. This can be configured for:

  • Literal Match: The output must exactly match the expected result.

  • Fuzzy Match: The output can vary slightly and still be considered correct if it meets specified criteria.

Result

The actual output produced by the workflow, displayed alongside the expected result for comparison.


Exporting Evaluations to CSV

Evaluations can be exported to a CSV file by clicking on the Export button at the bottom right, sharing, or further analysis. The export includes all input, expected results, and actual results, making it easy to communicate Evaluations to team members or stakeholders.


Usage Example: Content Moderation

In this example, we'll validate the accuracy of a Content Moderation workflow that classifies user-generated content. Each evaluation tests the workflow's ability to label inputs accurately:

  • Input: A piece of text or content submitted for moderation, such as:

    • "This review is just copied from another site. Plagiarism!"

    • "I hate this product and the company that makes it. They are the worst!"

  • Expected Result: The anticipated classification for each input, such as:

    • Plagiarism or Copyright Violation

    • Hate Speech or Offensive Content

  • Result: The actual classification provided by the workflow.

Evaluations will indicate whether the workflow correctly classifies content into categories such as "Clear," "Spam or Promotional Content," or "False or Misleading Information."

For example, the workflow should identify

"I hate this product and the company that makes it. They are the worst!"

as Hate Speech or Offensive Content.

Test Case Generator