Evaluations
Last updated
Last updated
MindStudio's Evaluations feature enables you to rigorously test the accuracy and consistency of your workflows. By creating structured tests, you can validate expected outcomes, identify areas for improvement, and ensure your workflows are functioning as intended.
To use the Evaluations tool effectively, your workflow must meet the following requirements:
Launch Variables: The workflow should be configured with launch variables to define inputs.
End Block: The workflow must contain an End block to return outputs.
Evaluations are structured test cases you can create to validate your workflow's output. There are two main ways to create evaluations:
Define each test case from scratch by specifying the input variables and expected results. This approach gives you complete control over each evaluation.
Click Generate button to automatically populate evaluation cases using default or existing input data. Choose the number of test cases and optionally give the generator additional context about the test cases you want generated.
Once you've created your evaluations, you can run them all at once or individually to test your workflow's output against the defined expectations.
Click Run all at the top left to run all of the test cases at the same time. Results may take more time to appear depending on the size of the workflow and how many test cases you created.
Hover over the left side of the test case row and click on the Play icon to run an individual test case. Ideal if you don’t want to rerun previous tests.
Each evaluation consists of three main components:
The set of variables or data points that the workflow will process.
The anticipated output for the given input. This can be configured for:
Literal Match: The output must exactly match the expected result.
Fuzzy Match: The output can vary slightly and still be considered correct if it meets specified criteria.
The actual output produced by the workflow, displayed alongside the expected result for comparison.
Evaluations can be exported to a CSV file by clicking on the Export button at the bottom right, sharing, or further analysis. The export includes all input, expected results, and actual results, making it easy to communicate Evaluations to team members or stakeholders.
In this example, we'll validate the accuracy of a Content Moderation workflow that classifies user-generated content. Each evaluation tests the workflow's ability to label inputs accurately:
Input: A piece of text or content submitted for moderation, such as:
"This review is just copied from another site. Plagiarism!"
"I hate this product and the company that makes it. They are the worst!"
Expected Result: The anticipated classification for each input, such as:
Plagiarism or Copyright Violation
Hate Speech or Offensive Content
Result: The actual classification provided by the workflow.
Evaluations will indicate whether the workflow correctly classifies content into categories such as "Clear," "Spam or Promotional Content," or "False or Misleading Information."
For example, the workflow should identify
"I hate this product and the company that makes it. They are the worst!"
as Hate Speech or Offensive Content
.