Examples

This section provides practical examples of using Viteval for different scenarios.

Basic Text Evaluation

import { evaluate, scorers } from 'viteval';

evaluate('Simple QA', {
  data: async () => [
    { input: "What is the capital of France?", expected: "Paris" },
    { input: "What is 2+2?", expected: "4" },
  ],
  task: async (input) => {
    return await callYourLLM(input);
  },
  scorers: [scorers.levenshtein],
  threshold: 0.8,
});

Custom Dataset

import { defineDataset } from 'viteval/dataset';

const mathDataset = defineDataset({
  name: 'basic-math',
  data: async () => {
    const problems = [];
    for (let i = 0; i < 100; i++) {
      const a = Math.floor(Math.random() * 10);
      const b = Math.floor(Math.random() * 10);
      problems.push({
        input: `What is ${a} + ${b}?`,
        expected: String(a + b),
      });
    }
    return problems;
  },
});

evaluate('Math solver', {
  data: () => mathDataset.data(),
  task: async (input) => await solveMath(input),
  scorers: [scorers.exactMatch],
  threshold: 0.9,
});

Multiple Scorers

evaluate('Content quality', {
  data: async () => loadQuestions(),
  task: async (input) => await generateContent(input),
  scorers: [
    scorers.factual,        // Must be factually correct
    scorers.answerSimilarity, // Must be semantically similar
    scorers.moderation,     // Must be safe content
  ],
  threshold: 0.8, // an average of the scores is used to determine if threshold is met
});

Runnable Examples

There are a number of examples available in the Viteval GitHub repository.

Basic Evaluation Basic evaluation with Viteval

Vercel AI SDK Real-world evaluation with Vercel AI SDK

Voltagent Real-world evaluation with Voltagent

Examples ​

Basic Text Evaluation ​

Custom Dataset ​

Multiple Scorers ​

Runnable Examples ​

Examples

Basic Text Evaluation

Custom Dataset

Multiple Scorers

Runnable Examples