Runs

A run is a single end-to-end cycle: compile your behaviour spec into training data, augment examples with AI, fine-tune the model, and auto-evaluate the result.

The Run Object

{
  "id": "e0b7694b-2c65-4199-89a1-fc54a6a6010c",
  "behavior_spec_id": "cafd8799-...",
  "run_number": 1,
  "status": "completed",
  "spec_snapshot": { ... },
  "dataset_id": "dc66546b-...",
  "fine_tune_job_id": "b3e2b918-...",
  "model_id": "96e9f0d9-...",
  "hyperparameters": {
    "augment": true,
    "n_epochs": 4,
    "lora_rank": 8,
    "lora_alpha": 16
  },
  "eval_summary": {
    "total": 5,
    "avg_score": 0.82,
    "pass_rate": 0.8,
    "scoring_method": "llm_judge",
    "regressions": 0,
    "improvements": 3
  },
  "started_at": "2026-03-06T10:30:00.000Z",
  "completed_at": "2026-03-06T10:57:50.000Z"
}

Run Lifecycle

Status	Description
`preparing`	Compiling spec → augmenting examples → uploading to provider
`training`	Fine-tuning job running on Together AI
`evaluating`	Model being tested against the spec's examples
`completed`	Eval results available
`failed`	Error — check the `error` field
`cancelled`	Manually cancelled

Spec Snapshot

Every run captures a spec_snapshot — a frozen copy of the behaviour spec at run time. You can freely edit your spec between runs; each run preserves exactly what it trained on.

Eval Summary

Field	Description
`avg_score`	Mean score across all examples (0–1)
`pass_rate`	Fraction of examples that passed (score ≥ 0.7)
`exact_match_rate`	Fraction of near-perfect scores (≥ 0.95)
`avg_latency_ms`	Mean inference latency per example
`scoring_method`	`llm_judge` or `similarity`
`regressions`	Examples that scored ≥ 0.1 worse than previous run
`improvements`	Examples that scored ≥ 0.1 better than previous run

Start a Run

POST /api/v1/behavior-specs/:id/runs

curl -X POST https://api.tunedtensor.com/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer tt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "augment": true,
    "hyperparameters": {
      "n_epochs": 4,
      "learning_rate": 0.00002,
      "lora_rank": 8,
      "lora_alpha": 16
    }
  }'

Parameter	Default	Description
`augment`	true	Use AI to expand examples into a larger training set
`hyperparameters.n_epochs`	4	Number of training epochs (1–20)
`hyperparameters.learning_rate`	auto	Learning rate
`hyperparameters.batch_size`	8	Training batch size (min 8)
`hyperparameters.lora_rank`	8	LoRA adapter rank
`hyperparameters.lora_alpha`	16	LoRA alpha scaling factor

Returns immediately with status preparing. Work happens asynchronously.

List Runs for a Spec

GET /api/v1/behavior-specs/:id/runs

curl https://api.tunedtensor.com/v1/behavior-specs/:id/runs \
  -H "Authorization: Bearer tt_your_api_key"

List All Runs

GET /api/v1/runs

curl https://api.tunedtensor.com/v1/runs \
  -H "Authorization: Bearer tt_your_api_key"

Returns runs across all specs with _spec_name for display.

Get Run Detail

GET /api/v1/runs/:id

curl https://api.tunedtensor.com/v1/runs/:id \
  -H "Authorization: Bearer tt_your_api_key"

Returns the full run with _evals — per-example results sorted by score (worst first). Each eval includes:

prompt, expected, actual
score (0–1), passed (boolean)
reasoning — LLM judge's explanation
latency_ms — inference time

Cancel a Run

POST /api/v1/runs/:id/cancel

curl -X POST https://api.tunedtensor.com/v1/runs/:id/cancel \
  -H "Authorization: Bearer tt_your_api_key"

Cancels runs in preparing, training, or evaluating status. Also cancels the provider fine-tuning job if running.