Datasets

JSONL files used for fine-tuning. Usually auto-generated when you start a run from a behaviour spec. Can also be uploaded manually.

Auto-Generated Datasets

When you start a run, the platform automatically:

Compiles your behaviour spec into JSONL chat format (system + user + assistant messages)
If augmentation is enabled, uses Claude to expand your examples into a larger, more diverse training set (typically 5–10 examples → 30–40 rows)
Uploads the compiled dataset to storage

Auto-generated datasets are named "Spec Name - Run #N".

The Dataset Object

{
  "id": "dc66546b-48b3-4490-8baf-9b50aa78130c",
  "name": "Customer Support Bot - Run #8",
  "description": "Auto-compiled from behaviour spec. 36 examples (augmented).",
  "format": "jsonl",
  "status": "validated",
  "row_count": 36,
  "file_size_bytes": 36922,
  "created_at": "2026-03-06T10:44:30.000Z"
}

Upload a Dataset

POST /api/v1/datasets (multipart form data)

curl -X POST https://api.tunedtensor.com/v1/datasets \
  -H "Authorization: Bearer tt_your_api_key" \
  -F "name=my-training-data" \
  -F "description=Custom training dataset" \
  -F "file=@training.jsonl"

The file must be JSONL format. Each line is validated as valid JSON. Status will be validated if all lines parse correctly, or invalid with error details.

JSONL Format

Each line should be a JSON object with a messages array:

{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi!"}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "Help"}, {"role": "assistant", "content": "Sure!"}]}

List Datasets

GET /api/v1/datasets

curl https://api.tunedtensor.com/v1/datasets \
  -H "Authorization: Bearer tt_your_api_key"

Get a Dataset

GET /api/v1/datasets/:id

Delete a Dataset

DELETE /api/v1/datasets/:id

Deletes the dataset record and the underlying file from storage.