Playground

Test base and fine-tuned models interactively using Together AI serverless inference. Compare outputs side by side to evaluate the effect of fine-tuning.

Overview

The Playground is available at Dashboard → Playground and via two REST endpoints. It supports every serverless base model on Together AI plus any fine-tuned model you have created through a run.

Key capabilities:

Single & compare mode — run one model or two side by side in the same prompt
System prompt — optionally prepend a system message
Temperature & max tokens — adjustable per request (0–2 temperature, 64–4 096 tokens)
Latency & token metrics — every response displays wall-clock latency and prompt/completion token counts

Available Base Models

These models are available for serverless inference in the Playground. They are separate from the fine-tunable base models listed on the Overview page.

Model	Context Length
`meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`	131,072
`meta-llama/Llama-3.3-70B-Instruct-Turbo`	131,072
`meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8`	1,048,576
`Qwen/Qwen2.5-7B-Instruct-Turbo`	32,768
`mistralai/Mixtral-8x7B-Instruct-v0.1`	32,768
`deepseek-ai/DeepSeek-V3.1`	131,072
`google/gemma-3n-E4B-it`	32,768

Fine-tuned models you create through runs also appear in the model selector and can be used via the API by passing their provider_model_id.

List Playground Models

GET /api/v1/playground/models

curl https://api.tunedtensor.com/v1/playground/models \
  -H "Authorization: Bearer tt_your_api_key"

Response:

{
  "data": {
    "base_models": [
      {
        "id": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        "name": "Llama 3.1 8B Instruct Turbo",
        "type": "base"
      }
    ],
    "fine_tuned_models": [
      {
        "id": "user/Llama-3.2-3B-Instruct-ft-abc123",
        "name": "Llama-3.2-3B-Instruct-ft-abc123",
        "type": "fine-tuned",
        "base_model": "meta-llama/Llama-3.2-3B-Instruct"
      }
    ]
  }
}

Field	Description
`base_models`	Together AI serverless models available for inference
`fine_tuned_models`	Your fine-tuned models (includes `base_model` for reference)

Run a Completion

POST /api/v1/playground/completions

curl -X POST https://api.tunedtensor.com/v1/playground/completions \
  -H "Authorization: Bearer tt_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain LoRA fine-tuning in one paragraph." }
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Request Body

Parameter	Type	Default	Description
`model`	string	—	Model ID (base or fine-tuned). Required.
`messages`	{role, content}[]	—	Chat messages. At least one required. Roles: `system`, `user`, `assistant`.
`temperature`	number	0.7	Sampling temperature (0–2).
`max_tokens`	integer	1024	Maximum tokens to generate (1–4 096).

Response

{
  "data": {
    "content": "LoRA (Low-Rank Adaptation) is a parameter-efficient ...",
    "latency_ms": 823,
    "usage": {
      "prompt_tokens": 28,
      "completion_tokens": 156
    }
  }
}

Field	Description
`content`	Generated text from the model
`latency_ms`	Wall-clock inference time in milliseconds
`usage.prompt_tokens`	Tokens consumed by the input prompt
`usage.completion_tokens`	Tokens generated in the response

Error Codes

Status	Code	Meaning
400	`validation_error`	Invalid request body (missing model, empty messages, etc.)
404	`model_not_found`	Model is not a supported base model and not one of your fine-tuned models
429	`rate_limited`	Too many requests — retry after a short delay
500	`inference_error`	Upstream provider error

Comparing Base vs Fine-Tuned

A common workflow is to compare a base model against your fine-tuned version to verify that fine-tuning improved behaviour:

Open the Playground and enable Compare mode
Select the base model (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) in Model A
Select your fine-tuned model in Model B
Enter a system prompt and user message from your behaviour spec
Click Run — both models run in parallel and responses appear side by side with latency and token metrics

Via the API, make two separate POST /api/v1/playground/completions calls with the same messages but different model values.