Playground
Test base and fine-tuned models interactively using Together AI serverless inference. Compare outputs side by side to evaluate the effect of fine-tuning.
Overview
The Playground is available at Dashboard → Playground and via two REST endpoints. It supports every serverless base model on Together AI plus any fine-tuned model you have created through a run.
Key capabilities:
- Single & compare mode — run one model or two side by side in the same prompt
- System prompt — optionally prepend a system message
- Temperature & max tokens — adjustable per request (0–2 temperature, 64–4 096 tokens)
- Latency & token metrics — every response displays wall-clock latency and prompt/completion token counts
Available Base Models
These models are available for serverless inference in the Playground. They are separate from the fine-tunable base models listed on the Overview page.
| Model | Context Length |
|---|---|
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 131,072 |
meta-llama/Llama-3.3-70B-Instruct-Turbo | 131,072 |
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1,048,576 |
Qwen/Qwen2.5-7B-Instruct-Turbo | 32,768 |
mistralai/Mixtral-8x7B-Instruct-v0.1 | 32,768 |
deepseek-ai/DeepSeek-V3.1 | 131,072 |
google/gemma-3n-E4B-it | 32,768 |
Fine-tuned models you create through runs also appear in the model selector and can be used via the API by passing their provider_model_id.
List Playground Models
GET /api/v1/playground/models
curl https://api.tunedtensor.com/v1/playground/models \
-H "Authorization: Bearer tt_your_api_key"Response:
{
"data": {
"base_models": [
{
"id": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"name": "Llama 3.1 8B Instruct Turbo",
"type": "base"
}
],
"fine_tuned_models": [
{
"id": "user/Llama-3.2-3B-Instruct-ft-abc123",
"name": "Llama-3.2-3B-Instruct-ft-abc123",
"type": "fine-tuned",
"base_model": "meta-llama/Llama-3.2-3B-Instruct"
}
]
}
}| Field | Description |
|---|---|
base_models | Together AI serverless models available for inference |
fine_tuned_models | Your fine-tuned models (includes base_model for reference) |
Run a Completion
POST /api/v1/playground/completions
curl -X POST https://api.tunedtensor.com/v1/playground/completions \
-H "Authorization: Bearer tt_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain LoRA fine-tuning in one paragraph." }
],
"temperature": 0.7,
"max_tokens": 1024
}'Request Body
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | — | Model ID (base or fine-tuned). Required. |
messages | {role, content}[] | — | Chat messages. At least one required. Roles: system, user, assistant. |
temperature | number | 0.7 | Sampling temperature (0–2). |
max_tokens | integer | 1024 | Maximum tokens to generate (1–4 096). |
Response
{
"data": {
"content": "LoRA (Low-Rank Adaptation) is a parameter-efficient ...",
"latency_ms": 823,
"usage": {
"prompt_tokens": 28,
"completion_tokens": 156
}
}
}| Field | Description |
|---|---|
content | Generated text from the model |
latency_ms | Wall-clock inference time in milliseconds |
usage.prompt_tokens | Tokens consumed by the input prompt |
usage.completion_tokens | Tokens generated in the response |
Error Codes
| Status | Code | Meaning |
|---|---|---|
| 400 | validation_error | Invalid request body (missing model, empty messages, etc.) |
| 404 | model_not_found | Model is not a supported base model and not one of your fine-tuned models |
| 429 | rate_limited | Too many requests — retry after a short delay |
| 500 | inference_error | Upstream provider error |
Comparing Base vs Fine-Tuned
A common workflow is to compare a base model against your fine-tuned version to verify that fine-tuning improved behaviour:
- Open the Playground and enable Compare mode
- Select the base model (e.g.
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) in Model A - Select your fine-tuned model in Model B
- Enter a system prompt and user message from your behaviour spec
- Click Run — both models run in parallel and responses appear side by side with latency and token metrics
Via the API, make two separate POST /api/v1/playground/completions calls with the same messages but different model values.