Data-on-Demand · for AI fine-tuning

The dataset is the half of fine-tuning nobody hands you.

Describe the task. We build a clean, answer-verified dataset and deliver it HuggingFace-ready — drop the repo straight into Gradients, TRL, Axolotl, or Unsloth. Every checkable answer is verified in code, not trusted from a model.

Order a dataset See a live sample →

row 0428 · verified-math-reasoning-3ktrain split

instructionAn item costs $161. It is on sale with a 25% discount. What is the final price in dollars?

output…25% of $161 = $40.25 · $161 − $40.25 = $120.75 · The answer is 120.75

ground truth120.75 computed in Python, not the model

✓ answer matches ground truth — row kept · mismatches are discarded

How it works

Python owns the truth. The model only writes the prose.

Most synthetic datasets trust the language model to be right. This one doesn't. The correct answer is computed independently before the model writes a word — then the model's answer is checked against it.

step 01

Construct

Each problem is built in code with a known-correct answer as ground truth.

step 02

Generate

The model writes step-by-step reasoning and a final answer for the problem.

step 03

Verify

The model's answer is checked against ground truth in code. Mismatches are discarded.

step 04

Deliver

Deduplicated, split train/val/test, documented, pushed to a HuggingFace repo.

100%

of checkable answers verified against code-computed ground truth

Alpaca

instruction / input / output — Gradients-ready, drop-in for TRL · Axolotl · Unsloth

24h

standard turnaround on a verifiable build · Apache-2.0, commercial use

Pricing

Priced like a fine-tuning job — not a data subscription.

A Gradients run costs $100–500 and you still have to bring the data. We supply the verified dataset for a flat per-build price. No setup fee, no per-record meter.

Tier S

$75

1,000 – 2,000 rows

Answer-verified rows
Train / val / test split
Dataset card + license
HuggingFace repo

Tier M · most fine-tunes

$150

2,000 – 5,000 rows

Everything in S
Larger, richer coverage
Schema tuned to your trainer
Gradients drop-in instructions

Tier L

$300

5,000 – 10,000 rows

Everything in M
Highest volume tier
Priority build queue
Custom / larger scoped on request

Order

Two ways in: an agent endpoint, or a human note.

Autonomous agents — call the hsh-finetune-dataset tool on our x402 endpoint. Describe the task, pay in USDC, receive the repo.

https://dod.hshintelligence.com/mcp

Humans — tell us the task, the domain, and how many rows. We scope it, build it, and send the HuggingFace repo. Verifiable tasks are quoted and confirmed fast.

data@hshintelligence.com