HSH·Intelligence
Data-on-Demand · for AI fine-tuning

The dataset is the half of fine-tuning nobody hands you.

Describe the task. We build a clean, answer-verified dataset and deliver it HuggingFace-ready — drop the repo straight into Gradients, TRL, Axolotl, or Unsloth. Every checkable answer is verified in code, not trusted from a model.

row 0428 · verified-math-reasoning-3ktrain split
instructionAn item costs $161. It is on sale with a 25% discount. What is the final price in dollars?
output…25% of $161 = $40.25 · $161 − $40.25 = $120.75 · The answer is 120.75
ground truth120.75  computed in Python, not the model
answer matches ground truth — row kept · mismatches are discarded
How it works

Python owns the truth. The model only writes the prose.

Most synthetic datasets trust the language model to be right. This one doesn't. The correct answer is computed independently before the model writes a word — then the model's answer is checked against it.

step 01

Construct

Each problem is built in code with a known-correct answer as ground truth.

step 02

Generate

The model writes step-by-step reasoning and a final answer for the problem.

step 03

Verify

The model's answer is checked against ground truth in code. Mismatches are discarded.

step 04

Deliver

Deduplicated, split train/val/test, documented, pushed to a HuggingFace repo.

100%
of checkable answers verified against code-computed ground truth
Alpaca
instruction / input / output — Gradients-ready, drop-in for TRL · Axolotl · Unsloth
24h
standard turnaround on a verifiable build · Apache-2.0, commercial use
Pricing

Priced like a fine-tuning job — not a data subscription.

A Gradients run costs $100–500 and you still have to bring the data. We supply the verified dataset for a flat per-build price. No setup fee, no per-record meter.

Tier S
$75
1,000 – 2,000 rows
  • Answer-verified rows
  • Train / val / test split
  • Dataset card + license
  • HuggingFace repo
Tier M · most fine-tunes
$150
2,000 – 5,000 rows
  • Everything in S
  • Larger, richer coverage
  • Schema tuned to your trainer
  • Gradients drop-in instructions
Tier L
$300
5,000 – 10,000 rows
  • Everything in M
  • Highest volume tier
  • Priority build queue
  • Custom / larger scoped on request
Order

Two ways in: an agent endpoint, or a human note.

A

Autonomous agents — call the hsh-finetune-dataset tool on our x402 endpoint. Describe the task, pay in USDC, receive the repo.

https://dod.hshintelligence.com/mcp
B

Humans — tell us the task, the domain, and how many rows. We scope it, build it, and send the HuggingFace repo. Verifiable tasks are quoted and confirmed fast.

data@hshintelligence.com