TransXLab
Rust CLI · Zero Dependencies · <1s Validation

Don't waste compute
on doomed runs.

TransXLab validates and designs LLM fine-tuning configurations before training starts. One binary. No Python. Catches the mistakes that cost you $665 and a weekend.

3.3 MB
Single binary, no runtime
20
Failure mode signatures
25
Hyperparameter rules
<1s
Full validation pass
Fine-tuning is trial by fire.
Usually, you just get burned.
  • ×
    VRAM overflow 40 minutes in. You eyeballed the memory math. The OOM killer didn't.
  • ×
    Learning rate off by 10x. Loss looks fine for an hour, then diverges. You find out after the cloud bill hits.
  • ×
    Template contamination in your data. Every row starts with the same prompt. The model memorizes the wrapper, not the task.
  • ×
    No config validation anywhere. HuggingFace Trainer will happily launch a 93 GB job on a 24 GB card. It's not its problem.
  • ×
    Postmortem is archaeology. When a run fails at epoch 8, you're reverse-engineering logs with no tooling.
$665
Wasted on one doomed run
AC-v2: Full fine-tune of Llama-3-8B.
Wrong learning rate. Wrong epoch count. Wrong VRAM estimate.
Every issue was catchable before launch.
Three-level validation pipeline.

TransXLab runs your config through a layered analysis in under a second. Each stage builds on the last. Nothing ships to the GPU until everything passes.

1

Preflight

Environment checks and hardware validation before anything else runs.

  • GPU detection and VRAM inventory
  • CUDA version compatibility
  • Disk space for checkpoints
  • Model download verification
  • Dependency availability
2

Design

Architecture analysis and hyperparameter validation against 25 rules.

  • VRAM estimation (model + optimizer + gradient)
  • Learning rate range validation
  • Batch size / gradient accumulation sizing
  • LoRA rank and target module recommendations
  • Epoch count and overfitting risk
3

Data Strategy

Training data quality analysis to catch contamination and distributional issues.

  • Self-BLEU diversity scoring
  • Template contamination detection
  • Token length distribution analysis
  • Class balance assessment
  • Train/eval split validation
Built for ML engineers who ship.

Zero Dependencies

Single 3.3 MB static binary. No Python, no pip, no conda. Copy it to your server and go.

HuggingFace Hub Integration

Auto-detects model architecture, parameter count, and precision from any Hub model ID.

Config Generation

Generates validated configs for HF Trainer, Axolotl, and LLaMA-Factory from your spec.

Cloud Cost Estimation

Estimates cost across 7 GPU tiers and 4 cloud providers before you commit to a run.

CI/CD Gating

Use --fail-on warn|fail with JSON output to gate training pipelines in CI.

Postmortem Diagnosis

Feed failed training logs in. TransXLab matches against 20 failure mode signatures to tell you what went wrong.

AC-v2: The $665 postmortem that started it all.

A full fine-tune of Llama-3-8B. Every parameter was plausible. All of them were wrong. TransXLab catches every issue in under a second.

The Doomed Config

Model Llama-3-8B
Method Full Fine-Tune
Learning Rate 1e-4
Epochs 10
VRAM Available 24 GB
VRAM Required 93.4 GB
Data Self-BLEU 0.697
Cost $665
  • FAIL VRAM overflow: 93.4 GB required vs 31.8 GB available (24 GB card + overhead). Will OOM before first backward pass.
  • FAIL Learning rate 1e-4 is 3.3x too high for 8B full fine-tune. Recommends 3e-5 to avoid divergence.
  • WARN 10 epochs on a small dataset risks catastrophic overfitting. Recommends 2-3 epochs with early stopping.
  • WARN Template contamination detected: self-BLEU=0.697 indicates high prefix repetition. Model will memorize wrappers.
transxlab validate ac-v2.yaml
$ transxlab validate --config ac-v2.yaml

TransXLab v0.1.0 // validate & design before you train

== PREFLIGHT ==
[PASS] CUDA 12.1 detected
[PASS] GPU: NVIDIA RTX 4090 (24 GB)
[PASS] Disk: 847 GB free
[FAIL] VRAM insufficient
       Required: 93.4 GB (model=16.1 + optimizer=32.1 + gradients=16.1 + activations=29.1)
       Available: 31.8 GB (24 GB physical + 7.8 GB shared)
       Recommendation: Use LoRA (r=16) to reduce to ~18.2 GB

== DESIGN ==
[FAIL] Learning rate 1e-4 exceeds safe range for 8B full fine-tune
       Max recommended: 5e-5 | Optimal: 3e-5
       Rule: lr_max = 1e-4 / sqrt(params_B) for full fine-tune
[WARN] Epoch count 10 likely to overfit
       Dataset size: 2,847 samples
       Recommended: 2-3 epochs with eval_steps=50, early_stopping_patience=3
[PASS] Batch size 4 with gradient_accumulation_steps=8
[PASS] Weight decay 0.01 within range
[PASS] Warmup ratio 0.03 appropriate

== DATA STRATEGY ==
[WARN] Template contamination detected
       Self-BLEU: 0.697 (threshold: 0.5)
       Top repeated 4-gram: "Below is an instruction that" (87% of samples)
       Recommendation: Strip template wrappers, diversify instruction phrasing
[PASS] Token length distribution: mean=342, std=128
[PASS] No class imbalance detected

== SUMMARY ==
2 FAILURES  2 WARNINGS  7 PASSED

VERDICT: DO NOT TRAIN
Fix VRAM and learning rate issues before proceeding.
Estimated cost if run anyway: $665 across 4x A100 for ~18 hours.

Run transxlab design --model meta-llama/Llama-3-8B --method lora for a corrected config.
Running in 30 seconds.
Step 1 — Install

Download the binary

# Linux / macOS
$ curl -fsSL https://github.com/zamfir70/transxlab/releases/latest/download/transxlab \
    -o /usr/local/bin/transxlab
$ chmod +x /usr/local/bin/transxlab

# Verify
$ transxlab --version
transxlab 0.1.0 (3.3 MB, zero dependencies)
Step 2 — Validate

Check your config

# Validate an existing config
$ transxlab validate --config my-run.yaml

# Design a new config from scratch
$ transxlab design \
    --model meta-llama/Llama-3-8B \
    --method lora \
    --gpu "RTX 4090"

# CI gate: fail pipeline on warnings
$ transxlab validate --config run.yaml \
    --fail-on warn --output json
Step 3 — Estimate

Know your costs

$ transxlab cost --config my-run.yaml

Cost Estimates (3 epochs, 2847 samples)

Provider        GPU          $/hr   Hours   Total
─────────────────────────────────────────────────
Lambda          A100 80GB    $1.10   4.2    $4.62
RunPod          A100 80GB    $1.64   4.2    $6.89
AWS             p4d.24xl     $3.93   4.2    $16.51
GCP             a2-highgpu   $3.67   4.2    $15.41
Step 4 — Diagnose

Analyze failed runs

$ transxlab postmortem --log training.log

Postmortem Analysis

[MATCH] Loss Divergence @ step 1,247
  Pattern: loss > 2x moving average for 50+ steps
  Cause:   Learning rate too high after warmup
  Fix:     Reduce lr by 3-5x or use cosine schedule

[MATCH] Gradient Norm Spike @ step 1,190
  Pattern: grad_norm > 10x baseline
  Cause:   Likely precedes loss divergence
  Fix:     Add max_grad_norm=1.0 clipping
TransXLab + TransXform

TransXLab + TransXform

TransXLab validates before training. TransXform supervises during training. Together, they cover the full fine-tuning lifecycle — from config validation to live monitoring, early stopping, and checkpoint management.

Use TransXLab to design and gate your run. Hand the validated config to TransXform to execute it with live loss monitoring, automatic early stopping, and structured experiment logging.

TransXLab — Before Training Validate config, estimate cost, check VRAM, analyze data quality
2
Config handoff Validated YAML exported to HF Trainer / Axolotl / LLaMA-Factory format
TransXform — During Training Live monitoring, early stopping, checkpoint management, experiment tracking
4
Postmortem — After Training If anything goes wrong, feed logs back to TransXLab for failure diagnosis