Generate Fraud Detection ML Training Data
Generate production-ready fraud detection datasets with extreme class imbalance (2% fraud), log-normal transaction amounts, and behavioral risk signals, ready for anomaly detection models.
Fraud Detection template configuration
Here's the pre-built template configuration. Customize everything after loading.
{
"templateName": "Fraud Detection",
"taskType": "classification",
"numSamples": 10000,
"features": [
{ "name": "transaction_amount", "type": "numeric", "distribution": "log-normal", "mean": 150, "std": 200 },
{ "name": "distance_from_home", "type": "numeric", "distribution": "exponential", "mean": 25 },
{ "name": "time_since_last", "type": "numeric", "distribution": "exponential", "mean": 48 },
{ "name": "merchant_category", "type": "categorical", "categories": ["retail", "online", "travel", "food", "gas"] },
{ "name": "is_international", "type": "boolean", "trueRatio": 0.08 }
],
"target": { "labels": ["legitimate", "fraud"], "weights": [98, 2] },
"noise": 0.1
} Built for Fraud
Every feature is configured with domain-appropriate distributions and realistic parameters.
Extreme Class Imbalance (2/98)
Pre-configured with 2% fraud / 98% legitimate split reflecting real-world fraud rates. Test how your model handles rare event detection with SMOTE, undersampling, or cost-sensitive learning.
Transaction Amount Distributions
Transaction amounts follow log-normal distributions. Most transactions are small, with a long tail of large purchases. Configurable mean and standard deviation match your domain.
Behavioral Distance Features
Distance from home and time since last transaction use exponential distributions. These behavioral signals model how fraudulent transactions differ from normal spending patterns.
Merchant Category Risk
Five merchant categories with configurable weights. Model category-specific fraud risk and test how merchant type contributes to fraud detection accuracy.
Who uses Fraud Detection training data?
Fintech Fraud Teams
Build and benchmark fraud detection models with realistic transaction data. Test SMOTE, threshold tuning, and cost-sensitive approaches with known ground truth.
Risk & Compliance Analysts
Evaluate fraud detection rules and thresholds with controlled synthetic data. No PCI-DSS concerns. The data is entirely synthetic.
ML Researchers
Study extreme class imbalance techniques with configurable fraud rates. Compare oversampling, undersampling, and ensemble methods on controlled datasets.
Extreme Imbalance Handling
Fraud detection requires models that work with extremely rare positive events. SynthForge IO generates datasets purpose-built for this challenge.
2% Fraud Rate Default
The default 2/98 split matches real-world fraud rates. At 10,000 rows, you get ~200 fraud cases, enough to train and evaluate, while maintaining realistic rarity.
Adjustable Imbalance
Set the fraud rate from 0.1% (extreme rarity) to 50% (balanced). Test how your model degrades as the positive class becomes rarer.
10,000 Row Default
Larger default dataset size ensures sufficient positive samples even at extreme imbalance ratios. Scale up to 100K+ rows for production-scale testing.
Evaluation-Ready
Use precision-recall curves, F1 scores, and AUC-PR (not just accuracy) to evaluate. The baseline model evaluation highlights metrics appropriate for imbalanced data.
More ML use cases
Frequently asked questions
Why is the default dataset 10,000 rows?
Can I adjust the fraud rate?
Is this data PCI-DSS compliant?
What export formats are available?
How should I evaluate fraud detection models on this data?
Start Generating Fraud Detection Training Data
Load the Fraud Detection template, customize features and parameters, and export publication-ready datasets in seconds.