SynthForge IO

Generate realistic data.
Export to any format.

100+ field types, statistical distributions, conditional overrides, parallel generation at 60K+ rows/sec, and exports to CSV, SQL, JSON, MongoDB, Parquet, and SQLite.

100+ field types 7 SQL dialects Statistical distributions 6 export formats Generate Data

100+ Realistic Field Types

Domain-specific data generators across 13 categories

Basic

Integer, float, string, text, boolean, UUID

Personal

First/last name, email, phone, SSN, gender

Address

Street, city, state, postal code, country

Date & Time

Date, datetime, timestamp, time zones

Commerce

Product name, price, SKU, company, department

Internet

URL, IPv4, IPv6, MAC address, domain

Financial

Credit card, IBAN, account number, routing

Healthcare

ICD-10, CPT, NDC codes, blood type, MRN

Geographic

Latitude, longitude, timezone, coordinates

Semantic

Job title, color, language, industry

Boolean

Yes/no, true/false, active/inactive variants

Custom

Enums, regex patterns, computed values

Fantasy

Names, races, classes, weapons, spells, items

Constraints & Correlations

Fine-grained control so your generated data looks and behaves like the real thing

Value Ranges

Set minimum and maximum values for numeric fields. Define valid ranges for prices, ages, quantities.

min: 0, max: 999.99

String Length

Control minimum and maximum length for text fields. Perfect for usernames, descriptions, IDs.

min_length: 3, max_length: 50

Regex Patterns

Generate data matching specific patterns. Ideal for custom IDs, codes, formatted strings.

pattern: "[A-Z]2-[0-9]4"

Enumerated Values

Restrict fields to specific allowed values. Great for status fields, categories, types.

enum: ["active", "pending", "closed"]

Weighted Distribution

Control percentage distribution across values, or apply statistical distributions like Normal, LogNormal, Exponential, and Triangular to numeric fields.

Normal(mean: 50, std: 15)

Date Ranges

Constrain dates to specific periods. Set start and end for timestamps and date fields.

start: 2020-01-01, end: 2024-12-31

Field Correlations

Fields that make sense together - cities match their states, product prices vary by category, birth dates match age ranges, and order totals match line item sums. Correlations catch bugs that only surface with realistic value combinations.

Statistical Distributions

Generate statistically realistic data instead of flat random values

Probability Distributions

Model real-world data patterns

Apply Normal, LogNormal, Exponential, or Triangular distributions to any numeric field. Generate salary data that clusters around a median, response times with a long tail, or test scores that follow a bell curve.

Normal mean: 75000, std: 15000
LogNormal mean: 3.5, sigma: 0.8
Exponential lambda: 0.5
Triangular low: 1, mode: 5, high: 10

Conditional Overrides

Distributions that vary by context

Change how a field is distributed based on another field's value. Salary distributions that shift by department, price ranges that vary by product category, or age distributions that differ by region. Your test data reflects the same patterns as production.

When department = "Engineering" then salary ~ Normal(120k, 20k)
When department = "Marketing" then salary ~ Normal(85k, 15k)
When category = "Electronics" then price ~ LogNormal(5.5, 1.2)

Intelligent Data Generation

Fields that understand each other - dependencies and derived rules produce data that makes sense together

Field Dependencies

Lookup tables keep related values consistent

ZIP codes automatically match their city and state. Use built-in geographic lookup tables or define your own - SynthForge IO recommends which parent fields to depend on.

Field dependency configuration showing hq_zip depending on hq_city and hq_state via US Geographic lookup table

Derived Rules

Conditional logic that mirrors real-world relationships

Define WHEN/THEN rules with formulas that reference other fields. Example: when age >= 40 and BMI >= 30, multiply heart rate by 1.5. Build realistic correlations that simple random generation can't produce.

Derived field rules configuration showing conditional logic for systolic_bp based on age and BMI values

The Most Flexible Export Options Available

No other synthetic data generator supports this many database types, export formats, and configuration options, all included for free.

CSV

SQL

JSON

MongoDB

Parquet

SQLite

Relational Databases

7 SQL dialects with dialect-specific DDL and bulk loading

PostgreSQL

COPY command, serial/identity columns, array types

MySQL

LOAD DATA INFILE, auto_increment, engine options

SQL Server

BULK INSERT, identity columns, T-SQL syntax

SQLite

.import command, lightweight DDL, binary file export

MariaDB

MySQL-compatible with MariaDB-specific optimizations

DuckDB

COPY FROM, columnar analytics-optimized DDL

CockroachDB

IMPORT INTO, distributed-compatible DDL

Every dialect includes CREATE TABLE, bulk loading commands, and FK constraints optimized for the target database.

Document, Analytics & Files

MongoDB, JSON variants, Parquet, CSV, and SQLite binary

MongoDB

Embedded documents from related tables, mongoimport-ready output

JSON - 6 structures

Records Table Lines / NDJSON Formatted Compact Flat

Parquet - 4 compression algorithms

Snappy Gzip LZ4 Zstd

CSV & SQLite Binary

Standard CSV export and downloadable SQLite database files ready to query

Generate Data. Visualize It Your Way.

Take SynthForge IO data and plug it into your favorite tools - these charts were built from our Ecommerce and Banking template schemas using Matplotlib

Data generated by SynthForge IO - charts rendered with Matplotlib

Built for Scale

Generate million-row datasets without breaking a sweat

Parallel Generation

Three levels of parallelism: row chunks across CPU cores, independent entities generated concurrently, and exports written in parallel. Achieves 60,000+ rows per second on multi-core systems.

Real-Time Progress

Live progress tracking that reports actual rows completed per entity, not a fake progress bar. See exactly where your generation stands with per-table breakdowns updated every second.

Million-Row Datasets

Generate production-scale datasets with full referential integrity. Automatic dependency ordering and chunk-based processing keep memory usage constant regardless of dataset size.

Built for Developers

Real scenarios, real solutions

Application Developer

"Test your API with realistic payloads"

Generate request/response data that matches your schema. Catch edge cases before they reach production. Fill your dev database with realistic records.

QA Engineer

"Edge cases don't hide from realistic data"

Generate thousands of test records with diverse, realistic values. Cover boundary conditions automatically. Test with production-scale volumes.

Data Engineer

"Prototype pipelines with production-scale data"

Test ETL pipelines before production data exists. Validate transformations with realistic datasets. Generate millions of rows for load testing.

Frequently Asked Questions

What field types does SynthForge IO support?

SynthForge IO offers 100+ field types across 13 categories: Basic (integer, float, string, UUID), Personal (name, email, phone, SSN), Address (street, city, state, postal code), Date & Time, Commerce (product, price, SKU), Internet (URL, IP, domain), Financial (credit card, IBAN), Healthcare (ICD-10, CPT, blood type), Geographic (lat/long, timezone), Semantic (job title, color, language), Boolean, Custom (enums, regex, computed), and Fantasy (names, races, weapons).

What export formats are available?

SynthForge IO exports to CSV, SQL (with INSERT statements for PostgreSQL, MySQL, SQLite, SQL Server, MariaDB, DuckDB, and CockroachDB), JSON, MongoDB (with embedded documents), Apache Parquet, and SQLite database files. Each SQL export includes CREATE TABLE statements, bulk loading commands, and FK constraints optimized for the target database.

How do constraints and correlations work?

You can set value ranges, string length limits, regex patterns, enumerated values, distribution ratios, and date ranges on any field. Field dependencies use lookup tables to keep related values consistent (e.g., ZIP codes match their city and state). Derived rules let you define WHEN/THEN conditional logic with formulas that reference other fields.

What are statistical distributions and conditional overrides?

Statistical distributions let you apply Normal, LogNormal, Exponential, or Triangular probability distributions to numeric fields - so salary data clusters around a median, response times have a realistic long tail, or test scores follow a bell curve. Conditional overrides take this further: you can vary the distribution based on another field's value. For example, salary distributions that shift by department or price ranges that vary by product category.

How fast is data generation?

SynthForge IO uses three levels of parallelism: row chunks are distributed across CPU cores, independent entities are generated concurrently, and exports are written in parallel. This achieves 60,000+ rows per second on multi-core systems, making million-row datasets practical. Real-time progress tracking shows actual rows completed per entity as generation runs.

Start Generating Realistic Data

100+ field types, fine-grained constraints, and export to any format your stack needs.