SynthForge SynthForge SynthForge IO

Generate realistic data.
Export to any format.

111 field types, statistical distributions, conditional weight overrides, and exports to CSV, SQL (7 dialects), JSON, JSONL, and Parquet.

111 field types 7 SQL dialects Statistical distributions 5 export formats Launch App

111 Realistic Field Types

Domain-specific data generators across 19 categories

Basic & primitives

Integer, float, string, text, boolean, UUID, enum, date, datetime, integer sequence

Personal

Names, age, date of birth, gender, nationality, SSN, blood type, prefix, suffix, occupation

Contact

Email, phone number, area code, username, password

Address

Street, city, state, postal code, country, building number, secondary address, full address

Geographic

Latitude, longitude, license plate

Date & time

Future date, past date, time

Business & company

Company name, department, industry, sector, sub-industry, job title, catch phrase, vehicle

Commerce & products

Order ID, product name, product category, subcategory, type

Financial

Credit card, IBAN, SWIFT, currency code/amount, decimal, EAN, stock ticker, price, percentage, tax rate, discount

Internet & web

URL, domain name, image URL, emoji, hashtag, slug

Networking & developer

IPv4, IPv6, MAC address, port, HTTP method, MIME type, file extension, user agent, API key, locale, language code

Text & content

Word, sentence, paragraph, book title, ISBN-13

Healthcare

CPT code & description, NDC code/drug name/generic name, ICD-10 code & description

Hardware

CPU, GPU

Food & drink

Dish, drink

Education

Academic degree, university

Color

Color name, hex color

Measurement

Height, weight

Fantasy & creative

Character class, item, name, race, spell, weapon

Constraints & Correlations

Fine-grained control so your generated data looks and behaves like the real thing

Value Ranges

Set minimum and maximum values for numeric fields. Define valid ranges for prices, ages, quantities.

min: 0, max: 999.99

String Length

Control minimum and maximum length for text fields. Perfect for usernames, descriptions, IDs.

min_length: 3, max_length: 50

Regex Patterns

Generate data matching specific patterns. Ideal for custom IDs, codes, formatted strings.

pattern: "[A-Z]2-[0-9]4"

Enumerated Values

Restrict fields to specific allowed values. Great for status fields, categories, types.

enum: ["active", "pending", "closed"]

Weighted Distribution

Control percentage distribution across values, or apply statistical distributions like Normal, LogNormal, Exponential, and Triangular to numeric fields.

Normal(mean: 50, std: 15)

Date Ranges

Constrain dates to specific periods. Set start and end for timestamps and date fields.

start: 2020-01-01, end: 2024-12-31

Field Correlations

Fields that make sense together - cities match their states, product prices vary by category, birth dates match age ranges, and order totals match line item sums. Correlations catch bugs that only surface with realistic value combinations.

Statistical Distributions

Generate statistically realistic data instead of flat random values

Probability Distributions

Model real-world data patterns

Apply Normal, LogNormal, Exponential, or Triangular distributions to any numeric field. Generate salary data that clusters around a median, response times with a long tail, or test scores that follow a bell curve.

Normal mean: 75000, std: 15000
LogNormal mean: 3.5, sigma: 0.8
Exponential lambda: 0.5
Triangular low: 1, mode: 5, high: 10

Conditional Overrides

Distributions that vary by context

Change how a field is distributed based on another field's value. Salary distributions that shift by department, price ranges that vary by product category, or age distributions that differ by region. Your test data reflects the same patterns as production.

When department = "Engineering" then salary ~ Normal(120k, 20k)
When department = "Marketing" then salary ~ Normal(85k, 15k)
When category = "Electronics" then price ~ LogNormal(5.5, 1.2)

Intelligent Data Generation

Fields that understand each other - dependencies and derived rules produce data that makes sense together

Field Dependencies

Lookup tables keep related values consistent

AI-generated schemas with state, city, and ZIP columns get coherent geographic dependencies wired automatically - no clicks required. Same for sector/industry/sub-industry chains. ZIP codes always match their city and state by construction; industry codes always roll up to a real sector. Hand-edit the dependencies in the visual editor when you want different parent fields.

Field dependency configuration showing hq_zip depending on hq_city and hq_state via US Geographic lookup table

Derived Rules

Conditional logic that mirrors real-world relationships

Define WHEN/THEN rules with formulas that reference other fields. Example: when age >= 40 and BMI >= 30, multiply heart rate by 1.5. Build realistic correlations that simple random generation can't produce.

Derived field rules configuration showing conditional logic for systolic_bp based on age and BMI values

Cardinality Ratios

Per-relationship row count math, so child tables scale with their parents

Per-Relationship Ratios

Children scale with parents, not by hand

Set "8-12 patients per doctor" or "100-500 line items per order" instead of hand-typing a row count for every child table. SynthForge picks a uniform-random multiplier per parent and resolves child counts in topological order. Multiple parent ratios on the same child (M:N junction tables) sum naturally.

doctors: 50 (fixed)
patients: 8 - 12 per doctor → ~500 rows
visits: 1 - 5 per patient → ~1,500 rows

Auto-Saved Defaults

Last generation's settings come back next time

Every successful generation persists its row counts and ratios as the schema's defaults. Open the same schema's generation form tomorrow, next month, or from a different machine - the inputs come back exactly the way you left them. No more re-typing 100, 100, 100 across every table.

Generation #1: set patients = 1000, doctors = 50, visits = 2-4 per doctor
Generation #2 (same schema): form pre-fills with all of those values
Override anything, regenerate, the new values become the next default

The Most Flexible Export Options Available

No other synthetic data generator supports this many database types, export formats, and configuration options, all included for free.

CSV

SQL

JSON

MongoDB

Parquet

SQLite

Relational Databases

7 SQL dialects with dialect-specific DDL and bulk loading

PostgreSQL

COPY command, serial/identity columns, array types

MySQL

LOAD DATA INFILE, auto_increment, engine options

SQL Server

BULK INSERT, identity columns, T-SQL syntax

SQLite

.import command, lightweight DDL, binary file export

MariaDB

MySQL-compatible with MariaDB-specific optimizations

DuckDB

COPY FROM, columnar analytics-optimized DDL

CockroachDB

IMPORT INTO, distributed-compatible DDL

Every dialect includes CREATE TABLE, bulk loading commands, and FK constraints optimized for the target database.

Document, Analytics & Files

MongoDB, JSON variants, Parquet, CSV, and SQLite binary

MongoDB

Embedded documents from related tables, mongoimport-ready output

JSON - 6 structures

Records Table Lines / NDJSON Formatted Compact Flat

Parquet - 4 compression algorithms

Snappy Gzip LZ4 Zstd

CSV & SQLite Binary

Standard CSV export and downloadable SQLite database files ready to query

Generate Data. Visualize It Your Way.

Take SynthForge IO data and plug it into your favorite tools - these charts were built from our Ecommerce and Banking template schemas using Matplotlib

Data generated by SynthForge IO - charts rendered with Matplotlib

Built for Scale

Real-Time Progress

Live progress tracking that reports actual rows completed per entity, not a fake progress bar. See exactly where your generation stands with per-table breakdowns updated every second.

Million-Row Datasets

Generate production-scale datasets with full referential integrity. Automatic dependency ordering and chunk-based processing keep memory usage constant regardless of dataset size.

Built for Developers

Real scenarios, real solutions

Application Developer

"Test your API with realistic payloads"

Generate request/response data that matches your schema. Catch edge cases before they reach production. Fill your dev database with realistic records.

QA Engineer

"Edge cases don't hide from realistic data"

Generate thousands of test records with diverse, realistic values. Cover boundary conditions automatically. Test with production-scale volumes.

Data Engineer

"Prototype pipelines with production-scale data"

Test ETL pipelines before production data exists. Validate transformations with realistic datasets. Generate millions of rows for load testing.

Frequently Asked Questions

What field types does SynthForge IO support?

SynthForge IO ships 111 field types across 19 categories: Basic primitives (integer, float, string, text, boolean, UUID, enum, date, datetime, integer sequences), Personal (first/last/full name, age, date of birth, gender, nationality, occupation, prefix, suffix, SSN, blood type), Contact (email, phone, area code, username, password), Address (street, city, state, postal code, country, building number, full address, secondary address), Geographic (latitude, longitude, license plate), Date & time (future date, past date, time), Business & company (catch phrase, company name, department, industry, sector, sub-industry, job title, vehicle), Commerce & products (order ID, product name, product category / subcategory / type), Financial (credit card, IBAN, SWIFT, currency code, currency amount, decimal, EAN, stock ticker, discount, price, percentage, tax rate), Internet & web (URL, domain, image URL, emoji, hashtag, slug), Networking & developer (IPv4/IPv6, MAC, port, HTTP method, MIME type, file extension, user agent, API key, locale, language code), Text & content (word, sentence, paragraph, book title, ISBN-13), Healthcare (CPT code/description, NDC code/drug/generic, ICD-10 code/description), Hardware (CPU, GPU), Food & drink (dish, drink), Education (academic degree, university), Color (color name, hex color), Measurement (height, weight), and Fantasy & creative (character class, item, name, race, spell, weapon).

What export formats are available?

SynthForge IO exports to CSV, SQL (DDL + INSERT statements for PostgreSQL, MySQL, SQLite, SQL Server, MariaDB, DuckDB, and CockroachDB), JSON, JSONL, and Apache Parquet. JSON / JSONL output is MongoDB-ready - load it via mongoimport. Each SQL export includes CREATE TABLE statements, bulk loading commands (\copy, LOAD DATA INFILE, bcp, etc.), and FK constraints optimized for the target database.

How do constraints and correlations work?

You can set value ranges, string length limits, regex patterns, enumerated values, distribution ratios, and date ranges on any field. Field dependencies use lookup tables to keep related values consistent (e.g., ZIP codes match their city and state). Derived rules let you define WHEN/THEN conditional logic with formulas that reference other fields.

What are statistical distributions and conditional overrides?

Statistical distributions let you apply Normal, LogNormal, Exponential, Triangular, or Uniform probability distributions to numeric fields - so salary data clusters around a median, response times have a realistic long tail, or test scores follow a bell curve. Conditional weight overrides take this further: you can vary sampling weights based on another field's value (e.g., shift category weights by department, or product-tier weights by region).

Can I set per-relationship row count ratios (e.g. N children per parent)?

Yes. For any child table that has a foreign key to a parent, the generation form lets you toggle from a fixed row count to a 'per parent row' ratio. Specify a min - max range (e.g. '8 to 12 patients per doctor') and SynthForge picks a uniform-random multiplier per parent. Ratios resolve in topological FK order, so a chain like doctors -> patients -> visits cascades automatically. Multi-parent ratios on the same junction table sum together (so '20 appointments per doctor' + '5 appointments per patient' produces a junction that respects both axes). Top-level tables stay as plain row counts.

Does SynthForge remember my last generation settings?

Yes. Every successful dataset job persists its row counts and any ratios as the schema's defaults. The next time you open the generation form for that schema, the inputs pre-fill with what you last submitted. Override anything before regenerating; the new values overwrite the defaults. Works across browsers and devices since the defaults live on the schema record, not in local storage.

Start Generating Realistic Data

111 field types, fine-grained constraints, and export to any format your stack needs.