SynthForge SynthForge SynthForge IO

Integration · DuckDB

Generate DuckDB test data, with foreign keys intact

Realistic, referentially-correct multi-table datasets you can load into DuckDB with one COPY. Free, no signup.

Short answer

Design your tables in SynthForge, generate, and export the DuckDB bundle. You get DuckDB-dialect CREATE TABLE statements with foreign keys inline, CSV data files, and an import_duckdb.sql script that loads everything with native COPY commands. Every child row references a real parent row by construction.

Why generate rather than hand-roll

If you have ever populated a DuckDB file for a demo or a benchmark, you have probably written a script with random values, then discovered your orders reference customers that do not exist, then added dedup and parent-lookup code that has nothing to do with what you were testing.

SynthForge resolves the table dependency graph, generates parents first, and draws each foreign key from the parent IDs that actually exist, so the data is referentially valid the moment it lands. A single COPY per table loads it, with no server to stand up.

How to do it

1. Define the schema

Describe it in plain English ("a customers table, an orders table with a customer_id foreign key, line_items referencing orders"), build it in the visual editor, or paste a CREATE TABLE script and let the importer pick out tables, columns, and single-column foreign keys.

2. Set distributions and cardinality (optional)

Numeric columns can follow Normal, LogNormal, Exponential, or Triangular distributions. Set per-relationship ratios like "5 to 20 orders per customer" instead of hand-typing a row count for every child table. This is the step that makes the data look real.

3. Generate and export DuckDB

Pick DuckDB as the SQL dialect. The export bundle contains import_duckdb.sql (the CREATE TABLE DDL with foreign keys declared inline, plus a COPY statement per table) and one csv/<table>.csv data file per table.

4. Load it

Run from the extracted bundle directory so the csv/ paths resolve. One command creates every table and loads every row:

bash
duckdb mydb.duckdb < import_duckdb.sql

# the script runs native COPY statements like:
#   COPY customers FROM 'csv/customers.csv' (FORMAT CSV, HEADER);
#   COPY orders    FROM 'csv/orders.csv'    (FORMAT CSV, HEADER);

5. (Alternative) Use the Parquet export for analytics

SynthForge also exports Parquet, which DuckDB reads natively. Handy when you are benchmarking analytical queries rather than loading tables.

sql
CREATE TABLE orders AS
SELECT * FROM read_parquet('parquet/orders.parquet');

Frequently asked questions

Does DuckDB enforce the foreign keys?
DuckDB supports FOREIGN KEY constraints and SynthForge declares them inline in the CREATE TABLE statements. Independent of enforcement, the data is referentially valid by construction: child foreign keys are sampled from generated parent IDs, so every reference resolves.
How large a dataset can I generate?
Up to 10,000,000 rows per table per request, with per-account rate limits. A roughly one-million-row table generates in minutes; larger volumes take proportionally longer.
Can I get the same schema for Postgres or another engine later?
Yes. The same schema exports to PostgreSQL, MySQL, SQLite, SQL Server, MariaDB, DuckDB, and CockroachDB, plus JSON, JSONL, and Parquet.
Does SynthForge use my real data?
No. SynthForge generates greenfield data from the schema you give it; it does not ingest or de-identify a real database. If you need to anonymize real production data, that is a different tool (see Tonic or NVIDIA NeMo).

Related

Ready to get started?

Design a multi-table schema, generate referentially-intact data, and export the DuckDB bundle. No credit card.