SynthForge SynthForge SynthForge IO

Use case · Verified May 2026

Load testing data, generated correctly

k6 and Locust drive request load. Your database needs to look like prod, or the test only proves your dev DB is small. SynthForge fills the gap.

Short answer

Generate a multi-table dataset shaped like your prod database (right row counts, right distributions, FK-respecting), bulk-load it into your load-test database, then drive traffic against it. The results are honest because the database under test is honest.

The situation

A k6 script that exercises /api/orders/{id} is meaningless if the orders table has 200 rows. The query planner picks a different plan, the cache is hot for everything, and your test passes for the wrong reason.

What you want is a load-test database with the rough shape of prod: enough rows that index selectivity matters, distributions that match real traffic (a few customers with thousands of orders, most with one or two), and full FK integrity so cascading queries do not 404 mid-test.

The hand-rolled version: a Python script with Faker plus a parent/child loop plus a deduplication helper plus a 'why does this take six hours' moment. The SynthForge version: define the schema once, generate the dataset, bulk-load.

How to do it

1. Mirror your prod schema

Paste a CREATE TABLE script into SynthForge. Mark the FK columns. Save.

2. Set distributions that look like prod

Per-customer order counts often follow a power-law: most customers have 1-3 orders, a few have 1000+. SynthForge supports weighted FK sampling for exactly this shape. For numeric columns: LogNormal for prices, Exponential for time-between-events, Normal for ages.

3. Set realistic per-table row counts

The numbers that matter are the ones that match prod's order-of-magnitude. 1M customers, 30M orders, 100M line_items is typical mid-stage SaaS. Use multiple SynthForge jobs to exceed the 10M-per-request cap.

4. Bulk-load before the load test

Use the dialect-specific loader the export includes. PostgreSQL \copy, MySQL LOAD DATA INFILE, SQL Server bcp. None of these are 'INSERT one row at a time' speed.

bash
# Postgres bulk load (fastest)
psql $LOADTEST_DB -f synthforge-ddl.sql
for tbl in customers orders line_items; do
  psql $LOADTEST_DB -c "\copy $tbl FROM '$tbl.csv' CSV HEADER"
done

# Then run k6
k6 run loadtest.js

5. Vary the dataset between runs

Change row counts. Change distributions. Test the case where 99% of customers are new vs the case where 99% have order history. SynthForge regenerates in seconds.

When something else is the right call

Honest alternatives in case SynthForge is not the best fit for your specific situation.

k6 / Locust / JMeter / Gatling

These are the load drivers, not the data layer. They sit on top of whatever data you have. SynthForge fills the data layer.

Replaying production traffic against a sanitized prod copy

Highest fidelity, highest cost, requires a full de-identification pipeline (Tonic Structural). Worth it for serious capacity-planning work.

pgbench / sysbench

Synthetic micro-benchmarks for the database itself, not your app. Good for raw DB performance, not realistic application load tests.

Frequently asked questions

How big a dataset do I really need for a load test?
Big enough that index selectivity behaves like prod. For most apps, that means rows per table in the millions. The query planner cares about cardinality; if your test DB is 1000x smaller than prod, the planner picks different plans and your test is misleading.
Can SynthForge generate distributions that look like real traffic?
Within the bounds of its supported distributions: Normal, LogNormal, Exponential, Triangular, Uniform. Plus weighted FK sampling for power-law-ish per-parent counts. For more exotic distributions (mixtures of Poisson processes, copula-driven correlations), you will hit the floor.
How do I handle a 100M-row table?
Split into multiple SynthForge generation requests (10M each is the cap), concatenate the CSVs, and bulk-load. Or generate the parent table once and the child table in chunks that each reference the parent IDs.
Does the SynthForge data hit the same query plans as my real data?
Often yes if you match row counts, FK fan-out, and column-value distributions. Run EXPLAIN on the same query against both prod and the load-test DB to verify; if plans diverge, tune the distributions.

Related

Other use cases

Try SynthForge for free

Design a multi-table schema, generate referentially-intact data, and export to your database. No credit card.