Blog · May 14, 2026

Why typing 100, 100, 100, 100 across every table got old

We just shipped per-relationship cardinality ratios in SynthForge. You can now say '8 to 12 patients per doctor' instead of hand-computing row counts for every child table. Plus auto-saved defaults so last generation's settings come back next time.

features synthetic-data engineering

The friction we couldn’t unsee

Pick any non-trivial schema. Doctors, patients, visits, prescriptions. Or customers, orders, line items, refunds. Every time you went to generate data, you’d type a row count for each table by hand.

Some teams pick round numbers - 100 doctors, 1000 patients, 5000 visits - and call it close enough. The math is wrong (each doctor would have 10 patients on average, but each patient would somehow have 5 visits, which only works if patients see a single doctor across all those visits). The shape of the data is off, but for a sanity-check seed, it doesn’t matter.

It does matter when you’re building load-testing data. Or training data. Or anything where the cardinalities matter to the system you’re testing. A query plan that handles “1 line item per order” beautifully can melt down on “200 line items per order.” Your tests have to look like production, and production has structure: each order has somewhere between 1 and 50 line items, with most clustering around 3-5. Each user has 0-3 active subscriptions. Each warehouse has 50-500 SKUs.

You knew this. You’d open a calculator, multiply through, and type the result into the row count box. Then you’d realize the AI agent emitted six tables instead of four and you’d start over.

What we just shipped

The SynthForge dataset generation form now lets you toggle any child table - any table that has a foreign key to a parent - from a fixed row count to a per-parent ratio.

It looks like this:

doctors:        50            (fixed)
patients:       8 to 12 per doctor      (~500 rows projected)
visits:         1 to 5 per patient      (~1,500 rows projected)
prescriptions:  0 to 3 per visit        (~2,250 rows projected)

You set the parents. SynthForge derives the children, in topological FK order. Each parent picks a uniform-random multiplier inside your range, so you get organic variation - one doctor has 9 patients, another has 11, none of them have exactly the same. Live preview shows the projected total as you type.

Multi-parent ratios sum. So if you have a junction table for many-to-many relationships - say appointments linking doctors and patients - you can set “20 appointments per doctor” AND “5 appointments per patient”, and SynthForge produces a junction table that respects both axes (50 doctors x 20 + 1000 patients x 5 = 6,000 appointments). The math compounds the way you’d expect.

Top-level tables - the ones with no foreign keys to anything - stay as plain row count inputs. No new mental model for the simple case.

Auto-saved defaults

The other thing that quietly disappeared: re-typing the same numbers every generation.

Every successful dataset job now persists its row counts and ratios as the schema’s defaults. The next time you open the generation form for that schema, the inputs come back exactly the way you left them. Different browser, different machine, different month - the defaults live on the schema record, not in your local storage.

You override anything before regenerating, the new values overwrite the defaults, the cycle continues. The form has memory now.

What this took, briefly

The runtime side wasn’t actually new work. SynthForge already walked tables in topological FK order during generation - it had to, so children could read their parents’ generated IDs. We just taught it to also resolve row counts in that same order: walk parents first, look up their resolved count, multiply by the ratio, write the child’s count.

The shape of the input - {"parent_table.child_table": {"min": N, "max": M}} - and the database columns to store it had been part of the data model since day one of the Cloudflare rewrite, sitting unused. It was originally ported from the legacy Django app’s contract for completeness, then quietly waiting for someone to wire it through.

The interesting work was the user surface. The naive UI is a separate “Relationships” panel below the row counts. The version we shipped puts the toggle inline next to each child table, so you don’t have to keep two mental models. We also auto-detect FK pairs from the schema so the parent picker only shows real options.

Persistence is auto-save on every successful generation. We considered a “remember as default” checkbox but decided it added a click for behavior that’s never wrong - if your last generation worked and you’re regenerating, you almost always want the same shape.

The hardest design call was precedence. If a child has both an explicit row count AND a ratio against its parent, which wins? We picked: ratio wins. The mental model is “I set ratios, the system picks counts.” If you don’t want a ratio, switch the toggle off. Trying to honor both at once would surface confusing edge cases (what if the explicit count contradicts the ratio’s range?) and add no real expressiveness.

What’s not in this release

Two things that some users will ask for and we deliberately deferred:

Conditional ratios. “10 to 30 line items per order, but only 1 to 5 if the order is a Refund.” Useful, but it’s really a derived-rule shape, not a cardinality shape, and we’d want to design the UI carefully. Open issue.

Ratio histograms. Right now the per-parent multiplier is uniform random within your range. A more realistic distribution would be something like LogNormal (most orders have 3 items, a few have 50). We have LogNormal for column values; cross-walking it into cardinality is straightforward when the demand surfaces.

Try it

app.synthforge.io. Pick any schema with foreign keys (or use the AI agent to make one). Open Forge dataset, find any child table, hit the dropdown, switch to “per parent row.” Generate. Open the form again - your inputs are still there.

It’s free. No credit card. ~1M rows per table in minutes (10M hard cap), 5 GB stored per account. The full feature set is on the data generation page.

Ready to get started?

Multi-table foreign-key integrity, AI schema design, seven SQL dialects, no credit card.

Launch App