A/B test significance & sample size calculator
Free, no signup, no email wall. Runs entirely in your browser — the same statistics ABTestly's results page uses, methodology published.
Two tools: check whether a finished (or running) test is actually significant, and work out how many visitors a future test needs. Both do the math the statistically honest way — no winner below 95% confidence, every lift shown with its confidence interval and sample size, and an automatic sample-ratio-mismatch check so a broken traffic split can't masquerade as a result.
Is my result significant?
Counts should be unique visitors and unique converters (each person once), or the independence assumption behind the test breaks — why that matters.
How many visitors do I need?
4 means 4%.
10 means a 10% relative lift (5.0% → 5.5%), not 10 percentage points (5.0% → 15.0%).
Significance is fixed at 95% (α = 0.05, two-sided). We don’t offer 90% as an alternative — multiple thresholds invite reading whichever number tells the story you want.
How this calculator works
Significance is a two-sided, two-proportion z-test with pooled standard error — the same test behind ABTestly’s Confidence column. The lift is relative, and its 95% confidence interval uses the unpooled standard error exactly as published in how we count results. The SRM check is a chi-square goodness-of-fit test flagged below p = 0.001 (the bar used by large experimentation platforms), and it stays quiet until 500 total visitors so it can’t false-alarm during warm-up. Sample size uses the standard two-proportion power calculation. We refuse to render a verdict when any expected cell of the implied 2×2 falls below 5 visitors — the normal approximation breaks down before then.
This is a fixed-horizon test, so plan once and look once. Repeatedly checking interim results and stopping when one looks significant turns a 5% false-positive rate into 20–40% — that’s the “peeking problem.” Decide your sample size up front (using the card above), run to it, then judge once. More on this from the docs.
Built by ABTestly. Every function on this page is verified against the production implementation by an automated drift-guard suite that imports the worker’s source directly — if the product and this page ever disagreed, our test suite would fail before the page shipped. Nothing you type leaves your browser.
Why this is stricter than most online significance calculators
Significance calculators on the internet vary more than the underlying math should permit. Most are flexible about how they let you read a result; this one isn’t, and that’s deliberate. A calculator’s job is to refuse to tell you what you want to hear when the data doesn’t support it.
| Most online calculators | This one | |
|---|---|---|
| Significance threshold | Often 90% / 95% / 99% selectable, or three rows of decisions | 95% only. Multiple thresholds invite reading whichever number tells the story you want. |
| A 94%-confident result | Often reported as “significant at 90%” by relaxing the threshold | “Below 95% — no verdict.” We won’t let you talk yourself into a winner. |
| Peeking warning | Rarely mentioned; many implicitly encourage continuous checking | One-line caveat on every “no verdict” result, linking the methodology |
| Broken traffic splits (SRM) | Not detected — a 5,000 vs 4,200 test would still report a winner | Auto-checked (chi-square, p < 0.001 — the industry standard) and flagged before any verdict |
| Tiny samples | Will render a confident-looking verdict on 8 visitors | Refuses below 5 expected conversions or non-conversions per arm |
| Difference shown as | Percentage points (+0.12 pp) — undersells real business impact | Relative lift (+25%) with its 95% CI and sample size, always together |
| Math source | Often hand-rolled, or transcribed from a spreadsheet | Test-pinned to the same code that powers ABTestly’s results page |
Where we don’t (yet) match other calculators: we only handle two variants. ABTestly the product is A/B-only today, so adding multivariate (MVT) support here for a product that doesn’t do it would mislead either direction. When MVT lands in the product, it’ll show up here too — with proper Tukey HSD math, not a normal-curve approximation. Sequential testing is the other gap; it’s on the roadmap.
Quick answers
What does statistically significant mean in an A/B test?
A result is statistically significant when it would be unlikely to appear by chance if the variants truly performed the same. This calculator uses a two-tailed, two-proportion z-test and calls a result significant when confidence reaches 95% (p < 0.05) — the standard CRO convention. Below that bar the honest verdict is "still collecting", not "almost a winner".
Why is the threshold fixed at 95%?
Because offering 90% or 99% side by side encourages reading whichever number tells the story you want. ABTestly's product applies the same fixed 95% bar and shows "Still collecting" until a test clears it; the calculator behaves identically.
What is sample ratio mismatch (SRM)?
SRM means the traffic split you observed differs from the split you configured by more than chance allows (chi-square test, flagged below p = 0.001). It usually means something is systematically dropping one arm — a broken redirect, a variant erroring on one browser — and the test's numbers can't be trusted until it's fixed. This calculator checks it automatically from your visitor counts. Full guide.
How many visitors does an A/B test need?
It depends on your baseline conversion rate and the smallest relative lift worth detecting. A worked example: detecting a 10% relative lift on a 4% baseline at 95% confidence and 80% power needs 39,475 visitors per variant. The sample-size calculator above computes it for your numbers.
Found a number you disagree with? Email support@abtestly.com with your inputs — if the calculator is wrong we'll fix it publicly. — The ABTestly team