SEO A/B testing for developers
You know how to A/B test a checkout flow: split users 50/50, ship the variant to half of them, measure the lift. Then you try to A/B test an SEO change the same way — and quietly break your rankings. SEO split testing works, but the unit you randomize is not the user. It's the page.
Conversion-rate testing and SEO testing look like the same problem, so engineers reach for the same tool. They shouldn't. The whole method has to change, because the visitor you're optimizing for is Googlebot, and Googlebot is one user. You can't show it version A on Tuesday and version B on Wednesday and call that a test. You'll either measure noise or get flagged for cloaking.
Here's the version of A/B testing that actually holds up for organic search — and the honest limits of when you can run one at all.
The unit of randomization is the page, not the user
A CRO test splits people: a cookie or a feature flag sends user 1 to the control and user 2 to the variant, and you compare conversion between the two cohorts. That works because you have thousands of independent users to randomize over.
In organic search you have effectively one user whose behavior you're trying to change — the crawler that decides your rankings. You can't split it into cohorts. So instead of splitting users across one page, an SEO test splits pages across one change:
- Take a large group of structurally similar pages served by the same template — product pages, location pages, blog posts, category pages.
- Randomly assign each page to a control group or a variant group.
- Apply the change (a new title format, added structured data, a copy block, an internal-linking module) to the variant group only, at the template or server level, so every visitor and the crawler sees the same thing on a given page.
- Measure the difference in organic performance between the two groups.
This only works if you have the pages for it. A test needs a group large enough to carry statistical signal — typically dozens to hundreds of comparable URLs per side. That's the first hard constraint, and it's the one that decides whether you can run a clean test at all.
Why you can't just split the users (the cloaking trap)
The reason this matters isn't pedantic. Serving different content based on who's asking is exactly the behavior Google's spam policies define as cloaking — showing the crawler something other than what users get. Search bots can't be reliably cookied, so a user-split tool tends to feed Googlebot a random mix of variants across visits, which is both unmeasurable and, if it diverges from what humans see, penalizable. Google has been explicit that cloaking can get a site demoted or removed from results.
That's why SEO split tests are implemented server-side, per page — never per request based on the visitor. Each variant-group URL renders one consistent version for everyone. You're not hiding anything from the crawler; you're changing a real subset of pages and watching what happens.
Measuring the result: you forecast the counterfactual
Here's where it stops looking like a t-test on two conversion rates. You can't just compare "variant clicks" to "control clicks," because the two page groups never had identical traffic to begin with, and organic traffic drifts with seasonality, demand, and algorithm updates the whole time your test runs.
So the question you actually answer is counterfactual: what would the variant group's traffic have done if you'd changed nothing — and how far did reality diverge from that?
The standard tool is Google's own CausalImpact, built on Bayesian structural time-series models (Brodersen et al., 2015). You feed it the variant group's metric (clicks, say) as the response series and the control group as a covariate. Before the change, the two move together. The model learns that relationship, then — after the change — projects where the variant should have been based on the still-unchanged control, and measures the gap.
Two numbers decide whether you believe it. You want 95% confidence (less than a 5% chance of seeing this gap if the change did nothing), and you have to let the test run long enough — usually 2–4 weeks — for the crawl-and-reindex lag to play out and the interval to tighten. Call a winner on day three and you're reading provisional data and weekend seasonality, not a result.
CRO test vs. SEO split test, side by side
| CRO A/B test | SEO split test | |
|---|---|---|
| What you randomize | Users | Pages (a template's URL set) |
| Who you optimize for | Many human visitors | One crawler (Googlebot) |
| How variants are served | Per user, client-side / flag | Per page, server / template level |
| Splitting by visitor is | The whole method | Cloaking — a policy violation |
| How you measure | Conversion rate, A vs B | Variant vs forecast counterfactual |
| Typical time to read | Hours to days | 2–4 weeks (crawl + index lag) |
| Minimum to run one | Enough traffic | Enough similar pages |
When you can't run a split test at all
The catch sits in that last row. A clean SEO split test needs a large set of interchangeable pages. Plenty of the changes you most want to validate don't have that:
- Your homepage, your top three money pages, a single high-intent landing page — there's only one of each, so there's no group to split.
- Sitewide changes — a new nav, a performance fix, a global schema rollout — hit every page at once, so there's no control group left.
For those, you can't randomize, and the gold-standard experiment is off the table. What's left is the quasi-experimental version of the same idea: ship the change to everything, then build the counterfactual from the page's own pre-change trend (and unaffected pages as covariates) and measure the divergence after the crawl lag. It's weaker than a randomized split test — you're trusting a model of "what would have happened" instead of holding out a real control — but applied honestly, with the same confidence-interval discipline, it's how you attribute a change you couldn't randomize.
That observational case is the common one for most teams, and it's exactly the method Code Results automates: it runs changepoint detection on your Search Console history, builds the no-change counterfactual, and lines each ranking shift up against the pull requests that landed before it — so even when you can't run a textbook split test, "did that change actually move us?" has a real answer instead of a guess.
See which of your PRs actually moved rankings.
Code Results connects your GitHub deploys to Google Search Console with causal attribution — so you stop guessing which code change moved organic search, and start measuring it.
Start for freeNext.js SEO: the deploy-level changes that move rankings
A client-only SPA fails SEO loudly — the crawler gets an empty shell. Next.js fails it quietly: it renders on the server by default, so a one-line diff can flip a whole route subtree to dynamic, delete a page's metadata, or push its tags out of Google's first crawl — with no error and no failing test. The deploy-level regressions to check by sight.
Shipping JSON-LD structured data from your codebase
Structured data is the rare SEO task that belongs to engineers — it ships with your build. But it is an eligibility signal, not a ranking boost, and Google has been retiring rich-result types, not adding them. What JSON-LD actually buys you in 2026, and how to generate it from your app so it never drifts out of sync.
Technical SEO for React and Vite SPAs Googlebot actually rewards
Googlebot runs JavaScript — but "can render" and "reliably indexes" are different promises, and the gap is where a React SPA leaks traffic. The two-wave indexing pipeline, why dynamic rendering is dead, and the SPA-specific mistakes (hash routing, soft 404s, onClick nav) that silently tank rankings.