Introduction
A few months ago I stumbled on something that made me feel like, “wait, what?” one of our testing sites was showing up in Google search results. It wasn’t supposed to be public, it was a playground for QA and developers, but there it was, indexed and discoverable.
That little surprise led to a deep-dive into how search engines find non-production sites, why that’s risky, and how to stop it from happening again. The short version: the fix needs to start at the server, not in a spreadsheet or a PR comment.
In this post I’ll walk through what happened, why it matters, and a practical, prioritized checklist you can use to protect your staging and testing environments.
How environments are typically organised
Most projects don’t run on a single site. Common environments include:
- Development — where developers iterate and experiment.
- QA — for testers to validate features and find bugs.
- UAT — stakeholders and product owners double-check functionality.
- Pre-production — a final dress rehearsal before release.
- Production — the site your users actually visit.
It’s easy to assume crawlers will only index your production site. They won’t. If a URL is publicly reachable, search engines can and will crawl it.
How Google found our testing site
There are a few common ways a non-production URL ends up indexed:
- Someone shares a link in chat or a public forum.
- A sitemap accidentally includes staging URLs.
- Third-party services or monitoring tools poke the site.
- Crawlers discover links from other sites or from previous exposure.
In our case it was a combination of an exposed preview link and an openly accessible server. One quick search you can use to check indexing is:
site:staging.your-domain.com
Why this is a problem?
A seemingly small leak can cause several practical headaches:
- Unfinished features go public: Test pages can reveal prototypes or pre-release content.
- Duplicate content: Search engines may be confused about which version should rank.
- Confused users: Visitors may land on a broken or incomplete experience.
- Security posture: Even harmless endpoints can expose internal patterns and URLs.
Step 1 — fix the root cause on the server
If you only ask Google to remove URLs while the site remains crawlable, they’ll often come back. The most reliable first step is to prevent crawling at the infrastructure level. With Nginx we added this header:
server {
# Prevent search engine indexing
add_header X-Robots-Tag "noindex, nofollow" always;
}
The nice thing about an X-Robots-Tag is it applies regardless of your front-end framework — Angular, React, or a plain static site — since it’s a header served by the server.
Step 2 — verify ownership in Google Search Console
After you’ve blocked crawling, verify the environment in Google Search Console so you can request removals and track progress. The quickest verification method is an HTML tag, for example:
<meta name="google-site-verification" content="verification-code" />
Drop that into your head (for Angular put it in src/index.html), deploy, then complete verification in the console.
Step 3 — request URL removal
With verification in place, use the Removals tool in Search Console:
- Open Search Console.
- Go to Removals and create a new request.
- Enter the exact URL or a prefix to remove multiple pages.
- Submit and monitor until Google confirms removal.
Step 4 — add front-end safeguards
As a belt-and-suspenders approach, add a robots meta tag to pages in non-production builds:
<meta name="robots" content="noindex,nofollow" />
This helps during short windows when server headers might not be fully rolled out or when a preview URL is temporarily exposed.
Step 5 — restrict access where possible
The most effective strategy is to keep non-production sites off the public internet entirely. Common options include:
- VPN-only access
- Single sign-on (SSO) protection
- Basic auth for quick protection
- IP whitelisting for internal teams
If crawlers can’t access the site, they can’t index it.
Don’t forget production
While we lock down non-production environments, make sure your production site is discoverable: verify it in Search Console, publish a clean XML sitemap, and keep robots.txt and meta tags configured correctly.
Conclusion
Finding a staging site in search results was a reminder that search engines treat every reachable URL the same. The fix is pragmatic: start at the infrastructure level, verify ownership, request removals, then add front-end and access controls to avoid repeats.