Technical SEO
Robots.txt and Sitemap Examples for Content Sites
Practical robots.txt and sitemap.xml examples for technical content sites, including safe defaults, private paths, static generators, and validation checks.
A robots file and a sitemap file serve different jobs. The robots file gives crawler access instructions. The sitemap lists canonical public URLs you want discovered. Keep both simple unless the site genuinely needs complexity.
Safe default robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
This is enough for many small content sites. It allows crawling and points to the sitemap.
Blocking private or duplicate paths
Block paths that should not be crawled, such as internal search results, admin URLs, or generated filters. Do not block CSS, JavaScript, or image assets that are needed to render public pages.
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/
Disallow: /preview/
Disallow: /*?sort=
Sitemap: https://example.com/sitemap.xml
Simple sitemap example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-30</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/articles/structured-data-json-ld-examples/</loc>
<lastmod>2026-05-30</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Static site generator pattern
A static site can generate the sitemap from the same content registry that generates pages. This avoids stale URLs.
const pages = [
{ path: "/", updated: "2026-05-30", priority: "1.0" },
{ path: "/technical-seo/", updated: "2026-05-30", priority: "0.9" },
{ path: "/articles/robots-txt-sitemap-examples/", updated: "2026-05-30", priority: "0.8" }
];
function absoluteUrl(path) {
return "https://example.com" + path;
}
function renderSitemap(items) {
const urls = items.map((item) => {
return [
" <url>",
" <loc>" + absoluteUrl(item.path) + "</loc>",
" <lastmod>" + item.updated + "</lastmod>",
" <priority>" + item.priority + "</priority>",
" </url>"
].join("\n");
}).join("\n");
return "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
urls +
"\n</urlset>\n";
}
What not to include in a sitemap
- Draft pages.
- Redirected URLs.
- Noindex pages.
- Internal search result URLs.
- Tracking-parameter versions of the same page.
- Thin tag archives that have no unique editorial value.
Validation checklist
/robots.txtreturns text/plain or readable text.- The sitemap URL listed in robots.txt is absolute.
- Every sitemap URL returns a 200 status.
- Sitemap URLs match canonical URLs exactly.
- Important pages are internally linked, not only listed in the sitemap.
WordPress robots.txt pattern
A WordPress site usually does not need a restrictive robots file. Allow public content, allow uploaded media, block administrative paths, and point to the sitemap. Avoid blocking theme, plugin, CSS, or JavaScript assets that are needed to render pages.
User-agent: *
Allow: /
Allow: /wp-content/uploads/
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*?s=
Sitemap: https://example.com/sitemap.xml
Sitemap index example
As a site grows, a sitemap index is easier to maintain than one large file. WordPress SEO plugins often create an index that points to page, post, category, and media sitemaps. That is acceptable as long as each child sitemap lists canonical public URLs.
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/page-sitemap.xml</loc>
<lastmod>2026-05-31T08:00:00+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/article-sitemap.xml</loc>
<lastmod>2026-05-31T08:00:00+00:00</lastmod>
</sitemap>
</sitemapindex>
Troubleshooting crawl access
| Symptom | Likely cause | Check |
|---|---|---|
| Important page missing from sitemap | CMS page type excluded or no canonical URL. | Open the sitemap child file and search for the slug. |
| Page crawled but not useful | Thin content or duplicate intent. | Compare the page to the hub and related articles. |
| Rendered page looks broken | Robots blocks CSS or JavaScript assets. | Review disallow rules for asset folders. |
| Internal search pages appear | Search query URLs are crawlable and linked. | Block search result patterns and avoid linking to them. |
References
robots.txt vs noindex vs canonical
These controls are often confused. Use the right one for the job. A robots rule can stop crawling, but it does not make weak public content stronger. A canonical tag can identify a preferred duplicate, but it should not be used to avoid merging pages that serve the same task.
| Need | Use | Example |
|---|---|---|
| Keep admin pages out of crawler paths. | robots.txt disallow. | Disallow: /wp-admin/ |
| Keep a public utility page out of search results. | meta robots noindex. | <meta name="robots" content="noindex, follow"> |
| Handle tracking-parameter duplicates. | Canonical tag. | Canonical to the clean article URL. |
| Replace an old URL permanently. | 301 redirect. | Redirect old guide to the new canonical guide. |
Submission-ready checklist
- The sitemap index or sitemap file returns 200.
- Each listed URL is canonical and returns 200.
- No draft, preview, search result, or redirected URL is listed.
- robots.txt does not block CSS, JavaScript, images, or public article paths.
- The sitemap URL in robots.txt exactly matches the submitted sitemap URL.
Practical rollout notes
Use this guide when the site is ready to expose a new section or when a CMS/plugin has changed crawler access output. The goal is simple discovery, not a clever robots file.
Acceptance criteria
Page: Robots.txt and Sitemap Examples for Content Sites
Reader task: clear in the introduction
Implementation proof: examples, tables, commands, or checklist present
Trust proof: dates, author or publisher context, and source links where needed
Maintenance proof: revisit trigger documented
- robots.txt allows public content and required assets.
- The sitemap URL listed in robots.txt returns 200.
- Every sitemap URL is canonical and indexable.
- Search, admin, preview, and duplicate utility paths are handled deliberately.
When to revisit
Revisit after adding a new content type, changing permalink structure, installing a sitemap plugin, or moving from a static host to WordPress.

