Robots.txt and Sitemap Examples for Content Sites

A robots file and a sitemap file serve different jobs. The robots file gives crawler access instructions. The sitemap lists canonical public URLs you want discovered. Keep both simple unless the site genuinely needs complexity.

Safe default robots.txt

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

This is enough for many small content sites. It allows crawling and points to the sitemap.

Blocking private or duplicate paths

Block paths that should not be crawled, such as internal search results, admin URLs, or generated filters. Do not block CSS, JavaScript, or image assets that are needed to render public pages.

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/
Disallow: /preview/
Disallow: /*?sort=

Sitemap: https://example.com/sitemap.xml

Simple sitemap example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-05-30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/articles/structured-data-json-ld-examples/</loc>
    <lastmod>2026-05-30</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Static site generator pattern

A static site can generate the sitemap from the same content registry that generates pages. This avoids stale URLs.

const pages = [
  { path: "/", updated: "2026-05-30", priority: "1.0" },
  { path: "/technical-seo/", updated: "2026-05-30", priority: "0.9" },
  { path: "/articles/robots-txt-sitemap-examples/", updated: "2026-05-30", priority: "0.8" }
];

function absoluteUrl(path) {
  return "https://example.com" + path;
}

function renderSitemap(items) {
  const urls = items.map((item) => {
    return [
      "  <url>",
      "    <loc>" + absoluteUrl(item.path) + "</loc>",
      "    <lastmod>" + item.updated + "</lastmod>",
      "    <priority>" + item.priority + "</priority>",
      "  </url>"
    ].join("\n");
  }).join("\n");

  return "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
    "<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n" +
    urls +
    "\n</urlset>\n";
}

What not to include in a sitemap

Draft pages.
Redirected URLs.
Noindex pages.
Internal search result URLs.
Tracking-parameter versions of the same page.
Thin tag archives that have no unique editorial value.

Validation checklist

/robots.txt returns text/plain or readable text.
The sitemap URL listed in robots.txt is absolute.
Every sitemap URL returns a 200 status.
Sitemap URLs match canonical URLs exactly.
Important pages are internally linked, not only listed in the sitemap.

WordPress robots.txt pattern

A WordPress site usually does not need a restrictive robots file. Allow public content, allow uploaded media, block administrative paths, and point to the sitemap. Avoid blocking theme, plugin, CSS, or JavaScript assets that are needed to render pages.

User-agent: *
Allow: /
Allow: /wp-content/uploads/
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /*?s=

Sitemap: https://example.com/sitemap.xml

Sitemap index example

As a site grows, a sitemap index is easier to maintain than one large file. WordPress SEO plugins often create an index that points to page, post, category, and media sitemaps. That is acceptable as long as each child sitemap lists canonical public URLs.

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/page-sitemap.xml</loc>
    <lastmod>2026-05-31T08:00:00+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/article-sitemap.xml</loc>
    <lastmod>2026-05-31T08:00:00+00:00</lastmod>
  </sitemap>
</sitemapindex>

Troubleshooting crawl access

Symptom	Likely cause	Check
Important page missing from sitemap	CMS page type excluded or no canonical URL.	Open the sitemap child file and search for the slug.
Page crawled but not useful	Thin content or duplicate intent.	Compare the page to the hub and related articles.
Rendered page looks broken	Robots blocks CSS or JavaScript assets.	Review disallow rules for asset folders.
Internal search pages appear	Search query URLs are crawlable and linked.	Block search result patterns and avoid linking to them.

References

robots.txt vs noindex vs canonical

These controls are often confused. Use the right one for the job. A robots rule can stop crawling, but it does not make weak public content stronger. A canonical tag can identify a preferred duplicate, but it should not be used to avoid merging pages that serve the same task.

Need	Use	Example
Keep admin pages out of crawler paths.	robots.txt disallow.	`Disallow: /wp-admin/`
Keep a public utility page out of search results.	meta robots noindex.	`<meta name="robots" content="noindex, follow">`
Handle tracking-parameter duplicates.	Canonical tag.	Canonical to the clean article URL.
Replace an old URL permanently.	301 redirect.	Redirect old guide to the new canonical guide.

Submission-ready checklist

The sitemap index or sitemap file returns 200.
Each listed URL is canonical and returns 200.
No draft, preview, search result, or redirected URL is listed.
robots.txt does not block CSS, JavaScript, images, or public article paths.
The sitemap URL in robots.txt exactly matches the submitted sitemap URL.

Practical rollout notes

Use this guide when the site is ready to expose a new section or when a CMS/plugin has changed crawler access output. The goal is simple discovery, not a clever robots file.

Acceptance criteria

Page: Robots.txt and Sitemap Examples for Content Sites
Reader task: clear in the introduction
Implementation proof: examples, tables, commands, or checklist present
Trust proof: dates, author or publisher context, and source links where needed
Maintenance proof: revisit trigger documented

robots.txt allows public content and required assets.
The sitemap URL listed in robots.txt returns 200.
Every sitemap URL is canonical and indexable.
Search, admin, preview, and duplicate utility paths are handled deliberately.

When to revisit

Revisit after adding a new content type, changing permalink structure, installing a sitemap plugin, or moving from a static host to WordPress.