Skip to main content

📘 Why Am I Seeing Pages I Don’t Recognize Being Crawled on My Site?

Understanding archive and taxonomy pages in your CMS.

Updated this week

If you’ve noticed a large number of unfamiliar pages being crawled on your site—such as URLs with /author/, /category/, /tag/, or /date/—this is actually quite common. These URLs are typically auto-generated by your content management system (CMS), especially if you’re using platforms like WordPress.

This article will help you understand what these pages are, why they’re being crawled, and how to control their visibility.

👣 Step-by-Step Instructions

1. What Are These Pages?

Many CMSs, including WordPress, automatically generate archive pages you may not have manually created:

  • Author Archives – Lists all posts by a specific author (e.g., /author/admin/)

  • Date Archives – Groups content by month or year (e.g., /2023/07/)

  • Category Pages – Lists posts under a specific category (e.g., /category/news/)

  • Tag Pages – Shows content tagged with a specific keyword (e.g., /tag/seo/)

These pages are public by default and can be discovered by any crawler or search engine.

2. Why Are These Pages Being Crawled?

  1. Crawlers scan all publicly accessible pages.

  2. If these archive pages are linked internally (e.g., sidebar, footer, sitemap), they’re considered part of your site’s structure.

  3. Unless deliberately disabled or blocked, they will continue to exist and be crawled.

3. How Can I Control This?

  1. Disable in CMS Settings

    • Many platforms let you turn off author archives, date archives, or other taxonomy pages directly from site settings.

  2. Block Crawling with robots.txt

    • Add rules to disallow specific paths.

    • Add noindex Meta Tags

      • If you want the pages to exist but not appear in search results, configure your site to apply a noindex tag.

      • This lets them be crawled but prevents indexing.

FAQs

❓ Why do these pages exist if I didn’t create them?

💡 Your CMS automatically generates them as part of its default functionality.

❓ Should I block or keep these pages?

⚡ If they don’t serve SEO or user navigation purposes, it’s often best to block or noindex them.

❓ What happens if I disable them?

✅ They’ll no longer be accessible, meaning crawlers and users won’t see them.

Closing Note

✅ These unfamiliar URLs are usually nothing to worry about—they’re simply auto-generated by your CMS. By adjusting settings, updating your robots.txt, or applying meta tags, you can take control over whether they’re crawled or indexed.

Did this answer your question?