Skip to main content
All CollectionsKnowledge Base
Site Audit Knowledge Base
Site Audit Knowledge Base
Updated over 3 weeks ago

Understanding Site Audits

A site audit is a comprehensive analysis of your website's technical SEO health and performance. Using automated crawlers, it systematically examines your website's pages to identify issues that could impact your search engine rankings, user experience, and overall site performance. The crawler acts like a search engine bot, visiting your pages and collecting data about various technical aspects such as load speed, meta tags, broken links, and content quality.

Setting Up Your Site Audit

Follow these steps to configure your site audit for optimal results:

1. Enter Your Website URL

  • Input the complete URL of the website you want to audit (e.g., https://www.example.com)

  • Verify that the URL includes the correct protocol (http:// or https://)

2. Configure Crawl Depth

  • Determine the total number of pages you want the crawler to analyze

  • Recommended Setting: Set the limit to your total number of pages plus 10%

    • Example: If your site has 1000 pages, set the limit to 1100

  • This buffer ensures complete coverage and accounts for any new pages

3. Set Crawl Frequency

  • Choose how often you want the crawler to audit your site

  • Minimum Recommended Frequency: Monthly

  • For Critical Pages: Weekly

  • Factors to consider when setting frequency:

    • How often your content changes

    • Site size and complexity

    • Resource allocation

4. Choose User Agent

  • Select the crawler's user agent to simulate specific search engine behavior

  • Recommended Options:

    • Our custom Searchatlas user agent

    • Googlebot Mobile

5. Adjust Crawl Speed

  • Set the pace at which the crawler analyzes your pages

  • Key Considerations:

    • Faster crawls: Quick overview of major issues

    • Slower crawls: More detailed analysis and deeper insights

    • Server capacity and bandwidth limitations

  • Important: A slower, more thorough crawl typically yields more detailed and accurate results

6. URL Exclusion Conditions (Beta)

  • Configure which parts of your website should be excluded from the crawl

  • Helps optimize crawl budget and focus analysis on relevant pages

  • Two primary configuration options:

    • URL Exclusion Rules

      • Exclude URLs containing specific terms

      • Example: Excluding "/blog" will prevent the crawler from analyzing any URLs containing this path

      • Useful for skipping sections like admin pages, author archives, or specific categories

    • URL Inclusion Exceptions

      • Add specific exceptions to your exclusion rules

      • Example: If you've excluded "/blog", you can add "/blog/*" to include all blog posts while still excluding the main blog page

      • Allows for granular control over which pages are analyzed

Remember to periodically review and adjust these settings as your website grows and evolves. Regular audits help maintain optimal site performance and identify potential issues before they impact your search engine rankings.

Crawl Monitoring [Beta]

The Crawl Monitoring feature provides real-time tracking and analysis of how search engines and AI bots interact with your website. This tool helps you optimize your site's visibility and ensure efficient indexing by monitoring crawler behavior, distribution, and patterns.

To access Crawl Monitoring, navigate to the Site Audit dashboard and select the Crawl Monitoring tab.

Dashboard Components:

  • Crawler Distribution Table

    • Shows active crawlers (e.g., Google, Bing, Google-Mobile)

    • Displays total requests per crawler

    • Includes interactive graphs for Historical Crawl Activity visualization

  • Historical Crawl Activity Graph

    • Presents crawler activity over time

    • Color-coded by crawler type

    • Switchable between Daily/Weekly/Monthly views

    • Hover tooltips show detailed metrics for specific dates

Key Metrics:

  • Site Indexation Percentage

    • Visual representation of crawled vs. uncrawled pages

    • Updates in real-time as crawlers access your site

    • Helps identify indexing gaps and opportunities

  • Crawl Purpose Analysis

    • Distinguishes between discovery and refresh crawls

    • Discovery: These are crawls where search engines find and index new pages on your site for the first time

    • Refresh: These are crawls where search engines revisit already-known pages to check for updates

    • Shows percentage distribution of crawler intentions

    • Helps understand how search engines are processing your content

  • Device Distribution

    • Breaks down crawler activity by device type (Desktop/Mobile)

    • Helps ensure proper resource allocation for different user agents

  • Crawl Frequency Metrics

    • Shows activity patterns across different timeframes:

      • Last 7 days

      • Last 30 days

      • Last 6 months

      • Last 1 year

    • Helps identify trends and patterns in crawler behavior

Best Practices:

  • Regularly monitor the "Not Crawled" percentage to identify potential technical issues

  • Use the device distribution data to inform mobile optimization efforts

  • Track crawl frequency patterns to optimize content update schedules

  • Monitor crawler distribution to ensure balanced visibility across search engines

Content Velocity [Beta]

Content Velocity is an analytical tool that measures and tracks your website's publication patterns. By analyzing your sitemap data, it helps you understand and optimize your content publishing strategy by providing detailed temporal insights.

To access Content Velocity, navigate to the Site Audit dashboard and select the Content Velocity tab.

Dashboard Components:

  • URL Publication Tracking

    • Lists all published URLs

    • Provides publication timestamps

    • Includes direct links to content

    • Allows for URL selection and filtering

  • Publication Frequency Analysis

    • Daily View

      • Interactive graph with hover details

      • Highlights peak publishing days

      • Helps identifying publishing gaps

    • Monthly Distribution

      • Bar chart showing monthly publication volume

      • Color-coded by content categories

    • Yearly Overview

      • Pie chart displaying yearly publication totals

      • Quick comparison between years

      • Publication growth indicators

Keep in mind:

  • The tool updates daily based on your sitemap

  • Historical data is maintained for comprehensive trend analysis

  • Data can be filtered by sitemaps and/or date ranges

Best Practices:

  • Monitor publication consistency to maintain steady content flow

  • Use historical data to plan future content schedules

  • Identify successful publishing patterns

  • Track seasonal content trends

  • Analyze the impact of publishing frequency on site performance


Whitelisting Search Atlas IP in CDNs

If you want to run a site audit and monitor your site health using Search Atlas, sometimes our site crawlers get blocked by CDNs.

Content Delivery Networks (CDNs) are a group of servers that help deliver website content faster. If your website relies on a CDN to increase its performance, then the CDN’s firewall may block our crawler from accessing your site.

Resolving this issue is straightforward and requires whitelisting the Search Atlas site crawler.

Search Atlas uses one of the following IP addresses to monitor websites:

  • 168.151.102.206

  • 161.123.88.40

  • 161.123.75.228

  • 161.123.90.214

  • 161.123.81.184

  • 168.151.116.203

  • 168.151.138.43

  • 161.123.105.249

  • 161.123.77.229

  • 161.123.107.109

  • 161.123.77.171

  • 161.123.90.252

  • 209.95.170.40

  • 168.151.116.237

  • 161.123.78.218

  • 161.123.104.45

  • 161.123.85.66

  • 161.123.82.188

  • 209.95.171.24

  • 161.123.83.48

You will need to whitelist all of these IP addresses to run successful SEO audits via Search Atlas.

FAQs

  1. Which user agent should I use? We recommend using Googlebot Mobile. This is because Google indexes the mobile version of your site first (mobile-first indexing) and most organic traffic comes from mobile devices. As a result, technical errors in a mobile environment are usually more important. However, if you experience issues crawling, feel free to try any one of our other user agents, like Search Atlas!

  2. How often should I recrawl the site? We recommend monthly crawls at a minimum. Although, it's a good practice to set up more frequent crawls on priority pages, like the home page and main navigation pages. Priority pages we recommend crawling weekly up to daily depending on how frequently the website is updated.

  3. Why is the crawl budgeting and what should I set? We suggest calculating it based on the total pages of website + 10%

  4. Why is the Site Audit is still showing issues I fixed with OTTO SEO? SiteAudit doesn't have Javascript rendering, therefore it won't reflect the changes done through OTTO.

  5. What is the reason why not all the pages were audited? Sometimes, the website owner blocks the pages to crawler using robot.txt file or meta robots tag or server-side restriction. Please make sure to whitelist our crawler to avoid this issues.

  6. Why am I getting the error “Crawl budget setting too high”? This could happen if the Site Audit quota has already expired

  7. Why did the crawler only find one page on my site? If our crawler found only one page, it's often due to a lack of outgoing internal links from your homepage. This can happen for several reasons:

    1. JavaScript Rendering: Our Site Audit tool currently doesn’t support JavaScript rendering. If your site’s navigation relies on JavaScript, the crawler may be unable to locate and follow links beyond the homepage.

    2. Sitemap Issues: If your sitemap is incomplete or outdated, it may not accurately reflect all the pages on your site. This can leave the crawler without links to follow to additional pages.

    3. Blocked Resources: Check that our crawler isn’t blocked from accessing internal links by server-side settings like Cloudflare’s “Bot Fight Mode”, here’s an article on how to disable those .

  8. Why is the crawl monitoring of my site audit empty? If your site audit's crawl monitoring shows no data, this is typically caused by one of these common issues:

    1. Missing OTTO Script: The script hasn't been installed on your website

    2. Outdated Version: You're running an older version of OTTO script that needs updating

    3. Implementation Issues: The script might not be properly implemented across all pages

      For step-by-step solutions to these issues, follow our installation guide

Glossary

  • Depth: Page depth is a metric that measures how many clicks it takes for a user to get from the home page to the next page on the site.

  • Site Health score: Measured out of 1000 points. 1000 being the most optimal result. This is a metric to help the user understand their score and how to improve the technicals of a website. Anything above 800 is acceptable.

  • Site Health changes: It’s a graphic way to check if the score of the site health is improving or worsening over time.

  • All page changes: you can track when improvements are done, also track what pages were added, changed, redirected or removed.

  • Total issue changes: track how many of the issues found have been addressed.

  • All Page types: Check the status of the page in the code.

  • Site Indexability: Track what pages are indexable and what pages are not.

  • Chrome User Experience Report: The Chrome User Experience Report (CrUX) provides user experience metrics for how real-world Chrome users experience data on millions of websites.

  • Pages to Crawl: select how many pages you want to crawl at once, it is set up to crawl the first 100 pages, and, if a domain exceeds that the number can be adjusted.

  • Crawl Speed: define how long you want the crawler to spend analyzing the website. It is set up to 20 pages per second. The more time spent crawling the more granular the data could be.

  • Crawl frequency: this is the frequency set to update the data with a re-crawl. It is set up to re-crawl every 7 days and you can adjust it.

  • Page: The exact URL of the page analyzed.

  • Type: A page type in Compose is a content type which defines a custom field structure for a web page or any other composition of content. Some typical examples for page types are “Landing page”, “Homepage”, “Product page” or “Blog post”.

  • Importance: With the previous metrics we give the user an estimate of how important a page is, the closer it is to the home page the more important it tends to be

  • Page Health: An estimate of how healthy that page is from 0 to 1000. Depends on the amount of issues found in that specific page.

  • HTTPS: Hypertext transfer protocol secure (HTTPS) is the secure version of HTTP, which is the primary protocol used to send data between a web browser and a website.

  • Status: What’s the code status of the page, 200 tends to signify a stable page. Status like 400 or 500 tend to signify issues.

  • Title: Here the user can find the exact title given to the page.

  • Meta Description: A meta description tag generally informs and interests users with a short, relevant summary of what a particular page is about. They are like a pitch that convince the user that the page is exactly what they're looking for.

  • H1: Here the user can find the exact H1 present on the page.

  • Indexable: This is the metric that shows that due to the amount of issues present in a page, whether that makes the page indexable or not.

  • In XML Sitemap: Whether the page in question is present in the site map of the domain.

  • Incoming Internal Links: Internal links are hyperlinks that point to different pages on the same website. These differ from external links, which link to pages on other websites.

  • HREFLANG: is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to tell search engines like Google about these variations. This helps them to serve the correct version to their users.

  • Schema Org Types: Schema.org is defined as two hierarchies: one for textual property values, and one for the things that they describe. This is the main schema.org hierarchy: a collection of types (or "classes"), each of which has one or more parent types.

  • View button: When clicked it takes the user to the page insights where they can analyze specific pages and see all the issues that affect them.

  • Create segment: this allows the user to filter and divide the data however they prefer. Add new column filters from within the table, sort columns and include a certain amount of pages on their segments.

  • Share Audit: via URL.

  • GSC/GA: connect both Google integrations to enrich the data.

  • Export: as an xls file.

  • Manage Columns: Select and reorder the columns you want to see.

Did this answer your question?