Understanding Site Audits

A site audit is a comprehensive analysis of your website's technical SEO health and performance. Using automated crawlers, it systematically examines your website's pages to identify issues that could impact your search engine rankings, user experience, and overall site performance. The crawler acts like a search engine bot, visiting your pages and collecting data about various technical aspects such as load speed, meta tags, broken links, and content quality.

To ensure OTTO crawls all your website's pages effectively, it is critical to configure the Site Audit before creating OTTO projects. Performing the Site Audit first helps the crawler capture all available pages accurately, avoiding incomplete crawls during project execution. Always prioritize this order of operations for optimal results.

Setting Up Your Site Audit

Follow these steps to configure your site audit for optimal results:

Discrepancies between the number of crawled and total pages in a Site Audit report often stem from default configuration settings. By default, the audit crawls only a limited number of pages, such as one page, unless you adjust the settings to include more.

If you want to crawl or audit specific pages, both the Site Audit and OTTO tools allow you to configure crawlers to include or exclude certain URLs. This functionality is particularly useful for auditing only relevant sections of a large website.

To further optimize your crawls, exclude specific pages such as author pages or outdated URLs. For example, configure URL exclusions to avoid crawling non-relevant sections or redundant pages. Address '404 Not Found' pages by adding these URLs to the exclusion list to improve crawling accuracy.

1. Enter Your Website URL

Input the complete URL of the website you want to audit (e.g., https://www.example.com)
Verify that the URL includes the correct protocol (http:// or https://)

2. Configure Crawl Depth

Determine the total number of pages you want the crawler to analyze
Recommended Setting: Set the limit to your total number of pages plus 10%
- Example: If your site has 1000 pages, set the limit to 1100
This buffer ensures complete coverage and accounts for any new pages

3. Set Crawl Frequency

Choose how often you want the crawler to audit your site
Minimum Recommended Frequency: Monthly
For Critical Pages: Weekly
Factors to consider when setting frequency:
- How often your content changes
- Site size and complexity
- Resource allocation

4. Choose User Agent

Select the crawler's user agent to simulate specific search engine behavior
Recommended Options:
- Our custom Searchatlas user agent
- Googlebot Mobile

5. Adjust Crawl Speed

Set the pace at which the crawler analyzes your pages
Key Considerations:
- Faster crawls: Quick overview of major issues
- Slower crawls: More detailed analysis and deeper insights
- Server capacity and bandwidth limitations
Important: A slower, more thorough crawl typically yields more detailed and accurate results

6. Crawl Settings

Robots.txt Behavior

Follow Robots.txt (Recommended)
- Mirrors Google's crawling behavior
- Respects your site's crawling directives
- Provides accurate search engine perspective
- Ideal for SEO-focused audits
Ignore Robots.txt
- Crawls all accessible pages
- Identifies issues in restricted sections
- Comprehensive technical analysis
- Useful for complete site debugging

JavaScript Rendering (Beta)

Enable JS Rendering
- Detects dynamically loaded content
- Identifies JavaScript-based modifications
- Verifies OTTO fix implementations
- Analyzes client-side rendered elements

7. URL Exclusion Conditions (Beta)

Configure which parts of your website should be excluded from the crawl
Helps optimize crawl budget and focus analysis on relevant pages
Two primary configuration options:
- URL Exclusion Rules
  - Exclude URLs containing specific terms
  - Example: Excluding "/blog" will prevent the crawler from analyzing any URLs containing this path
  - Useful for skipping sections like admin pages, author archives, or specific categories
- URL Inclusion Exceptions
  - Add specific exceptions to your exclusion rules
  - Example: If you've excluded "/blog", you can add "/blog/*" to include all blog posts while still excluding the main blog page
  - Allows for granular control over which pages are analyzed

Handling '404 Not Found' Pages
Exclude non-functional '404' URLs using the Site Audit tool to avoid them in future audits.
Perform a recrawl after exclusions to verify results and ensure data accuracy.
Minimize the impact of outdated support links by relying on the latest resources from the Search Atlas Knowledge Base.

Excluding Pages in OTTO

Excluding specific pages from being crawled or optimized in OTTO is a straightforward process. This feature is particularly useful for users aiming to customize their website’s accessibility and management through OTTO, a part of the Search Atlas platform.

Step-by-Step Guide

Access Settings in OTTO
- Open the OTTO application and navigate to the settings menu.
Locate the Relevant Settings Section
- Within the settings menu, find the section labeled "Crawl Settings" or a similarly named area under "Widget and AI Settings." Depending on your OTTO version, you might also find similar options under "Autopilot Settings."
Exclude Specific Pages
- Once in the Crawl Settings menu, you will see an option to provide URLs or select pages you wish to exclude from crawling or optimization.
Save Changes
- Ensure you save any changes after adding the desired pages to the exclusion list.

Use Cases for Excluding Pages

Sensitive Content: Prevent certain pages containing private or sensitive information from being listed in search results.
Under Development: Exclude pages that are still under construction or review.
Duplicate Content: Avoid indexing duplicate versions of the same content, which can impact SEO rankings negatively.

Remember to periodically review and adjust these settings as your website grows and evolves. Regular audits help maintain optimal site performance and identify potential issues before they impact your search engine rankings.

Crawl Monitoring [Beta]

The Crawl Monitoring feature provides real-time tracking and analysis of how search engines and AI bots interact with your website. This tool helps you optimize your site's visibility and ensure efficient indexing by monitoring crawler behavior, distribution, and patterns.

To access Crawl Monitoring, navigate to the Site Audit dashboard and select the Crawl Monitoring tab.

Dashboard Components:

Crawler Distribution Table
- Shows active crawlers (e.g., Google, Bing, Google-Mobile)
- Displays total requests per crawler
- Includes interactive graphs for Historical Crawl Activity visualization
Historical Crawl Activity Graph
- Presents crawler activity over time
- Color-coded by crawler type
- Switchable between Daily/Weekly/Monthly views
- Hover tooltips show detailed metrics for specific dates

Key Metrics:

Site Indexation Percentage
- Visual representation of crawled vs. uncrawled pages
- Updates in real-time as crawlers access your site
- Helps identify indexing gaps and opportunities
Crawl Purpose Analysis
- Distinguishes between discovery and refresh crawls
- Discovery: These are crawls where search engines find and index new pages on your site for the first time
- Refresh: These are crawls where search engines revisit already-known pages to check for updates
- Shows percentage distribution of crawler intentions
- Helps understand how search engines are processing your content
Device Distribution
- Breaks down crawler activity by device type (Desktop/Mobile)
- Helps ensure proper resource allocation for different user agents
Crawl Frequency Metrics
- Shows activity patterns across different timeframes:
  - Last 7 days
  - Last 30 days
  - Last 6 months
  - Last 1 year
- Helps identify trends and patterns in crawler behavior

Best Practices:

Regularly monitor the "Not Crawled" percentage to identify potential technical issues
Use the device distribution data to inform mobile optimization efforts
Track crawl frequency patterns to optimize content update schedules
Monitor crawler distribution to ensure balanced visibility across search engines

Content Velocity [Beta]

Content Velocity is an analytical tool that measures and tracks your website's publication patterns. By analyzing your sitemap data, it helps you understand and optimize your content publishing strategy by providing detailed temporal insights.

To access Content Velocity, navigate to the Site Audit dashboard and select the Content Velocity tab.

Dashboard Components:

URL Publication Tracking
- Lists all published URLs
- Provides publication timestamps
- Includes direct links to content
- Allows for URL selection and filtering
Publication Frequency Analysis
- Daily View
  - Interactive graph with hover details
  - Highlights peak publishing days
  - Helps identifying publishing gaps
- Monthly Distribution
  - Bar chart showing monthly publication volume
  - Color-coded by content categories
- Yearly Overview
  - Pie chart displaying yearly publication totals
  - Quick comparison between years
  - Publication growth indicators

Competitor Content Analysis Track and analyze your competitors' content publishing strategies using our comprehensive sitemap monitoring system. Here's how to get started:

Initial Setup
- Navigate to the Site Audit section
- Enter your competitor's website URL
- Run a complete site audit
- Wait for the analysis to complete
Adding Competitors
- In Content Velocity, click the '+Add Competitor' button
- Select from your previously audited websites
- You can track multiple competitors simultaneously
- The system will automatically start monitoring their content

Keep in mind:

The tool updates daily based on your sitemap
Historical data is maintained for comprehensive trend analysis
Data can be filtered by sitemaps and/or date ranges

Best Practices:

Monitor publication consistency to maintain steady content flow
Use historical data to plan future content schedules
Identify successful publishing patterns
Track seasonal content trends
Analyze the impact of publishing frequency on site performance

Whitelisting Search Atlas IP in CDNs

If you want to run a site audit and monitor your site health using Search Atlas, sometimes our site crawlers get blocked by CDNs.

Content Delivery Networks (CDNs) are a group of servers that help deliver website content faster. If your website relies on a CDN to increase its performance, then the CDN’s firewall may block our crawler from accessing your site.

Resolving this issue is straightforward and requires whitelisting the Search Atlas site crawler.

You can find the list of IPs that Search Atlas uses to monitor websites in this link. You will need to whitelist all of these IP addresses to run successful SEO audits via Search Atlas.

Alternatively, if you're using Cloudflare you can just select our bot out of the verified bots list. For more information, refer to Cloudflare's documentation.

FAQs

Which user agent should I use? We recommend using Googlebot Mobile. This is because Google indexes the mobile version of your site first (mobile-first indexing) and most organic traffic comes from mobile devices. As a result, technical errors in a mobile environment are usually more important. However, if you experience issues crawling, feel free to try any one of our other user agents, like Search Atlas!
How often should I recrawl the site? We recommend monthly crawls at a minimum. Although, it's a good practice to set up more frequent crawls on priority pages, like the home page and main navigation pages. Priority pages we recommend crawling weekly up to daily depending on how frequently the website is updated.
Why is the crawl budgeting and what should I set? We suggest calculating it based on the total pages of website + 10%
How do I exclude specific pages in OTTO? To exclude pages in OTTO, navigate to the settings menu, access "Crawl Settings" or "Autopilot Settings," input the URLs to exclude, and save your changes. This helps manage site optimization effectively.
Why is the Site Audit is still showing issues I fixed with OTTO SEO? SiteAudit doesn't have Javascript rendering, therefore it won't reflect the changes done through OTTO.
What is the reason why not all the pages were audited? Sometimes, the website owner blocks the pages to crawler using robot.txt file or meta robots tag or server-side restriction. Please make sure to whitelist our crawler to avoid this issues.
Why am I getting the error “Crawl budget setting too high”? This could happen if the Site Audit quota has already expired
Why did the crawler only find one page on my site? If our crawler found only one page, it's often due to a lack of outgoing internal links from your homepage. This can happen for several reasons:
JavaScript Rendering: Our Site Audit tool currently doesn’t support JavaScript rendering. If your site’s navigation relies on JavaScript, the crawler may be unable to locate and follow links beyond the homepage.
Sitemap Issues: If your sitemap is incomplete or outdated, it may not accurately reflect all the pages on your site. This can leave the crawler without links to follow to additional pages.
Blocked Resources: Check that our crawler isn’t blocked from accessing internal links by server-side settings like Cloudflare’s “Bot Fight Mode”, here’s an article on how to disable those .
Why is the crawl monitoring of my site audit empty? If your site audit's crawl monitoring shows no data, this is typically caused by one of these common issues:
Missing OTTO Script: The script hasn't been installed on your website
Outdated Version: You're running an older version of OTTO script that needs updating
Implementation Issues: The script might not be properly implemented across all pagesFor step-by-step solutions to these issues, follow our installation guide

How do I customize the pages to crawl? Both the Site Audit and OTTO tools allow configuration of crawlers to include or exclude pages based on specific rules. This is useful for auditing only relevant sections of your site.
Why is only one page being crawled? Restart the crawling job with proper settings. If the issue persists, use a different crawling agent or check for bandwidth restrictions.

How can I manage '404 Not Found' errors in my site audits? Address '404 Not Found' errors by using the Site Audit tool to exclude these URLs from future crawls. Navigate to the exclusions feature, add the '404' URLs, and initiate a recrawl to verify the exclusions take effect.

Glossary

Depth: Page depth is a metric that measures the number of clicks required for a user to navigate from the home page to the next page on the site.
Site Health score: Measured out of 1000 points. 1000 being the most optimal result. This metric helps users understand their score and how to improve the technical aspects of a website. Anything above 800 is acceptable.
Site Health changes: It’s a graphical way to check if the site health score is improving or worsening over time.
All page changes: you can track when improvements are made, as well as what pages were added, changed, redirected, or removed.
Total issue changes: track how many of the issues found have been addressed.
All Page types: Check the page status in the code.
Site Indexability: Track what pages are indexable and what pages are not.
Chrome User Experience Report: The Chrome User Experience Report (CrUX) provides user experience metrics for how real-world Chrome users experience data on millions of websites.
Pages to Crawl: Select how many pages you want to crawl at once. It is set up to crawl the first 100 pages, and if a domain exceeds that, the number can be adjusted.
Crawl Speed: defines how long the crawler should spend analyzing the website. It is set up to 20 pages per second. The more time spent crawling, the more granular the data could be.
Crawl frequency: This refers to the frequency at which the data is updated through a re-crawl. It is set up to re-crawl every 7 days, and you can adjust it.
Page: The exact URL of the page analyzed.
Type: A page type in Compose is a content type that defines a custom field structure for a web page or any other composition of content. Some typical examples of page types are “Landing page”, “Homepage”, “Product page”, or “Blog post”.
Importance: With the previous metrics, we give the user an estimate of how important a page is; the closer it is to the home page, the more important it tends to be
Page Health: An estimate of how healthy that page is from 0 to 1000. Depends on the number of issues found on that specific page.
HTTPS: Hypertext Transfer Protocol Secure (HTTPS) is the secure version of HTTP, which is the primary protocol used to send data between a web browser and a website.
Status: What’s the code status of the page? 200 tends to signify a stable page. Statuses like 400 or 500 tend to indicate issues.
Here, the user can find the exact title given to the page.
Meta Description: A meta description tag generally informs and interests users with a short, relevant summary of what a particular page is about. They are like a pitch that convinces the user that the page is precisely what they're looking for.
H1: Here, the user can find the exact H1 present on the page.
Indexable: This is the metric that shows that due to the amount of issues present in a page, whether that makes the page indexable or not.
In XML Sitemap: Whether the page in question is present in the site map of the domain.
Incoming Internal Links: Internal links are hyperlinks that point to different pages on the same website. These differ from external links, which link to pages on other websites.
HREFLANG is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to tell search engines like Google about these variations. This helps them to serve the correct version to their users.
Schema.org Types: Schema.org is defined as two hierarchies: one for textual property values and one for the things that they describe. This is the central schema.org hierarchy: a collection of types (or "classes"), each of which has one or more parent types.
View button: When clicked, it directs the user to the Insights page, where they can analyze specific pages and view all the issues that affect them.
Create segment: This feature allows users to filter and divide the data as they prefer. Add new column filters from within the table, sort columns, and include a certain number of pages on their segments.
Share Audit: via URL.
GSC/GA: Connect both Google integrations to enrich the data.
Export: as an Excel file.
Manage Columns: Select and reorder the columns you want to see.