Understanding Site Audits
A site audit is a comprehensive analysis of your website's technical SEO health and performance. Using automated crawlers, it systematically examines your website's pages to identify issues that could impact your search engine rankings, user experience, and overall site performance. The crawler acts like a search engine bot, visiting your pages and collecting data about various technical aspects such as load speed, meta tags, broken links, and content quality.
Setting Up Your Site Audit
Follow these steps to configure your site audit for optimal results:
1. Enter Your Website URL
Input the complete URL of the website you want to audit (e.g., https://www.example.com)
Verify that the URL includes the correct protocol (http:// or https://)
2. Configure Crawl Depth
Determine the total number of pages you want the crawler to analyze
Recommended Setting: Set the limit to your total number of pages plus 10%
Example: If your site has 1000 pages, set the limit to 1100
This buffer ensures complete coverage and accounts for any new pages
3. Set Crawl Frequency
Choose how often you want the crawler to audit your site
Minimum Recommended Frequency: Monthly
For Critical Pages: Weekly
Factors to consider when setting frequency:
How often your content changes
Site size and complexity
Resource allocation
4. Choose User Agent
Select the crawler's user agent to simulate specific search engine behavior
Recommended Options:
Our custom Searchatlas user agent
Googlebot Mobile
5. Adjust Crawl Speed
Set the pace at which the crawler analyzes your pages
Key Considerations:
Faster crawls: Quick overview of major issues
Slower crawls: More detailed analysis and deeper insights
Server capacity and bandwidth limitations
Important: A slower, more thorough crawl typically yields more detailed and accurate results
6. URL Exclusion Conditions (Beta)
Configure which parts of your website should be excluded from the crawl
Helps optimize crawl budget and focus analysis on relevant pages
Two primary configuration options:
URL Exclusion Rules
Exclude URLs containing specific terms
Example: Excluding "/blog" will prevent the crawler from analyzing any URLs containing this path
Useful for skipping sections like admin pages, author archives, or specific categories
URL Inclusion Exceptions
Add specific exceptions to your exclusion rules
Example: If you've excluded "/blog", you can add "/blog/*" to include all blog posts while still excluding the main blog page
Allows for granular control over which pages are analyzed
Remember to periodically review and adjust these settings as your website grows and evolves. Regular audits help maintain optimal site performance and identify potential issues before they impact your search engine rankings.
Crawl Monitoring [Beta]
The Crawl Monitoring feature provides real-time tracking and analysis of how search engines and AI bots interact with your website. This tool helps you optimize your site's visibility and ensure efficient indexing by monitoring crawler behavior, distribution, and patterns.
To access Crawl Monitoring, navigate to the Site Audit dashboard and select the Crawl Monitoring tab.
Dashboard Components:
Crawler Distribution Table
Shows active crawlers (e.g., Google, Bing, Google-Mobile)
Displays total requests per crawler
Includes interactive graphs for Historical Crawl Activity visualization
Historical Crawl Activity Graph
Presents crawler activity over time
Color-coded by crawler type
Switchable between Daily/Weekly/Monthly views
Hover tooltips show detailed metrics for specific dates
Key Metrics:
Site Indexation Percentage
Visual representation of crawled vs. uncrawled pages
Updates in real-time as crawlers access your site
Helps identify indexing gaps and opportunities
Crawl Purpose Analysis
Distinguishes between discovery and refresh crawls
Discovery: These are crawls where search engines find and index new pages on your site for the first time
Refresh: These are crawls where search engines revisit already-known pages to check for updates
Shows percentage distribution of crawler intentions
Helps understand how search engines are processing your content
Device Distribution
Breaks down crawler activity by device type (Desktop/Mobile)
Helps ensure proper resource allocation for different user agents
Crawl Frequency Metrics
Shows activity patterns across different timeframes:
Last 7 days
Last 30 days
Last 6 months
Last 1 year
Helps identify trends and patterns in crawler behavior
Best Practices:
Regularly monitor the "Not Crawled" percentage to identify potential technical issues
Use the device distribution data to inform mobile optimization efforts
Track crawl frequency patterns to optimize content update schedules
Monitor crawler distribution to ensure balanced visibility across search engines
Content Velocity [Beta]
Content Velocity is an analytical tool that measures and tracks your website's publication patterns. By analyzing your sitemap data, it helps you understand and optimize your content publishing strategy by providing detailed temporal insights.
To access Content Velocity, navigate to the Site Audit dashboard and select the Content Velocity tab.
Dashboard Components:
URL Publication Tracking
Lists all published URLs
Provides publication timestamps
Includes direct links to content
Allows for URL selection and filtering
Publication Frequency Analysis
Daily View
Interactive graph with hover details
Highlights peak publishing days
Helps identifying publishing gaps
Monthly Distribution
Bar chart showing monthly publication volume
Color-coded by content categories
Yearly Overview
Pie chart displaying yearly publication totals
Quick comparison between years
Publication growth indicators
Keep in mind:
The tool updates daily based on your sitemap
Historical data is maintained for comprehensive trend analysis
Data can be filtered by sitemaps and/or date ranges
Best Practices:
Monitor publication consistency to maintain steady content flow
Use historical data to plan future content schedules
Identify successful publishing patterns
Track seasonal content trends
Analyze the impact of publishing frequency on site performance
Whitelisting Search Atlas IP in CDNs
If you want to run a site audit and monitor your site health using Search Atlas, sometimes our site crawlers get blocked by CDNs.
Content Delivery Networks (CDNs) are a group of servers that help deliver website content faster. If your website relies on a CDN to increase its performance, then the CDN’s firewall may block our crawler from accessing your site.
Resolving this issue is straightforward and requires whitelisting the Search Atlas site crawler.
Search Atlas uses one of the following IP addresses to monitor websites:
168.151.102.206
161.123.88.40
161.123.75.228
161.123.90.214
161.123.81.184
168.151.116.203
168.151.138.43
161.123.105.249
161.123.77.229
161.123.107.109
161.123.77.171
161.123.90.252
209.95.170.40
168.151.116.237
161.123.78.218
161.123.104.45
161.123.85.66
161.123.82.188
209.95.171.24
161.123.83.48
You will need to whitelist all of these IP addresses to run successful SEO audits via Search Atlas.
FAQs
Which user agent should I use? We recommend using Googlebot Mobile. This is because Google indexes the mobile version of your site first (mobile-first indexing) and most organic traffic comes from mobile devices. As a result, technical errors in a mobile environment are usually more important. However, if you experience issues crawling, feel free to try any one of our other user agents, like Search Atlas!
How often should I recrawl the site? We recommend monthly crawls at a minimum. Although, it's a good practice to set up more frequent crawls on priority pages, like the home page and main navigation pages. Priority pages we recommend crawling weekly up to daily depending on how frequently the website is updated.
Why is the crawl budgeting and what should I set? We suggest calculating it based on the total pages of website + 10%
Why is the Site Audit is still showing issues I fixed with OTTO SEO? SiteAudit doesn't have Javascript rendering, therefore it won't reflect the changes done through OTTO.
What is the reason why not all the pages were audited? Sometimes, the website owner blocks the pages to crawler using robot.txt file or meta robots tag or server-side restriction. Please make sure to whitelist our crawler to avoid this issues.
Why am I getting the error “Crawl budget setting too high”? This could happen if the Site Audit quota has already expired
Why did the crawler only find one page on my site? If our crawler found only one page, it's often due to a lack of outgoing internal links from your homepage. This can happen for several reasons:
JavaScript Rendering: Our Site Audit tool currently doesn’t support JavaScript rendering. If your site’s navigation relies on JavaScript, the crawler may be unable to locate and follow links beyond the homepage.
Sitemap Issues: If your sitemap is incomplete or outdated, it may not accurately reflect all the pages on your site. This can leave the crawler without links to follow to additional pages.
Blocked Resources: Check that our crawler isn’t blocked from accessing internal links by server-side settings like Cloudflare’s “Bot Fight Mode”, here’s an article on how to disable those .
Why is the crawl monitoring of my site audit empty? If your site audit's crawl monitoring shows no data, this is typically caused by one of these common issues:
Missing OTTO Script: The script hasn't been installed on your website
Outdated Version: You're running an older version of OTTO script that needs updating
Implementation Issues: The script might not be properly implemented across all pages
For step-by-step solutions to these issues, follow our installation guide
Glossary
Depth: Page depth is a metric that measures how many clicks it takes for a user to get from the home page to the next page on the site.
Site Health score: Measured out of 1000 points. 1000 being the most optimal result. This is a metric to help the user understand their score and how to improve the technicals of a website. Anything above 800 is acceptable.
Site Health changes: It’s a graphic way to check if the score of the site health is improving or worsening over time.
All page changes: you can track when improvements are done, also track what pages were added, changed, redirected or removed.
Total issue changes: track how many of the issues found have been addressed.
All Page types: Check the status of the page in the code.
Site Indexability: Track what pages are indexable and what pages are not.
Chrome User Experience Report: The Chrome User Experience Report (CrUX) provides user experience metrics for how real-world Chrome users experience data on millions of websites.
Pages to Crawl: select how many pages you want to crawl at once, it is set up to crawl the first 100 pages, and, if a domain exceeds that the number can be adjusted.
Crawl Speed: define how long you want the crawler to spend analyzing the website. It is set up to 20 pages per second. The more time spent crawling the more granular the data could be.
Crawl frequency: this is the frequency set to update the data with a re-crawl. It is set up to re-crawl every 7 days and you can adjust it.
Page: The exact URL of the page analyzed.
Type: A page type in Compose is a content type which defines a custom field structure for a web page or any other composition of content. Some typical examples for page types are “Landing page”, “Homepage”, “Product page” or “Blog post”.
Importance: With the previous metrics we give the user an estimate of how important a page is, the closer it is to the home page the more important it tends to be
Page Health: An estimate of how healthy that page is from 0 to 1000. Depends on the amount of issues found in that specific page.
HTTPS: Hypertext transfer protocol secure (HTTPS) is the secure version of HTTP, which is the primary protocol used to send data between a web browser and a website.
Status: What’s the code status of the page, 200 tends to signify a stable page. Status like 400 or 500 tend to signify issues.
Title: Here the user can find the exact title given to the page.
Meta Description: A meta description tag generally informs and interests users with a short, relevant summary of what a particular page is about. They are like a pitch that convince the user that the page is exactly what they're looking for.
H1: Here the user can find the exact H1 present on the page.
Indexable: This is the metric that shows that due to the amount of issues present in a page, whether that makes the page indexable or not.
In XML Sitemap: Whether the page in question is present in the site map of the domain.
Incoming Internal Links: Internal links are hyperlinks that point to different pages on the same website. These differ from external links, which link to pages on other websites.
HREFLANG: is an HTML attribute used to specify the language and geographical targeting of a webpage. If you have multiple versions of the same page in different languages, you can use the hreflang tag to tell search engines like Google about these variations. This helps them to serve the correct version to their users.
Schema Org Types: Schema.org is defined as two hierarchies: one for textual property values, and one for the things that they describe. This is the main schema.org hierarchy: a collection of types (or "classes"), each of which has one or more parent types.
View button: When clicked it takes the user to the page insights where they can analyze specific pages and see all the issues that affect them.
Create segment: this allows the user to filter and divide the data however they prefer. Add new column filters from within the table, sort columns and include a certain amount of pages on their segments.
Share Audit: via URL.
GSC/GA: connect both Google integrations to enrich the data.
Export: as an xls file.
Manage Columns: Select and reorder the columns you want to see.