If you manage a large ecommerce website the concept of “crawl budget” is likely something that should be on your radar, if it’s not already. Even if just to cross it off your list of issues, it’s important knowledge for any ecommerce professional to have.
If you manager a smaller site, crawl budget won’t be a concern for you.
What is Crawl Budget?
Google defines crawl budget as “the number of URLs Googlebot can and wants to crawl.” The search engine determines how many pages of your site will be crawled each day based on a variety of factors. The result is your crawl budget. Not having enough crawl budget can impact the visibility of your new content and the changes you make to existing content.
What Does a Crawl Budget Issue Look Like?
What displays in a Google search result is not always the latest version of a web page. What you see, instead, is the most recent version of a page crawled and memorized (indexed) by Google.
When a crawl budget issue impacts a site it means not enough pages are periodically crawled relative to the total number of pages. When that is the case any page changes or additions may not be seen by Google – or Google users – for a longer-than-normal period of time. Since the first goal of any web page is to be found this is an important issue to diagnose and address.
Crawl Rate Limit + Crawl Demand = Crawl Budget
A crawl rate limit is applied by Google to limit the number of pages crawled on a site during a given period of time. Limits are applied because an excessive crawl rate could have an impact on a site’s server, which in turn can have negative consequences for users. Slow sites have a lower crawl rate limit than sites that respond quickly. On site errors and other issues can also impact crawl rate limits.
Crawl demand refers to how many pages, and how often, Google wants to crawl a site. Crawl demand increases as a site or page becomes more popular, and decreases when the opposite occurs.
Put together, crawl rate limit and crawl demand dictate the average number of daily pages Google crawls on your site – the crawl budget.
Identifying Crawl Budget Issues
Yoast outlines a number of quick and easy steps to identify a site’s crawl budget, which we’ve summarized below. As a caveat, as Yoast correctly notes this strategy assumes your site does not have a larger number of URLs that are crawled but not indexed (via noindex meta tags, for example). Otherwise, you should be fine to take these steps:
- Determine the total number of pages on your site. You can do this by looking at your XML sitemap, if you have one – typically /sitemap.xml after your domain.
- Go to your Google Search Console dashboard and look at the pages crawled per day (under “Crawl” then “Crawl Stats”).
- Divide the total number of pages on your site by the average number of pages crawled each day.
- If the total number of pages exceeds 10x that which Google crawls each day, a remediation should be considered.
Addressing Crawl Budget Deficits
Tackling a crawl budget can be a challenge, but there are steps that can be taken to both improve site structure and quality and, hopefully, increase the number of pages that Google crawls each day.
Reducing unnecessary URLs can be a good first step. Many ecommerce sites have a faceted navigation structure that can create a large number of URLs, for example. A product page for clothing may include filters for size, color, and other product features. In the eyes of the search engines each filter option can create a new page that needs to be crawled, even as the average user believes they are on the same page, practically speaking.
Blocking these unnecessary faceted navigation pages via your robots.txt file may be a good strategy in such instances. Doing so can drastically reduce your overall page count, which in itself can have a huge impact on any crawl budget issues. This strategy, along with other important SEO considerations, has been laid out nicely on the Moz blog.
Additionally, a number of other unnecessary URL types have been outlined by Google.
While addressing unnecessary URLs provides a more manageable page count that Google has a better capacity to crawl, the second prong of attack involves working to prompt Google to crawl more pages per day. This involves fixing crawl errors using the Google Search Console crawl error report, and addressing other site errors.
Though it is a longer-term strategy involving a robust content strategy to engage the online community you serve, additional inbound links to your site can also help to show interest in your site and prompt more pages to be crawled.
Your Next Steps
If you reached the end of this post you likely suspect you’re facing a crawl budget challenge, or just want to do everything you can to ensure your ecommerce site is as healthy as possible. Rest assured, there is a wealth of resources online to help you go down each rabbit hole to ultimately decide on a strategy that is right for you.
The first step is determining if there is an issue to be addressed, because knowledge is power.
Think you've determined an issue? Let us help.