Crawling
The process by which search engines discover and download pages from the web using automated programs called crawlers.
Definition
Crawling is the stage in which a search engine discovers pages on the web and downloads their text, images, and other content using automated programs known as crawlers. It is the first step before a page can be analysed and considered for indexing.
Google has stated that crawling uses an algorithmic process to decide which sites to visit and how often, and that crawlers adjust their speed to avoid overloading a server. Not every discovered URL is crawled: pages can be blocked by robots.txt, gated behind a login, or simply not reached. Being crawled is also distinct from being indexed, as Google processes a crawled page before deciding whether to add it to the index at all.
Examples
Discovery via links
Googlebot follows a link from an already-known category page to a newly published article, downloading that article during a crawl.
Crawling blocked by robots.txt
A staging subdomain disallowed in robots.txt is not crawled, so its pages are not downloaded for processing.
Sources
Related terms
- GooglebotThe generic name for Google's web crawlers — the automated software that discovers and fetches pages for inclusion in Google Search.
- IndexingThe process by which a search engine analyses a fetched page and stores information about it so the page can later be returned in search results.
- Crawl BudgetThe number of URLs a search engine crawler will fetch and the rate at which it fetches them on a given site.
- robots.txtA plain-text file at the root of a domain that tells crawlers which paths they may or may not request.
- SitemapA file, usually XML, that lists URLs on a site so search engines can discover and crawl them more efficiently.
Where QueryCatch uses this
Last updated: 16/05/2026