Duplicate Content

Substantively identical or very similar content that appears at more than one URL, either within a single site or across different sites.

Definition

Duplicate content describes blocks of content that match or closely resemble each other at multiple URLs. Search engines have to choose one canonical version to index, and signals such as links and rankings may be split across the duplicates.

Duplicate content commonly arises from URL parameters, session IDs, http and https variants, www and non-www hosts, printer-friendly pages, syndication and faceted navigation. Google does not treat most duplication as a manual-action offence; instead it clusters the duplicates and selects a canonical URL using signals like sitemap inclusion, internal linking, redirects and rel=canonical annotations. Site owners can influence that selection by consolidating URLs with 301 redirects, declaring canonicals or noindexing low-value variants.

Examples

Tracking parameters create duplicates
An online store reaches the same product page via `/shoes/runner`, `/shoes/runner?utm_source=email` and `/shoes/runner?ref=affiliate`. Google clusters the three URLs and picks one canonical to display in results.
Syndicated article on a partner site
A news publisher republishes an article on a partner domain without a canonical pointing back to the original. Google has to decide which copy to show in Search, and the partner's version sometimes outranks the source.

Glossary

Duplicate Content

Definition

Examples

Sources

Related terms

Where QueryCatch uses this