Issues

Content: Semantically Similar

back to issues overview

Semantically Similar

Pages that are semantically similar in content based upon the configured cosine semantic similarity threshold. High semantic similarity can be completely normal, but it can also indicate duplicate or overlapping content that should be reviewed.

How to Analyse in the SEO Spider

View URLs with this issue in the ‘Content’ tab and ‘Semantically Similar’ filter.

To populate this filter an embeddings prompt must be set up from an AI provider as outlined in the embeddings configuration. The ‘Enable Semantic Similarity’ configuration must be selected in ‘Config > Content > Embeddings’, and post ‘Crawl Analysis’ must be performed.

Semantic similarity scores range from 0 – 1, and pages scoring above 0.95 are considered semantically similar by default. The semantic similarity threshold can be adjusted via ‘Config > Content > Embeddings’ down to as low as 0.5.

The ‘Semantic Similarity Score’ column displays the similarity to the page displayed in the ‘Closest Semantically Similar Address’ column.

The ‘No. Semantically Similar’ column displays the number of pages that are similar to the page based upon the similarity threshold which can be viewed in the lower ‘Duplicate Details’ tab and ‘Semantic Similarity’ filter.

The algorithm is run against text on the page, rather than the full HTML. The content used for this analysis can be configured under ‘Config > Content > Area’.

Export in bulk using ‘Bulk Export > Content > Semantically Similar’.

Please read our tutorial How to Identify Semantically Similar Pages & Outliers.

What Triggers This Issue

This issue is triggered when pages are semantically similar in content, based on a configured cosine similarity threshold (defaulted to 0.95).

How To Fix

Review semantically similar content to ensure highly similar pages should be standalone pages and are not duplicated, covering the same subject multiple times, causing cannibalisation issues, or crawling and indexing inefficiencies.

It is normal to have highly semantically similar content when talking about closely related subjects. Leave pages where appropriate, or if required consider making pages more unique, consolidate, block, or remove.

Consider internal linking opportunities between highly semantically similar content.

Further Reading

Back to top