How to Identify Thin and Duplicate Content Issues on Your Website
Content is a major factor in Google’s ranking algorithm. While “high-quality content” can seem somewhat subjective, there are objective guidelines that all content creators should follow:
- Avoid duplicate content (copied text that shows up on multiple pages)
- Avoid thin content (less than 350 words)
- Write unique meta data for every page
- Use headings in the correct hierarchal order (H1, H2, H3, etc.)
It’s a good idea to analyze your website’s content periodically to check for any of these potential issues. This task can be time-consuming, but luckily, there are tools that can automate the process.
Using Web Crawlers to Identify Thin Content
Rather than manually going through every page and checking word count, you can use a web crawler to quickly identify which pages need more content. In the example below, we used Screaming Frog, a paid software popular within the SEO community.
Copy and paste your domain name into the crawler and click “Start” to run a new crawl. After it’s finished, click the export button.
This will export an Excel or CSV file of all the data. The resulting file will have a lot of columns that you won’t need, so feel free to hide whatever isn’t useful. You will want to focus on these columns:
- Status Code
- Word Count
For ease of use, filter the columns so that you’re only looking at relevant pages. First, filter the content tab to only display text/html pages. Then, filter the status codes so you only see 200, 301, or 302. Finally, filter indexability to only show “indexable” pages.
Sort by word count in ascending order to easily view which pages contain thin content. However, there are a few nuances to keep in mind.
The Purpose of the Page
If the page is image-heavy like a photo gallery or there isn’t much to say about the subject, don’t add unnecessary fluff simply to increase the word count. Take it as a sign to reevaluate the importance of a page if you’re unable to enhance the thin page copy with additional value. Begin asking yourself if the content could be included somewhere else or removed entirely.
It should be noted that there are exceptions. For example, a Contact Us or About Us page is useful, even if it may not include a lot of content. Order forms with minimal content should also remain on your site.
Take into Account Extra Words
Web crawlers look at all content on a page; they are unable to distinguish between the main content and the other words on the page. For example, most websites have header and/or footer navigation menus that link out to the most important pages. Web crawlers tally the text within these navigation menus and include them into your word count, inflating your total.
A simple calculation to decipher which pages, in fact, need more content is as follows:
Suggested minimum (350) + Total count of extra text = Content benchmark
On the Web Talent website, our benchmark is 508 after we take into account our footer navigation. To identify which pages need a content refresh, our SEO team will adjust Excel’s filters to flag pages with less than 508 words.
Though it is a manual process, this method narrows down your search list and helps you focus on the pages that are most in need of a content overhaul.
Other Data You Can Use from Screaming Frog
Screaming Frog is great for diagnosing issues beyond thin content. You can also use it to look for missing meta data and title tags, long meta descriptions, and multiple H1 tags on a page.
These are all content issues that, once fixed, give your website a better chance of ranking well on Google.
Tools for Identifying Duplicate Content
Siteliner is a helpful tool for identifying duplicate content. The crawler scans your entire website, identifies pages that have repeat text, and tells you how much of the text matches. It will even highlight the duplicate text with color coding to make it easier for you to visualize the content issues.
Siteliner, however, runs into the same issues as any web crawler and will include the header and footer text in the total number of matching words. Be sure to evaluate the pages it is flagging to see if the content truly is duplicated.
Need Help Auditing Your Content?
Maintaining website copy takes valuable time away from your other daily tasks. If you’d like help with the process, Web Talent has an experienced content marketing team ready to dive into your site content and identify areas of opportunity. Learn more about our content marketing strategy services.