Why I Love the Google's Supplemental Index
Forbes recently wrote an article about Google's supplemental results, painting it as webpage hell. The article states that pages in Google's Supplemental index is trusted less than pages in the regular index:
Google's programmers appear to have created the supplemental index with the best intentions. It's designed to lighten the workload of Google's "spider," the algorithm that constantly combs and categorizes the Web's pages. Google uses the index as a holding pen for pages it deems to be of low quality or designed to appear artificially high in search results.
Matt Cutts was quick to state that supplemental results are not a big deal, as Rand did here too, but supplemental results ARE a big deal. They are an indication of the health of a website.
I have worked on some of the largest sites and network of sites on the web (hundreds of millions+ pages). When looking for duplicate content or information architecture related issues, the search engines do not allow you to view deep enough to see all indexing problems, so one of the first things I do is use this search to find low quality pages (ie: things that suck PageRank and do not add much unique content to their site). After you find some of the major issues you can dig deeper by filtering out some of the core issues that showed up on your first supplemental searches. For example, here are threadwatch.org supplemental results that do not contain the word node in the URL.
If you have duplicate content issues, at best you are splitting your PageRank, but you might also affect your crawl priorities. If Google thinks 90% of a site is garbage (or not worth trusting much) I am willing to bet that they also trust anything else on that domain a bit less than they otherwise would, and are more restrictive with their willingness to crawl the rest of the site. As noted in Wasting Link Authority on Ineffective Internal Link Structure, ShoeMoney increased his search traffic 1400% after blocking some of his supplemental pages.
Comments
Slightly off topic, but Matt Cutts left an interesting comment on Rand's post:
"duplicate content doesn't make you more likely to have pages in the supplemental index in my experience. It could be a symptom but not a cause, e.g. lots of duplicate content implies lots of pages, and potentially less PageRank for each of those pages. So trying to surface an entire large catalog of pages would mean less PageRank for each page, which could lead to those pages being less likely to be included in our main web index."
He adds
"I'm not aware of an explicit mechanism whereby duplicate content is more likely to be in our supplemental results, but I'm also happy to admit that as supplemental results are different from webspam, I'm not the expert at Google on every aspect of supplemental results."
One problem I see is using tags on pages. I show a lot of supplemental results for pages like www.domain.com/tag/xxxxx
I rank for many more terms using tags with my Wordpress blog but the majority of them eventually fall into supplemental and suck PR. We're talking like 2800+ tag pages.
Also the wordpress pages that are built to comments and posts feeds. www.domain.com/xxxxxx/feed
Google's reporting of supplemental results is a little messed up at the moment, although your toolbar also seems to report totally different numbers as well.
Lets look at your specific search method for SEOBook.com
Duplicate content on SEOBook.com search
Now lets compare that to my site that has a lot of duplicate content and much less total pagerank and links
Duplicate content on andybeard.eu search
There are some new things going on that are also fairly bugged, because pages that are duplicate content based upon the toolbar pagerank export (and show grey) still rank for reasonably competitive keywords.
Add new comment