As a public facing SEO who has many thousands of customers at the beginning of the SEO learning cycle many of my most common questions I get asked come as a result of Google half truths. I thought it would be worthwhile to write a few of these down to save myself time answering the same emails each day.
It may be inappropriate to label Google as a liar for doing the following. A more appropriate and more fair label would be an intentionally deceitful organization.
Want Link Data? Go Elsewhere
Google offers a link: function which shows a sampling of inbound links to a website. A few years back they had a much smaller allotment of machines for handling link queries, so they used to show mostly a sample of the most authoritative inbound links to a site. Then they started showing mostly lower quality inbound links, while filtering out most of the better ones. But they explain that they doubled the size of the sample, and showed more links to smaller mom and pop websites that lacked high authority inbound links, so it was a good feature for users.
When you hear some people at Google talk about this they talk about it, they tend to talk about "from a historical perspective" and explain how they used to be limited, but they still use virtually ALL link data in calculating result relevancy. Given that last statement then the "from a historical perspective" is self serving positioning about not providing a useful feature because they want to make SEO harder.
Want further proof? If you sign up for Google Webmaster Central and verify your site they will show you all your linkage data. I don't trust Google Webmaster Central because they profile SEOs and hand edit the search results. Signing up is probably an unwise decision.
Google does offer us a free tool to estimate link authority though: PageRank.
Google PageRank
For as hyped as Google PageRank is, Google sure goes out of their way to ensure the values are inaccurate. They only update the Google Toolbar display about once every three months. Even then, when they update it that is not fresh for that day...those stats might be from a week, two weeks, or a month ago. Also sometimes the toolbar is buggy and shows the wrong PageRank values, where viewing the same page multiple times in a row will yield different PageRank values each time.
The only reasons they still place PageRank on the toolbar is because they get free marketing out of it, and it helps them collect more usage data. Years ago Apostolos Gerasoulis, the search scientist behind Teoma, said Google doesn't rely heavily on PageRank to score relevancy. Gigablast's Matt Wells said similar:
PageRank is just a silly idea in practice, but it is beautiful mathematically. You start off with a simple idea, such as the quality of a page is the sum of the quality of the pages that link to it, times a scalar. This sets you up with finding the eigen vectors of a huge sparse matrix. And because it is so much work, Google appears not to be updating its PageRank values that much.
Any webmaster with an old URL that ranks exceptionally well with almost no PageRank knows that PageRank didn't drive them to outrank people with 10 times their link equity.
PageRank is important for one aspect of information retrieval though: crawl depth. If you have a lot of PageRank then you will get crawled deeply and more frequently. If not, then they will crawl shallow, and perhaps place many of your pages in the supplemental results.
Are My Pages in Google's Supplemental Results?
Want to know what pages from your site are rarely crawled, cached, or updated? Want to know where your duplicate content issues exist? Want to know what pages from your site we don' t trust links from or trust enough to rank well for many search results? Look at the supplemental results. Oops, they took that label out of the results, but here is a more complicated search you can use to find your supplemental results, at least until it gets disabled. Jim Boykin and Danny Sullivan also offered tips on finding supplemental pages.
Right now a Google search for Google supplemental results provides low quality search results because most of the top results (including my #1 ranking at the moment) do not discuss how to find supplemental results. Unfortunately if you only heard of the supplemental results AFTER they took the label out you likely will have no way of telling if any of your pages are supplemental, which is good for seasoned marketers but bad for mom and pop. If I don't edit my post then people will think I am a liar or an idiot because Google is deceptive.
If they truly wanted to make the world's information universally accessible and useful why would they remove this label? At the very least, if webmasters paid attention to this label they would structure their sites better and help Google save bandwidth by not having Google crawl as many low quality pages.
The easiest way to get out of the supplemental results is to clean up site structure issues and build more high quality link authority. Cleaning up site structure issues is much harder now that it is harder to see what is in the supplemental results, and each day it gets harder to build honest links due to Google spreading FUD about links...
Organic Linking Patterns
With all the FUD Google spreads about paid links they make many webmasters afraid to link out to other sites, which reduces the quality of information available on those sites, and prevents some quality sites from ranking where they should. Nofollow is not about being organic. In fact, it was a tool created to directly manipulate the public's perception of linking. To appreciate how out of hand it is, consider the following situation.
A friend's business got covered by a mainstream media site. They wrote an entire article about his business but did not link to him because they felt linking would have made the article too promotional. Imagine being the topic of an article and the source of content for other sites without attribution for it. That is the side effect of Google's bought links FUD.
Instead of promoting quality content current relevancy algorithms support information pollution. Google goes out and scares you about keeping your link profile natural while letting proxies hack your rankings. And they have known about this problem for years, just like 302 redirect.
Since Google's link guidelines are self-serving and out of nature with the realities of the web, what happens if I get aggressive with link building and eventually get busted for doing the same things my competitors are getting away with doing?
Lose All Your Link Equity (and Your Content and Your Brand, Too!)
Your Site Goes to Jail, You DO NOT Collect $200
Many webmasters have suffer the fate of hand editing recently. The site of mine that they hand edited had about 95% of its links cleanly built by me, with the other 5% coming before I bought the site. Because it was my site they wiped away ALL of its link equity via a hand edit (simply because I bought a site that had some link equity). What makes hand edits worse is when they follow up a hand edit by paying an AdSense spammer to steal all of your content and then rank his site where you ranked prior to the hand edit.
When sites are hand penalized, they typically do not even rank for their own brand related keywords unless the webmaster buys their own brand name in Google AdWords, which means Google is even willing to sacrifice their relevancy to punish webmasters who fall outside of Google's evershifting rule-set. Unfortunately that punishment is doled out in an uneven fashion. Large corporations can own 10 first page rankings, or use 100 subdomains, but if you do the same with a smaller website expect a swift hand edit. Even programmers who support Google's API get a hand edit from time to time.
Rank Checkers & Google API Keys
Were you one of the early programmers to build a tool that use the SOAP version of Google's API? Sorry, but they no longer offer Google Search API keys. Their (formerly useful) API has came back as an academic only project which they can use to recruit college students studying information retrieval.
Anyone who built a tool based on Google's old API now has to explain to people why their tools broke. Google wanted the tools to break so they could replace the useful API with something worse. In fact, Google is pulling back more data in other spots, even when third parties create tools to add features that should have been core to Google's products. Let's take a look at AdSense.
Google AdSense Stats
Google does not tell smaller webmasters what their payout percentage is, what keywords triggered the ads, or what ads get clicked on. Some third party tools were created to help track the ads and keywords, but Google disabled those.
If you think about this, Google is directly undermining the profitability of their partners by hoarding data. If I know what sections of my site perform well then it is easier for me to create high value content in those areas. The more profitable my site is the more money I have to reinvest into building more content and higher quality content.
It doesn't make sense that they ban high quality content just because it is owned by an SEO, then fund the growing dirty field of cybersquatting. I invested nearly $100,000 into building an AdSense site, until it got hand edited and I realized how AdSense really works, cannibalizing the value of content and making your site too dependant on Google as a traffic source.
Summary
If Google was honestly interested in creating a maximally efficient marketplace they wouldn't disable third party tools, hold back information, and keep changing their systems to confuse webmasters. They wouldn't hand edit real sites that thousands of webmasters vote for. And they would not be spreading FUD amongst the market. They would simply find a way to monetize the markets, push out inefficiencies, and grow additional revenue streams.
In some cases, if you register your site with Google they may give you a few more crumbs of info, but unless you have scale they want you to fail. What they really want, like any for profit power hungry authoritative system, is control of your attention and information, so they can ensure as many dollars as possible flow through them. Look no further than their position on the health care industry to see their true vision for making information universally accessible and useful. Ads are a type of information:
We can place text ads, video ads, and rich media ads in paid search results or in relevant websites within our ever-expanding content network. Whatever the problem, Google can act as a platform for educating the public and promoting your message. We help you connect your company’s assets while helping users find the information they seek.
Looking Forward
Eventually Google will launch Claim Your Content.com, which is yet another way for them to get you to register your work with them so they can more easily profile webmasters and hand edit SEO owned sites. Allegedly it will help prevent content theft, but once it comes out, expect duplicate content filters to stagnate or get worse unless you sign up for their service. Dirty? Yes, but so is deceiving people to increase corporate profits. The above 7 examples are just a small set of the whole.