Spying on Google: What is Spam? What is Relevant? Read This to Find Out

You can read a lot about what search engineers want by looking at how the search results change. You can learn a bit more by listening to how they try to guide / influence / manipulate the market while engaging in discourse. And you can learn a lot more by reading their guidelines for how they expect people to rate search quality.

The reasons that the internal communication documents are so powerful are

  • they do not discuss search from "in an ideal world" approach, but cover the current marketplace from a pragmatic standpoint solving real issues
  • the documents may display algorithmic holes that require manual intervention
  • the documents may show clues as to the hints search engineers give raters to quickly infer quality and relevancy
  • the documents show issues or relevancy infractions that merit a lower relevancy rating
  • the documents show how ratings change based on the quality and availability of information on the topic
  • how something that is considered spam in some instances is considered fine if it is associated with a large well known brand
  • how things that are relevant in some verticals are irrelevant in others if Google runs a competing offering
  • the current documents are the result of years of back and forth communication between quality raters and search engineers

For organic search junkies the Google Gods have tossed us another gift. An SEO Black Hat member discovered an April 2007 Google Evaluation Guidelines document, referenced here.

In April 2007 Yahoo! Music did offer lyrics, but the official Google query evaluation guidelines from that time-frame stated

Exceptions (Scraped Content that is not Spam) Lyrics, poems, ringtones (that the user programs rather than downloads), quotes, and proverbs have no central authority. When you see pages with this content, you cannot judge it to have been copied, and the pages should not be assigned a Spam label. Unfortunately, some content is written specifically for Spam pages and you will not find it on another source.

Although you may be convinced that the intent is to deceive, if the content makes sense and appears original, you will not be able to label such pages Spam.

In a sense, if a spammer or copyright violator is the only person providing the information online for free it is not considered spam, even if it would have been deemed spam by the traditional guidelines. The same is likely true if Google is trying to work on business negotiations to own that content directly (how could they state there are no central authority sites for music lyrics when sites like Yahoo! Music offer them?).

Because Google has not partnered up with the record labels to create a Google database of lyrics somehow those copyright violations are deemed acceptible even if they would have been judged as spam under Google's typical guidelines. And, of course, after Google creates a relationship to get those lyrics hosted on Google.com, many of those lyrics sites will indeed be deemed as spammers.

In other words, spam is only spam if it does not help Google achieve its business objectives. Who cares about the laws. Good to know.

You can compare the current query evaluation and rater document to the 2003 versions I referenced here and here. And the 2007 document has been leaked online.

Published: March 11, 2008 by Aaron Wall in google

Comments

SEO Junkie
March 13, 2008 - 6:35am

In other words, spam is only spam if it does not help Google achieve its business objectives. Who cares about the laws. Good to know.

Evil is whatever Sergey decides is evil. ;-)

Justin Goldberg
March 13, 2008 - 7:59pm

Someone post the full document to rapidshare or, even better, scribd!

SlightlyShadySEO
March 13, 2008 - 10:26pm

I read that document cover to cover. Incredibly useful stuff there. You beat me to blogging about it though ;-)
And no one should leak the whole document btw. Keep it at least semi-private(the super details) and it stays useful. Of course, I doubt this will happen.

March 13, 2008 - 10:28pm

And no one should leak the whole document btw. Keep it at least semi-private(the super details) and it stays useful. Of course, I doubt this will happen.

Some of the people who linked to this post provided multiple public links to the full document in different formats (web based and PDF).

Lee Johnson
September 15, 2008 - 3:33pm

After reading the PDF I was facinated to find that Google are taking spam seriously. (Oh really!!)
How many times to you seen crap in the top 10 of the SERP's results?

Justin Seibert
September 17, 2008 - 1:48pm

Great stuff - thanks, Aaron. Bummed I'm just stumbling across this now.

September 17, 2008 - 2:31pm

Glad you like it. :)

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.