Google Search Result Quality Evaluators

Google's search quality evaluation process site may have been around for years.

SearchBistro recently posted a 22 page PDF titled General Guidelines on Random-Query Evaluation that was last revised on December 31, 2003. In addition to posting the Random-Query Evaluation PDF, Henk van Ess has recently posted:

  • examples of offensive (or low quality) sites

  • some whitelisted sites:
    Here is a non-exhaustive "white list" of the sites whose pages are not to be rated as Offensive (nor as Erroneous):
    Kelkoo, Shopping.com, dealtime.com, bizrate.com, bizrate.lycos.com, dooyoo.com;

  • tips for rating sites:
    If it's a machine-generated, no added value affiliates, it's Spam. If it provides some unique values, for example, customer feedback, local information, it should be rated on the merit scale even if it has some affiliates. Similarly, if the game site allows you to download a game, without being intrusive (i.e. install a spyware without notice), it should be rated on the merit scale, instead of Spam.

  • How reviewers communicate to come up with solutions when review quality scores are far apart from one another

Search Classification Types:
The Google review guide classifies searches as being

  • navigational (example: a search for United Airlines)

  • informational (example: how do I..)
  • transactional (example: buy 18K White Gold Omega Watch)
  • any mixture of the above categories.

Resource Quality Rating:
Google then asks raters to classify sites listed in random queries using the following categories:

  • Vital

    • most queries, especially generic type queries do not have a Vital result.

    • Vital result example: search for Ask Jeeves returns www.ask.com.
  • Useful
    • these should have some of the following characteristics (although it likely will not exhibit all of them): comprehensive, quality, answers the search query with precission, timly, authoritative.

    • This is the highest rating attainable for most pages on most search queries.
    • Useful result example: search for USA Patriot Act returning the ACLU page covering the USA Patriot Act.
    • For some plural queries, such as Newspapers in Scotland, the best results may be lists of related sites. Reviewers must also check some links on the page to ensure the page is functional.
  • Relevant
    • One step down from Useful. Relevant results may satisfy only one important facet of a query, whereas Useful results are expected to be more broad and thorough.

    • Results that would have been Vital if a more common interpretation did not overshadow it are considered relevant.
  • Not Relevant
    • Not Relevant results are related to the topic but do not help users.

    • If a person searching for Real Estate finds a San Diego Real Estate website that would probably not be relevant since most people searching for that do not live in or want to move specifically to San Diego.
    • As the San Diego example is too narrow geographically other sites could also be too narrow in other non location based ways, such as being outdated or too specific to a subset idea of the query.
  • Off Topic
    • Is not a useful page. Irrelevant.

    • Usually occurs when text matching algorithms do not account for some terms that can have multiple meanings.
  • Offensive
    • Pages or sites that often do not hold merit on any query.

    • Example Offensive sites: spyware, unrequested porn, AdSense scraper and other keyword net type sites, etc.
  • Erronious
  • Didn't Load
  • Foreign Language
  • Unrated

Vital to Offensive are in order of quality. The higher the better. Erronious through Unrated are cast as non votes. When in doubt between rating values raters are expected to rate at the lower of the two rating values.

Why this is Important:
By learning how and what they want evaluators to look for it makes it easier to understand how to deliver what the search engines want.

This post was a quick review of General Guidelines on Random-Query Evaluation. If you are heavily interested in SEO it is well worth your time to read the original document, which lists many more examples and is in far greater detail than this post.

Random Thoughts:
With how relatively low the wages are for these positions ($10 - $20 an hour) you have to wonder:

  • why it took so long for this information to come out

  • if some of these people are using the information they gained from participating in other ways
  • if these people know anything about Google's business model, and how much THEY could be making on a per click basis if they created well cited content that fit Google's guidelines.
  • and a far off tangent! what would happen if Google's business model made self employment too profitable to where they could not afford to pay workers

AdSense Tips, Google Israel, Google Updates Webmaster Guidelines, More...

AdSense:
Subscribers thread at WMW offering tips to making money from AdSense

here is an example of some of OddSod's advice:

Adwords cost:
unreliable hosting £0.04
server going down £0.04

If you do a special landing page and convert those to dedicated server (£3.00 of which you'll likely get £2) you need only a 2% CTR to break even. Many sites find it quite easy to achieve 5%.

Shalom:
One of the few words I remember from my brief stay in Israel. Apparently Google wants to go there too, as they are pondering opening up a new office.

Webmaster Guidelines:
Google recently updated them.

The Beauty of Search:
rant post by Sebastian

Google Server IP Address:
DaveN pointed at a cool new FireFox plugin that shows the IP address your search results are coming from.

Glossaries:
A hip SEO technique. says Woz

Sites Postioned Above Mine:
thread about ways to penalize sites which are overtly manipulating search relevancy. A few interesting posts and points of view in there, as well as links to a white paper on the topic.

ClickTracks:
Interview of John Marshall

Like the French, Germans to Challenge Google Print

Not too long ago there were many reports of French trying to rival the Google print program. The International Herald Tribune reports the same thing is now happening in Germany.

Then this year, when Google started wooing publishers to sign on for its own digital book project, that German executive, Matthias Ulmer, decided the time was ripe to seize control with a homegrown counterattack.

Now Ulmer and a five-member task force of the German book trade association Börsenverein are organizing their own digital indexing project, Volltextsuche Online. The effort of the 6,000-member association of booksellers and publishers comes in reaction to Google's plans, unveiled in December, to start digitizing books in the world, with the first step being major university library collections in the United States.

Ultimately variety is going to be important to keep the free flow of information possible. A few companies controlling the information supply is a scary thought, though it looks as though it is where we are headed.

The scalability of search and requested ad networks requires that anyone jumping into the market either

  • creates a strong brand in a niche, or

  • jumps in big

It will be hard for Google and others to appease publishers as they try to convince them to allow others to control free copies of their content, which at the least will transform the publishing business model and could eventually undermine large segments of it.

Even if some of the uprising forces have little effect on the outcome being a leader in an outraged group helps market the leaders as being market leaders. An article with "challenges Google" in the title is bound to get syndicated thousands of times.

If you can find some angle where you can go against Google which others find nobel it might be some of the cheapest marketing you ever experience.

Whether or not people view Google's desire to control information as evil it is hard to deny that they are at the forefront of pushing others to modernize data.

Google Using Human Reviewers, Google Launches Google Sitemaps

Human Reviewers:
Google using humans to improve relevancy. They may eventually accept feedback on AdSense publisher quality. Maybe.

GoogleGuy:
exclusive thread. He thinks summer is a good time to code. New Orleans is only a few weeks off.

Lazy Crawling:
One of Google's major hangups with paid inclusion was that it allows lazy crawling. It appears that is no longer an issue, as Danny spots the free new Google Sitemaps program. FAQs here

Goole Toolbar PageRank Missing, Google Engineers at WMW Conference, Yahoo! & DMOZ Weighting

PageRank:
goes missing from toolbar. Brett Tabke said it is just a temporary glitch though.

Google Engineers:to appear at the New Orleans WMW conference

Does Reciprocal Linking Work?
Recently I saw the Blue Gecko SEO forums ranking at #10 for SEO. Most of his link popularity looked like it was from link trades associated with his webmaster resources directory. The reason people say link trades do not work are mostly because:

  • they are usually slow and expensive to build if you do not outsource or automate

  • most people exchanging links in bulk are not doing so with quality sites

DMOZ Weighting in Yahoo!:
I created a one page site about Effexor which is listed in DMOZ. I have not built any other linkage data, and it is ranking in the mid 30s for Effexor out of over 7,000,000 sites.

A Trawl Through a Little Bit of Fishtory

Once Upon a Time in a Galaxy Far Far Away...

from The Anatomy of a Large-Scale Hypertextual Web Search Engine

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

a few years later:

It looks bad, coming days after the recent song-and-dance at the Google Factory Tour about how much energy is supposedly expended on core search and ads. Here's a personalized home page, but don't worry, we're not a portal, Google said.

Funny, this type of inattention is exactly what made people get turned off from the portals of the past, when they lost focus on search quality. Yahoo seems to have fixed this redirect hijacking problem, but Google is still struggling with it?

Danny Sullivan, on Google's hijacking of their own site. Danny rarely sounds ticked off, but that post hints at more than a little disappointment in Google.

Someone Who Hates Google...

Should write a press release about Google not being about to control their search index, and nefarious webmasters hijacking other sites to remove them from the search index. Why?

Currently the ball is in Google's court, with SEO being branded as spam and scum of the web.

If someone could push the idea that Google could not even control their own index, or how to rate their own site, then perhaps they could somehow shift the frame, saying that Google does not know how to control their index and needs the help of good SEOs to improve their search relevancy.

They should reference:

Search engines have yet to be seriously challenged about:

Most consumers do not realize how search results are manipulated, and most don't even know where the ads are.

It would be cool to see an SEO more daring or less lazy than me use this opportunity to toot their own horn and talk about how they help Google solve a problem it hasn't fully figured out yet - relevancy.

It would certainly be cheap marketing if you get national media coverage with the current feeding frenzy for Google's stock.

Google Hijacked in their own Search Results

Google Hijacked in Google:
Official Google AdSense site bit by a meta refresh. hmm. Low quality site? more at ThreadWatch

For those who spin all the ethics stuff, do you think Google knew of the problem and was lying when they said it was no big deal? If so, is it ethical for them to tell blatent lies? If not, how is it that SEOs know more about their search engine than they do and they generally disocunt the whole concept of SEO?

Yahoo! Q Challenge:
whats up with a $5,000 prize - that surely is not much payout for the value they could create with that contest. I might need to create a similar marketing program for myself. hehehe

Exalead:
Michael Nguyen posts about some of their search features.

Novice Spam Tool:
I have not tried it, but someone promoted this site http://searchpr.info/test.php, via forum spam of course.

Yahoo! Public Site Match:
Nothing more than a PR stunt? It sure smells the part. A while ago they promoted that program a good bit, but it sure is hard to find information about it nowadays.

Masochistic Behavior:
reading IHY forums. I don't know anywhere else where a single comment can return pages about what a horrible person you are. SEO is doomed. We are all evil. hehehe

Lots of good ones in that rant thread, but one of my all time favorite Doug quote:

Most journalists I know of at least fall on one side or the other.

Another scary thing with that thread is I find myself agreeing with Glengara!

Open Source Rank Checker:
I have not tried it, but a friend pointed me to this software. I am not sure how it plays with Google, since they have been blocking some automated software.

OPD Should Close Shop?
Danny Sullivan weighs in on the ODP's recent site submission status closure.

Black Market Porn:
UK bans selling porn DVDs over the web. UK prostitution market to soar ;)

Funny:
There is a website that qualifies you and prints out your ordained ministor certification in under a minute. A person today tried to justify me giving away my business model to them because they spent the minute to print one out.

Evolution of Yahoo! Search:
article about Yahoo! creating their search service. thnx to RC

Google Portal, Stemming, DMOZ Submission Review

Portalized:
Google offers portalization of Google.com. Danny Sullivan has an in depth review. They have a number of features and intend to add many, such as RSS feed support.

Stemming:
Rand points out a post by Xan on stemming and a free online stemming tool

DMOZ:
kills the submission status review. Now its even easier to be corrupt ;)

New York Times:
Begins charging for some of their content. Most of their content remains free. They are also replacing the CEO of About.com.

When Not to Submit to Directories:
when a person creates about a half dozen general directories and promotes them all together. that is not building value, that is trying to cash out and milk the web.

Many directory owners have become exceedingly greedy recently. All the while search algorithms continue to advance and few of the directory owners are actually trying to build any legitimate value.

The Search:
You can pre order John Battelle's new book. He said if you use this link he may be able to autograph it for you, assuming he can work out the shipping details.

The Size of Google's Index:
might have been a bit frothy

Google Factory Tour:
video presentations (should be up soon), Philip Lessen has highlights

Mirago AdSense:
Apparently they have a product similar to AdSense, which might be useful for companies like HotNacho.

SimCity & Google Earth

SimCity was always one of my favorite games. kpaul recently noticed a new site by the name of Chicago Crime, which overlays crimes with their locations using Google Maps. Pretty scary to see that in Chicago there was over a murder a day last month.

What kind of ad marketplace would Google have if they:

  • integrated Google maps and public data into a social network

  • which linked to - or allowed people to upload - business feedback (think Local Froogle)
    • should I buy from here?

    • what other businesses are cheaper or provide better service?
    • should I consider working here?
    • who else is hiring in this field or near here?

    and destination reviews

    • is this place worth visiting?

    • when is best?
    • who has the best travel deals?

They also could show the history and trust rating of reviewers, as well as letting you determine how many social connections away you were willing to accept reviews from, maybe they could match up personalities or demographic profiles if people gave them that data, or they could let you create your own combined metric.

Add a strong recommending engine technology to that (like how Amazon.com says "of the people who viewed this product ultimately 37% ended up buying XYZ") and Google will serve ads that know what you want even when you don't.

Google has data worth lots and lots of money. It will be interesting to see how they aggregate content and collect feedback to leverage their market position.

Any merchant heavily exposed to the web which is not building communities or other hard to replicate assets may end up in the hurt locker in the next couple years.

Google's ad serving technology is still somewhat primative. As time passes and more major networks leverage their market postions more and more merchants will get marginalized by the forces that be.

Pages