How Search Engines Work: Search Engine Relevancy Reviewed
This article is a fairly comprehensive review of search engine relevancy algorithms, published by Aaron Wall on June 13, 2006. While some of the general details have changed, the major themes referenced in this article were still relevant when I reviewed it a year after publishing it.
However, when I reviewed it on January 12, 2011, there have been significant changes:
- Yahoo! Search is now powered by Bing in the United States and Google in Japan.
- Ask announced they were leaving the search space to focus on QnA, and their core search will be powered by another search engine.
- A couple newer smaller search engines (like Blekko and DuckDuckGo) have launched.
- Some foreign search engines that dominate their home markets (like Yandex and Baidu) are looking to become global players.
- Search engines have ...
- increasinly pushed to monetize their organic search results.
- pushed more on automating search query suggestion.
- evolved the search interface.
- placed more focus on search personalization & localization
- built more advanced anti-spam penalties + filters
In spite of the changes since publication, this article still makes for a great historical reference & helps readers understand a variety of approaches that have been used in the search space, both from a business model and a relevancy perspective.
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Article Sections
- Overview
- The Short Version
- Yahoo! Search
- MSN Search
- Ask
- Vertical Search & Other Search Companies
- Thanks
Why Did I Write This Article?
Many people think a search engine is broken or their site is banned if they do not rank in one engine while they rank great in others. This article will hopefully help searchers understand the different relevancy criteria used by different engines. While people have looked at search engine ranking factors on a global level I do not think anyone has spent much time comparing and contrasting how different search engines compute their relevancy scores or bias their algorithms.
I think one of the hardest parts of explaining SEO is that the web is such a diverse place where things that make sense in one industry would make no sense in another industry. In addition, different engines look for different things to determine search relevancy. Things that help you rank in one engine could preclude you from ranking in another.
There is enough traffic out there that it might make sense to do ultra aggressive SEO on one burnable site for quick rankings in MSN and Yahoo! while spending time using slow growth long term low risk techniques on a site you eventually want to rank in Google.
I thought it would be worth compiling notes comparing how relevancy is defined by each engine (or how I perceive it based on my experiences). This page makes no aim to be comprehensive, but is designed more for making it easy for people new to the web to understand the differences between the different engines.
Bias is Universal. The Definition of Relevancy is Not.
Each major large scale search engine is ran by a large corporation. Each of these corporations is a for profit entity which has their own underlying principals or core beliefs which helps guide how they craft the search relevancy. While some machines automatically evolve search relevancy via genetic algorithms each major search engine still has some human input in how relevancy is calculated (at the very least humans write some of the algorithms).
Alejandro M. Diaz wrote a PDF research paper about Google's biases.
The Bias of this Article
Originally when I started writing this article I wanted it to be more about search relevancy perhaps from more of an academic type standpoint, but my perspective on search is as one who understands it more from a marketing perspective.
The Short Version
Yahoo!
- been in the search game for many years.
- is better than MSN but nowhere near as good as Google at determining if a link is a natural citation or not.
- has a ton of internal content and a paid inclusion program. both of which give them incentive to bias search results toward commercial results
- things like cheesy off topic reciprocal links still work great in Yahoo!
MSN Search
- new to the search game
- is bad at determining if a link is natural or artificial in nature
- due to sucking at link analysis they place too much weight on the page content
- their poor relevancy algorithms cause a heavy bias toward commercial results
- likes bursty recent links
- new sites that are generally untrusted in other systems can rank quickly in MSN Search
- things like cheesy off topic reciprocal links still work great in MSN Search
- has been in the search game a long time, and saw the web graph when it is much cleaner than the current web graph
- is much better than the other engines at determining if a link is a true editorial citation or an artificial link
- looks for natural link growth over time
- heavily biases search results toward informational resources
- trusts old sites way too much
- a page on a site or subdomain of a site with significant age or link related trust can rank much better than it should, even with no external citations
- they have aggressive duplicate content filters that filter out many pages with similar content
- if a page is obviously focused on a term they may filter the document out for that term. on page variation and link anchor text variation are important. a page with a single reference or a few references of a modifier will frequently outrank pages that are heavily focused on a search phrase containing that modifier
- crawl depth determined not only by link quantity, but also link quality. Excessive low quality links may make your site less likely to be crawled deep or even included in the index.
- things like cheesy off topic reciprocal links are generally ineffective in Google when you consider the associated opportunity cost
Ask
- looks at topical communities
- due to their heavy emphasis on topical communities they are slow to rank sites until they are heavily cited from within their topical community
- due to their limited market share they probably are not worth paying much attention to unless you are in a vertical where they have a strong brand that drives significant search traffic
Yahoo! Search
Yahoo! was founded in 1994 by David Filo and Jerry Yang as a directory of websites. For many years they outsourced their search service to other providers, but by the end of 2002 they realized the importance and value of search and started aggressively acquiring search companies.
Overture purchased AllTheWeb and AltaVista. Yahoo! purchased Inktomi (in December 2002) and then consumed Overture (in July of 2003), and combined the technologies from the various search companies they bought to make a new search engine. Yahoo! dumped Google in favor of their own in house technology on February 17th, 2004.
Yahoo! has a cool Netrospective of their first 10 years, and Bill Slawski posted a list of many of the companies Yahoo! consumed since Overture.
On Page Content
Yahoo! offers a paid inclusion program, so when Yahoo! Search users click on high ranked paid inclusion results in the organic search results Yahoo! profits. In part to make it easy for paid inclusion participants to rank, I believe Yahoo! places greater weight on on-the-page content than a search engine like Google does.
Being the #1 content destination site on the web, Yahoo! has a boatload of their own content which they frequently reference in the search results. Since they have so much of their own content and make money from some commercial organic search results it might make sense for them to bias their search results a bit toward commercial websites.
Using descriptive page titles and page content goes a long way in Yahoo!
In my opinion their results seem to be biased more toward commerce than informational sites, when compared with Google.
Crawling
Yahoo! is pretty good at crawling sites deeply so long as they have sufficient link popularity to get all their pages indexed. One note of caution is that Yahoo! may not want to deeply index sites with many variables in the URL string, especially since
- Yahoo! already has a boatload of their own content they would like to promote (including verticals like Yahoo! Shopping)
- Yahoo! offers paid inclusion, which can help Yahoo! increase revenue by charging merchants to index some of their deep database contents.
You can use Yahoo! Site Explorer to see how well they are indexing your site and which sites link at your site.
Query Processing
Certain words in a search query are better at defining the goals of the searcher. If you search Yahoo! for something like "how to SEO " many of the top ranked results will have "how to" and "SEO" in the page titles, which might indicate that Yahoo! puts quite a bit of weight even on common words that occur in the search query.
Yahoo! seems to be more about text matching when compared to Google, which seems to be more about concept matching.
Link Reputation
Yahoo! is still fairly easy to manipulate using low to mid quality links and somewhat to aggressively focused anchor text. Rand Fishken recently posted about many Technorati pages ranking well for their core terms in Yahoo!. Those pages primarily have the exact same anchor text in almost all of the links pointing at them.
Sites with the trust score of Technorati may be able to get away with more unnatural patterns than most webmasters can, but I have seen sites flamethrown with poorly mixed anchor text on low quality links, only to see the sites rank pretty well in Yahoo! quickly.
Page vs Site
A few years ago at a Search Engine Strategies conference Jon Glick stated that Yahoo! looked at both links to a page and links to a site when determining the relevancy of a page. Pages on newer sites can still rank well even if their associated domain does not have much trust built up yet so long as they have some descriptive inbound links.
Site Age
Yahoo! may place some weight on older sites, but the effect is nowhere near as pronounced as the effect in Google's SERPs.
It is not unreasonable for new sites to rank in Yahoo! in as little as 2 or 3 months.
Paid Search
Yahoo! prices their ads in an open auction, with the highest bidder ranking the highest. By early 2007 they aim to make Yahoo! Search Marketing more of a closed system which factors in clickthrough rate (and other algorithmic factors) into their ad ranking algorithm.
Yahoo! also offers a paid inclusion program which charges a flat rate per click to list your site in Yahoo!'s organic search results.
Yahoo! also offers a contextual ad network. The Yahoo! Publisher program does not have the depth that Google's ad system has, and they seem to be trying to make up for that by biasing their targeting to more expensive ads, which generally causes their syndicated ads to have a higher click cost but lower average clickthrough rate.
Editorial
Yahoo! has many editorial elements to their search product. When a person pays for Yahoo! Search Submit that content is reviewed to ensure it matches Yahoo!'s quality guidelines. Sites submitted to the Yahoo! Directory are reviewed for quality as well.
In addition to those two forms of paid reviews, Yahoo! also frequently reviews their search results in many industries. For competitive search queries some of the top search results may be hand coded. If you search for Viagra, for example, the top 5 listings looked useful, and then I had to scroll down to #82 before I found another result that wasn't spammy.
Yahoo! also manually reviews some of the spammy categories somewhat frequently and then reviews other samples of their index. Sometimes you will see a referral like http://corp.yahoo-inc.com/project/health-blogs/keepers if they reviewed your site and rated it well.
Sites which have been editorially reviewed and were of decent quality may be given a small boost in relevancy score. Sites which were reviewed and are of poor quality may be demoted in relevancy or removed from the search index.
Yahoo! has published their content quality guidelines. Some sites that are filtered out of search results by automated algorithms may return if the site cleans up the associated problems, but typically if any engine manually reviews your site and removes it for spamming you have to clean it up and then plead your case. You can request to have your domain evaluated for re-inclusion using this form.
Social Aspects
Yahoo! firmly believes in the human aspect of search. They paid many millions of dollars to buy Del.icio.us, a social bookmarking site. They also have a similar product native to Yahoo! called My Yahoo!
Yahoo! has also pushed a question answering service called Yahoo! Answers which they heavily promote in their search results and throughout their network. Yahoo! Answers allows anyone to ask or answer questions. Yahoo! is also trying to mix amateur content from Yahoo! Answers with professionally sourced content in verticals such as Yahoo! Tech.
Yahoo! SEO Tools
Yahoo! has a number of useful SEO tools.
- Overture Keyword Selector Tool - shows prior month search volumes across Yahoo! and their search network.
- Overture View Bids Tool - displays the top ads and bid prices by keyword in the Yahoo! Search Marketing ad network.
- Yahoo! Site Explorer - shows which pages Yahoo! has indexed from a site and which pages they know of that link at pages on your site.
- Yahoo! Mindset - shows you how Yahoo! can bias search results more toward informational or commercial search results.
- Yahoo! Advanced Search Page - makes it easy to look for .edu and .gov backlinks
- while doing link:http://www.site.com/page.html searches (links to an individual page)
- while doing linkdomain:www.site.com/ searches (links to any page on a particular domain)
- Yahoo! Buzz - shows current popular searches
Yahoo! Business Perspectives
Being the largest content site on the web makes Yahoo! run into some inefficiency issues due to being a large internal customer. For example, Yahoo! Shopping was a large link buyer for a period of time while Yahoo! Search pushed that they didn't agree with link buying. Offering paid inclusion and having so much internal content makes it make sense for Yahoo! to have a somewhat commercial bias to their search results.
They believe strongly in the human and social aspects of search, pushing products like Yahoo! Answers and My Yahoo!.
I think Yahoo!'s biggest weakness is the diverse set of things that they do. In many fields they not only have internal customers, but in some fields they have product duplication, like with Yahoo! My Web and Del.icio.us.
Search Marketing Perspective
I believe if you do standard textbook SEO practices and actively build quality links it is reasonable to expect to be able to rank well in Yahoo! within 2 or 3 months. If you are trying to rank for highly spammed keyword phrases keep in mind that the top 5 or so results may be editorially selected, but if you use longer tail search queries or look beyond the top 5 for highly profitable terms you can see that many people are indeed still spamming them to bits.
As Yahoo! pushes more of their vertical offerings it may make sense to give your site and brand additional exposure to Yahoo!'s traffic by doing things like providing a few authoritative answers to topically relevant questions on Yahoo! Answers.
Learn More
- Yahoo! Search Content Quality Guidelines
- Yahoo! Search Help
- Yahoo! Search Blog
- Yahoo! Search Submit - paid inclusion
- Yahoo! Publisher Search Blog - blog discussing Yahoo!'s contextual ad product
- Yahoo! Research - Yahoo!'s research lab
Worker Blogs
- Jeremy Zawodny - probably known as one of the top 10 most famous bloggers on the web
- Tim Converse - it is surprising his blog doesn't get more comments on it for how interesting it is
- Tim Mayer - are you going to update this anytime soon Tim?
MSN Search
MSN Search had many incarnations, being powered by the likes of Inktomi and Looksmart for a number of years. After Yahoo! bought Inktomi and Overture it was obvious to Microsoft that they needed to develop their own search product. They launched their technology preview of their search engine around July 1st of 2004. They formally switched from Yahoo! organic search results to their own in house technology on January 31st, 2005. MSN announced they dumped Yahoo!'s search ad program on May 4th, 2006.
On Page Content
Using descriptive page titles and page content goes a long way to help you rank in MSN. I have seen examples of many domains that ranked for things like
state name+ insurance type + insurance
on sites that were not very authoritative which only had a few instances of state name and insurance as the anchor text. Adding the word health, life, etc. to the page title made the site relevant for those types of insurance, in spite of the site having few authoritative links and no relevant anchor text for those specific niches.
Additionally, internal pages on sites like those can rank well for many relevant queries just by being hyper focused, but MSN currently drives little traffic when compared with the likes of Google.
Crawling
MSN has got better at crawling, but I still think Yahoo! and Google are much better at crawling. It is best to avoid session IDs, sending bots cookies, or using many variables in the URL strings. MSN is nowhere near as comprehensive as Yahoo! or Google at crawling deeply through large sites like eBay.com or Amazon.com.
Query Processing
I believe MSN might be a bit better than Yahoo! at processing queries for meaning instead of taking them quite so literally, but I do not believe they are as good as Google is at it.
While MSN offers a tool that estimates how commercial a page or query is I think their lack of ability to distinguish quality links from low quality links makes their results exceptionally biased toward commercial results.
Link Reputation
By the time Microsoft got in the search game the web graph was polluted with spammy and bought links. Because of this, and Microsoft's limited crawling history, they are not as good as the other major search engines at telling the difference between real organic citations and low quality links.
MSN search reacts much more quickly than the other engines at ranking new sites due to link bursts. Sites with relatively few quality links that gain enough descriptive links are able to quickly rank in MSN. I have seen sites rank for one of the top few dozen most expensive phrases on the net in about a week.
Page vs Site
I think all major search engines consider site authority when evaluating individual pages, but with MSN it seems as though you do not need to build as much site authority as you would to rank well in the other engines.
Site Age
Due to MSN's limited crawling history and the web graph being highly polluted before they got into search they are not as good as the other engines at determining age related trust scores. New sites doing general textbook SEO and acquiring a few descriptive inbound links (perhaps even low quality links) can rank well in MSN within a month.
Paid Search
Microsoft's paid search product, AdCenter, is the most advanced search ad platform on the web. Like Google, MSN ranks ads based on both max bid price and ad clickthrough rate. In addition to those relevancy factors MSN also allows you to place adjustable bids based on demographic details. For example, a mortgage lead from a wealthy older person might be worth more than an equivalent search from a younger and poorer person.
Editorial
All major search engines have internal relevancy measurement teams. MSN seems to be highly lacking in this department, or they are trying to use the fact that their search results are spammy as a marketing angle.
MSN is running many promotional campaigns to try to get people to try out MSN Search, and in many cases some of the searches they are sending people to have bogus spam or pornography type results in them. A good example of this is when they used Stacey Kiebler to market their Celebrity Maps product. As of writing this, their top search result for Stacey Kiebler is still pure spam.
Based on MSN's lack of feedback or concern toward the obvious search spam noted above on a popular search marketing community site I think MSN is trying to automate much of their spam detection, but it is not a topic you see people talk about very often. Here are MSN's Guidelines for Successful Indexing, but they still have a lot of spam in their search results. ;)
Social Aspects
Microsoft continues to lag in understanding what the web is about. Executives there should read The Cluetrain Manifesto. Twice.Or maybe three times.
They don't get the web. They are a software company posing as a web company.
They launch many products as though they have the market stranglehold monopolies they once enjoyed, and as though they are not rapidly losing them. Many of Microsoft's most innovative moves get little coverage because when they launch key products they often launch them without supporting other browsers and trying to lock you into logging in to Microsoft.
MSN SEO Tools
MSN has a wide array of new and interesting search marketing tools. Their biggest limiting factor with them is that they have limited search market share.
Some of the more interesting tools are
- Keyword Search Funnel Tool - shows terms that people search for before or after they search for a particular keyword
- Demographic Prediction Tool - predicts the demographics of searchers by keyword or site visitors by website
- Online Commercial Intention Detection Tool - estimates the probability of a search query or web page being commercial, informational-transactional, or
- Search Result Clustering Tool - clusters search results based on related topics
You can view more of their tools under the demo section at Microsoft's Adlab.
MSN Business Perspectives
Microsoft has too many search brands for building their own technology in house.
They have MSN Search, Microsoft AdCenter, and Windows Live Search. All these things are pretty much the same thing and are meshed together, the only difference between them is that Microsoft does not know what brand they want to push.
Microsoft also heavily undermines their own credibility by recommending doorway page generator software and fake Alexa traffic generator software.
It seems as though Microsoft is big, slow moving, and late to the game.
Search Marketing Perspective
I believe if you do standard textbook SEO practices and actively build links it is reasonable to expect to be able to rank well in MSN within about a month. If you are trying to rank for highly spammed keyword phrases keep in mind that many of the top results will have thousands and thousands of spammy links. The biggest benefit to new webmasters trying to rank in Microsoft is how quickly they rank new sites which have shown inbound link bursts.
One note of caution with Microsoft Search is that they are so new to the market that they are rapidly changing their relevancy algorithms as they try to play catch up with Yahoo! and Google, both of which had many years of a head start on them. Having said that, expect that sometimes you will rank where your site does not belong, and over time some of those rankings may go away. Additionally sometimes they may not rank you where you do belong, and the rankings will continue to shift to and fro as they keep testing new technologies.
Microsoft has a small market share, but the biggest things a search marketer have to consider with Microsoft are their vast vats of cash and the dominance on the operating system front.
So far they have lost many distribution battles to Google, but they picked up Amazon.com as a partner, and they can use their operating system software pricing to gain influence over computer manufacturer related distribution partnerships.
The next version of Internet Explorer will integrate search into the browser. This may increase the overall size of the search market by making search more convenient, and boost Microsoft's share of the search pie. This will also require search engines to bid for placement as the default search provider, and nobody is sitting on as much cash as Microsoft is.
Microsoft has one of the largest email user bases. They have been testing integrating search and showing contextually relevant ads in desktop email software. Microsoft also purchased Massive, Inc., a firm which places ads in video games.
Microsoft users tend to be default users who are less advertisement adverse than a typical Google user. Even though Microsoft has a small marketshare they should not be overlooked due to their primitive search algorithms (and thus ease of relevancy manipulation), defaultish users, and potential market growth opportunity associated with the launch of their next web browser.
Learn More
- MSN Guidelines for Successful Indexing
- MSN Site Owner Help
- MSN Search Blog
- MSN AdCenter Blog
- Microsoft AdLab
- Microsoft Research
Worker Blogs
- Robert Scoble - he is probably known as one of the top 10 bloggers, but after working for Microsoft for years he left on June 10th, 2006.
Google sprang out of a Stanford research project to find authoritative link sources on the web. In January of 1996 Larry Page and Sergey Brin began working on BackRub (what a horrible name, eh?)
After they tried shopping the Google search technology to no avail they decided to set up their own search company. Within a few years of forming the company they won distribution partnerships with AOL and Yahoo! that helped build their brand as the industry leader in search. Traditionally search was viewed as a loss leader
Despite the dotcom fever of the day, they had little interest in building a company of their own around the technology they had developed.
Among those they called on was friend and Yahoo! founder David Filo. Filo agreed that their technology was solid, but encouraged Larry and Sergey to grow the service themselves by starting a search engine company. "When it's fully developed and scalable," he told them, "let's talk again." Others were less interested in Google, as it was now known. One portal CEO told them, "As long as we're 80 percent as good as our competitors, that's good enough. Our users don't really care about search."
Google did not have a profitable business model until the third iteration of their popular AdWords advertising program in February of 2002, and was worth over 100 billion dollars by the end of 2005.
On Page Content
If a phrase is obviously targeted (ie: the exact same phrase is in most of the following location: in most of your inbound links, internal links, at the start of your page title, at the beginning of your first page header, etc.) then Google may filter the document out of the search results for that phrase. Other search engines may have similar algorithms, but if they do those algorithms are not as sophisticated or aggressively deployed as those used by Google.
Google is scanning millions of books, which should help them create an algorithm that is pretty good at differentiating real text patterns from spammy manipulative text (although I have seen many garbage content cloaked pages ranking well in Google, especially for 3 and 4 word search queries).
You need to write naturally and make your copy look more like a news article than a heavily SEOed page if you want to rank well in Google. Sometimes using less occurrences of the phrase you want to rank for will be better than using more.
You also want to sprinkle modifiers and semantically related text in your pages that you want to rank well in Google.
Some of Google's content filters may look at pages on a page by page basis while others may look across a site or a section of a site to see how similar different pages on the same site are. If many pages are exceptionally similar to content on your own site or content on other sites Google may be less willing to crawl those pages and may throw them into their supplemental index. Pages in the supplemental index rarely rank well, since generally they are trusted far less than pages in the regular search index.
Duplicate content detection is not just based on some magical percentage of similar content on a page, but is based on a variety of factors. Both Bill Slawski and Todd Malicoat offer great posts about duplicate content detection. This shingles PDF explains some duplicate content detection techniques.
I wrote a blog post about natural SEO copywriting which expounds on the points of writing unique natural content that will rank well in Google.
Crawling
While Google is more efficient at crawling than competing engines, it appears as though with Google's BigDaddy update they are looking at both inbound and outbound link quality to help set crawl priority, crawl depth, and weather or not a site even gets crawled at all. To quote Matt Cutts:
The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling.
In the past crawl depth was generally a function of PageRank (PageRank is a measure of link equity - and the more of it you had the better you would get indexed), but now adding in this crawl penalty for having an excessive portion of your inbound or outbound links pointing into low quality parts of the web creates an added cost which makes dealing in spammy low quality links far less appealing for those who want to rank in Google.
Query Processing
While I mentioned above that Yahoo! seemed to have a bit of a bias toward commercial search results it is also worth noting that Google's organic search results are heavily biased toward informational websites and web pages.
Google is much better than Yahoo! or MSN at determining the true intent of a query and trying to match that instead of doing direct text matching. Common words like how to may be significantly deweighted compared to other terms in the search query that provide a better discrimination value.
Google and some of the other major search engines may try to answer many common related questions to the concept being searched for. For example, in a given set of search results you may see any of the following:
- a relevant .gov and/or .edu document
- a recent news article about the topic
- a page from a well known directory such as DMOZ or the Yahoo! Directory
- a page from the Wikipedia
- an archived page from an authority site about the topic
- the authoritative document about the history of the field and recent changes
- a smaller hyper focused authority site on the topic
- a PDF report on the topic
- a relevant Amazon, eBay, or shopping comparison page on the topic
- one of the most well branded and well known niche retailers catering to that market
- product manufacturer or wholesaler sites
- a blog post / review from a popular community or blog site about a slightly broader field
Some of the top results may answer specific relevant queries or be hard to beat, while others might be easy to compete with. You just have to think of how and why each result was chosen to be in the top 10 to learn which one you will be competing against and which ones may perhaps fall away over time.
Link Reputation
PageRank is a weighted measure of link popularity, but Google's search algorithms have moved far beyond just looking at PageRank.
As mentioned above, gaining an excessive number of low quality links may hurt your ability to get indexed in Google, so stay away from known spammy link exchange hubs and other sources of junk links. I still sometimes get a few junk links, but I make sure that I try to offset any junky link by getting a greater number of good links.
If your site ranks well some garbage automated links will end up linking to you weather you like it or not. Don't worry about those links, just worry about trying to get a few real high quality editorial links.
Google is much better at being able to determine the difference between real editorial citations and low quality, spammy, bought, or artificial links.
When determining link reputation Google (and other engines) may look at
- link age
- rate of link acquisition
- anchor text diversity
- deep link ratio
- link source quality (based on who links to them and who else they link at)
- weather links are editorial citations in real content (or if they are on spammy pages or near other obviously non-editorial links)
- does anybody actually click on the link?
It is generally believed that .edu and .gov links are trusted highly in Google because they are generally harder to influence than the average .com link, but keep in mind that there are some junky .edu links too (I have seen stuff like .edu casino link exchange directories). While the TrustRank research paper had some names from Yahoo! on it, I think it is worth reading the TrustRank research paper (PDF) and the link spam mass estimation paper (PDF), or at least my condensed version of them here and here understand how Google is looking at links.
When getting links for Google it is best to look in virgin lands that have not been combed over heavily by other SEOs. Either get real editorial citations or get citations from quality sites that have not yet been abused by others. Google may strip the ability to pass link authority (even from quality sites) if those sites are known obvious link sellers or other types of link manipulators. Make sure you mix up your anchor text and get some links with semantically related text.
Google likely collects usage data via Google search, Google Analytics, Google AdWords, Google AdSense, Google news, Google accounts, Google notebook, Google calendar, Google talk, Google's feed reader, Google search history annotations, and Gmail. They also created a Firefox browser bookmark synch tool, an anti-phishing tool which is built into Firefox and have relationships with the Opera (another web browser company). Most likely they can lay some of this data over the top of the link graph to record a corroborating source of the legitimacy of the linkage data. Other search engines may also look at usage data.
Page vs Site
Sites need to earn a certain amount of trust before they can rank for competitive search queries in Google. If you put up a new page on a new site and expect it to rank right away for competitive terms you are probably going to be disappointed.
If you put that exact same content on an old trusted domain and link to it from another page on that domain it can leverage the domain trust to quickly rank and bypass the concept many people call the Google Sandbox.
Many people have been exploiting this algorithmic hole by throwing up spammy subdomains on free hosting sites or other authoritative sites that allow users to sign up for a cheap or free publishing account. This is polluting Google's SERPs pretty bad, so they are going to have to make some major changes on this front pretty soon.
Site Age
Google filed a patent about information retrieval based on historical data which stated many of the things they may look for when determining how much to trust a site. Many of the things I mentioned in the link section above are relevant to the site age related trust (ie: to be well trusted due to site age you need to have at least some link trust score and some age score).
I have seen some old sites with exclusively low quality links rank well in Google based primarily on their site age, but if a site is old AND has powerful links it can go a long way to helping you rank just about any page you write (so long as you write it fairly naturally).
Older trusted sites may also be given a pass on many things that would cause newer lesser trusted sites to be demoted or de-indexed.
The Google Sandbox is a concept many SEOs mention frequently. The idea of the 'box is that new sites that should be relevant struggle to rank for some queries they would be expected to rank for. While some people have debunked the existence of the sandbox as garbage, Google's Matt Cutts said in an interview that they did not intentionally create the sandbox effect, but that it was created as a side effect of their algorithms:
"I think a lot of what's perceived as the sandbox is artefacts where, in our indexing, some data may take longer to be computed than other data."
You can listen to the full Matt Cutts audio interviewshere and here.
Paid Search
Google AdWords factors in max bid price and clickthrough rate into their ad algorithm. In addition they automate reviewing landing page quality to use that as another factor in their ad relevancy algorithm to reduce the amount of arbitrage and other noisy signals in the AdWords program.
The Google AdSense program is an extension of Google AdWords which offers a vast ad network across many content websites that distribute contextually relevant Google ads. These ads are sold on a cost per click or flat rate CPM basis.
Editorial
Google is known to be far more aggressive with their filters and algorithms than the other search engines are. They are known to throw the baby out with the bath water quite often. They flat out despise relevancy manipulation, and have shown they are willing to trade some short term relevancy if it guides people along toward making higher quality content.
Short term if your site is filtered out of the results during an update it may be worth looking into common footprints of sites that were hurt in that update, but it is probably not worth changing your site structure and content format over one update if you are creating true value add content that is aimed at your customer base. Sometimes Google goes too far with their filters and then adjusts them back.
Google published their official webmaster guidelines and their thoughts on SEO. Matt Cutts is also known to publish SEO tips on his personal blog. Keep in mind that Matt's job as Google's search quality leader may bias his perspective a bit.
A site by the name of Search Bistro uncovered a couple internal Google documents which have been used to teach remote quality raters what to look for when evaluating search quality since at least 2003
- Google Spam Recognition Guide for Raters (doc) - discusses the types of sites Google considers spam. Generally sites which do not add any direct value to the search or commerce experience.
- General Guidelines on Random-Query Evaluation (PDF) - shows how sites can be classified based on their value, from vital to useful to relevant to not relevant to off topic to offensive
These raters may be used to
- help train the search algorithms, or
- flag low quality sites for internal reviews, or
- human review suspected spam sites
If Google bans or penalizes your site due to an automated filter and it is your first infraction usually the site may return to the index within about 60 days of you fixing the problem. If Google manually bans your site you have to clean up your site and plead your case to get reincluded. To do so their webmaster guidelines state that you have to click a request reinclusion link from within the Google Sitemaps program.
Google Sitemaps gives you a bit of useful information from Google about what keywords your site is ranking for and which keywords people are clicking on your listing.
Social Aspects
Google allows people to write notes about different websites they visit using Google Notebook. Google also allows you to mark and share your favorite feeds and posts. Google also lets you flavorize search boxes on your site to be biased towards the topics your website covers.
Google is not as entrenched in the social aspects of search as much as Yahoo! is, but Google seems to throw out many more small tests hoping that one will perhaps stick.They are trying to make software more collaborative and trying to get people to share things like spreadsheets and calendars, while also integrating chat into email. If they can create a framework where things mesh well they may be able to gain further marketshare by offering free productivity tools.
Google SEO Tools
- Google Sitemaps - helps you determine if Google is having problems indexing your site.
- AdWords Keyword Tool - shows keywords related to an entered keyword, web page, or web site
- AdWords Traffic Estimator - estimates the bid price required to rank #1 on 85% of Google AdWords ads near searches on Google, and how much traffic an AdWords ad would drive
- Google Suggest - auto completes search queries based on the most common searches starting with the characters or words you have entered
- Google Trends - shows multi-year search trends
- Google Sets - creates semantically related keyword sets based on keyword(s) you enter
- Google Zeitgeist - shows quickly rising and falling search queries
- Google related sites - shows sites that Google thinks are related to your site related:www.site.com
- Google related word search - shows terms semantically related to a keyword ~term -term
Business Perspectives
Google has the largest search distribution, the largest ad network, and by far the most efficient search ad auction. They have aggressively extended their brand and amazing search distribution network through partnerships with small web publishers, traditional media companies, portals like AOL, computer and other hardware manufacturers such as Dell, and popular web browsers such as Firefox and Opera.
I think Google's biggest strength is also their biggest weakness. With some aspects of business they are exceptionally idealistic. While that may provide them an amazingly cheap marketing vehicle for spreading their messages and core beliefs it could also be part of what unravels Google.
As they throw out bits of their relevancy in an attempt to keep their algorithm hard to manipulate they create holes where competing search businesses can become more efficient.
In the real world there are celebrity endorsements. Google's idealism associated with their hatred toward bought links and other things which act similarly to online celebrity endorsements may leave holes in their algorithms, business model, and business philosophy that allows a competitor to sneak in and grab a large segment of the market by factoring the celebrity endorsement factor into being part of the way that businesses are marketed.
Search Marketing Perspective
If you are new to a market and are trying to compete for generic competitive terms it can take a year or more to rank well in Google. Buying older established sites with aged trusted quality citations might also be a good way to enter competitive marketplaces.
If you have better products than the competition, are a strong viral marketer, or can afford to combine your SEO efforts with traditional marketing it is much easier to get natural citations than if you try to force your way into the index.
Creating a small site with high quality unique content and focusing on getting a few exceptionally high quality links can help a new site rank quickly. In the past I believed that a link was a link and that there was just about no such thing as a bad link, but Google has changed that significantly over the last few years. With Google sometimes less is more.
At this point sometimes buying links that may seem relatively expensive at first glance when compared to cheaper alternatives (like paying $299 a year for a Yahoo! Directory listing) can be a great buy because owners of the most spammy sites would not want to have their sites manually reviewed by any of the major search companies, so likely Yahoo! and Google both are likely to place more than average weight on a Yahoo! Directory listing.
Also getting a few citations from high quality relevant related resources can go a long way to improving your overall Google search relevancy.
Right now I think Google is doing a junky job with some of their search relevancy, by placing too much trust on older domains and favoring pages that have only one or few occurrences of certain modifiers on their pages. In doing this they are ranking many cloaked pages for terms other than the terms they are targeting, and I have seen many instances of things like Google ranking real content home mortgage pages for student loan searches, largely because student loans was in the global site navigation on the home mortgage page.
Learn More
- Google on SEO
- Google's Webmaster Guidelines
- Google Spam Recognition Guide for Raters (doc)
- General Guidelines on Random-Query Evaluation (PDF)
- Google Blog
- Google AdWords
- Google AdWords Blog
- Google AdSense
- Google AdSense Blog
- Google Sitemaps
- Google Sitemaps Blog
- Papers Written by Googlers
- patent about information retrieval based on historical data
Worker Blogs
- Matt Cutts - Matt is an amazingly friendly and absurdly accessible guy given his position as the head of Google's search quality team.
- Adam Lasnik - a sharded version of Matt Cutts. A Cuttlet, if you will.
Ask
Ask was originally created as Ask Jeeves, and was founded by Garrett Gruener and David Warthen in 1996 and launched in April of 1997. It was a natural query processing engine that used editors to match common search queries, and backfilled the search results via a meta search engine that searched other popular engines.
As the web scaled and other search technologies improved Ask Jeeves tried using other technologies, such as Direct Hit (which roughly based popularity on page views until it was spammed to death), and then in 2001 they acquired Teoma, which is the core search technology they still use today. In March of 2005 InterActive Corp. announced they were buying Ask Jeeves, and by March of 2006 they dumped Jeeves, changing the brand to Ask.
On Page Content
For topics where there is a large community Ask is good at matching concepts and authoritative sources. Where those communities do not exist Ask relies a bit much on the on page content and is pretty susceptible to repetitive keyword dense search spam.
Crawling
Ask is generally slower at crawling new pages and sites than the other major engines are. They also own Bloglines, which gives them incentive to quickly index popular blog content and other rapidly updated content channels.
Query Processing
I believe Ask has a heavy bias toward topical authority sites independent of anchor text or on the page content. This has a large effect on the result set the provide for any query in that it creates a result set that is more conceptually and community oriented than keyword oriented.
Link Reputation
Ask is focused on topical communities using a concept they call Subject-Specific PopularitySM. This means that if you are entering a saturated or hyper saturated field that Ask will generally be one of the slowest engines to rank your site since they will only trust it after many topical authorities have shown they trusted it by citing it. Due to their heavy bias toward topical communities, for generic search they seem to be far more biased on how many quality related citations you have than looking as much at anchor text. For queries where there is not much of a topical community their relevancy algorithms are nowhere near as sharp.
Page vs Site
Pages on a well referenced trusted site tend to rank better than one would expect. For example, I saw some spammy press releases on a popular press release site ranking well for some generic SEO related queries. Presumably many companies link to some of their press release pages and this perhaps helps those types of sites be seen as community hubs.
Site Age
Directly I do not believe it is much of a factor. Indirectly I believe it is important in that it usually takes some finite amount of time to become a site that is approved by your topical peers.
Paid Search
Ask gets most of their paid search ads from Google AdWords. Some ad buyers in verticals where Ask users convert well may also want to buy ads directly from Ask. Ask will only place their internal ads above the Google AdWords ads if they feel the internal ads will bring in more revenue.
Editorial
Ask heavily relies upon the topical communities and industry experts to in essence be the editors of their search results. They give an overview of their ExpertRank technology on their web search FAQ page. While they have such limited distribution that few people talk about their search spam policies they reference a customer feedback form on their editorial guidelines page.
Social Aspects
Ask is a true underdog in the search space. While they offer Bloglines and many of the save a search personalization type features that many other search companies offer they do not have the critical mass of users that some of the other major search companies have.
Ask SEO Tools
Ask search results show related search phrases in the right hand column. Due to the nature of their algorithms Ask is generally not good at offering link citation searches, but recently their Bloglines service has allowed you to look for blog citations by authority, date, or relevance.
Business Perspectives
Ask is owned by InterActive Corp. While Ask is considered to be far behind the running in search volume Barry Dillar, their CEO, has made a large comeback in the television space in the past.
InterActive Corp. has some of the strongest brands in expensive niches such as online dating, loans, and travel. If they sell ad inventory space in some of those top tier markets they can have a significant effect on the search landscape in many markets. They also push the Ask.com search box on many of those well branded sites.
Ask's two biggest weak points are
- their limited distribution
- their heavy reliance on Google for providing their sponsored links
Search Marketing Perspective
Ask generally has such limited market share that I have never really worried much about them. If I was in a vertical where they drove significant traffic I might pay a bit more attention to them. If you are in one of the verticals where they have a strong brand it will be worth it to watch how they bias their search results and / or search ads toward their other internal properties, and how strongly they push their search offering on those properties.
In areas where there is a limited relevant community to build their topical community around their relevancy drops sharply.
Learn More about Ask
- Ask.com Editorial Guidelines
- Ask.com Web Search FAQ page
- Official Ask Jeeves Blog
- Jon Kleinberg's home page - has links to Kleinberg's papers on hubs and authorities. Jon's research was a large foundation of what lead to Ask's current search algorithm.
- Mike Grehan's Topic Distillation PDF - PDF about hubs & authorities
Other Search Systems
Classic large scale hypertextual search engines are only one type of an information retrieval and organization system. There are many other types of search that we do not think of as being search.
Example Vertical Search Engines
General large scale web search is just one type of search. There are many other types of search engines and information organization tools, for example
- the Yellow Pages
- Television program guides
- directories like DMOZ, the Yahoo! Directory, LII, or specialty directories
- encyclopedia type sites like Wikipedia
- large general structured databases like Google Base
- shopping search like Froogle
- local search like Google Local
- news search like Yahoo! News
- blog search like Technorati
- tag search like Del.icio.us
- video search like YouTube
- photo search like Flickr
- meme trackers like Techmeme
- social communities like Digg
- social networks like MySpace
- some people may also rely on an individual content channel or a group of them to find the most interesting things and deliver it through daily updated content streams
Limitations of Vertical Search
Vertical search services typically
- have exceptionally limited scale through running everything through central editors, or
- they have less editorial control than large scale search engines.
Vertical search is limited because
- there fewer available signs of quality they can measure in their algorithms, and
- many of them use how recent something is as one of their sorting signals, and
- they typically have less content available (due to content copyright restrictions or limited available content)
Ways to Organize Vertical Search
Thus vertical search services have to rely on trusting vetted content partners or they must heavily rely on things like usage data. Some vertical services are even based around displaying the most popular items (like Digg) or most frequently viewed items (like Popular Google Videos or Google Video Movers & Shakers).
Many search systems are thin me too aggregators, but by limiting their data sources, structuring their sources, or providing a different means of search many of these search systems are more useful than general global search systems.
Vertical Folds Into General Search
Danny Sullivan wrote a piece called Searching With Invisible Tabs which talks about how Google will fold vertical search into their global search product.
From a marketing perspective the things you have to remember are
- some vertical search services are harder to get into due to editorial vetting processes
- some vertical search services are easy to dominate due to limited competition
- vertical search is being folded into many global search products
Search Engines as Efficient Media Companies
Google, Microsoft, and Yahoo! are all trying to increase ad network efficiency and extend current ad networks to place more relevant ads in current ad systems and to increase the accessibility of ads to smaller merchants via automation and hyper-targeting.
- Google is launching a payment processor which will help them get better conversion statistics. That will allow them to improve ad targeting and increase ad automation.
- Google is trying to roll out free WiFi services so they can track mobile search ads.
- Companies like Olive Software will make it easier for offline content to come online, and make it easy to cite those works like regular web pages. After a few media companies find a way to make sense of the search model many others will likely trickle across.
- Yahoo! bought Flickr, Del.icio.us, Upcomming, and many other vertical search companies.
- Microsoft bought Massive, Inc., a company which sells targeted ads in video games.
- Google paid roughly 1 billion dollars to buy dMark Broadcasting, a radio advertising automation network.
- Google is trying to sell ads in print media.
- Google is doing research on passive automation of social web integration into the television experience (PDF).
The search battle is general a battle for near perfect market data which can be leveraged in a near infinite number of ways, with each additional vertical or efficiency lending the network to a potentially larger scale AND an even greater network efficiency.
Due to brand limitations and limited market size many vertical search services will remain unscathed by global search powerhouses, but as search improves in efficiency the major search engines will swallow many additional markets and make them more efficient.
Thanks to...
- Danny Sullivan and all the other people that were easy to reference while writing this.
- Google, for being evil and divergent enough with their algorithm to make such an article a worthwhile endeavor.
- You, for reading that much. :)
Gain a Competitive Advantage Today
Your top competitors have been investing into their marketing strategy for years.
Now you can know exactly where they rank, pick off their best keywords, and track new opportunities as they emerge.
Explore the ranking profile of your competitors in Google and Bing today using SEMrush.
Enter a competing URL below to quickly gain access to their organic & paid search performance history - for free.
See where they rank & beat them!