The Google Crawling Sandbox

With Matt Cutts's recent post about the changing quality signals needed to get indexed in Google, and sites with excessive low quality links getting crawled shallower (and some of them not getting crawled at all) some people are comparing Google's current improving crawling standard as an early development of something similar to how the Google Sandbox prevents new or untrusted sites from ranking. WebmasterWorld has a 7ish page thread about BigDaddy, Where Graywolf said:

I'm personally of the opinion that we're starting to see the 'sandbox of crawling'

What is the Optimal Site Size?

Some people in the thread are asking for optimal site size for crawling, or if they should change their internal navigation to accommodate the new Google, but I think to some extent I think that misses the mark.

If you completely changing your site structure away from being usable to do things that might appease Google in a state of flux you are missing the real message they are trying to send. If you rely too heavily on Google then you might find they are in a constant state of being broken, at least from your perspective ;)

The site size should depend largely on

  • how much unique content you can create around the topic

  • how well you can coax others into wanting to create unique topical content for you
  • how people shop
  • how people search for information
  • how much brand strength you have (a smaller site may make it easier to build a stronger niche specific brand, and in most cases less content of a higher content quality is far more remarkable than lots of junk content)

Many times it is better to have smaller sites so that you can focus the branding messages. When you look at some of the mega sites, like eBay, they are exceptionally weak on deep links, but they also have enough authority, mindshare, and quality link reputation to where they are still represented well in Google.

Scaling Out Link Quality and Unique Content

Another big issue with crawl depth is not only link quality, but also how unique the content is on a per page level. I was recently asked about how much link popularity was needed to index a 100,000,000 page site with cross referenced locations and categories. My response was that I didn't think they could create that much content AND have it unique enough to keep it all indexed AND build enough linkage data to make Google want to index it all.

Sometimes less is more.

The same goes for links too. If you go too hard after acquiring links the sandbox is a real and true phenomenon. If you get real editorial citations and / or go for fewer and higher quality links you will probably end up ranking quicker in Google.

While it may help to be selective with how many links you build (and what sources you are willing to get links from) it also presents a great value to be selective to who you are willing to link at AND link out to many quality resources that would be hard to make look spammy. Rand recently posted

From a trustworthy source - Googlebowling is totally possible, but you need to use patterns that would show that the site has "participated" in the program. What does that mean? Check who they link to - see if you can make the same spammy links point to those places and watch for link spam schemes that aren't in the business of pointing to people who don't pay them.

So if you make your site an island or only partner with other sources that would be easy to take out you limit your stability.

What Makes a Site More Stable?

The big sites that will have 100,000 pages stick in the SERPs are real brands and/or sites that offer added value features. Can individuals create sites to that scale that will still stick? I think they can, but there has to be a comment worthy element to them. They have to find a way to leverage and structure data, be comment worthy and / or they need to have an architecture for social participation / content generation.

The Net Cost & Value of Large Algorithmic Swings

Some people say that in wild search algorithmic swings are not a big deal since that for every person losing someone must gain, so the net effect is not driving people toward paid ads, but I do not buy that.

If your sites are thin spam sites and you have limited real costs the algorithmic swings might not be a big deal, but when businesses grow quickly or have their income sharply drop it affects their profitability, both as they scale up and scale down. You also have to factor in the cost of monitoring site rankings and link building.

At the very least, the ability to turn on or turn off traffic flows (or at least finely adjust them) makes PPC ads an appealing supplement to real businesses with real employees and real business costs. Dan Thies mentioned his liking of PPC ads largely for this reason when I interviewed him about a year ago.

As Google makes it harder to spam and catches spam quicker eventually the opportunity cost of spamming or running cheesy no value add thin sites will exceed the potential profit most people could attain.

Authority Systems Influence the Networks They Measure:

Some people are looking to create systems that measure influence, arguing that as attention grows scarcer it will increase in value:

Attention is what people produce (as in "hand over the money" or "look at this ad") in exchange for information and experience. As Lanham writes in The Economics of Attention, the most successful artists and companies are the ones that grab attention and shape it, in other words, that exercise influence. With so much information, simply paying attention is the equivalent of consuming a meal or a tube of toothpaste.

Any system that measures influence also poses an influence on the market it measures. A retail site in an under-marketed industry that randomly winds up on the Delicious popular list or Memeorandum one day will likely outdistance competitors that do not.

Search has a self reinforcing aspect to it as your links build up. A site with a strong history of top rankings gets links that other sites won't get. Each additional link is a re validation of quality. The people at Google realize that they have a profound effect on how the web grows. Now that they have enough content to establish a baseline in most commercial markets they can be more selective with what they are willing to crawl and rank. And they are cleaning up the noise in their ad market as well.

The Infinite Web, Almost

Some people view the web as an infinite space, but as mentioned above, there is going to be a limit to how much attention and mindshare anything can have.

The Tragedy of the Commons is a must read for anyone who earns a living by spreading messages online, especially if you believe the web to be infinite. While storage and access are approaching free, eventually there is going to be a flood of traditional media content online. When it is easy to link at pages or chapters of books the level of quality needed to compete is going to drastically increase in many markets.

So you can do some of the things that Graywolf suggested to help make your site Google friendly in the short term, but the whole point of these sort of changes at Google are to find and return legitimate useful content. The less your site needs to rely on Google the more Google will be willing to rely on your site.

If you just try to fit where Google is at today expect to get punched in the head at least once a year. If you create things that people are likely to cite or share you should be future friendly.

Published: June 5, 2006 by Aaron Wall in seo tips

Comments

Kyle
June 5, 2006 - 9:58pm

So this is what you've been up to the last few post-less days? It was worth the wait, great post!

June 5, 2006 - 10:16pm

Wow... you've given me a lot to think about with this one – most of my sites are doing quite well in the SERPs, but one in particular got hit badly by Big Daddy, and I think I'm starting to understand why.

"The less your site needs to rely on Google the more Google will be willing to rely on your site."

That's one quote I need to print out and stick to the wall above my desk... perhaps forgetting about google is the best way to win their approval :)

June 5, 2006 - 11:01pm

The less your site needs to rely on Google the more Google will be willing to rely on your site.

It's like Google Zen ;)

June 5, 2006 - 11:47pm

As usual, excellent post. I think you are damn right about the last part -- if you want to stick around in this game you better be producing content that really has value. The days of those buying crappy re-written articles or pounding out BS content all night are numbered.

The good news is that old media types are still very fusy about copyright and where their work is reproduced. So, while some are making a killed online (AskTheBuilder.com), most are still missing the party due to rigid beliefs.

I think SEO is becoming more about reality and less about all the little technical details. In a few years the really successful "blackhats" will be better social engineers rather than programming wizards; but the truth is some are at that point already.

June 6, 2006 - 10:45am

It does seem harder to get of the sandbox these days. However, this should definitely increase the quality of indexed websites.

June 7, 2006 - 1:04am

Yes, the quote "The less your site needs to rely on Google the more Google will be willing to rely on your site." is true when you are trying to optimize page content, as that should be natural and useable, still you can't expect to do NOTHING and get anywhere in google, as you still need/should practice SEO to the extent of aquiring more links, submiting to directories etc.
Obviously some sites are exceptions, as they may have never ever practiced SEO and still rank in google still most sites such as SEOBOOOK.com I'm sure have putten effort in to gaining links etc.

To sum it up:
Content=be natural and don't try to hard, then google may just end up liking your site better
Links=usually it requires atleast some initial effort in practicing SEO to rank in google, although as arron mentioned in his ebook eventually when you rank real high like SEOBOOK.com then gaining more page rank and seo placement is really more depending on the content of your site and less on actually you needing to practice SEO.

Graham Smith
June 7, 2006 - 6:27am

I have been using Google to keep track of the site www.workwise.org.uk, by searching for "site:www.workwise.org.uk updated". Just lately this has stopped working as well as it used to (only 4 results now, where there used to be about 17). If Google doesn't index this site, is there any other search engine I can use instead?

June 7, 2006 - 6:49am

Yahoo! and MSN both offer site: searches, and Yahoo! also offers a product called Site Explorer which allows you to surf through the pieces of your site structure that are indexed in Yahoo!

nickg
May 27, 2008 - 4:53pm

If you are interested in seeing the Google Sandbox in action, please take look at the following:

webmaster-diary.griddler.co.uk/Apr2008.aspx

...you can see a graph of visitor numbers getting to a site from Google search from the inception of a site, through indexing of the first pages, through the google sandbox, and out the other side.

No black hat SEO was used, and the sandbox effect lasted about 1 month.

I hope you find this useful

Nick

eljefe
February 20, 2009 - 9:11pm

The sandbox is getting harder and harder to get out, from my own experience i now think it last 9 months to get out and before it was 3 months. I don't believe the sandbox is different just because of niches, i think it effect all equal.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.