With Matt Cutts's recent post about the changing quality signals needed to get indexed in Google, and sites with excessive low quality links getting crawled shallower (and some of them not getting crawled at all) some people are comparing Google's current improving crawling standard as an early development of something similar to how the Google Sandbox prevents new or untrusted sites from ranking. WebmasterWorld has a 7ish page thread about BigDaddy, Where Graywolf said:
I'm personally of the opinion that we're starting to see the 'sandbox of crawling'
What is the Optimal Site Size?
Some people in the thread are asking for optimal site size for crawling, or if they should change their internal navigation to accommodate the new Google, but I think to some extent I think that misses the mark.
If you completely changing your site structure away from being usable to do things that might appease Google in a state of flux you are missing the real message they are trying to send. If you rely too heavily on Google then you might find they are in a constant state of being broken, at least from your perspective ;)
The site size should depend largely on
- how much unique content you can create around the topic
- how well you can coax others into wanting to create unique topical content for you
- how people shop
- how people search for information
- how much brand strength you have (a smaller site may make it easier to build a stronger niche specific brand, and in most cases less content of a higher content quality is far more remarkable than lots of junk content)
Many times it is better to have smaller sites so that you can focus the branding messages. When you look at some of the mega sites, like eBay, they are exceptionally weak on deep links, but they also have enough authority, mindshare, and quality link reputation to where they are still represented well in Google.
Scaling Out Link Quality and Unique Content
Another big issue with crawl depth is not only link quality, but also how unique the content is on a per page level. I was recently asked about how much link popularity was needed to index a 100,000,000 page site with cross referenced locations and categories. My response was that I didn't think they could create that much content AND have it unique enough to keep it all indexed AND build enough linkage data to make Google want to index it all.
Sometimes less is more.
The same goes for links too. If you go too hard after acquiring links the sandbox is a real and true phenomenon. If you get real editorial citations and / or go for fewer and higher quality links you will probably end up ranking quicker in Google.
While it may help to be selective with how many links you build (and what sources you are willing to get links from) it also presents a great value to be selective to who you are willing to link at AND link out to many quality resources that would be hard to make look spammy. Rand recently posted
From a trustworthy source - Googlebowling is totally possible, but you need to use patterns that would show that the site has "participated" in the program. What does that mean? Check who they link to - see if you can make the same spammy links point to those places and watch for link spam schemes that aren't in the business of pointing to people who don't pay them.
So if you make your site an island or only partner with other sources that would be easy to take out you limit your stability.
What Makes a Site More Stable?
The big sites that will have 100,000 pages stick in the SERPs are real brands and/or sites that offer added value features. Can individuals create sites to that scale that will still stick? I think they can, but there has to be a comment worthy element to them. They have to find a way to leverage and structure data, be comment worthy and / or they need to have an architecture for social participation / content generation.
The Net Cost & Value of Large Algorithmic Swings
Some people say that in wild search algorithmic swings are not a big deal since that for every person losing someone must gain, so the net effect is not driving people toward paid ads, but I do not buy that.
If your sites are thin spam sites and you have limited real costs the algorithmic swings might not be a big deal, but when businesses grow quickly or have their income sharply drop it affects their profitability, both as they scale up and scale down. You also have to factor in the cost of monitoring site rankings and link building.
At the very least, the ability to turn on or turn off traffic flows (or at least finely adjust them) makes PPC ads an appealing supplement to real businesses with real employees and real business costs. Dan Thies mentioned his liking of PPC ads largely for this reason when I interviewed him about a year ago.
As Google makes it harder to spam and catches spam quicker eventually the opportunity cost of spamming or running cheesy no value add thin sites will exceed the potential profit most people could attain.
Authority Systems Influence the Networks They Measure:
Some people are looking to create systems that measure influence, arguing that as attention grows scarcer it will increase in value:
Attention is what people produce (as in "hand over the money" or "look at this ad") in exchange for information and experience. As Lanham writes in The Economics of Attention, the most successful artists and companies are the ones that grab attention and shape it, in other words, that exercise influence. With so much information, simply paying attention is the equivalent of consuming a meal or a tube of toothpaste.
Any system that measures influence also poses an influence on the market it measures. A retail site in an under-marketed industry that randomly winds up on the Delicious popular list or Memeorandum one day will likely outdistance competitors that do not.
Search has a self reinforcing aspect to it as your links build up. A site with a strong history of top rankings gets links that other sites won't get. Each additional link is a re validation of quality. The people at Google realize that they have a profound effect on how the web grows. Now that they have enough content to establish a baseline in most commercial markets they can be more selective with what they are willing to crawl and rank. And they are cleaning up the noise in their ad market as well.
The Infinite Web, Almost
Some people view the web as an infinite space, but as mentioned above, there is going to be a limit to how much attention and mindshare anything can have.
The Tragedy of the Commons is a must read for anyone who earns a living by spreading messages online, especially if you believe the web to be infinite. While storage and access are approaching free, eventually there is going to be a flood of traditional media content online. When it is easy to link at pages or chapters of books the level of quality needed to compete is going to drastically increase in many markets.
So you can do some of the things that Graywolf suggested to help make your site Google friendly in the short term, but the whole point of these sort of changes at Google are to find and return legitimate useful content. The less your site needs to rely on Google the more Google will be willing to rely on your site.
If you just try to fit where Google is at today expect to get punched in the head at least once a year. If you create things that people are likely to cite or share you should be future friendly.