It is no secret that in the past Rand and I have had some minor difference of opinions (mainly on outing). ;)
But in spite of those, there is no denying that he is an astute marketer. So I thought it would be fun to ask him about his background in SEO and to articulate his take on where some of our differences in opinions are. Interestingly, it turns out we shared far more views than I thought! Hope you enjoy the interview. :)
Throughout your history in the SEO field, what are some of your biggest personal achievements?
The first one would have to be digging myself (and my Mom) out of bankruptcy when we were still a small, sole proprietorship. Since then, there have been a lot of amazing times:
- The first time I spoke at a conference (SES Toronto in 2004)
- Transitioning from a consulting to a software business
- Taking venture capital
- Building a team (not just making hires)
- Having dinner with the UN Secretary General (Ban Ki Moon) and presenting to their CTO on SEO - it was amazing to hear stories about how people in conflict-ridden parts of the world used search to find safe havens, escape and transmit information and the UN's missed opportunities around SEO. I'd never really thought of our profession as having life-or-death consequences until then.
- Making the Inc 500 list for Fastest Growing Companies in the US (during a nasty recession)
- Probably my biggest personal achievement, though, is my relationship with my wife. I know that no matter what happens to me in any other part of my life, I have her support and love forever. That gets a guy like me through a lot of tough times.
My wife and I in San Franicsco (via her blog)
What are the biggest counter-intuitive things you have learned in SEO (eg: that theoretically shouldn't work, but wow it does (or the opposite - should work but doesn't)?
The most obvious one I think about regularly is that the "best content rarely wins." The content that best leverages (intentionally or not) the system's pulleys and levers will rise up much faster than the material the search engines "intended" to rank first.
Another big one includes the success of very aggressive sales tactics and very negative, hateful content and personalities. Perhaps because of the way I grew up or my perspective on the world, I always thought of those things as being impediments to financial success, but that's not really the case. They do, however, seem to have a low correlation with self-satisfaction and happiness, and I suppose, for the people/organizations with those issues, that's even worse.
A very specific, technical tactic that I'm always surprised to see work is the placement of very obvious paid text links. We realized a few months back that with Linkscape's index, we could ID 90%+ of paid link spam with a fairly simple process:
- Grab the top 10K or 100K query monetizable terms/phrases (via something like a "top AdSense payout" list)
- Find any page on the web that contains 2+ external anchor text links pointing to separate websites (e.g. Page A has a link that says "office supplies" linking to 123.com and another link that says "student credit card" linking to 456.com)
- Remove the value passed by those links in any link metric calculation (which won't hurt the relevancy/ranking of any pages, but will remove the effects of nearly all paid links)
We've not done the work to implement this, so perhaps there's some peculiar reason why applying it is harder than we think. But, it strikes me that even if you could only do it for pages with 3 or 4+ links in this fashion, you'd still eliminate a ton of the web's "paid" link graph. The fact that Google clearly hasn't done this makes me think it must not work, but I'm still struggling to understand why.
BTW - I asked some SEOs about making this a metric available through Linkscape/Open Site Explorer (like a "liklihood this page contains paid links" metric) and they all said "don't build it!" so we probably won't in the near term.
One of the big marketing angles you guys tried to push hard on was the concept of transparency. Because of that you got some pretty bad blowback when Linkscape launched (& perhaps on a few other occasions). Do you feel pushing on the transparency angle has helped or hurt you overall?
I think those inside the SEO community often perceive a conflict or tiff internally as having a much broader reach than it really does. I'd agree that folks like you and I, and maybe even a few hundred or even a thousand industry insiders are aware of and take something away from those types of events, but SEOmoz as a software company with thousands of paying subscribers and hundreds of thousands of members seems to be far less impacted than I am personally.
Re: Linkscape controversy - there have been a few - but honestly, the worst reputation/brand problems we ever had have always been with regards to personal issues or disputes (a comment on someone's blog or something we wrote or allowed to be published on YOUmoz). I don't have a good explanation for why they crop up, but I can say that they seem to have a nearly predictable pattern at this point (I'm sure you recognize this as well - think I've seen you write fairly eloquently on the subject). That does make it easier to handle - it's the unpredictable that's scary.
We certainly maintain transparency as a core value and we're always trying to do more to promote it. To me, core value means "things we value more than revenue or profits" and so even if it's had some hard-to-measure, adverse impact, we'd maintain it. We've actually got a poster hanging up in the office that our design team made:
An excerpt from our TAGFEE poster
There's a quote I love on this topic that explains it more eloquently than I can:
"(Our) core values might become a competive advantage, but that is not why we have them. We have them because they define for us what we stand for, and we would hold them even if they became a competitive disadvantage." - Ralph Larson, CEO of Johnson and Johnson
What type of businesses do you think do well with transparency? What type of businesses do you feel do poorly with it?
Hmm... Not something I've tried to apply to every type of business, but my feeling is that nearly every company can benefit from it, though it also exposes you to new risk. Even being the transparency-loving type, I'd probably say that military contractors, patent trolls and sausage manufacturers wouldn't do so well.
How have you been able to manage the transparency angle while having investors?
I thought it would be tougher after taking investment, but they've actually been very supportive in nearly every case (some parts of Linkscape, particularly those re: our patent filings being exceptions). I don't know if that would be true had we taken on different backers, but that's why the startup advice to choose your investors like you choose your husband/wife is so wise.
When you took investment money did you mainly just get capital? What other intangibles came with it? How have your investors helped shape your business model?
It certainly made us much more focused on the software model. As you noted, we dropped consulting in 2010 entirely, and we've generally limited any form of non-scalable revenue to help fit with the goals of a VC-backed business. We did gain some great advisors and a lot more respect in many technology and startup circles that would have been tough without the presence of venture funds (although I think that's shifting somewhat given the changes of the past 2-3 years in the startup world).
Have you guys ever considered buying out your investors? Are you worried what might happen to your company if/when it gets sold?
While we'd love to, I doubt that would ever be possible (barring some sort of massive personal windfall outside of SEOmoz). Every dollar we make gets our investors more excited about the future of the company and less likely to want to sell their shares before we reach our full potential. Remember that with VC, the idea is high risk, high reward, so technically, they'd rather we go for broke and fall to pieces than do a mid-size, but profitable deal. Adding $5 or $10 million dollars back to a $300+ million fund is largely useless to a VC, so a bankruptcy while trying to return $50 or $100 million is a very tolerated, sometimes preferable result.
I wrote about this more in my Venture Capital Process post (where I talked about failing to raise money in summer 2009)
Now that you are already well known & well funded you are taking a fairly low risk strategy to SEO, but if you were brand new to the space & had limited capital would you spam to generate some starting capital? At what point would you consider spamming being a smaller risk than obscurity?
You ask great questions. :-)
While I don't think spam has any moral or ethical problems, I don't know that I'd ever be able to convince myself that spam would be a more worthwhile endeavor than brand building for a white hat property. Overnight successes take years of hard work, and I'd much rather get started as a scrappy, bootstrapping company than build up a reserve with spam dollars and waste that time. However, I certainly don't think that applies to everyone. As you know, I've got lots of friends who've done plenty of shady stuff (probably a lot I don't even want to know about!), but that doesn't mean I respect them any less.
Speaking of low risk SEO, why do you think neither of our sites has hit the #1 slot yet in Google for "seo"? And do you think that ranking would have much business impact?
We've looked at the query in our ranking models and I think it's unlikely we could ever beat out the Wikipedia result, Google or SEO.com (unless GG pulls back on their exact-match domain biasing preference). That said, we should both be overtaking SEOchat.com fairly soon (and some of the spammier results that temporarily pop in and out). Some of our engineers think that more LDA work might help us to better understand these super-high competitive queries.
SERPs analysis of "SEO" in Google.com w/ Linkscape Metrics + LDA (click for larger)
In terms of business impact - yeah, I think for either of us it would be quite a boon actually (and I rarely feel that way about any particular single term/phrase). It would really be less the traffic than the associated perception.
As an SEO selling something unique (eg: not selling a commodity that can be found elsewhere & not as an affiliate) I have found word of mouth marketing is a much more effective sales channel than SEO. Do you think the search results are overblown as a concern within the SEO industry? Do you find most of your sales come from word of mouth?
I see where you're coming from, but in our analyses, it's always been a combination of things that leads to a sale. People search and find us, then browse around. Or they hear of us and search for information about us. Then they'll find us through social media or referring site and maybe they'll sign up for a free account. They'll get a few emails from us, have a look at PRO and go away. Then a couple months later they'll be more serious about SEO and search for a tool or answer and come across us again and finally decide, "OK, these guys are clearly a good choice."
This is what makes last touch attribution so dangerous, but it also speaks to the importance of having a marketing/brand presence across multiple channels. I think you could certainly make the case that many of us in the SEO field see every problem as a nail and our profession as the hammer.
What business models do you feel search fits well with, and what business models do you feel search is a poor fit for?
I think it's terrific for a business that has content or products they can monetize over the web that also relate to things people are already searching for. It's much less ideal for a product/service/business that's "inventing" something new that's yet to be in demand by a searching population. If you're solving a problem that people already have an identified pain point around, whether that's informational, transactional or entertainment-driven, search is fantastic. If that pain point isn't sharp enough or old enough to have generated an existing search audience, branding, outreach, PR and classic advertising may actually do better to move the needle.
Have you ever told a business that you felt SEO would offer too low of a yield to be worth doing?
Actually yes! I was advising a local startup in Seattle a couple years ago called Gist and told them that SEO couldn't really do much for them until people started realizing the need for social-plugins to email and searching for them. This is the case with a lot of startups I think.
In an interview on Mixergy you mentioned up racking up a good bit of debt when you got started in search. If a person is new to the web, when would you recommend them using debt leverage to grow?
Never, if you're smart. Or, at least, never in the quantities I did. The web is so much less costly to build on nowadays and the lean startup movement has produced so many great companies (many of them only small successes, but still profitable) from $10K or less that it just doesn't make sense, especially with the horror that is today's debt market, to go too far down that route. If you can get a low-cost loan from a family member or a startup grant through a government-backed, low interest program, sure, but credit card debt (which is where I started) is really not an option anymore.
How were you able to maintain presence and generally seem so happy publicly when you first got started, even with the stress of that debt?
To be honest, I really just didn't think about it much. If you have $30K in debt, you're constantly thinking about how to pay it off month by month and day by day. When you're $450K in debt with collectors coming after you and your wife paying the rent, you think about how to make a success big enough to pay it all off or declare bankruptcy - might as well go with the former until life runs you into the latter. There's just not much else to do.
As Bob Dylan says - "when you got nothing, you got nothing to lose."
Many people new to the field are afraid to speak publicly, but you were fairly well received right off the start. What prepared you for speaking & what are keys to making a good presentation?
Oh man - I sucked pretty hard my first few presentations. I think everyone does. The only reason I was well received, at least in my opinion, is because I'd already built a following on the web and had a positive reputation that carried over from that. The only thing that really prepared me for big presentations (things like the talk to Google's webspam/search quality team or keynotes at conferences) was lots and lots of experience and for that I'll always be grateful to Danny Sullivan for giving me a shot.
I'd say to others - start small, get as many gigs as you can, use video to help (if you're great on camera, you'll be good in front of a live audience) and try to emulate speakers and presentations you've loved.
When large companies violate Google's guidelines repeatedly usually nothing happens. To cite a random example...I don't know...hmm Mahalo. And yet smaller companies when outed often get crushed due to Google's huge marketshare. Because of the delta between those 2 responses, I believe that outing smaller businesses is generally bogus because it strips freedoms away from individuals while promoting large corporations that foist ugly externalities onto society. Do you disagree with any of that? :D
I think I agree with nearly all of that statement, though I'd still say it's no more "bogus" to out small spammers than it is to spam. I would agree it's not cool that Google applies its standards unfairly, but it's hard to imagine a world where they didn't. If mikeyspaydayloans.info isn't in Google's index, no ones thinks worse of Google. If Disney.com isn't in Google (even if they bought every link in the blogosphere), searchers are going to lose faith and switch engines. The sensible response from any player in such an environment is to only violate guidelines if you're big enough to get away with it or diversified enough to not care.
I'm unhappy with how Google treats these issues, but I'm equally unhappy with how spam distorts the perception of the SEO field. Barely a day goes by without a thought leader in the technology field maligning our industry - and 9 times out of 10 that's because of the "small" spammers. If we protect them by saying SEOs shouldn't "out" on another, we bolster that terrible impression. I don't think most web spam should even have the distinction of being classified as "SEO" and I don't think any SEO professionals who want our field to be taken seriously by marketing and engineering departments should protect those who foist their ugly externalities onto us.
I know we disagree on this, but it's always an interesting discussion :-)
One of the most remarkable things about the SEO industry is the gap in earnings potential between practicing it (as a publisher) and teaching it / consulting. Why do you think such a large gap exists today?
Teaching has always been an altruist's pursuit. Look at teachers in nearly every other field - they earn dramatically less than their production/publishing oriented peers. Those who teach computer science never earn what computer scientists who work at Google or Microsoft make. Those who teach math are far less well compensated than their compatriots working as "qaunts" on Wall Street. It's a sad reality, but it's why I have so much respect for people like Market Motive, Third Door Media and Online Marketing Connect, who are trying to both teach and build profitable businesses. I love the alignment of noble pursuits with profitable ones.
You guys exited the consulting area in spite of being able to charge top rates due to brand recognition. Do you think lots of consultants will follow suit and move into other areas? How do you see SEO business models evolving over the next 3 to 5 years?
I don't think so - our consulting business was going very well and I've heard and seen a lot of growth from my friends who run SEO consulting firms. The margins and exit price valuations wouldn't have made sense for VCs, but I don't think it was a bad business at all and others are clearly doing remarkable things. Just look at iCrossing's recent sale to Hearst for $325million. You can build an amazing company with consulting - it's just not the route we took.
In regards to the evolution of the SEO business model, I'd say we're likely to see more sophistication, more automation, more scalability (and hopefully, more software to help with those) over the next few years from both in-house SEOs and external agencies/consultants. It's sometimes surprising to me how little SEO consulting has progressed from 2002 vs. things like email marketing or analytics, where software has become standard and tons of great companies compete (well, Google's actually made competition a bit more challenging in the analytics space, but creative companies like KissMetrics and Unbounce are still doing cool, interesting things).
Small businesses in many ways seem like the most under-served market, but also the hardest to serve (since they have limited time AND small budgets). Do you think the rise of maps & other verticals gives them a big opportunity, or is it just more layers of complexity they need to learn?
Probably more the former than the latter. The small business owners I know and interact with in my area (and wherever I seem to visit) are only barely getting savvy to the web as a major driver of revenue. I think it might take another 10 years or more before we see true maturity and savvy from local businesses. Of course, that gives a huge competitive advantage to those who are willing to invest the time and resources into doing it right, but it means a less "complete" map of the local world in the online one, which as a consumer (or a search engine) is less than ideal.
When does the delta between paid search & SEO investment begin to shrink (if ever)?
I think it's probably shrinking right now. Paid search is so heavily invested in that I think it's fair to call it a mature market (at least in global web search, though, re: your previous question, probably not in local). SEO is ramping up with a higher CAGR (Compound Annual Growth Rate) according to Forrester, so that delta should be shrinking.
via Forrester Research's Interactive Marketing Forecast 2009-2014
Often times a Google policy sounds like something coming out of a conflicted government economist's mouth. But even Google has invested in an affiliate network which suggests controlling your HTML links based on payment. How much further do you think Google can grow before they collapse under complexity or draw enough regulatory attention to be forced to change?
I think if they tread carefully and invest heavily in political donations and public relations, they can likely maintain another very positive 5-10 years. What the web looks like at that time is anyone's guess, and the unpredictable nature and wild shifts probably help them avoid most regulation. Certainly the rise of Facebook has been a boon to their risk exposure from government intervention, even if they may not be entirely happy with their inability to compete in the social web.
I remember you once posted about getting lots of traffic from Facebook & Twitter, but almost 0 sales from it. Does there become a point where search is not the center of the web (in terms of monetization), or are most of these networks sorta only worthwhile from a branding perspective?
As direct traffic portals, it's hard to imagine a Facebook/Twitter user being as engaged in the buying/researching process as a Google searcher. Those companies may launch products that compete with Google's model or intent, but as they exist today, I don't foresee them being a direct sales channel. They're great for traffic, branding, recognition and ad-revenue model sites, but they're of little threat to marketers concerned with the relevance or value of search disappearing.
What are the major differences between LDA & LSI?
They're both methodologies for building a vector space model of terms/phrases and measuring the distance between them as a way to find more "relevant" content. My understanding is that LSI, which was first developed in 1988, has lots of scaling issues. It's cousin, PLSI (probabilistic LSI) attempted to address some of those when it came out in 1999, but still has scaling problems (the Internet is really big!) and often will bias to more complex solutions when a basic one is the right choice.
LDA (Latent Dirichlet Allocation), which started in 2002, is a more scalable (though still imperfect) system with the same intuition and goals - it attempts to mathematically show distances between concepts and words. All of the major search engines have lots of employees who've studied this in university and many folks at Google have written papers and publications on LDA. Our understanding is that it's almost universally preferred to LSI/PLSI as a methodology for vector space models, but it's also very likely that Google's gone above and beyond this work, perhaps substantially.
The "brand" update was subsequently described as being due to looking at search query chains. In a Wired article Amit Singhal also highlighted how Google looks for entities in their bi-gram breakage process & how search query sequences often help them figure out such relationships. How were you guys able to build a similar database without access to the search sessions, or were you able to purchase search data?
In a vector space model for a search function, the distances and datasets leverage the corpus rather than query logs. Essentially, with LDA (or LSI or even TF*IDF), you want to be able to calculate relevance before you ever serve up your first search query. Our LDA work and the LDA tool in labs today use a corpus of about 8 million documents (from Wikipedia). Google's would almost certainly use their web index (or portions of it).
It's certainly possible that query data is also leveraged for a similar purpose (though due to how people search - with short terms and phrases rather than long, connected groups of words - it's probably in a different way). This might even be something that helps extend their competitive advantage (given their domination of market share).
Sometimes one can see Google's ontology change over time (based on sharp ranking increases and drops for outlier pages which target related keywords but not the core keyword, or when search results for 2 similar keywords keep bouncing between showing the exact same results to showing vastly different results). How do you guys account for these sorts of changes?
Thus far, we haven't been changing the model - it just launched last week. However, one nice thing we get to do consistently is to run our models against Google's search results. Thus, if Google does change, our scores (and eventually, the recommendations we hope to make) should change as well. This is the nice part about not having to "beat" Google in relevance (as a competing search engine might want to do) but simply to determine where Google's at today.
For a long time one of the thing I have loathed most in the SEO space was clunky all-in-one desktop tools that often misguide you into trying to change your keyword density on the word "the" and other such idiocy. Part of the reason we have spent thousands of Dollars offering free Firefox extensions was my disgust toward a lot of those all-in-one tools. A lot of the best SEOs tend to prefer a roll-your-own mix and match approach to SEO. Recently you launched a web application which aims to sorta do all-in-one. What were the key things you felt you had to get right with it to make it better than the desktop software so many loathe?
I think our impetus for building the web app was taken from the way software has evolved in nearly every other web marketing vertical. In online surveys, you had one-time, self built systems and folks like Wufoo and SurveyMonkey have done a great job making that a consolidated, simple, powerful software experience. That goes for lots of others like:
- PPC - Google has really taken the cake here with Adwords integration and the launch of Optimizer and even GA
- CRM - Salesforce, of course, was the original "all-in-one" web marketing software, and they've shown what a remarkable company you can build with that model. InfusionSoft and other players are now quickly building great businesses, too.
- Email Marketing - Exact Target, Constant Contact, Mailchimp, MyEmma, iContact and many more have built tens-hundreds of millions of dollar/year businesses with "all-in-one" software for handling email marketing.
- Banner Ads - platforms like Aquantive, DoubleClick, AdReady, etc. have and are building scalable solutions that drive billions in online advertising
- Analytics - remember when we had one-off, log file analysis tools and analytics consultants who built their own tools to dig into your data? Those consultants are still here, but they're now armed with much more powerful tools - Google Analytics, Omniture, Webtrends, etc. (and new players like KISS Metrics, too)
You're likely spot-on in thinking that power players will continue to mash up and hack their own solutions, build their own tools and protect their secret processes to make them more exclusive in the market and (hopefully) competitive. But, these folks are on the far edge of the bell curve. In every one of the industries above (and many others), it looks like the way to build a scalable software product that many, many people adopt, use and love is to optimize of the middle to upper-end of the bell curve (what we'd probably call "intermediate" to "advanced" SEOs, rather than the outlier experts).
When you gather ranking data do you use APIs to do so? If not, how hard was it been on the technical front scaling up to that level of data extraction?
Some data we can get through APIs, but most isn't available in that fashion, so relatively robust networks are required to effectively get the information. Luckily, we've got a pretty terrific team of engineers and a VP of Engineering who's done data extraction work previously for Amazon, Microsoft and others. I'd certainly say that it ranks in the top 10 technical challenges we've faced, but probably not the top 3.
What do you gain by doing the all-in-one approach that a roll your own type misses out on?
Convenience, consistency, UI/UX, user-friendliness and scalability are all big gains. However, the compromise is that you may lose some of that "secret-sauce" feeling and the power that comes from handling any weird situation or result in a hands-on, one-to-one fashion. Plenty of folks using our web app have already pointed out edge-case scenarios where we're probably not taking the ideal approach, and those kinks will take time to be ironed out.
Some firms use predictive analytics to automatically change page titles & other attributes on the fly. Do you see much risk to that approach? Do you eventually see SEO companies offering CMS tools as part of their packages to lock in customers, while integrating the SEO process at a much deeper level?
When we were out pitching to take venture capital last summer, a lot of VCs felt that this was the way to go and that we should have products on this front.
Personally, I don't like it, and I'd be surprised if it worked. Here's why:
- Editors/writers should be responsible for content, not machine-generated systems built to optimize for search engines. Yes, those machine systems can and should make recommendations, but I fear for the future of your content and usability should "perfect SEO" be the driving force behind every word and phrase on your site.
- With links being such a powerful signal, it's far better to have a slightly less well-targeted page that people actually want to link to than a "perfect" page that reads like machine-generated content.
- I think content creators who take pride in their work are the ones who'll be better rewarded by the engines (at least in the long term - hopefully your crusade against Demand Media, et al. will help with that), and those are the same type of creators who won't permit a system like this to automatically change their content based on algorithmic evaluation.
There are cases I could see where something like this would be pretty awesome, though - e.g. a 404 detector that automatically 301s pages it sees earning real links back to the page it thinks was the most likely intended target.
On your blog recently there was a big fuss after you changed your domain authority modeling scores. Were you surprised by that backlask? What caused such a drastic change to your scores?
We were surprised only until we realized that somehow, our internal testing missed some pretty obvious boneheaded scores.
Basically, we calculate DA and PA using machine learning models. When those models find better "correlated" results, we put them in the system and build new scores. Unfortunately, in the late August release, the models had much better average correlation but some really terrifically bad outliers (lots of junky single-page keyword-match domains got DAs of 100 for example).
We just rolled out updated scores (far ahead of our expected schedule - we thought it would take weeks), and they look much better. We're always open to feedback, though!
When I got into SEO (and for the first couple years) it seemed like you could analyze a person's top backlinks and then literally just go out and duplicate most of them fairly easily. Since then people have become more aware of SEO, Google has cracked down on paid links, etc. etc. etc. Based on that, a lot of my approach to SEO has moved away from analysis and more toward just trying to do creative marketing & hope some % of it sticks. Do you view data as being a bit of a sacred cow, or more of just a rough starting point to build from? How has your perception as to the value of data & approach to SEO changed over time?
I think your approach is almost exactly the same as mine. The data about links, on-page, social stats, topic models, etc. is great for the analysis process, but it's much harder to simply say "OK, I'll just do what they did and then get one more link," than it was when we started out.
That analysis and ongoing metrics tracking is still super-valuable, IMO, because it helps define the distance between you and the leaders and gives critical insight into making the right strategic/tactical decisions. It's also great to determine whether you're making progress or not. But, yes, I'd agree that it's nowhere near as cut-and-dried as it once was.
The frustrating part for us at SEOmoz is we feel like we're only now producing/providing enough data to be good at these. I wish that 6-7 years ago, we'd been able to do it (of course, it would have cost a lot more back then, and the market probably wasn't mature enough to support our current business model).
How much time do you suggest people should spend analyzing data vs implementing strategies? What are some of the biggest & easiest wins often found in the data?
I think that's actually the big win with the web app (or with competitive software products like Raven, Conductor, Brightedge, etc). You can spend a lot less time on the collection/analysis of data and a lot more on taking the problems/opportunities identified and doing the real work of solving those issues.
Big wins in our new web app for me have been ID'ing pages through the weekly crawl that need obvious fixing (404s and 500s are included, like Google Webmaster Tools, but so are 20+ other data points they don't show like 302s, incorrect rel canonicals, etc.)
Blekko has got a lot of good press by sharing their ranking models & link data. Their biggest downside so far in their beta is the limited size of their index, which is perhaps due to a cost benefit analysis & they will expand their index size before they publicly launch. In some areas of the web Google crawls & indexes more than I would expect, while not going to deeply into others. Do you try to track Google's crawls in any way? How do you manage your crawl to try to get the deep stuff Google has while not getting the deep stuff that Google doesn't have?
Yeah - we definitely map our crawls against Google, Bing and Majestic on a semi-regular basis. I can give you a general sense of we see ourselves performing against these:
- Google - the freshest and most "complete" (without including much spam/junk) of the indices. A given Linkscape index is likely around 40-60% of the Google index in a similar timeframe, but we tend to do pretty well on coverage of domains and well-linked-to pages, though worse on deep crawling in big sites.
- Bing - they've got a large index like Google, but we actually seem to beat them in freshness for many of the less popular corners of the web (though they're still much faster about catching popular news/blogs/etc from trusted sources since they update multiple times daily vs. our once-per-month updates).
- Majestic - dramatically larger in number of URLs than Google, Bing or Linkscape, but not as good as any of those about freshness or canonicalization (we'll often see hundreds of URLs in the index that are essentially the same page with weird URL parameters). We like a lot of their features and certainly their size is enviable, but we're probably not going to move to a model of continuous additions rather than set updates (unless we get a lot more bandwidth/processing power at dramatically lower rates).
the problem with maintaining old URLs became more clear when we analyzed decay on the WWW
In terms of reaching the deep corners of the web, we've generally found that limiting spam and "thin" content is the big problem at those ends of the spectrum. Just as email traffic is estimated to be 90%+ spam, it's quite possible that the web, if every page were truly crawled and included, would have similar proportions. Our big steps to help this are using metrics like mozTrust, mozRank and some of our PA/DA work to help guide the crawl. As we scale up index size (probably December/January of this year), that will likely become a bigger challenge.
---
Thanks Rand. You can read his latest thoughts on the SEOmoz blog and follow him on Twitter at @randfish.