Duplicate Duplicate Content

Both Bill Slawski and Todd Malicoat posted great posts about duplicate content detection and how to avoid producing duplicate content.

Todd also posted a link to this shingles PDF, which describes some of the ways to detect duplicate content.

Web Credibility & SEO

Peter D recently mentioned a PDF on website credibility.

If you think of a search engine as a user trying to perceive how credible documents are then many of those factors make a lot of sense from an SEO perspective, because

  • Your site visitors will consider many perceived credibility factors when deciding weather or not to buy, transact, or link to your site. Credibility is the key to conversion, especially with expensive or non commodity products and services.
  • Search engines also evaluate how others perceive your site through looking at linkage data and usage data.

It might also be worth taking a look at Beyond Algorithms: A Librarian's Guide to Finding Web Sites You Can Trust. Imagine that many media members use some similar criteria as the above two documents, and it is easy to see how librarians, media members, and other authoritative voices propagate trust through the web, and why many sites lack credible citations until their site sells itself as being credible enough to merit quality citations.

Paying Extra For Good Design is Well Worth It

I have bought a couple blog designs. SEO Book.com was designed by one of my favorite blog designers. Another blog design I bought was cheaper and for a network of blogs, and it came out to be far less appealing. Had I not been an SEO who looks at site structure frequently I might not have noticed many of the hidden costs that came with the bargain design. Here are some examples of things that were totally jacked up with the design product I got a deal on

  1. Does not look as professional: of course I expected this part, and sorta factored that into the consideration of value at the lower price point. What I did not factor in was all of the following

  2. Same page title on every page: well, obviously that sucks. How well will search engines understand the differences between documents when I throw one of the keys away at hello? So I had to go through and find the appropriate archive and individual post Typepad tags fix up the templates to offer unique page titles on idividual post and archive pages.
  3. Header links to alternate version of homepage: the site was designed such that all the internal link popularity flowed to site.com/index.html instead of site.com. Some search engines are still having canonicalization issues, so that had to get fixed.
  4. Lack of modularity: although the designer knew I was going to use the template across a series of blogs they chose to manually type out the URL paths and site anchor text when that could have easily been done using the tagging solutions, which I eventually had to go through and add to make it easier to duplicate the design across different blogs without needing to take an hour per blog to edit the templates for each of about 20 blogs.
  5. noindex nofollow: out of an attempt to sabotage a client or out of sheer incompetence the designer included noinex and nofollow tags in most of the page templates.

So lets say you save a few grand by going with a cheaper designer. What are the potential hidden costs to those savings?

  1. less professional design: I think this factor has to be broken down by the quality of your site

    • high quality content: if you are going to make a high quality site you might as well make the design look nice too. Links snowball on themselves, and a few more links today may be a hundred more next month and a thousand next year. Or imagine the cost if you missed out on those links. Eeek!

    • low quality content: for sites that are borderline spam sometimes the difference between staying indexed or being booted out of the search results all together is bridged by a decent design. A good design can carry bad content to some extent. Bad content + bad design = much more likely to get the boot for being spam.
  2. poor page titles: this can easily cut your search referral traffic in half. Given that the people who reference your work are people who somehow found it cutting off one of your most important inroads can cost a lot over time, especially when you consider how links logarithmically build over time.
  3. canonicalization issues: this could cause indexing problems and prevent your homepage from ranking as well. Potentially worse too.
  4. noindex nofollow: I guess it depends on how you monetize your site, but cutting the search engines off at hello is not a good way to work your way up to exposure

Someone newer to the web than me probably would not have caught all those errors either. So the problems could have lasted for months or years without being fixed for some people.

I don't think great design has to be expensive either. I am a fan of buying a great logo and then just using an ultra clean site design, and just letting the links and headings sorta match the colors of the logo. That is how this site was for about a year and a half before I found the designer who did a kick ass job designing the current version.

On top of design effecting how willing people will be to link to your site or read your site it also plays a major role in determining how well your site will convert. Some ugly sites sell, but if you are selling something that is high end and individually branded I think a great design can also play a big role in helping build your credibility and boosting your conversion rates.

One thing I find frustrating with this is that if you go to Wikipedia they list the SEO page as being part of the spamming series and yet you got people designing hundreds or thousands of websites with these sort of information architecture errors in them.

.htaccess, 301 Redirects & SEO: Guest Post by NotSleepy

Tony Spencer here doing a guest spot on SEOBook. Aaron was asking me some 301 redirect questions a while back and recently asked me if I would drop in for some
tips on common scenarios so here goes. Feel free to drop me any questions in the comments box.

301 non-www to www

From what I can tell Google has yet to clean up the canonicalization problem that arises when the www version of your site gets indexed along with the non-www version (i.e. http://www.seobook.com & http://seobook.com).

<code>
RewriteEngine On

RewriteCond %{HTTP_HOST} ^seobook.com [NC]
RewriteRule ^(.*)$ http://www.seobook.com/$1 [L,R=301]
</code>

The '(*.)$' says that we'll take anything that comes after http://seobook.com and append it to the end of 'http://www.seobook.com' (thats the '$1' part) and redirect to that URL. For more grit on how this works checkout a good regular expressions resource or two.

Note: You only have to enter 'RewriteEngine On' once at the top of your .htaccess file.

Alternately you may chose to do this 301 redirect from
in the Apache config file httpd.conf.

<code>
<VirtualHost 67.xx.xx.xx>
ServerName www.seobook.com
ServerAdmin webmaster@seobook.com
DocumentRoot /home/seobook/public_html
</VirtualHost>

<VirtualHost 67.xx.xx.xx>
ServerName seobook.com
RedirectMatch permanent ^/(.*) http://www.seobook.com/$1
</VirtualHost>
</code>

Note that often webhost managers like CPanel would have placed a 'ServerAlias' seobook.com in the first VirtualHost entry which would negate the following VirtualHost so be sure to remove the non-www ServerAlias.

301 www to non-www

Finally the www 301 redirect to non-www version would look like:

<code>
RewriteCond %{HTTP_HOST} ^www.seobook.com [NC]
RewriteRule ^(.*)$ http://seobook.com/$1 [L,R=301]
</code>

Redirect All Files in a Folder to One File

Lets say you no longer carry 'Super Hot Product' and hence want to redirect all requests to the folder /superhotproduct to a single page called /new-hot-stuff.php. This redirect can be accomplished easily by adding the following your .htaccess page:

<code>
RewriteRule ^superhotproduct(.*)$ /new-hot-stuff.php [L,R=301]
</code>

But what if you want to do the same as the above example EXCEPT for one file? In the next example all files from /superhotproduct/ folder will redirect to the /new-hot-stuff.php file EXCEPT /superhotproduct/tony.html which will redirect to /imakemoney.html

<code>
RewriteRule ^superhotproduct/tony.html /imakemoney.html [L,R=301]
RewriteRule ^superhotproduct(.*)$ /new-hot-stuff.php [L,R=301]
</code>

Redirect a Dynamic URL to a New Single File

It's common that one will need to redirect dynamic URL's with parameters to single
static file:

<code>
RewriteRule ^article.jsp?id=(.*)$ /latestnews.htm [L,R=301]
</code>

In the above example, a request to a dynamic URL such as http://www.seobook.com/article.jsp?id=8932
will be redirected to http://www.seobook.com/latestnews.htm

SSL https to http

This one is more difficult but I have experienced serious canonicalization problems
when the secure https version of my site was fully indexed along side my http version. I have yet
to find a way to redirect https for the bots only so the only solution I have for now is
to attempt to tell the bots not to index the https version. There are only two ways I know to do this and neither are pretty.

1. Create the following PHP file and include it at the top of each page:

if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') {
echo '<meta name="robots" content="noindex,nofollow">'. "\n";
}

2. Cloak your robots.txt file.
If a visitor comes from https and happens to be one of the known bots such as googlebot, you will display:

User-agent: *
Disallow: /

Otherwise display your normal robots.txt. To do this you'll need to alter your .htaccess
file treat .txt files as PHP or some other dynamic language and then proceed to write
the cloaking code.

I really wish the search engines would get together and add a new attribute to robots.txt
that would allow us to stop them from indexing https URLs.

Getting Spammy With it!!!

Ok, maybe you aren't getting spammy with it but you just need to redirect a shit ton of pages. First of all it'll take you a long time to type them into .htaccess, secondly too many entries in .htaccess tend to slow Apache down, and third its too prone to human error. So hire a programmer and do some dynamic redirecting from code.

The following example is in PHP but is easy to do with any language. Lets say you switched to a new system and all files that ended in the old id need to be redirected. First create a database table that will hold the old id and the new URL to redirect to:

old_id INT
new_url VARCHAR (255)

Next, write code to populate it with your old id's and your new URLs.

Next, add the following line to .htaccess:

<code>
RewriteRule ^/product-(.*)_([0-9]+).php /redirectold.php?productid=$2
</code>

Then create the PHP file redirectold.php which will handle the 301:

<code>
<?php
function getRedirectUrl($productid) {
// Connect to the database
$dServer = "localhost";
$dDb = "mydbname";
$dUser = "mydb_user";
$dPass = "password";

$s = @mysql_connect($dServer, $dUser, $dPass)
or die("Couldn't connect to database server");

@mysql_select_db($dDb, $s)
or die("Couldn't connect to database");

$query = "SELECT new_url FROM redirects WHERE old_id = ". $productid;
mysql_query($query);
$result = mysql_query($query);
$hasRecords = mysql_num_rows($result) == 0 ? false : true;
if (!$hasRecords) {
$ret = 'http://www.yoursite.com/';
} else {
while($row = mysql_fetch_array($result))
{
$ret = 'http://www.yoursite.com/'. $row["new_url"];
}
}
mysql_close($s);
return $ret;
}

$productid = $_GET["productid"];
$url = getRedirectUrl($productid);

header("HTTP/1.1 301 Moved Permanently");
header("Location: $url");
exit();
?>
</code>

Now, all requests to your old URLs will call redirectold.php which will lookup the new URL and return a HTTP response 301 redirect to your new URL.

¿Tiene preguntas?

Questions? Ask them here and I'll do what I can.

Inbound Link Quality Extension

Bob Mutch at SEO Company created an inbound link quality extension for Firefox. You can download the extension from his home page, or access the tool online (again on his home page, but the web based tool has been slow). The tool checks to see if a site is listed in the Yahoo! Directory or DMOZ. In addition it searches Yahoo! for the number of .edu and .gov links pointing at a website.

The extension looks like this
SEO Company link quality extension.

While in some cases there are some .edu and .gov sites that offer up spammy links, the theory behind the tool is that most .edu or .gov links are going to be harder to get / more pure / of higher quality than the average link from most commercial sites. In that sense, the raw number of .edu and .gov links can be seen as a proxy for an indication of if a site has any quality natural editorial inbound links and an estimate of the depth of quality citations a site received.

Buying Domains Before They go to Auction

So there was an old domain name I really wanted. I saw that the site was down and that the PageRank was already stripped (which happens to most expiring domains anyhow) and the name was kinda junky, but I was hoping that it would go to auction and I would be the only one backordering it. Oh how I was wrong.

It just cost me about $4,000 to buy a generic domain with 0 PageRank because I was too dumb to try to get it earlier, perhaps while it still had PageRank. So the tip is, if you see a site down and think you would like the traffic stream the domain enjoys you are probably better off asking the current owner if they would part with it for a few hundred dollars instead of paying $4,000 at auction for it ;)

The Google Crawling Sandbox

With Matt Cutts's recent post about the changing quality signals needed to get indexed in Google, and sites with excessive low quality links getting crawled shallower (and some of them not getting crawled at all) some people are comparing Google's current improving crawling standard as an early development of something similar to how the Google Sandbox prevents new or untrusted sites from ranking. WebmasterWorld has a 7ish page thread about BigDaddy, Where Graywolf said:

I'm personally of the opinion that we're starting to see the 'sandbox of crawling'

What is the Optimal Site Size?

Some people in the thread are asking for optimal site size for crawling, or if they should change their internal navigation to accommodate the new Google, but I think to some extent I think that misses the mark.

If you completely changing your site structure away from being usable to do things that might appease Google in a state of flux you are missing the real message they are trying to send. If you rely too heavily on Google then you might find they are in a constant state of being broken, at least from your perspective ;)

The site size should depend largely on

  • how much unique content you can create around the topic

  • how well you can coax others into wanting to create unique topical content for you
  • how people shop
  • how people search for information
  • how much brand strength you have (a smaller site may make it easier to build a stronger niche specific brand, and in most cases less content of a higher content quality is far more remarkable than lots of junk content)

Many times it is better to have smaller sites so that you can focus the branding messages. When you look at some of the mega sites, like eBay, they are exceptionally weak on deep links, but they also have enough authority, mindshare, and quality link reputation to where they are still represented well in Google.

Scaling Out Link Quality and Unique Content

Another big issue with crawl depth is not only link quality, but also how unique the content is on a per page level. I was recently asked about how much link popularity was needed to index a 100,000,000 page site with cross referenced locations and categories. My response was that I didn't think they could create that much content AND have it unique enough to keep it all indexed AND build enough linkage data to make Google want to index it all.

Sometimes less is more.

The same goes for links too. If you go too hard after acquiring links the sandbox is a real and true phenomenon. If you get real editorial citations and / or go for fewer and higher quality links you will probably end up ranking quicker in Google.

While it may help to be selective with how many links you build (and what sources you are willing to get links from) it also presents a great value to be selective to who you are willing to link at AND link out to many quality resources that would be hard to make look spammy. Rand recently posted

From a trustworthy source - Googlebowling is totally possible, but you need to use patterns that would show that the site has "participated" in the program. What does that mean? Check who they link to - see if you can make the same spammy links point to those places and watch for link spam schemes that aren't in the business of pointing to people who don't pay them.

So if you make your site an island or only partner with other sources that would be easy to take out you limit your stability.

What Makes a Site More Stable?

The big sites that will have 100,000 pages stick in the SERPs are real brands and/or sites that offer added value features. Can individuals create sites to that scale that will still stick? I think they can, but there has to be a comment worthy element to them. They have to find a way to leverage and structure data, be comment worthy and / or they need to have an architecture for social participation / content generation.

The Net Cost & Value of Large Algorithmic Swings

Some people say that in wild search algorithmic swings are not a big deal since that for every person losing someone must gain, so the net effect is not driving people toward paid ads, but I do not buy that.

If your sites are thin spam sites and you have limited real costs the algorithmic swings might not be a big deal, but when businesses grow quickly or have their income sharply drop it affects their profitability, both as they scale up and scale down. You also have to factor in the cost of monitoring site rankings and link building.

At the very least, the ability to turn on or turn off traffic flows (or at least finely adjust them) makes PPC ads an appealing supplement to real businesses with real employees and real business costs. Dan Thies mentioned his liking of PPC ads largely for this reason when I interviewed him about a year ago.

As Google makes it harder to spam and catches spam quicker eventually the opportunity cost of spamming or running cheesy no value add thin sites will exceed the potential profit most people could attain.

Authority Systems Influence the Networks They Measure:

Some people are looking to create systems that measure influence, arguing that as attention grows scarcer it will increase in value:

Attention is what people produce (as in "hand over the money" or "look at this ad") in exchange for information and experience. As Lanham writes in The Economics of Attention, the most successful artists and companies are the ones that grab attention and shape it, in other words, that exercise influence. With so much information, simply paying attention is the equivalent of consuming a meal or a tube of toothpaste.

Any system that measures influence also poses an influence on the market it measures. A retail site in an under-marketed industry that randomly winds up on the Delicious popular list or Memeorandum one day will likely outdistance competitors that do not.

Search has a self reinforcing aspect to it as your links build up. A site with a strong history of top rankings gets links that other sites won't get. Each additional link is a re validation of quality. The people at Google realize that they have a profound effect on how the web grows. Now that they have enough content to establish a baseline in most commercial markets they can be more selective with what they are willing to crawl and rank. And they are cleaning up the noise in their ad market as well.

The Infinite Web, Almost

Some people view the web as an infinite space, but as mentioned above, there is going to be a limit to how much attention and mindshare anything can have.

The Tragedy of the Commons is a must read for anyone who earns a living by spreading messages online, especially if you believe the web to be infinite. While storage and access are approaching free, eventually there is going to be a flood of traditional media content online. When it is easy to link at pages or chapters of books the level of quality needed to compete is going to drastically increase in many markets.

So you can do some of the things that Graywolf suggested to help make your site Google friendly in the short term, but the whole point of these sort of changes at Google are to find and return legitimate useful content. The less your site needs to rely on Google the more Google will be willing to rely on your site.

If you just try to fit where Google is at today expect to get punched in the head at least once a year. If you create things that people are likely to cite or share you should be future friendly.

Do You Sell Ipods?

In the last 3 days about 3 or 4 friends compared good marketing and branding with the Ipod. How does that related to SEO? Peter Da Vanzo recently posted about his Ipod:

When I was considering buying a music player, some music-gadget obsessed friends offered a wealth of well-meaning advice. "No", they said, "don't get an Ipod because it can't do xyz, unlike the XRX2000 (or whatever), which can do so much more! More stuff! Oh, and the Ipod is overpriced". Those weren't the exact words, but that was the jist.

They were probably right, but the problem is: I don't care.

I knew that if I bought anything else, I'd always think "yeah, but it's not an Ipod".

The other day in an IM Andy Hagans also mentioned his Ipod

I buy Ipods regularly even though I know they're not better. For 3 times the price of the competition. Because I 'trust' them somehow.

How does all this relate to marketing? If you want to do well long-term you have to sell your product or service as a non commodity. The more your product / service / business is sold as a piece of art or something to be thought as being worth paying more for the more you have to move away from just being approved on a rational level and the more you have to have a strong appeal on an emotional level.

The link profile of this site is far less than perfect, but a large part of the heavy anchor text focus on the phrase SEO Book is because I wanted to create a strong brand. If my inbound anchor text were mixed better this site could probably get a ton more traffic, but traffic without a strong branding element has much less value, especially when you sell an ebook for about 4 times the price that most physical books sell for.

Just like selling products, you also have to sell being link worthy if you want to integrate SEO into your market plan. It is hard to do that just by emulating what already exists. To get big rewards you have to create something that is conceptually different, such that you are memorable and evoke an emotional response. If you manage to do that and occasionally target different customers and different traffic streams than your competition by focusing on adding value to their experience it is hard to fail.

The reasons that legitimate content works so well are

  • most markets usually take a while to react to quality content
  • because of that delay, it typically takes spending months or years over-investing before seeing any type of return on the effort required to create something unique and useful that will stand the test of time
  • most people looking to make a quick buck are all fighting for the same shallow traffic sources and are not willing to spend the time to deeply research their topic or emotionally invest in their content enough for it to pay off

Not every page is going to win awards or have a net positive return for the effort that went into it, but as you build a variety of legitimate useful original pages over time the site authority starts to build on itself and eventually you snowball toward the top.

Monetizing Traffic - SEO is Pointless if You Throw Your Traffic Away

Recently 11 blogs from the Fine Fools network sold with content, designs, and links for a total of $4,500. I am still busy kicking myself in the teeth, because I would have paid much more than that for those blogs. Those blogs were generating over 300,000 monthly pageviews, but the sites were generating only roughly $300 in monthly revenues.

Without even adding any content to those sites, given their traffic volume and link authority (most of the sites were strong PageRank 6 sites with natural backlink profiles) I could have easily increased the income to over $5,000 a month (ie: had the network more than pay for itself in the first month of ownership).

I think that limited $300 / month revenue figure is a great example of why it is worth worrying about more than just pageviews. SEO is just one piece of the puzzle, and usually most sites have big obvious on site gains that could be pursued long before you look to invest heavily in increasing traffic.

I get some people who tell me that they are already getting a million pageviews a month and they want me to guarantee they will get 3 million pageviews a month if they read my $79 book. If you can get 2 million monthly pageviews for $79 please let me know the source and I will follow you with a few thousand dollars in hand. When you are to that scale the issue is not that you need more distribution. If you can't make money with hundreds of thousands or millions of pageviews you ought to consider changing your revenue model.

I realize that celebrity sites might get tons of low quality traffic, but how hard would it be to add ring tone affiliate ads, concert ticket text links, or dating affiliate ads to the sites? How hard would it be to write a dating ebook you sold for $30? And the blog A Man's View could have easy been changed to a porn blog that would make in excess of $5,000 a month by itself.

I guess a valuable lesson here is that networks that don't profit will eventually fall apart and/or will be sold for well less than they are worth. Another valuable lesson might be that there is still a huge disconnect between traffic and value in the minds of most webmasters, and the WWW still has near endless profit opportunity about.

A friend normally gives me the scoop on auctions at Sitepoint, but something went wrong on this one. I am still kicking myself in the teeth. I would have loved to have bought those blogs, especially that cheap. Damn damn damn damn etc ;)

Microsoft AdCenter Ad Labs Tools

After I pointed out Microsoft's AdCenter Labs yesterday WMW started a thread about some of the new tools. The selection of tools is diverse, interesting and useful enough to be well worth a review.

Content Categorization Engine:

This tool tells you ways your site may be categorized. Useful for:

  • helping you determine what types of webmasters might be interested in linking at your website

  • determining what type of affiliate ads you may want to consider using
  • seeing how well search engines understand what your site is about

Try Microsoft's Content Categorization Engine

Microsoft Content Categorization Engine Output.

Keyword Categorization Engine:

Similar to the content categorization engine, but for keywords. In addition to the uses described above this tool can also show you how well your page is aligned with your core keywords.
Try Microsoft's Keyword Categorization Engine

Microsoft Content Categorization Engine Output.

Demographics Prediction Tool:

Shows the age groups and gender of searchers for a particular query or visitors to a specific URL. Useful for:

  • showing the most common markets for a search query or domain.

  • showing you how well your site audience is aligned with your core keywords (for example, if a site lacks corporate bullshitspeakâ„¢, it would be unsurprising that the viewers of that site would be younger than the demographic averages for a field which is typically targeted toward older people who can't get enough corporate bullshitspeakâ„¢)
  • the most common groups of visitors and mindset to a site or for a query might be obvious, but some of the secondary and tertiary markets may be well less defined. this tool can help you find some of those other markets.

Try Microsoft's Demographics Prediction Tool

Microsoft Demographic Prediction Tool Output.

Seasonal Search Volume Forecast Tool:

Shows seasonal search spikes. It is like a hybrid between Google Trends and Google Suggest, but it will also show you relevant keyword phrases that have your keyword in the middle of them. This tool does not seem to have as much depth as Google Trends (ie: only a surprisingly few searches show results). They also seemed to have stripped out many gambling and porn related keywords. Unlike Google, MSN places search volume numbers on their trends. Useful for:

  • showing seasonal keyword trends

Try Microsoft's Search Forecast Tool

Microsoft Search Forecast Tool.

Keyword Search Funnel Tool:

Shows you the words people search for before or after they search for a specific search query. Useful for:

  • finding common spelling errors

  • finding related keywords that may not show up on most keyword tools

Try Microsoft's Keyword Funnel Tool

Microsoft Keyword Funnel Tool.

Detecting Online Commercial Intent Tool:

Shows you Microsoft's opinion of the probability of a query or a page being information, commercial-informational, or commercial-transactional in nature. Works well in conjunction with Yahoo! Mindset. Useful for:

  • seeing how commercial they think a term or page is, which is important because it is believed that some search engines, such as Google, have a heavy informational bias to their search results.

Try Microsoft's Online Commercial Intent Tool
Microsoft Online Commercial Intent Tool.

Local Ads:

Microsoft has a Seattle only local ad engine, a keyword mutation tool (to find misspelled keywords), an acronym resolution tool, a keyword group detection tool (like Google Sets), and a search result clustering tool.

I added links to these tools on my keyword research and niche discovery tool.

Pages