Google: Enabling & Profiting from Information Pollution

The most recent blog meme is Google's Blogger is a mass spam system.

David Sifry says he thinks that between 2 & 8% of blogs are spam, but I think just like people his systems are not good at detecting much of the spam.

Google maintains that they care:

When spam goes up, it directly affects the quality of those results. I'm exceedingly sympathetic with these folks because, well, we run one of those services ourselves.

But do they really care?

Think of blog search as a form of vertical search. If blog search is less useful and filtering through the spam

  • kills profit margins

  • slows blog search innovation

then more people will opt to use general search.

While Jeff Jarvis thinks Google should share it's tricks for not indexing blog spam I don't see why they would want to. Since Google has not put much effort into making their blog search anywhere near as good as their regular search I don't think they mind if nearly all blog search engines are full of spam.

Blog search full of spam = user may as well use general search = $ for Google. And, on another front, that helps Google ensure blog search sucks really bad until they create the solution, and then they get credit for doing right what their competitors could not :)

Just as a curiosity question, how hard would it be to attenuate trust, only trusting new blogs if they were co-cited by multiple trusted sites? There has to be an algorithmic way to do it. If you were worried about new sites being locked out then you could offer multiple search options:

  • the filtered trusted version

  • the unfiltered version
  • perhaps people could even enter their own trusted friends, levels of trust, minimum trusted citations, or make trust a slidable scale & use AJAX to reorder the results as the trust score is adjusted

On top of owning general search Google also wants to be the first port of call for vertical search. Just look at their recent desires on the real estate front.

Through monetizing spam production with AdSense and making publishing free and easy Google pollutes competing information systems for personal profit.

The same thing that is going on in vertical search scene is also going on in general search. Google has an algorithmic probationary period for most new sites. The same sites tend to rank MSN Search & Yahoo! Search quicker and easier.

By paying search spammers via AdSense Google is funding the information pollution that undermines the usefulness of competing search products. As I have stated in the past, Google generally does not give a shit if AdSense is on spam sites or sites that make money stealing other's copyrighted work.

Now what happens if Google ends up indexing AdSense spam sites? Well suddenly it is a real issue then, and they pull out the we care card. Matt Cutts recommends you report it to Google, but the hidden message there is that Google cares only when the spam ranks in Google.

Meanwhile all the A list bloggers are asking Google to fix the problem when they fail to realize the profits this problem brings Google.

Maybe a large part of being the company that organizes the world's information is encouraging entrepreneurs to stuff garbage in rich competitors databases.

Gems in the Comments

One of the best part of questioning something that is generally thought well of (like Wikipedia & Web2.blah) is the quality of the responses.

Chris Chris Tolles, ODP co-founder, had the following to say about the Wikipedia:

Well, the idealism is part of the package here -- and something you need to consider when you're building and marketing products, or managing your career. If someone's going to go out and harness the public to create a competitor, you might want to take it seriously. If every one of those people *believes* in what they're doing, it is a force to be reckoned with, whether or not they are right, "good", or "bad". If they believe they are building *The* machine, it's a very different amount of effort than if everyone thinks they're working as part of a machine.

NFFC also has a fun take on the Wikipedia. I think there is some bitterness in that post. Somebody doesn't like Lucozade ;)

Gmail UK No More?

From SlashDot:

As of today, UK Gmail users are seeing 'Google Mail' at the top of their Gmail accounts, and Google is warning they may lose their '@gmail.com' addresses in the future.

Google may be fighting on as many fronts as MicroSoft is!

Gmail recently became Google Mail in Germany as well:

Earlier this year, Google lost the right to use Gmail in Germany, following a dispute with the Hamburg-based finance firm. Giersch registered "Gmail – und die Post geht richtig ab." with the German Patent Office five years ago. Following a court injunction awarded by a German court in favour of Giersch earlier this year, Google's German customers have been using addresses ending @googlemail.com.

More Google Print Lawsuits

Recently Google was sued by the Authors Guild for copyright infringement. It seems the Google Print product is making a few more enemies, as publishers sue Google as well:

Book publishers sued Google, escalating a nasty spat over the search-engine giant's ambitious book-indexing project.

The Association of American Publishers said it sued the Mountain View, Calif., Internet giant Wednesday morning after talks broke down.

McGraw-Hill, Wiley, Penguin, and Simon & Schuster are named as plaintiffs in the suit.

John Battelle sees no point to the suit:

I really don't get this. I have been both a publisher and an author, and I have to tell you, these guys sue for one reason and one reason alone, from what I can tell: Their legacy business model is imperiled, and they fear change. Of course, if they can get out of their own way, they'll end up making more money.

I wonder if Google will respond by blacklisting any publishers. Google only needs a few major publishers to start seeing increased revenues due to Google Print for the rest to follow along.

Search is Hot, Cold, or In the Middle?

Yahoo! quarter profits flat, but they sold a bunch of their Google stock for over $100 million this time last year, so they had an impressive revenue increase. Their stock is up 6% on the day.

Yahoo said late Tuesday that net revenue, which excludes fees paid to distribution partners, leaped a better-than-expected 42% to $932 million. Of those sales, Yahoo's marketing-services business -- which is made up of branded and sponsored-search advertising -- accounted for 82%, or $761.8 million, up 40% from a year ago.

I can't believe Jux2 is a meta search engine without any revenue which uses scraping and is already up to $26,000 on eBay. Some of the top bidders are smart cookies too, so I am wondering what I am missing.

LookSmart needs to do a reverse split if they hope to stay listed on Nasdaq.

Hidden AdSense Ads & Hidden Search Results

Ads as content...works well for some. You know AdSense is out of hand when premade sites are selling on eBay.

Debt Consolidation:
Fun to hear a guy whine about his 2 sites in his sig file not outranking Forbes for debt consolidation
http://forums.seochat.com/t54393/s.html
because Forbes has an ad page. :(

I bet his debt consolidation lead generation sites are informational in nature :)

Some are discussing creating links wars
http://forums.seochat.com/t54378/s.html
as if that will help them rank. Good luck knocking out Forbes.com.

If Google dials up their weighting on large authority sites before Christmas maybe the solution is to buy ad pages on some of them. I bet there are some great underpriced ad links and advertisement pages if people would look hard enough.

Coolness:
Link to Jim Boykin's new tool...still a bad tool name though, IMHO.

Controlling Data & Helping Consumers Make it Smarter

Part 4 of my recent ongoing article...

Dynamic Site Advantages:

If you use a weblog or any other type of dynamic site, as content ages you create a large quantity of pages which can rank for a variety of terms in many engines. The site archive systems mean that posts not only get their own pages, but can also be organized by date and category. This creates what is considered to be legitimate keyword driftnet content bank.

People can also subscribe to the feeds to remind themselves when to come back and read your new information. Many people who read feeds also write sites with feeds, and can provide you with additional link popularity and another channel to acquire readers from.

Most people who subscribe to what you have to say will usually be people who agree with many of your points. This means that when they talk about you or mention your site you are:

  • likely to be presented to additional like minded people with similar biases to your own
  • in a positive manner
  • from a voice readers likely trust.

If people disagree with you and still subscribe to your feed then there is a great chance they will frequently want to say how wrong you are, maybe even linking through to your site.

Ultra Targeted Content:

Not all ideas need a whole article to explain. By publishing your thoughts with one topic per post it makes it easier for you to refer back to your own content in the future. It also makes it easier for others to point at / link to / reference it.

Ultra targeted content will also stand a good chance of ranking high for it's keyword theme since it is so well targeted.

Consumer Feedback & Product Catalogs:

For a long time creating pages by keyword phrase permutation was a functional SEO strategy, but Google does not want to display hollow product databases in their regular search results. Creating industrial strength spam works well for some, but as time passes the hollow databases need to get better at remixing sources and integrating user data.

If there is commercial value for a term Google believes Froogle & AdWords work well. It seems to be almost a yearly process that Google dials up the rankings on authority sites right around the Christmas shopping season. This forces merchants to need to buy in to the vertical shopping sites, buy AdWords, or spend Christmas out in the cold.

Allowing user feedback and interaction makes your content more original than most competing sites. It also adds value to the consumer experience & makes it easier to link at your site. Both of which make Google far more likely to want to include your site in the result set. Tim O'Reilly states Data is the Next Intel Inside:

While we've argued that business advantage via controlling software API's is much more difficult in the age of the internet, control of key data sources is not, especially if those data sources are expensive to create or amenable to increasing returns via network effects.

Google is just a giant feedback network, learning to better understand the relationships between queries, links, and content. If you own the smartest and richest consumer feedback network in your vertical you will only continue to gain profit, customers, and leverage, at least up until someone creates a better feedback network that displaces the current one.

My Lawyer Filed a Motion for Summary Judgement in the Anti Free Speech vs Blogger Case

Recently my lawyer filed a motion for summary judgement in the Traffic Power vs Seo Book.com case.

Ariel Stern, my lawyer, also spoke with Max Spilka, who told him that Traffic Power recently switched lawyers.

One wonders why they would switch lawyers so late in the case if they had a real case. Even if they aim to waste bundles of cash I think they well know truth is not on their side.

I hope other bloggers find the paperwork from my case useful. :)

Naming a Website & Redefining Language

Seth offers his new rules for naming, but unfortunately I think some of them are no good.

This:

This means that having the perfect domain name is nice, but it's WAY more important to have a name that works in technorati and yahoo and google when someone is seeking you out.

Sort of a built-in SEO strategy.

is debatable in it's presentation, but this:

So, that was the first task. Find a name that came up with close to zero Google matches.

is absolutely unnecissary.

The concept of needing nearly zero competition to rank is beyond me. If you are creating something of quality over 99% of the competing pages for any phrase are going to be of zero significance.

If your product or service is truely remarkable you should be able to redefine the meaning of language. That is what remarkable people & companies do.

By looking at the number of hits for a word you are just looking at the number of pages that have that term in it. Want a better glance at the competition? Search for the number of pages which have the word in the anchor text and the page title (that tool will be made better & open source...it is still very beta). Even that number does not matter much though.

You really only need to look at the top few results, because those are the ones you will be competing against if you are trying to own the meaning of a word or phrase.

When Nandini named a directory Web Atlas that was a bad call because there are authoritative well established .edu & .gov domains in that keyword space.

When I created Myriad Search, I did not look through the competition at all (in large part because I wanted to create Myriad Search for link popularity and personal use more than for it to spread widely).

In spite of spending under $1 avertising Myriad Search, in the first month the site already ranks at #11 or #12 in Google for the word "myriad" (out of 28,000,000+ results).

You don't really establish a cult status until after many people are talking about your product. People do not search for your product until they heard about it elsewhere.

You shouldn't think of your site starting from zero and every page that has the term you want to rank for as competition. You should compare the quality of your idea to the quality of the top ranking ideas and see if you think it is possible to outrank them based on that.

Also notice how Seth's post title sounding authoritative is more likely to get comments. New rules for naming sounds much more definitive than naming tips & ideas.

Dynamic Sites & Social Feedback

Part 3 in a series... let me know what you think :)

Blog Software is a Simple CMS

Some of the conversations stemming from my article series starting with Why Bloggers Hate SEO's & Why SEO's Should Love Bloggers have stated that blogs are just a simple CMS. The one catch is they are social in nature.

I have probably read about a couple hundred books, and have only emailed about 5 book authors to tell them how great their books were. Most of the book authors quickly replied to my emails to say thanks. This tells me that they must understand the value of having fans (Seth Godin surely fits in that group) or they are not as inundated with email as I sometimes am.

Compare the books, which take months to write, to most blogs. On blogs I have left hundreds or thousands of comments. Across my various blogs I have got thousands of feedback posts others have left. One blog is almost nothing but a framework for people to leave their comments, and yet they still do!

Some people have stated that blogs are a fad that will die out. They may be right, but if they die out it will only be if other software emerges which does a better job of social integration, as some of the current tools are lacking on many fronts.

Static Content & the Game of Margins

Some old estabished static sites may long live on, but both directly and indirectly the web is becoming more of a read write medium. Margins will require content to become more social.

In spite of years of branding and content creation even the most well known publishers are caught playing the margins, selling ad space aggressively, and push the blame onto their advertisers.

Creating content is a game of margins. If you use a static website, and update it's content to keep it current, you are writing over your old work, which means:

  • you are throwing away it's historical record
  • you are creating less pages (which means less chances to pull in visitors) , as each page is another search lottery ticket
  • it is likely going to be harder for an audience to find the new content
  • it is less likely people will reference the new content, since they do not know what URLs are changing when
  • it is less likely people will reference the old content, since it may eventually change
  • many people will not want to reread the parts they already read
  • as your content size grows it means you are forced to worry about keeping it up to date while still trying to keep up with the news and the shifting marketplace

Add all of those things together, and a business model which would wildly succeed could easily become a complete failure.

The static site this article is on generally sucked until my blog became popular. In spite of the effort writing this aritcle, my average blog post will probably be read many times more than this article is.

Who is a Static Site For?

When you first learn about a topic it may be useful to create a large site about the topics you are learning, just as a way of forcing you to learn it all. Even in doing that, so long as you map out the general hierarchy ahead of time, there is no reason to avoid creating the site using a dynamicly driven database. Eventually when I have enough time this site will likely be shifted to a dynamic format.

The only people who can really afford to get away with using purely static sites are:

  • those who have other dynamic sites which help build their credibility & authority
  • those who are creating a site out of boredom or for a personal hobby
  • those who are not trying to profit or spread ideas
  • those who are known as the authority on their topic (who can do well in spite of the shortfalls in their publishing methods)
  • amazing writers who write so well that they can do well in spite of their publishing format
  • those who were first runners or are in niche fields with few competitors
  • those who are gurus in fields that change slow
  • those who run tons of sites and want to make them scalable (although it is even easier to do this with dynamic sites)

In almost all the above cases I can point to examples of how using dynamic sites could save time or be more profitable.

Example of a Sucky Static Site:

Not too long ago I created a site called Link Hounds to give away free link building tools on. I find the tools exceptionally useful, but the site failed to take off for a number of reasons.

  • API Limitations: when I first announced the site people used it beyond the API limits and it did not work. I should ask the engines for increased limits.
  • Lack of Incentive to Syndicate: in part to make up for the API limitations I gave away the source code and referenced tool mirrors, but some who mirrored the tool did not want to share it with others. Also Yahoo! requires that sites have DOM XML support if you use PHP4 to program the tool. I should have had my friend program in PHP5.
  • Crap Design: While the site design was not bad for free, it obviously is not something stellar.
  • Open Source & SEO: Are generally not concepts which are paired together. I think it will take a bit of time for people to get used to it. An open source website recently asked me to write an article, so that may help a bit.
  • Perception of Value: People think they get what they pay for. In spite of the fact that some of my software is similar to (and in some ways better than) stuff that sells for $150 or more, some people think the software is worthless because it is free. Similar software with strong affiliate marketing is seen by many more people:
  • Boring / Static: If I started working a bit harder at link building and placed a blog offering a bunch of creative link tips on that site I suspect it would garner many more links.

As it sits, there is little reason for people to remember to go back to the Link Hounds site, so they rarely do.

Sites that are dynamic in nature which make it easy to give feedback will fair far better.

Pages