Many spam sites are based on automation, and in the attempts to automate and mass produce content or sites leave footprints that are easy to detect.
While MSN Search is still chuck full of spam, they are doing research to try to stop it (link via PeterD).
Our approach is to treat each spam page as a dynamic program rather than a static page, and utilize a “monkey program†[6] to analyze the traffic resulting from visiting each page with an actual browser so that the program can be executed in full fidelity.
Many successful, large-scale spammers have created a huge number of doorway pages that either redirect to or fetch ads from a single domain that is responsible for serving all target pages. By identifying those domains that serve target pages for a large number of doorway pages, we can catch major spammers' domains together with all their doorway pages and doorway domains.
Just about any piece of the publishing or monetization puzzle that is not well thought out can leave a footprint.
The downside with them doing that type of research and sharing it publicly is that they create an incentive for one person to make a bunch of spam sites for a competitor just to knock the competitor's main site out of the search results. And if you think MSN has fixes in place for that sort of stuff, you would probably be wrong. Take, for example, their inept geo-location targeting algorithms.
Does Google cross reference AdSense accounts when fighting spam? I am not certain, but some friends have recently reported occasional $8 and $9 AdSense ad clicks on some low traffic spammy sites in a network of spammy sites linked to an AdSense account. If Google is going out of their way to filter the noise out of their ad network it shouldn't be surprising that they would use similar data points to clean up their organic results. If you start getting a ton of traffic and/or large earnings quickly that might flag your site for some type of editorial review.
How to Look Like an SEO
Google started to support the NoODP meta tag that was introduced by MSN in May. To use it place the following code in the head of your DMOZ listed page
<META NAME="ROBOTS" CONTENT="NOODP">
I probably would not use that unless my DMOZ listing was really jacked. I believe it is a way to self select yourself as an SEO, which may not work in your favor.
I also think excessively using Nofollow tags outside of those that are typically associated with your content management system is another way to self select yourself as a known SEO.