A good SEO is both a geek and a suit, and is able to speak the language of both worlds. Providing SEO services without understanding both business and techie principles is the equivalent of trying to teach english to spanish children without knowing both languages and cultural mores of both fluently.
A few ways to judge if you may even be a tech guy that speaks marketing:
1 - You’re a card carrying member of the DMA, AMA, or MMA and know what USB, C-class IP, DNS, WHOIS, PPC, and G stands for.
2 - Every morning you recite the 4 P’s of the marketing mix as well as all mnemonics for the OSI Model including how it is like a 7 layer burrito.
3 - You know what city Madison Avenue is in as well as who invented the altair, TI, and Atari.
4 - You dream about purple cows and have listened to pink floyd synched with the wizard of oz after consuming various forms of caffeine.
5 - You read Adweek, Adage, Brand Republic, Fast Magazine, Inc.com, Linux Journal, Linux Mag, and Apache Week.
6 - You use linkedin and 2600 in the same day.
7 - You’ve heard of Ogilvy, Leo Burnett, Saatchi and Saatchi, Crispin Porter + Bogusky, BBDO, and can speculate how many boxes they run, their bandwidth speeds and marvel at how well Grand Theft Auto would run on their intranet infrastructures.
8 - You buy text links and optimize webhost performance.
9 - You are a certified Ambassador, Adwords professional, A+, network +, and cisco certified MCSE (marketing computer speak explainer).
10 - You can name the top 5 most run banner ads of all time, and still surf with javascript off from a proxy, and cloaking your user agent
Your marketing guys and gals need to communicate the how’s and why’s of both marketing and SEO to their caffeine and sugar sipping cohorts coding databases down the hall.
SEO/SEM is getting your marketing guys to talk to your techies (and maybe even hang out and ENJOY their differences of world and work views). SEO is bridging the business/ technology communication digital divide. I really enjoy both worlds, and training on the areas that oftentimes get missed when the dialogue is incomplete or strained due to how different they have traditionally been. Think of the stereotype of both groups (while stereotypes are not great, they are often based somewhat in reality), and you couldn’t have two more different type of people. These differences cause the cognitive dissonance that IS SEO.
This is the essence of SEO - project management that improves the communication between folks who specialize in marketing and those that specialize in technology for a better understanding of improving the relevance of a website for optimial search performance. This is a new idea, and even the personality type is a completely new one (thus the demand right now). If you work for a marketing company, ad agency, PR firm or SEO company that needs some strategic advice - I have a deal for you, Drop me a line, and I’ll fill you in, or point you in the direction of some good folks and information that may be able to help you.
Filed under: Google, SEM Research by Stuntdubl SEO at 11:03 am, 7/20/2006
My first post on the Google cache error was pretty much a quick rundown of what I thought was possible that the error message revealed about the Google algo. Well, after reviewing the error further, I was pretty much completely wrong. It sounded like some pretty good guesses, and I stand by the fact that most of those things probably ARE in the algo somewhere, but my interpretation of the error was dead wrong.
The most glaring error that I overlooked in my excitement, was that it was served up on a cached page, thus was most likely query level server response (thanks to Detlev’s analysis)
Detlev’s explanation of the error is by far the best I’ve seen, and I would guess that he has came closest to correct of what the error actually meant. Another pretty good guestimation of the errors is available from Teh Xiggeh.
Added: Wesley Tanaka has a nice writeup as well.
There have been some other discussions and explanations of the error as well, but since Detlev is probably one of the most proficient SEO’s I’ve ever met. (an OG SEO, that has been watching algorithms since before SEO was called SEO), and the explanation seems simplest and most logical (see Occam’s Razor)
Other discussions on the “google cache error”
Please feel free to link drop any other discussions that you’ve seen on the topic.
Filed under: Buzz Marketing, SEM Research by Stuntdubl SEO at 4:15 pm, 7/17/2006
Okay, so now that Myspace is the #1 site in the world, I think it’s finally time to embrace the space with a full marketer’s mentality. Despite the Myspace’s title being disputed, it’s hard to argue that there is an incredibly large market with a whole lot of potential. I haven’t dove in wholeheartedly yet, but from my very unscientific experiments on myspace at school there is an extremely high level of loyalty with myspace. My favorite stat is that myspace has 40 pageviews per visitors ON AVERAGE. That was a pretty amazing number to me.
While on vacation, I’ve seen two seperate instances of television ads that used Myspace URLs. I thought this kind of odd at first, but with increasing difficulty to both purchase (and then rank) new domains, it makes pretty good sense. Both the movie “John Tucker Must Die” and the TV show It’s Always Sunny in Philadelphia advertise Myspace Urls. This is in addition to thousands of groups dedicated to virally marketing and promoting various different topics, including one of my favorites, Entourage, which attempted a viral campaign via Myspace.
Benefits of My Space Optimization
Quicker Search Engine Rankings
“It’s Always Sunny” didn’t fair so well in the rankings, but “John Tucker” gets a #2 ranking for the phrase on Google with the intended myspace page.
No need for domain or hosting
Immediate audience with massive distribution
Drawbacks of My Space Optimization
No control over web host or downtime
Rather slow servers due to large volumes of traffic
No ownership of content
No ability to 301 traffic or content if it needs to move. If links are built, they can’t be redirected to appropriate areas.
Little way to track statistics and visitors (that I’m aware)
Search rankings are nice, but I really prefer converting traffic. The TRAFFIC is definitely there, now we just need to hope that myspace users start looking for more than friends and dates, and searching for more high dollar items that can be monetized (especially if they implement their own search technology after their search function contract expires). There’s definitely a ton of potential with Myspace, it’s only a matter of getting creative to harness some of the enormous viral marketing opportunities.
MySpace Optimization Resources
Filed under: Todd Malicoat by Stuntdubl SEO at 1:35 pm, 7/13/2006
Being my blog alter-ego is a misspelling, I’ve always gotten “Did you mean stuntdouble?” when doing a query for stuntdubl. Today was the first day that I noticed there was no more “did you mean”. I’m curious if it is due to search query volume or perhaps indexed pages, or a combination ratio of the two.
I’ve heard Greg mention this before as well, as he gets suggested if you misspell his name.
Filed under: Google, SEM Research by Stuntdubl SEO at 12:56 pm, 7/11/2006
Last week, a gent by the name of Ruslan Abuzant, got a rare peak at a portion of the algorithm of Google, stumbling accross it when looking at the cached version of a multi-language page. He was kind enough to post his findings on digital point forums which I found via threadwatch.
Perhaps, it’s because it happend over the holiday weekend, but I thought it was a bit odd that more SEO’s weren’t as excited by this as I was. No, there’s probably not A LOT that can be learned from this, but there is some, and it was finally like being “through the looking glass” to get a rare glimpse of how google really ranks pages.
pacemaker-alarm-delay-in-ms-overall-sum 2341989
pacemaker-alarm-delay-in-ms-total-count 7776761
cpu-utilization 1.28
cpu-speed 2800000000
timedout-queries_total 14227
num-docinfo_total 10680907
avg-latency-ms_total 3545152552
num-docinfo_total 10680907
num-docinfo-disk_total 2200918
queries_total 1229799558
e_supplemental=150000 –pagerank_cutoff_decrease_per_round=100 –pagerank_cutoff_increase_per_round=500 –parents=12,13,14,15,16,17,18,19,20,21,22,23 –pass_country_to_leaves –phil_max_doc_activation=0.5 –port_base=32311 –production –rewrite_noncompositional_compounds –rpc_resolve_unreachable_servers –scale_prvec4_to_prvec –sections_to_retrieve=body+url+compactanchors –servlets=ascorer –supplemental_tier_section=body+url+compactanchors –threaded_logging –nouse_compressed_urls –use_domain_match –nouse_experimental_indyrank –use_experimental_spamscore –use_gwd –use_query_classifier –use_spamscore –using_borg
While this isn’t EXTREMELY telling, there are some things we can take a look at here that are potentially useful. Perhaps the other reasons SEO’s weren’t to excited, because as you break this down, you will tend to see a lot of the variables that we often speculate about anyhow. TallTroll (hey Brendon - I’d link to ya if I knew any of your sites;)), mentioned on threadwatch a while back:
The joke is that even if they published a definitive version of the algo, the kind of people who moan about Google still wouldn’t be any better off, since they STILL wouldn’t have any clue what to do with the information. Those who do know what to do with it already have a good idea of what the algo looks like, at least in broad terms, and so will gain little themselves.
I guess Most SEO’s don’t NEED to know the algorithm, because they have adapted best practices to suit their process for the most part. They may be able to adapt their process a bit if they knew the EXACT algo, but many folks have a pretty good guess of where the knobs are dialed to, although I’m certain it’s far from a comprehensive understanding of exactly what the mountain of Ph.d’s at G, Y, and MSN have up their sleeves.
So without further ado, here’s a bit of my speculation on what I thought was one of the coolest developments in a long time. It’s only a piece of what is a much bigger thing, but I thought it was definitely worth a look, when Matt confirmed it was real (and also that we will most likely NEVER see something like this again).
**Note This is pure speculation and 99% of it may be pure trash
pacemaker-alarm-delay-in-ms-overall-sum 2341989
Best guess: Could be about anything I suppose - potentially a metric for spidering frequency to the specific page
pacemaker-alarm-delay-in-ms-total-count 7776761
Best guess: spidering frequency to entire site?
cpu-utilization 1.28
Best guess: Metric for how CPU intensive site spidering was
cpu-speed 2800000000
Best guess: Perhaps how fast to spider the website based on server performance
timedout-queries_total 14227
Best guess: How many times the web site has timed out to requests over time
num-docinfo_total 10680907
Best guess: File size of the document - last time requested
avg-latency-ms_total 3545152552
Best guess: Latency speed of the webserver serving the document requested
num-docinfo_total 10680907
Best guess: File size of the document - current request
num-docinfo-disk_total 2200918
Best guess: Total stored site size
queries_total 1229799558
Best guess: Total queries for the site category, or perhaps the specific site
Perhaps “navigational” queries are used to measure the popularity of a site?
e_supplemental=150000
Best guess: Threshhold for placing results into the supplemental index
–pagerank_cutoff_decrease_per_round=100
Best guess: Some cutoff point for figuiring link popularity - perhaps an incorporated trust filter to decrease link popularity by several multiples until it’s found trustworthy
–pagerank_cutoff_increase_per_round=500
Best guess: Some cutoff point for figuiring link popularity - see above
–parents=12,13,14,15,16,17,18,19,20,21,22,23
Best guess: Parent topical categories (think DMOZ) - or parent pages within the site (think SE theme pyramids or virtual site heirarchy)
–pass_country_to_leaves
Best guess: Choose primary country of origin for website or page
–phil_max_doc_activation=0.5
Best guess: Threshold for maximum spidering of website
–port_base=32311
Best guess: an indicator of filetype or which datacenters it’s the data is distributed throughout
–production
Not much to go on here -
–rewrite_noncompositional_compounds
From - Automatic Discovery of Non-Compositional Compounds
Spaces in texts of languages like English offer an easy first approximation to minimal content-bearing units. However, this approximation mis-analyzes non-compositional compounds (NCCs) such as “kick the bucket” and “hot dog.” NCCs are compound words whose meanings are a matter of convention and cannot be synthesized from the meanings of their space-delimited components.
Best guess: Sounds like some implementation of LSA/LSI to create meaning from non-standard language. Perhaps some type of language AI.
–rpc_resolve_unreachable_servers
Best guess: Have googlebot revisit unreachable servers
–scale_prvec4_to_prvec
Best guess: Adjustments on PR algo
–sections_to_retrieve=body+url+compactanchors
Best guess: Disregard navigation that is consistent throughout the website - Some type of block level analysis
–servlets=ascorer
Best guess: Who the hell knows…not much to go on here…I’m grasping at straws already if you got this far and didn’t realize it;)
–supplemental_tier_section=body+url+compactanchors
Best guess: Aditional block level analysis, perhaps some duplicate content detection
–threaded_logging
Best guess: Log more in depth information (links, clickthrough rates, etc.) for this page
–nouse_compressed_urls
Best guess: Perhaps a fix for SID’s in urls or other disregarding other types of urls that create infinite loops - disregarding any type of variables after the questionmark in a url
–use_domain_match
Best guess: Some type of Canonicalization fixes
–nouse_experimental_indyrank
Best guess: Dunno, but it sounds like a good thing to start tryin’ to figure out - perhaps they finally ARE going to roll toolbar or user data into the algo. Perhaps personalization finally making its’ way in.
–use_experimental_spamscore
Best guess: Newer version of the below spamscore - number filters that give an indicator of how likely a page is spam.
–use_gwd
Best guess: not much to go on here - I’ll go with “google word database”
Other guesses have included “google web directory” or “google world domination”
–use_query_classifier
Best guess:Something as simple as
-navigational
-informational
-transactional
Similar to yahoo mindset
-or-
More likely a deeper extension of the above.
Query specific variables to certain verticals -
Think “transactional real estate” - new york real estate agent
vs.
“informational real estate” - new york real estate news
This criteria would also help to decipher which queries to serve “onebox results” for froogle/googlebase/google local/ google maps/ etc.
–use_spamscore
Best guess: The “non-beta” or working version of the above mentioned spam score that is a constant work in progress. Things like multiple dashes in a domain have are good indicators of a high likelihood of a page being spam. Domain names over a certain lengths, and probably many other things would fall into what could be used to evaluate a sites “spamscore”
–using_borg
Best guess: A. Some technology or systems developed by Anita Borg (time for some homework) - or B. google really *IS* trying to take over the world, and we’re all being added to a massive database - I’m going with A as my best guess though;)
People sometimes have a hard time understanding that algorithm variables are not necessarily good or bad, fair or unfair..they are only effective or ineffective in judging quality. People evaulate search results subjectively, but a search algo is objective to many different criteria that make up the final result. A webmaster may think that tracking the number of times a site goes down is “unfair”, but on a massive scale it is an accurate indication of the quality of a website.
I’m sure the boys at the ‘plex are getting a nice chuckle from some of my wild speculation, so I’d like to be my normal google nitpicking self and add my own two cents to Matt’s super beta-algo (I like where it’s going:):
–initial_time_travel_wormhole=â€Wednesday, December 31 1969 11:11 pmâ€
–use_googlepray=false
–docid_size=more-than-four-bytes
–SETI_alien_communication_port=31337
–skynet_sentience=0.33
–plane_load=snakes
–pigeonrank_seed=42
–use_mentalplex=true
–unicorn_versus_werewolf=its-on-now
You may be better off with:
–initialize_flux_capacitor=â€November 5, 1955, 0600 AM†(stop Doc Brown!)
–docid_size=return_to_1985
-use_googlekarma=true
–reveal_matrix=red_pill
–SETI_alien_communication_port=31337
–skynet_sentience=0.33
–plane_load=snakes
–pigeonrank_seed=42
–use_mentalplex=true
-use_googledance=tango
-use_men-in-black-flashy=true
-toolbar_phone_home=ET
-tinfoil_hat_wearer=true
-source_code_level=hello_world
–ninjas_riding_unicorns_vs_pirates_with_werewolves
Hope this helps spice things up a bit:)
We know there are hundreds if not thousands of variables and combinations, so you have pretty good odds that you can pick SOMETHING that is in the secret sauce SOMEWHERE. This could of course be just another ploy to keep SEO’s busy and wondering rather than actually WORKING on creating more websites;) Anyone else care to toss out their best guesses on what some of this stuff may or may not mean? Wasn’t anyone else excited to get a brief little peak of the code we all so diligently try to reverse engineer?
Filed under: Interviews by Stuntdubl SEO at 12:05 pm, 7/5/2006
I’ve spent a lot of time recently catching up on webmasterradio.fm podcasts, and was lucky enough to catch Jeremy (aka shoemoney) talking with Lee Dodd on Net Income. I hadn’t come accross much of Lee’s stuff in the past, unfortunately, but he seems like a very intelligent guy, and certainly has a lot of insight into running an online community. During the show they talk about many of the different ways to promote and monetize a forum or social network, which can be a difficult thing to do without overly “pimping” the community. It makes for a nice listen if you have the time, and you can be certain I’ll be checking out more of Lee’s work in the future.
Lee is currently opening a new forum dubbed the EarnersForum.com that seems like a great place to get together and talk with other people who enjoy making money online. He’s running a contest giving away $1,000, a developed niche site, one hour consultation, a 1 year link, and an interview on my podcast to 5 winners getting the entire package.
I know he already has some great “earners” associated with the forum, and is planning to have some private areas for discussion. It sounds like a pretty good idea, and I wish Lee all the best with it.
Filed under: Google, Search Engine Optimization, Yahoo/ MSN by Stuntdubl SEO at 1:10 am,
Firstly, it’s a Trustbox, not a Sandbox. “Trust filters” seem to be a large portion of what has most SEO’s in a frenzy over search engine’s currently. There are pros and cons to the trustbox for folks on both sides of the fence, and the best thing you can do no matter which side of the game you are on is understand what the filters mean and the reprocussions that they will create in the future.
So what is search engine trust?
For the purpose of keeping things simple, I would identify a site’s trust by 3 different simple criteria:
- Website Age - (most importantly the first time it was indexed)
- Total # of backlinks and the overall age of those links
- Total “trustscore” of other backlinks (How many .edu’s, .gov’s, high ACTUAL PR links, etc.)
Aaron just released an amazing SEO extension for firefox that gives some great insights to these areas.
Most trust criteria revolve around some dependence on age, which is actually a pretty good signal of quality. From things folks at Google have said in the past, the trustbox (or sandbox if you must) was the unintentional effect of some other filters that were implemented. Realizing that age was a great signal all the way around to defend against the overdependency on links, they’ve went buckwild with age variables ever since.
I’m sure there are plenty of other things that effect trust, but these are most likely tops on the list. Think age related to just about any of the search ranking factors and it could (or probably is) being used.
Just how important is being trusted right now?
I figured it was about time for a rant on the trust of domains (mainly in Google), and when I spent some time on a recent roadtrip listening to some excellent Strikepoint podcasts, I really knew it was time. DaveN has some fantastic commentary on just how important trust is in ranking in google these days. I’m not sure exactly which episode it was (I listened to three or four and they were all very insightful), but Dave, Mikkel, or JasonD talk about 85% of search rankings these days being attributed to trust, and about 15% being onpage, and it is painfully true. With a few links to a highly trusted domain, and some body copy a site can rank for just about anything whether it is topically related or not.
There are examples everywhere on the web of just how critical trust is right now to top rankings. Don’t get me wrong…trust is a very good thing, and a great signal of quality, but depending almost solely on it is not the solution, as depending nearly solely on links was not the best solution.
Two or three years ago:
SEO = Content + high PR links
Created: a micro-economy of link buying solely for google rankings
Now
SEO = Crusty trusted domain + content
Will create: use your imagination.
Why the Overdependence on Trust Will Again Change the Web
The search engines are probably the most important aspect of the web. There are BILLIONS of pages of information available, but if you can’t find any of them, it makes instant access a WHOLE lot more difficult. The internet without search would be the equivalent of a library that was just a big pile of books that sometimes had a few similar books near each other.
“…the meaning of a link has been transformed from a reference to a vote.” - Bill Slawski, from his interview with Aaron.
“A link is a vote” has transformed the face of the web both for good and bad. It’s easy for SE’s to place all the blame on “spammers”, but to assume that there will be no manipulation with monetary stakes so high is somewhat naive as well. As long as the rewards are high, and the barrier to entry is low, there will be search engine spam. In addition to spam, there will always be folks who have a higher risk thresh hold for the potential of higher rewards. As everyone realized the value of a link more and more, it changed how every webmaster thought about the world wide web. The more motivated people were by money, the more extensive lengths they were willing to go for obtaining links that have their own inherent monetary value.
The over dependency on trust is the very same thing. It is going to cause trust to be abused in the very same way links were. We are already seeing the proliferation of subdomain spam, and after that is remedied there will still be the issue of hosting advertising space on a website.
One of the extremely big problems with trust filters is that they don’t seem to be retroactive…meaning that sites that were around and trusted BEFORE a particular filter was established can basically get away with murder (and they do).
The Trust Knob is Way too High…Please Turn it Back
One of the really great things about the web is that it has evened up the playing field for the little guys. The barrier to entry is constantly being raised, but for this unique window of opportunity, everyone has been given the opportunity to potentially start a successful online business if they are ambitious enough and spend time doing the right things.
Hey Google, remember when YOU were the little guy starting up in a garage ten years or so ago. Why not make the window of opportunity for little guys last just a little bit longer, and dial the trust thing back a bit eh? The trust knob has restored the balance of power right back into the hands of the big guys who can now do whatever they want with their “trusted domains” and be back in the index in days or never get removed at all. Why not give Joe’s ultra amazing toothpaste (the company with very little marketing budget because they spend their money making an amazing product) a chance to rank high for “toothpaste” for just a little bit longer instead of HELPING companies who’ve been spending millions of dollars on their “brand” instead of their product for the last decade or more?
Setting the barrier to entry so high just begs for abuse of the system. If SEO’s know that they can’t rank a new site for two years…why the hell would they bother to register a new domain…or take on a client with a brand new site? They are going to look for workarounds…and we all know what the workarounds are. The variations of these workarounds mutate and evolve to cause a whole new host of problems.
Please Google…turn the knob back before you make the problems even worse. The solution may be good in the short term, but you were great once because you helped the little guys that were hungry and cared about their customers. Focus on HELPING those people again and you will create great SERPS for your users and have to worry less about fighting spammers. Trust is a great signal of quality, but by moving so heavily to this model you are going to created the same problems that you did with the over dependency on link popularity.
Obligatory required reading on the Trustbox