Interview at SEOBuzzBox
Aaron at SEOBuzzbox does some pretty cool interviews with some notable folks in the SEO/SEM industry. I’m honored to now be one of them. Check out the interview if you get a chance.
Aaron at SEOBuzzbox does some pretty cool interviews with some notable folks in the SEO/SEM industry. I’m honored to now be one of them. Check out the interview if you get a chance.
Editors note: My friend Calum Coburn had a nice article about company branding within search results that he offered to share, and seemed quite worthwhile. I hope you enjoy it as well.
An important and often overlooked aspect of the Title Element (overlooked by the big guns of Brett and Todd both) Branding is that of branding your pages and site. Imagine that you’re the prospect, scanning for the most authoritative article or information relating to your kw phrase in the SERP’s (”speaker” being the kw example used below).
What if the top 5 sites use similar Titles? You’ve not got the time to read each description (meta tag description). So how do you choose? You look for authority status.
There are 2 main ways in which we can flex our authority muscles.
Lets first consider the impact of the URL before moving onto the almighty Title Element.
1. Choice of domain name and rest of the path/URL. Consider the following speaker articles search example:
1.1. pzlyq.com/pg240.cfm1.2. www.johnsmith.co.uk/articles/
1.3. www.speakertraining.co.uk/articles/
1.4. www.speakersinstitute.com/articles/
2. Lets consider each in turn:
2.1. The first domain and URL tell us nothing about what to expect.2.2. Whilst the “/articles/†directory is on topic for our search, the domain name is that of a
person’s name (John Smith), and so doesn’t win on the authority stakes. The focus here appears to be UK from the extension “.co.uk†– so if I’m searching for global or foreign centric knowledge this UK focus may be inappropriate.2.3. “training†suggests a focus on selling services rather than the development and sharing of Speaker knowledge. Additionally, there’s a “.co.uk†UK centric focus.
2.4. “speakerinstitute†appears to be focussed on developing speaker knowledge and application. It could mean a lot of things, I do know that they have put their stake firmly in the “speaker†ground and are an “institute†in the field of speakers and speaking. Also, they are a .com – so they are more likely to be a global company, and so may cover my local needs.
The second route to branding in the SERP’s is that of placing your (well chosen!) company name in your Title Element – every title element. Using our example above, the number 4 choice would look like this: “Speaker Articles | Speaker Instituteâ€. First off as the prospect you’re more likely to notice this Title Element than you would the URL. Second, if you’ve performed a number of related searches as you slide down from short tail towards your more refined long tail search, you will likely have started to notice the “ | Speaker Institute†popping up more and more, so…
Question: Why do advertisers screen their ad regularly or not at all?
Answer: The first 2 or 3 ocurrences are noticed by your conscious mind, after that you stop noticing the advert. That’s the dangerous part for us consumers, we need to watch out at this point. See, the advertiser wants your guard to be down, they know you’re not going to run into your closest supermarket feverishly looking for their product as a result of their funny/different/impactful ad. No, you’re going to be walking down the supermarket aisle as per usual, and either you notice their product out of the corner of your eye or you are looking for their category of product and now automatically are open to associating their brand with this category. The point is they are looking to win your trust and create an association. This is an unconscious process for the overwhelming majority of consumer products. Your not in direct conscious control of most of your unconscious processes, the decision was made a long time before, and often not by your conscious mind. So look out! So how can we make this stand to our advantage?
The parallels to search I hope will be obvious to you. See enough of your “ | Company Name†flashing up in front of your prospects eyes in the SERPs, and all of a sudden, as if by magic, you’re a trusted brand name worthy of a click in the prospects crucial long tail search phrase.
Notice too that if your kw phrase and domain name match (in our example “Speaker Institute†and “institutespeaker.comâ€, that’s a double punch into your prospects unconscious minds. (But I’m sure you noticed that consciously already…)
Now if you’re keeping your Title Element down to 3 to 8 kws, then does this take a bite of 2 or so words out of your Title length budget? You betcha! So for a company name of 2 words, this reduces your kw SEO budget down to 8-2=6 kws. Personally I’d say this forces good Title Crafting discipline.
If you have an article that you’d like to contribute (that hasn’t been published all over)…submissions are always welcome.
“…the meaning of a link has been transformed from a reference to a vote.” - Bill Slawski, from his interview with Aaron.
THIS is how Google has, and continues to change the face of the web. Thanks for putting it into simple context for us Bill.
Be sure to read the rest of this great interview. Bill is one of the few SEO’s with the superhuman ability to translate patents and whitepapers into language the rest of us can understand.
Unique content is a valuable commodity. There was a discussion on duplicate content in WMW supporters forum a few days ago, that I thought was worthy of a post for those who aren’t subscribed there (you should be though!).
Duplicate content has become the big area of misinformation with everyone concerned that they have hit a “duplicate content filter”, or been penalized for duplicate content. Chances are you haven’t been banned or penalized unless you really have very little unique content on your entire site. For this reason, I’d thought I’d dig a little bit further into dupe content and remedies so I have a reference document for later.
If it was as easy as saying that any page with more than 42% duplicate content will be filtered from the search results, then all site owners and SEO’s would probably grab 40% duplicate content for every page filler. It IS NOT a percentage. There may be percentage variables that apply, but the first step to understanding duplicate content is to get out of the “magical percentage” line of thinking.
From this paper on Finding near-replicas of documents on the web there are a few clues into the way SE’s may handle duplicate content:
Clustering exact copies by checksum
Comparing doc size for exact or near exact webpages
This is generally how people think of duplicate content detection in terms of the “magical percentage”. As long as you have 20% unique content you’ll be fine…Riiiight. It is an overly simplistic method of detecting duplicate content that is at the core of the technique of dupe content detection, but does not consider other techniques that may be applied. Many people do not consider the ways of detecting duplicate content much further than this method, and thus get stuck in the “magical percentage” line of thinking.
Computing all-pairs document
“Chunking” documents and searching for similarities and flagging them for a “second look”
The resulting document is then
chunked into smaller units…
Understanding some of the methods for filtering duplicate content is the first step in getting beyond the “magical percentage” thinking (from here on referred to as MP thinking). Imagine 10 different documents that all pull 5 lines of text from 3 documents containing 20 lines of text. They 10 different documents are most likely “unique” if the randomization settings are done well. They will all, however, have different levels of percentage similarities. Now before reverting to the line of thinking that says “which level will get me penalized?, consider other options for scoring the relevance of these pages. Consider also that it takes multiple iterations of processing to determine the similarities between ALL documents. Now, as an relevance engineer…how would YOU handle that mess?
Sort-based approach
Sorting and finding overlap.
Probablistic-counting based approach
Comparing the probability of dupe content based on footprint “sets” of different types if there is overlap between documents
Okay, you’re no longer an MP thinker. You’ve moved beyond wishing for the percentage you can push the limit to, and have agreed that you probably need a content writer to put something worthwhile on your site.
My other favorite whitepaper on duplicate content is:
Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content
From this we get some ideas on different levels of severity for dupe content:
Level 1 — Structural and content identity.
Every page on host A with relative path P, (i.e., a URL of the form http:/ /A/P) is represented by a byte-wise identical page on host B, at location http:/ /B/P, and vice versa.Level 2 — Structural identity. Content equivalence
Every page on host A with relative path P, is represented by an equivalent content page on host B, at location http:/ /B/P, and vice versa.Level 3 — Structural identity. Content similarity.
Every page on host A with relative path P, is represented by a highly similar page on host B, at location http:/ /B/P, and vice versa.Level 4 — Partial structural match. Content similarity.
Some pages on host A with relative path P, are represented by a page on host B, at location http:/ /B/P, and vice versa, and these pairs of pages are highly similarLevel 5 — Structural identity. Related content.
Every page on host A with relative path P, is represented by a page on host B, at location http:/ /B/P, and vice versa. The pages are pair-wise related (e.g., every page is a translation of its counterpart) but in general are not syntactically similar.Mismatch — None of the above.
It should be noted that this is my own gut feel on the topic, and there’s a very good chance I may be wrong. Take it with a grain of salt. In case you missed it, I’ve ranted about them before.
Using the level’s from above:
Level 1 - Banned
example: dmoz/wiki clones
Level 2 - Banned
example: page scraped from a site using relative urls
Level 3 - Partial penalization and/or filtering on some content depending on severity of duplication.
example: same CMS system, some dupe content
IE. Oscommerce and stock product manufacturer descriptions.
Level 4 - Possible penalization and/or filtering on some content depending on severity of duplication.
example: similar to #3 - similar content and cms system
Two widget forums both ran on phpbb or vbulletin, had similar categories of content, and allowed people to post the same content in both places (creating some exact dupe content), or aggegate rss feeds.
Level 5 - Not much to worry about -
Two widget forums both ran on vbulletin and had similar categories of content
Mismatch = best case scenario - This is what you’re striving for. Having NO duplicate content indexed is the ideal. Your best bet is to keep all duplicate content from being indexed at all, and make sure if you use out of the box solutions that you change up the “footprints” a bit.
Filter - You have some duplicate content within your own site or an external site, or you have a lack of unique content - You’ll most likely end up with these PAGES in the supplemental index. Filters are generally page level problems that decrease rankings
Penalty - You’ve served duplicate content one too many times. You may have served the spiders the same content so many times that they won’t come around as often (Calendar software or session ID’s are good examples of this). With a penalty, you may get your website spidered on a less frequent or more superficial basis (meaning you won’t get deep crawled). Penalties can be page or site level issues with varying degrees of severity that decrease rankings
Bannings - Chances are you’ll probably KNOW when you’re banned - Otherwise it’s most likely a penalty or filter. Chances are the only ways to get outright BANNED for dupe content is to be cloaking others content, being guilty of violating the DMCA, or other severly aggregious offenses where you KNOW what you did. If someone did this with your branded site, you better start practicing your grovelling, and develop the story of how it’s all your shady seo’s fault. Bannings are a pleasant way of someone telling you that you’re f*cked
From yet another fantastic PDF, we get some insight on shingles (why in the world do smart people insist on using pdf’s?).
Shingles
A k-shingle is a sequence of k consecutive words
- The quick brown
- quick brown fox
- brown fox jumped
- fox jumped over
I think TRUST plays a big role in determining which sites/ pages are flagged for resemblance checks of shingles. Meaning not all sites are held to the same standards - which also make it impossible to ever predict the magical percentage. Basing reviews on trust is probably one of the biggest helps to reducing the sample size, which in turn, reduces the processing power required for such a massive amount of data relevance interpretation. All documents will build a set of associated shingles over time. This is why delivering content strategically based on what is the unique content of your site is so important..
Jake’s Top 6 Duplicate Content Mistake
Courtesy of Mr. Baked’s Duplicate content presentation and Barry, and Mike’s coverage of the session, comes the top 6 duplicate content mistakes:
To get Jake’s fixes, you’ll have to attend his Duplicate content session at SES San Jose.
Don’t have the same content indexed in two places!!! Be consistent with your linking structure!
Robots.txt - Site Level
Don’t let the bots near your folders of duplicate content - keep it all in one place for users, and don’t let the bots near it.
Meta robots tag - Page Level
Using variations of the robots tag and allowing spiders to index, noindex, follow, or nofollow the given pages
Rel=Nofollow - Link/Block Level beta
Okay, this is what I’m talking about as a potential positive for rel=nofollow. The trouble is that support for it from the engines is shaky at best, and I’ve been shown examples where it flat out doesn’t work. I wouldn’t rely on it at this point, but it’s probably worthwhile to use it as a failsafe to keep spiders from getting to certain areas of your site.
Other Remedies for Duplicate Content
I-frames with 0 border - put the duplicate content on a seperate page and use the noindex, nofollow.
Text to image - Thanks to Web Professor.
Invisitext - Another brilliant script from Web Professor.
Just like reciprocal links, poor titles, run of site links, and a multitude of other SE variables, duplicate content occurs naturally on the web sometimes. It is not inherently a bad thing that can hurt your site. What CAN hurt your site is not having an understanding of how to handle duplicate content, and having spiders spend time indexing your duplicate content when they could be grabbing your good unique value added content.
To truly understand duplicate content issues, you need to learn what the problems associated with duplicate content are from a SE perspective, as well as HOW they are trying to remedy those issues. Understanding the strategies the SE’s are using to improve relevance (in this case, trying to de-dupe their index), is important to developing strategies for new and existing sites in the months and years to come.
More duplicate content reading and resources
More Whitepapers on Duplicate Content (PDF’s and required registration)
Getting hit by cars hurts. Getting hit by traffic feels good. Playing around with youtube today and digging through it a while, I realized that some people think that getting hit by cars may actually be fun. I must say that I definitely do not encourage nor endorse getting hit by cars. With that in mind, I thought I’d give you the top 10 reasons to get hit by traffic, and not cars. The below video got #1 - but there are 9 other reasons available in my favorites. If you’d like to download any of these youtube videos, you can use the download youtube videos firefox plugin
I think Juan, Johnny, and Joe get honorable mention.
If you’d like to watch for other stupid videos that I take, here’s my youtube profile.
1. Three letters - ROI
2. Every time I click on a blog result it takes me to the homepage, which stays in cache and becomes duplicate content.
3. If a blogger serves content selectively it is “cloaking”
4. Most people that talk like there are a lot of people listening are kind of as*holes.
5. Most blog searches aren’t efficient enough to fight spam effectively and scale.
1. Testing
2. Automation
3. Communication
4. Aggregation
5. Organization