Archive | January, 2012

How to optimize your sites for Google’s new “Page Layout Algorithm” update

Posted on 20 January 2012 by Scott Blanchard

A few days ago, I wrote a blog post about Google engineer Navneet Panda’s work to design a real time web page classifier for text and image data and suggested that web page layout may indeed be a ranking factor under this new “visual search” technology.

Right on cue, Google launched an algo update today called the “Page Layout Algorithm“ that utilizes this technology.

Specifically, the update aims to diminish thin sites with little content value in which the ads push the content below the fold and make it difficult for users to find meaningful content. If you’ve ever tried to download a printer driver or appliance manual, you’ve seen the worst of this.

From Google’s Matt Cutts:

As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites may not rank as highly going forward.

With this in mind, I’d like to demonstrate how site owners and Adsense publishers can leverage ClickBump 5‘s Ad Layout manager to maximize site rankings under the new algorithm, utilizing Google’s “Best Practices” for ad layout.

Specifically, there are two ad layouts that Google’s Adsense guidelines specifically recommends:

Option 1 > Ad wrapped above the fold:

To use this layout with ClickBump 5, go to “ClickBump > Ads > Ad Position” and choose the 2rd wrap option to align ads to the right side of text.

Option 2 > Ads positioned below the initial content block:

This layout can also be utilized with ClickBump 5. There are a few more steps with this method, and I’ll likely make it a 1 click option in the next update of the ClickBump 5 theme framework, but you can achieve this layout now with the following settings:

ClickBump > Ads > Ad Position > (choose the 1st layout option with the text wrapped to the right side of ads)

ClickBump > Ads > Ad Headroom > (enter 200 into this field. This will move ads down 200 pixels below the opening paragraph of content.)

ClickBump > Misc > Custom > CSS (enter the following line of css into the field)

.adsense336 {width:100%; padding-top:20px;}

Update: As of version 5.3.1, which was released on 1/25/2012, you can now position ads below the fold (ie, after X paragraphs) into your content. For example, you can choose to place ads after the 1st, 2nd, 3rd, 4th, or 5th paragraph as well as at the beginning or end of content, with one click.

Now that you know what TO do, I’d like to share with you an ad layout that you should avoid:

You can see with this layout that the content has been pushed below the dotted line with represents the “above the fold” area. You should avoid utilizing this layout as it may adversely impact your site’s ability to attract search traffic.

I look forward to your comments on other ways to maximize ad revenues while maintaining best practice ad layout guidelines. Its interesting to note that since this announcement there has been much chatter regarding Google’s own abuse of these new layout guidelines with sponsored ads above search results. I suppose its a classic case of “Do as I say, not as I do”.

Danny Sullivan over at searchengineland.com has a comprehensive write-up about the Google update along with Q&A with Google’s Matt Cutts on what it means and I’d encourage you to check it out along with my recent post on how to maximize site rankings under Panda.

Comments (19)

6 key ingredients to Wow site visitors and please Panda

Posted on 14 January 2012 by Scott Blanchard

I’ve been doing quite a bit of research into the Google Panda update in order to determine how to maximize the “trust/value/relevant” signals that are common to the best ranked websites and minimize the “untrusted/low value/not relevant” bad signals that can cause the panda to take a bite out of your site rankings.

There are several ingredients to a successful web page in the age of Panda. First and foremost, your goal should be to create pages that “Wow” users with useful information about the subject matter.

Do this and Panda will reward you as well. For example, do a search for “Flaxseed Oil Benefits” and you will land on a #1 ranked site running ClickBump 5 that does just that.

There is an abundance of evidence, both from website owners and from published research papers from current and former Google engineers, that indicate a strong correlation between bounce rates and user satisfaction (and by extension, lower bounce rates – higher site rankings).

In a paper titled “Predicting Bounce Rates in Sponsored Search Advertisements“, four Google engineers provide some concrete evidence that bounce rate can be utilized as a reliable metric and indicator of quality and user satisfaction:

Kaushik claims bounce rate is important for advertisers to monitor because a user who bounces from a site is unlikely to perform a conversion action such as a purchase. He suggests that high bounce rates may indicate that users are dissatisfied with page content or layout, or that the page is not well aligned to their original query

What is “Search Bounce Rate”?

When we are discussing bounce rate in the context of search, we are specifically referring to the activity of a user who clicks on a link from search and immediately returns back to the search page to perform the same search or click on another search result. In contrast, the bounce rate of a given page within a website has totally different meaning. A page in which all users visit only once, and they all click an outbound link (ie, an Ad or affiliate link) and never return again, would have a 100% bounce rate, however, it would be a very effective (and lucrative) page and still may rank very highly in Google because the “Search bounce rate” may be close to zero.

There is mounting evidence that Google pays close attention to “search bounce rate” as a key indicator of landing page quality.

So, how can we increase user satisfaction and decrease search bounce rates?

I want to share six essential elements that you should strongly consider implementing on your website immediately. These elements help to convey trust, increase user satisfaction and decrease bounce rates. Most importantly, they are in accordance with Google’s recommended guidelines for quality.

  1. A logo, owner photo or header graphic - Websites that are trusted and established have logos and/or header graphics relevant to their target audience. Websites that do not have logos or compelling header graphics are instantly seen as possible spam sites.
  2. An illustration, diagram or video within the content (as close to the top as possible) that helps explain the concept, product benefits, or information you are presenting. Diagrams are preferred to photos since they tend to be custom fitted to the content, rather than simply lifted from stock or web galleries. Make sure to include “Alt” text that describes what the graphic represents.
  3. An author byline- A byline is one of the easiest and most effective ways of conveying trust to the user. It gives the user a perspective with which to ingest the content from a human perspective. For example, as opposed to merely “reading a web page”, the context is shifted to “Reading Julie’s evaluation of this product”. This feature has been added as a default option in that latest update of ClickBump framework, version 5.2 r6.
  4. Over-deliver on content value. This begins with understanding the minimum metric for valuable content: content length. My previous absolute minimumrecommendation for a single article has been no less than 350 words. I’m raising that to 750 words and suggesting a target of 1000-1500 words for maximizing content value.
  5. Use named anchors to create an index of your page - to create a searchable index within your page, create jump links to identify each section of your page - Named anchors within your page are divisions which have jump links. Google now indexes these named anchors as independent links under the same search results. You can see an example here for trans fat as well as for the search prostate cancer.
  6. Use no-index on low quality pages – Make sure that your low quality pages are set to noindex and links to these pages are set to nofollow. Also, pages such as “About Us”, “Contact Us”, “Privacy Policy”, “Earnings Disclaimer”, etc are all candidates for nofollow/noindex. We now know that Panda looks at all pages on your site as a whole and pages that are of lower quality can negatively impact your pages that are of higher quality.

What does a high quality site look like?

Here’s a couple of sites which use the ClickBump framework to provide informative, engaging, credible and trustworthy content. These are the types of sites that we are talking about:

http://beyondmds.com/

Above: Beyondmds.com utilizes ClickBump 5 framework with the “XFactored 2.0″ Skin

http://nomorediabetes.org/

Above: nomorediabetes.org utilizes ClickBump 5 framework with the “WikiClicks” Skin

Comments (3)

SVM – The Secret Sauce inside Google’s Panda

Posted on 13 January 2012 by Scott Blanchard

In this article, I’d like to share a bit of what I’ve learned in preparation for a book I’m writing on building high ranking websites that Wow users.

To set things up, lets take a look at a very enlightening conversation that Wired magazine had with two of the engineers responsible for Search quality at Google, Matt Cutts and Amit Singhal

Wired.com: What’s the code name of this update? Danny Sullivan of Search Engine Land has been calling it “Farmer” because its apparent target is content farms.

Amit Singhal: Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible.

Wired.com: What was the purpose?

Singhal: So we did Caffeine [a major update that improved Google’s indexing process] in late 2009.  Our index grew so quickly, and we were just crawling at a much faster speed. When that happened, we basically got a lot of good fresh content, and some not so good. The problem had shifted from random gibberish, which the spam team had nicely taken care of, into somewhat more like written prose. But the content was shallow.

Matt Cutts: It was like, “What’s the bare minimum that I can do that’s not spam?”  It sort of fell between our respective groups. And then we decided, okay, we’ve got to come together and figure out how to address this.

Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?

Singhal: That’s a very, very hard problem that we haven’t solved, and it’s an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we’ve developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?”

Cutts: There was an engineer who came up with a rigorous set of questions, everything from. “Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?” Questions along those lines.

Singhal: And based on that, we basically formed some definition of what could be considered low quality. In addition, we launched the Chrome Site Blocker [allowing users to specify sites they wanted blocked from their search results] earlier , and we didn’t use that data in this change. However, we compared and it was 84 percent overlap [between sites downloaded by the Chrome blocker and downgraded by the update]. So that said that we were in the right direction.

Wired.com: But how do you implement that algorithmically?

Cutts: I think you look for signals that recreate that same intuition, that same experience that you have as an engineer and that users have. Whenever we look at the most blocked sites, it did match our intuition and experience, but the key is, you also have your experience of the sorts of sites that are going to be adding value for users versus not adding value for users. And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons …

Singhal: You can imagine in a hyperspace a bunch of points, some points are red, some points are green, and in others there’s some mixture. Your job is to find a plane which says that most things on this side of the place are red, and most of the things on that side of the plane are the opposite of red.

Now, As you may or may not know, on February 24, 2011, Google rolled out its most significant algorithm change since it first launched in the late 90′s. The update adversely affected about 12% of all US based websites in its index. A huge impact if you were one of the 12 percent.

This update has been code-named “Panda” after Navneet Panda, an Indian born software engineer at Google. The important thing to note here is that Navneet is the person primarily responsible for the “breakthrough” that Singhal describes above, hence the name.

Through my research, and verified by Singhal’s last comment above. I’ve identified that this breakthrough is based on SVM or Support Vector Machine computational analysis. So, in essence, Panda = SVM.

Want more evidence? Here’s a snippet from Navneet’s current resume at the UCSB website under “Projects > Machine Learning” (highlights added for emphasis):

• Development of indexing structures for support vector machines to enable relevant
instance search in high-dimensional datasets
• Speeding up SVM training in multi-category large dataset scenarios
• Speeding up approximate SVM classification of data-streams
• Design of a real time web page classifier for text and image data

So, you can see many references to SVM and machine learning concepts, all pretty wonky stuff, but the basic concepts are outlined by Matt and Amit above in laymans terms:

Google has basically created a dividing line or “hyperplane” representation of URLs for any given search term in which on one side are the quality sites and on the other are the bad.

The way it does this is extremely clever. It all starts with google’s vast army of human “quality raters”. These are contracted workers, normal people like you and me …

Aside: a clickbump owner has indicated to me that he’s just been accepted into the google rating program. I had already obtained a copy of the latest Google quality rater’s guide (March, 2011) and I’m looking forward to getting some actual rater feedback to share with you

These “raters”, as Cutts discusses above, are contract workers who are tasked to evaluate a list of website URLs from Google’s index for a given search query. They are then asked to rate the URL in terms of overall user experience, quality and how well the page answers the target search query. All the things that are very hard for a machine or algorithm to discern – unless you can teach it to behave and respond like a rater.

Contrary to popular belief, the raters do not directly impact the rankings of the individual sites they are rating. Rather, google uses their evaluations to create a model of the common traits (aka, signals or vectors) of websites which are viewed favorably by real people. It also creates a model of the common traits of websites which are found not useful at best or “spam” at worst.

As best I can tell at this point, this “model” is based on non visual indicators (apart from the human rater’s visual perceptions). However, Naveet Panda has done work on using SVM for visual image analysis, and as evidenced by his online resume at UCSB, “Design of a real time web page classifier for text and image data”. Since we know that facial recognition software exists and has advanced to a high degree of accuracy, one could expect that the same technology could be used for rapid comparitive analysis of a target URL screenshot against a known set of “Good” visual indicator cues.

What this means is that its conceivable and entirely possible that Google’s algorithm uses a screenshot of the page as a factor in its evaluation.

Google then uses all of this data to “train” its algorithm to instantly classify URLs as good or bad for any given query. In other words, if the URL exhibits an abundance of the signals common in those websites that fall on the “good” side of the hyperplane and very few of the traits of the “bad” sites, then it can make an extremely accurate prediction of the quality and user satisfaction of the URL. And most important of all, due to the massive size of the index, it can do it without ever visiting any website it passes judgment upon.

That last sentence is very important. As you might imagine, given the size and scope of indexing all of the pages on the web, an algorithm that only requires a glimpse of that index in order to return the top matches for any given search, would certainly be considered the holy grail of search,  that’s the breakthrough that Amit is talking about in the first and last paragraphs of the Wired interview excerpt above, and it can best be represented graphically like so:

hyperplane classification of data Illustration of hyperplane model separating good results from bad: In this case, the solid and empty dots can be correctly classified by any number of linear classifiers. H1 (blue) classifies them correctly, as does H2 (red). H2 could be considered “better” in the sense that it is also furthest from both groups. H3 (green) fails to correctly classify the dots.

I’m particularly intrigued by Navneet’s last bullet point above “Design of a real time web page classifier for text and image data“. I’ll talk more about this later, but it would seem that this statement is much more powerful than its order in the sequence would suggest.

Cutts: If someone has a specific question about, for example, why a site dropped, I think it’s fair and justifiable and defensible to tell them why that site dropped. But for example, our most recent algorithm does contain signals that can be gamed. If that one were 100 percent transparent, the bad guys would know how to optimize their way back into the rankings.

Singhal: There is absolutely no algorithm out there which, when published, would not be gamed.

So, once the site owner has done everything it can to produce content that is highly useful and relevant to the subject matter, the task of the SEO is to determine the most favorable “signals” or “vectors” that place the site as far away from the bad side of the index as possible. One might speculate that given a large enough selection of top ranked websites, you could identify the most common traits (aka signals or vectors) that they share. In the same manner.

In either case, I believe there will certainly be some common predictors of sites on both sides of the index and that given a set of URLs for a given search query, we should be able to more accurately predict potential search rank based on these factors.

I believe those predictors are:

1. Design and user experience: Choose site design templates that are clean, uncluttered and provide easy to understand navigation and categorization of content. A site logo is a trust signal. A header graphic, while not as strong as a logo, is also a signal of quality.

2. Content quality : Its no longer enough just to provide unique content. You content needs to “Wow” users into wanting to share your page with others. It should be rich and add value with photos, videos, illustrations, graphs, comparisons and well-researched data and conclusions. You should strive to tell both sides of the story, the good and the bad so as to remain objective. All of these traits are found in the pages that people like and respond to the best. They provide a strong signal of quality and these are the types of pages that Google wants to push to the top.

3. Encourage Google +1′s, Facebook likes and Twitter tweets: A social profile of a given URL should be easily indexible and a strong indicator of quality and relevance. As such, you should pay special attention to improving your page’s social profile and actively engaging with your target audience through social media.

4. Track user behavior and bounce rates closely - Google engineers have released several studies that show that bounce rate is a very accurate predictor of quality. As such, you should expect that your page’s bounce rate is of critical importance to your ranking. You want low bounce rates and long user sessions. This is a strong indicator of quality and very easy for an algorithmic evaluation. You can bet that its a central part in the Panda formula.

At this point in the discussion, we are only scratching the surface of the inner working’s of Google Panda. And as I continue to dive into SVM and Navneet’s research, I’ll be presenting more information on what we can learn from his work in order to build better websites.

The core takeaway at this point, and fodder ripe for discussion, is that, based on the core work product of Navneet Panda, along with Amit Singal’s comments to Wired.com above, we can be reasonably certain that the algorithm that determines whether your website is on the good side or the bad side of the index, is found in SVM theory.

If you want to do some reading on your own, here are a few of Navneet’s papers that delve deeper into the topic and give us a glimpse of what makes Google panda purr:

Efficient Top-k Hyperplane Query Processing for Multimedia Information Retrieval
Navneet Panda and Edward Y. Chang
ACM International Conference on Multimedia, MM Oct. 2006

Concept Boundary Detection for Speeding up SVMs
Navneet Panda, Edward Y. Chang and Gang Wu
International Conference on Machine Learning, 2006

KDX: An Indexer for Support Vector Machines
Navneet Panda and Edward Y. Chang
(Transactions of Knowledge and Data Engg., Jun 2006)

Exploiting Geometry for Support Vector Machine Indexing
Navneet Panda and Edward Y. Chang
(2005 SIAM International Conference on Data Mining)

Hypersphere Indexer
Navneet Panda, Edward Y. Chang and Arun Qamra
Database and Expert Systems Applications, DEXA 2006

Active Learning in Very Large Databases
Navneet Panda, Kingshy Goh and Edward Y. Chang
(Journal of Multimedia Tools and Applications Special Issue on Computer
Vision Meets Databases (invited submission))

Formulating Context-dependent Similarity
Gang Wu, Navneet Panda and Edward Y. Chang
ACM International Conference on Multimedia (MM), Singapore, November 2005

Formulating Distance Functions via the Kernel Trick
Gang Wu, Navneet Panda and Edward Y. Chang
ACM International Conference on Data Mining and Knowledge Discovery KDD, 2005

Comments (2)