CEO Perspective:
Comparing the Proverbial Apples to Oranges



An Open Letter to the Industry Regarding Panel Versus Web Log Audience Measurement

A recent article in Mediaweek and Adweek discusses a recurring theme in the online industry, asserting that panel-based audience metrics are inaccurate because they do not match Web server logs. Since Web logs record a site's every visit, visitor and page request, many intuitively view logs as the gold standard. When third party estimates do not match Web logs, it is easy to view this as a weakness.

As always, the devil is in the details. When you scrutinize the details, the answer on why these two measurements don’t always match up is… “it depends.” Depends on what? The exact definition of what is being compared.

Not All Measures are Created Equal
The most obvious metric where Web logs should be correct is page view counts (PVs). Unlike visitor metrics, PVs are not confounded by problems such as using cookies for counting people. Here again, however, it depends! Web logs measure hits, i.e. specific URL's requested from the Web server. Hits need to be filtered properly to get a proper PV count consistent with IAB definitions. Sometimes this can be done systematically, but sometimes it’s difficult to do. Consider these challenging factors:
  1. Multiple iframes in one page can result in multiple hits.
  2. Web servers record pages served, whether or not the page has actually been loaded on the user's screen. Panel-based measurement systems can tell the difference and record pages that actually loaded.
  3. Pop-up ads count as hits in Web logs, but comScore filters them out because they are not requested by the consumer.
  4. Ad requests and ad tracking beacons that resolve at a site's domain are recorded as hits and need to be properly filtered.

A Web log-generated PV count likely falls somewhere between Web hits and real PVs, and will almost always be higher than real PVs. But, the magnitude of the difference can be astounding. For instance, comScore data shows 3 times more hits than PVs for Google. In fact, a difference of up to 300 percent — which on the surface appears extraordinary — may be perfectly legitimate because we are comparing the proverbial apples to oranges. The difference between a PV and hit can be so subtle that detecting it is more akin to telling the difference between a cucumber and pickle! It’s no wonder that people get confused!

Are We Even in the Same Universe?
Differences in the measurement universe can add to the discrepancies. Although Web logs include the following, comScore's panel does not measure usage from public or office-based shared machines (e.g. shared machines at schools, libraries, Internet cafes, group PCs at work, etc.), nor does it include usage from college dorms, government offices, the military, school/university offices, or mobile phones/PDA's. Another common mistake is to compare a panel’s U.S. PV estimate to global hits recorded by a Web log. Additionally, non-user requested traffic such as robot traffic is a problem in Web logs, unless it is properly filtered. In one recent case, it was determined that robot traffic accounted for 72 percent of all Web log records. Finally, new technologies such as RSS and AJAX are increasingly causing additional discrepancies.

Cookies are not People
Comparing visitor counts is even trickier. Two people can share the same cookie if they use the same PC. Conversely, one person can be counted as two cookies if he/she uses two different PCs at home or at the office. Even more confounding, one person using a single PC can appear to the Web server as multiple cookies if the cookies on that PC are deleted or reset. And if a user’s PC doesn’t accept cookies at all, they may be counted every time they visit the site. Finally, one must be careful to compare the same geography, especially the U.S. based traffic to a U.S. based site instead of traffic logs from all countries. That difference alone could be more than 100 percent.

To illustrate the magnitude of these differences without releasing confidential client data, I will refer to a Slate.com article by Paul Boutin published on February 27, 2006, titled: “How many readers does Slate really have?”

Slate's Web logs show 8 million unique users based on cookie counts, while comScore and Nielsen Netratings both report 4.6 million unique visitors (UV) for Slate. However, comScore estimates Slate's worldwide unique visitors at 6.05 million, which is a more relevant comparison, if the 8 million number from Slate’s Web logs includes international visitors.

Even more interesting, when comScore counts the number of unique computers (UC) that visit Slate (the unduplicated number of computers from which a visit to Slate was initiated), we estimate 7.4 million computers in the U.S. and 8.9 million computers on a worldwide basis. So, which of these comScore metrics should be used as a comparison to Slate's internal 8 million visitor estimate? At first glance, the difference between 4.6 million and 8 million seems huge. However, UCs should be more comparable to cookies, albeit still not a perfect match. In order to form a fair comparison, we ought to compare Slate's 8 million cookies to comScore’s 7.4 million U.S. UC estimate, or even the 8.9 million worldwide UC estimate depending on whether the Slate numbers counted international visitors. Either way, the difference versus Slate’s internal number is now much smaller and probably within an acceptable range given all the other factors at play. Is the panel data wrong? Probably not! Is this confusing? Certainly! But, this example illustrates the importance of understanding the exact definition of what is being measured before jumping to conclusions.

The need for transparency is often mentioned as a remedy for this issue. comScore strongly supports transparency and has a number of initiatives under way to provide the industry with greater transparency in what we do, including the MRC pre-audit. Transparency, however, must go both ways.

At the most basic level, transparency requires publishers to disclose what metrics they are using and how they are calculated. All too often, discrepancies are either explained or mostly vanish once we conduct an in-depth review — which enables us to compare cucumbers to cucumbers.

We also frequently report on e-commerce dollars transacted on a site. Dollars are statistically tougher to estimate than PVs because transactions occur far less frequently than PVs, which means that there are fewer sample observations to use in measuring them. Nevertheless, it is remarkable that the dollar differences we see between our data and the clients’ data are typically smaller than PV differences — perhaps because there is no confusion regarding the definition of a dollar!

Finally, the concern about panel quality suffering from cost reductions is just not warranted as far as comScore is concerned. We have never felt better about the size and quality of our panel as we do today.

The Numbers Seem Right… When there’s No Other Comparison
Another issue discussed in the recent Mediaweek and Adweek article is the contrast in the use of ratings between TV and the Web. The author correctly points out that agencies buy Web advertising inventory based on ad impressions delivered by the ad servers, instead of relying on third party audience measurements as in the TV world. This is entirely appropriate. An exact count of impressions delivered by an ad server is more precise than any panel-based metric. On the other hand, the TV industry does not have TV server logs to count how many people view individual TV ads. The only available estimate is provided by panel-based TV ratings — and advertisers have no choice but to use them as a basis for payment.

This is precisely why TV ratings are considered a currency but Internet ratings are not. However, this does not mean TV ratings are more accurate. In fact, the very absence of audience census data for TV gives the illusion of fewer data problems since there are no differences to scrutinize every time one looks at the ratings numbers.

One might liken this to using a single watch to measure time. With only one watch, you really don’t know if the time is off — or by how much — so you have a false sense of accuracy even though the time could be significantly off. On the other hand, if you have two watches, you are almost always going to see a difference between the two time estimates, which leads you to question what time it really is… and which watch is right.

The TV industry measures the world with ONE watch. While that may be comforting, it is not necessarily accurate. As Paul Boutin wrote in the Slate article:

“The more I dig into how Web ratings work, the more I realize people in other media are in denial. Internet publishing is the most finely measurable medium ever invented; broadcast, movie, and print companies have no way of monitoring individual transactions from their end.”

He is absolutely right!

Magid Abraham, PhD
President & CEO
comScore Networks, Inc.