Blog - June 4, 2007

Thoughts on Wal-Mart and ISP Data

Last month in the Wall Street Journal, I read an article by Carl Bialik, aka “The Numbers Guy,” in which he mentioned the use of retailers’ UPC sales scanner data in the consumer packaged goods industry for measuring manufacturers’ market share trends. According to The Numbers Guy “Industrywide U.S. sales data have been hamstrung by the omission of the country’s biggest retailer since 2001, when Wal-Mart stopped sharing its figures.” As you can imagine, because of the importance of Wal-Mart, the omission of their data in the syndicated tracking services raises major issues of accuracy and reliability of the data for gauging market trends.

The article reminded me that the need to build a representative sample was foremost in our minds back in 1999 when we first began designing the Comscore service. We considered the option of building a database by trying to obtain clickstream data directly from the ISPs, but abandoned it because we decided that the likelihood was too high that we wouldn’t be able to obtain cooperation from all of the major providers. Because of this, we concluded that an Internet user sample obtained directly from ISPs could face valid criticism if consumers’ online behavior turned out to be different across the various ISPs -- and if the major ISPs weren’t represented in the sample. Think of the need for clickstream data from each ISP in an online audience ratings service as being analogous to requiring sales data from each retailer in a CPG sales tracking database. It was for this reason that we concluded it was necessary for Comscore to directly recruit a panel of Internet users so that we could be sure we had representation of all ISPs in our database. Now, eight years later, I thought it would be interesting to examine the wisdom of that decision by examining consumer behavior across the major ISPs -- as measured through the Comscore database.

I decided that examining consumers’ search behavior would provide a rather interesting case in point. I started by looking at Google’s share of the search market within the major ISPs:

As the above chart reveals, Google’s market share varies by more than 20 percentage points, from a high of almost 60 percent within Comcast ISP subscribers to a low of slightly more than 30 percent among AOL users. That’s evidence enough to conclude that without representation from the major ISPs one would have a skewed sample when it comes to measuring the performance of the search market leader.

But, wait, there’s more.

Next, I looked at the share that each of the other major search engines -- Yahoo, MSN and AOL – held within their “partner” ISPs. That is, SBC in the case of Yahoo, Verizon in the case of MSN and AOL in the case of AOL:

I was initially taken aback by the degree to which consumers’ online search behavior varies dramatically by ISP. Specifically, each search engine’s market share within subscribers to its partner ISP is far, far higher than the engine’s overall market share. Take MSN. Its overall share of searches is 10.3 percent. Yet, within subscribers to Verizon’s ISP service MSN’s share climbs to 39.4 percent! But, as I thought about it, I realized that these results simply reflect the preferred relationship between the ISP provider and the particular search engine. As an example, the terms and conditions of the SBC high speed service specify that: “AT&T Yahoo! High Speed Internet is provided by AT&T Internet Services with customized content, services, and applications from Yahoo!” It’s easy to see how a user of SBC’s ISP could become more oriented to the use of Yahoo’s search engine than other engines – especially if they’ve installed the Yahoo! toolbar. A similar dynamic exists with the other search engines and their partner ISPs.

My second reaction was to breathe a sigh of relief that back in 1999 Magid and I had the foresight to not go the route of trying to acquire user data directly from the ISPs. Without clickstream data from the important ISP providers, my analysis clearly shows that an online audience measurement service can be decidedly biased.

More About