Keeping Ahead of Non-Human Traffic: A Q&A with Brian Pugh
The media has been abuzz lately with news stories about invalid web traffic – commonly referred to as fraud – that can wreak havoc on ad campaign delivery and diminish the value of digital as a successful advertising medium. The recency of this coverage may imply to the industry that non-human traffic, which is often a result of bad actors trying to game the online ad market, is a relatively new scourge to the digital advertising ecosystem. However, the reality is that invalid traffic has plagued the internet in one form or another for more than a decade, and comScore has been measuring and filtering out this traffic from our products from the start, explained in our blog posts from 2011 and 2012.
With this latest wave of news coverage, we’ve received questions from many of our U.S. clients about how comScore measures and reports on this traffic – specifically in our comScore validated Campaign Essentials™ (vCE®) and comScore validated Media Essentials™ (vME™) products, which validate ad inventory from the media buyer and seller perspective. To help speak to these concerns, we thought it would be a great time to sit down with Brian Pugh, comScore’s SVP of Audience, to ask him some of the most common questions we’ve gotten and to learn more from him about some forthcoming changes to our vCE and vME methodology, which will be incorporated in the 2.0 versions of the products.
Here’s what Brian had to say.
comScore: What does the media mean when it refers to fraud in digital advertising?
BRIAN PUGH: The industry has commonly been using the term fraud to describe any type of non-human or invalid traffic source that inflates site traffic and ad delivery data. The language implies that all sources of invalid traffic are intentional and malicious, but that isn’t necessarily true. Yes, there are definitely bad actors who purposefully try to game the system, and these tend to be the dominant cause of problematic digital measurement. But there are other forms that can cause the same negative outcome without being intentionally fraudulent or illegal.
Many sources of non-human traffic are a natural byproduct of some common online business practices – such as the use of spiders or bots to gather data, index web pages for search or determine page content for contextual ad placements. The IAB publishes a list of known spiders and bots so that these types of non-human sources can be filtered out of measurement to enable site traffic and ad delivery data to be appropriately measured. While such activities may not be fraudulent, they should be filtered out of measurement as invalid traffic. As such, comScore doesn’t use the term fraud when speaking about invalid traffic. Instead, we use a broader term: non-human traffic or NHT.
comScore: There has been recent media coverage citing a stat that over 60% of Internet traffic is non-human. Is this consistent with what comScore has seen?
BRIAN PUGH: No, at least not if you are evaluating NHT for the overall internet. A lot of times these high numbers get picked up by the media because they ring alarm bells if taken at face value. But it doesn’t mean the data is necessarily sound or representative. Measurement providers in the industry may be tempted to publish the more sensational numbers because they know NHT is a hot issue that will generate buzz, but using non-representative figures to make a point can ultimately be to the detriment of the industry. I always caution clients to take this type of data with a grain of salt, because it can lack necessary context.
For example, if a measurement provider is reporting a 60% stat, its data likely skews toward long-tail sites, which on average have higher levels of NHT. The provider may not be properly accounting for – or even including – premium publisher sites, which tend to have very low levels of NHT. One place this problem gets exacerbated is on the exchanges, which commonly have a disproportionate number of long-tail sites. If an exchange reported the percentage of their inventory that was NHT, it would likely be a lot higher than the number for the overall internet.
Given the depth and breadth of our measurement at comScore, we are able to get a more comprehensive read of NHT levels across all sites, content areas and categories, and this allows us to report accurately the amount of NHT in the market.
comScore: Are there different types of non-human traffic?
BRIAN PUGH: Yes, definitely, and the various types have evolved over the years because the bad actors are forced to adapt their practices once they’ve been identified. If you look back to 2002, when comScore introduced Media Metrix and first began measuring NHT, audience numbers were being inflated through things like pop-ups. When this first started, we were the only ones measuring it, so we were the only ones around who could raise our hand to call out a bad actor. We devoted teams to finding websites who inflated their audiences and developed techniques to filter them out of our Media Metrix reporting.
As time went on, technologies became more sophisticated, and we started to see botnets and adware used around ad inventory. Bots and adware can drive up traffic in different ways and for different purposes, the simplest being to generate traffic on a page. But there are more sophisticated methods as well. For example, we’ve seen them used on e-commerce sites to trigger retargeted ads. Bots will go into the e-commerce site and view items or even put some into a shopping basket, and then the e-commerce site will target that cookie with ads. Their activity gave the cookie attached to the bot a higher value because it had targeting potential, which enabled them to make more money from it.
A significant breakthrough in our ability to detect many of the newer forms of botnets came from our launch of Unified Digital Measurement in 2008, which gave us census-level data assets in addition to our existing measurement panel. This added scale to our measurement, allowing us to extend the insights from our panel to a much broader range of online activity. More sources through which to detect patterns and reconcile data allowed us to get even better at NHT detection and filtration in our products.
comScore: How does comScore measure and filter out non-human traffic?
BRIAN PUGH: comScore’s NHT filtration technology in vCE and vME is the result of years of honing our methods. We’ve been filtering NHT longer than any other digital measurement company in the market, and we continue to adapt our practices to ensure our measurement remains relevant. We are getting ready to release our latest enhancement to our NHT reporting in vCE and vME, which enables our products to account for the latest techniques in creating invalid traffic and filter them accordingly.
I should mention that we don’t like to provide too much detail because the bad actors can use that to try and game the system, and I’ve seen firsthand how being overly communicative about NHT is like giving them a playbook. The way I typically talk about our measurement focuses on three core data assets that drive our methodology. These data assets include the comScore Ad Tag that we put on every ad that we measure, our comScore Census Network, which covers 85% of all devices globally and measures over 1.7 trillion interactions a month, and our global 2 million-person panel. The panel is particularly important, as it enables us to reliably distinguish real human behavior from non-human activity. Using the panel, comScore is able to passively measure the effects of malware on surfing behavior and create deterministic ways of isolating events that are initiated by NHT mechanisms. The panel is also geographically dispersed, enabling a scalable way of longitudinally measuring content on publishers without the need to rely on scrapers, spiders, and honeypots alone.
Using these three assets, we have built a practice that identifies many types of non-human traffic and create detection methods for them. We are able to study the patterns non-human traffic exhibits on an immense scale, learn the technology and effectively create methods that remove it from our core products like Media Metrix, Video Metrix, vCE and vME. This detection technology enables us to isolate NHT more comprehensively than other solutions in the market, many of which rely exclusively on the use of ad tags and spiders or bots for their measurement. Spiders and bots only see a fraction of what the panel sees, are often served a different version of the site, and can ultimately contribute to non-viewed impressions.
comScore is focused on event-level detection where an observed measurement can be confidently classified as non-human. Code invoked methods from our java-script tags as well as behavioral algorithms are used to meet our objectives. We do not rely on black lists developed from some proportional measurement of suspicious activity on websites because this will overstate the NHT levels and create false positives.
comScore: Can you tell us more about the enhancements happening in vCE and vME?
BRIAN PUGH: As it relates to non-human traffic measurement, there are a few notable changes involved in the upcoming enhancement. The first relates to the Viewable Ad Impression Measurement Guidelines that the MRC released earlier this year in collaboration with the IAB. These guidelines, which aim to bring consistency to measurement practices across viewability vendors, recommend that non-human impressions are counted as not viewable for any and all viewability reporting. So, the first part of this enhancement is that we have altered our processes to ensure our alignment to new standards.
Second, as part of our standard NHT measurement best practices, we have enhanced our technology for NHT detection to include the incorporation of new, advanced techniques that we believe are on the cutting edge of the industry. This includes the addition of new algorithms to address the latest NHT techniques we have observed in the market. These changes will result in more thorough and up-to-date NHT measurement in our products.
Finally, we have made some adjustments to our reporting in order to give clients a clearer read on impressions that are free from non-human traffic. In my opinion, one of the most significant of the changes is the inclusion of a Human GRP. This metric removes all NHT from the calculation, provides a truly apples-to-apples comparison to TV ratings. After all, there aren’t a lot of spiders and bots out there hiking up TV ratings, why should a digital GRP be subject to such a situation? This metric provides that comparability that the industry continues to rely on as they evaluate their digital investments relative to other channels.
comScore: You mentioned that NHT is removed from Media Metrix and Video Metrix in addition to vCE and vME. Is having this measurement across products important?
BRIAN PUGH: Yes. Another thing that distinguishes comScore measurement is that we filter NHT across the scope of our solutions. Rather than only measuring campaigns or only measuring audiences, we’re a holistic provider that measures both. Media buyers can use Media Metrix to accurately understand the human traffic on the websites before they buy, then use vCE to track the human impressions their campaign gets on those sites. Media sellers can use this same technology in vME to understand the human and non-human traffic on their inventory. This was done by design to offer clients consistent detection so that they can track NHT in an apples-to-apples way at every step of the process.
When we counsel clients about NHT, we encourage them to start looking for it from the beginning of the media buying process. If you see a ton of inventory on an exchange but the numbers don’t match Media Metrix, then that should raise a red flag. Buy inventory that you can expect to drive a lot of human traffic, and then use a campaign measurement tool like vCE to monitor your campaign delivery for NHT. There are providers that offer these separately, but that makes it difficult to look at the process from start to finish. This is why it is critical for us that the underlying methods for NHT detection are consistent.