Blog - 11 juillet 2011

Staying Ahead of Invalid Traffic in Digital Audience Measurement

Brian Pugh

Chief Information Officer
Comscore

Since the dawn of the Internet, spam has been one of the most prevalent issues plaguing legitimate digital businesses. This persistent pest exists in a variety of very familiar forms, such as email, link and search spam. In Comscore’s business of measuring the digital world, we have recently encountered new varieties of spam affecting digital audience measurement that we are taking proactive measures to address. In so doing, we have developed new capabilities and proprietary technologies that have significantly improved our already rigorous approach.

Comscore has more than a decade of experience developing world-leading methods in digital audience measurement. At the root of sound audience measurement is the ability to accurately quantify the behavior of people, which is very different than the simple counting of digital cookie-based traffic that appears in site server logs or web analytics data. Comscore’s Unified Digital Measurement (UDM) methodology, which combines our global 2 million person panel with publishers’ site server data, provides us with a unique ability to differentiate behavior between people and site server logs and refine our audience estimates accordingly. This capability also enables us to employ proprietary methods to identify, quantify, and filter out the portion of server traffic that originates from bots, spiders, and other non-human sources.

Comscore’s interest is in measuring digital consumer behavior, which means that we credit page views, visitation, sessions, duration, impressions and other metrics only when (1) the user took an explicit action to view the content, and (2) the content was rendered to the user in the foreground. These conditions are consistent with the audience measurement guidelines set forth by the Media Ratings Council (MRC) and Internet Advertising Bureau (IAB). People – not server calls – are the ones who consume content, view ads and, ultimately, buy products and services, so it is the behavior of people that we aim to accurately measure.

Unfortunately, there are ways to mimic the behavior of people through invalid page view generation, which intends to inflate publishers’ audience numbers for the purposes of generating additional ad revenue. Ad impressions might be generated either to bots, or to users who never intended to navigate to a particular web page, leaving advertisers to foot the bill for these impressions of dubious quality. Obviously, such activity should not be counted as valid traffic, and Comscore takes elaborate steps to ensure that these schemes get detected and are appropriately filtered out from our audience measurement data.

The Nature of Invalid Traffic
To understand the nature of this invalid traffic, it is helpful to start with a better understanding of the incentives and motivations of those in the advertising value chain. Certain players, such as Pay Per Click (PPC) providers, are paid solely based on their ability to drive traffic to publisher sites. Publishers want to sell advertising, so they’re willing to pay to acquire visitors as long as they ultimately profit from the ad revenue generation. There is nothing inherently wrong with this sort of arrangement, provided that the traffic generated is legitimate.

Unfortunately, some companies have been actively seeking to manipulate this ecosystem. Using a variety of mechanisms, they attempt to fake or redirect traffic to artificially inflate ad impressions, visits, or other measures that drive monetization.

There exist today a variety of ways in which web traffic can be inflated through non-human means that might at first glance appear to be legitimate, but which, on further investigation, can be seen to be inappropriate. The most common mechanisms include:

Generating background web calls to mimic real visitation
Phantom surfing calls
Malware-infected computers that hijack browsing
Bot networks creating mass visitation

Background Measurement Calls
One technique is for a site to deploy a call to a measurement analytics server with inaccurate data on an otherwise legitimate website. In the course of browsing a website, inappropriate site analytics or audience measurement pixel calls embedded in the site code are surreptitiously made and processed in the background, either once or at predetermined intervals. The aim of these calls is to credit other websites with new traffic even though the user never requested or viewed any of the other external sites’ content. In simple terms, a user browses a website and, during normal usage, multiple URL calls to other external sites are initiated from the user’s computer so as to inflate traffic to the other external websites.

Phantom surfing calls
Some firms will direct traffic to another site that the user did not intend to visit. For example, in the course of browsing a streaming video website such as an adult content site, Javascript embedded in the website’s content can be used to cause the user’s browser to request and download entire pages from a third party website in a way that is invisible to the user. In many of these cases, the browser opens a new window in a 1x1 pixel iframe and redirects the traffic to it. This means the phantom activity is unobservable to the user; however, the calls are being generated in the background with full page loads for the third party site including the full page, the full ad load, analytics calls, etc. For all intents and purposes, this traffic looks real and legitimate to the publisher and advertisers at the end of this daisy chain, but was never seen by an actual person. Further engagement can also be faked using this method.

Malware
Malware, which is typically contracted due to vulnerabilities in a user’s operating system or browser, is also frequently used to generate erroneous site visitation. These computer viruses are designed to manipulate the flow of traffic on the web. One example is redirect viruses; instead of having misleading links or pushing a user through intermediate websites, the redirect virus actively hijacks the links clicked on by the user. For example, a user conducts a search query on a search engine and decides to click on a result. Instead of arriving at the intended destination, the redirect virus forces the user to another website. The click is also laundered through a network of sites to generate clicks and therefore revenue for the party involved with the hijack.

Botnets The next most common method is the use of bot networks, also known as botnets. A botnet is a network of infected computers which can be used to perform distributed attacks online. The recent denial of service of attacks against Visa, Paypal, and other large web entities were perpetrated using botnets. Besides distributed attacks, a botnet can be used to direct a large number of users towards a website.

Comscore’s Solution
Comscore is committed to leading the way in developing the most advanced technologies for filtering out invalid clicks and traffic. We are compliant with the IAB’s guidelines for what constitutes a legitimate click, and for prevention of click fraud. The audit standards applied by the MRC require that measurement companies filter out traffic originating from the IAB list of bad agents; they also require that we proactively and continuously work to identify and exclude suspect and non-human traffic not on the IAB list. Comscore’s processes for identifying and removing non-human traffic were a part of the audit of Comscore Direct, which received MRC accreditation in March.

Over and above the standards set forth by the MRC and IAB, Comscore has developed proprietary algorithms for invalid click detection. Comscore has several additional assets that allow for more sophisticated detection of invalid traffic. The Comscore panel allows us to see the full browsing activity of millions of panelists, which provides insight into the complete clickstream behavior leading up to any site visit. (Unlike Comscore’s panel-based methodology, site-centric analytics services are limited to only seeing the immediately previous site visited if the referral is even available. And, in many cases, that referral is unavailable or obfuscated through a series of hops to hide the real origin of the traffic.) In addition to the panel, Comscore’s census network, which reaches 95% of U.S. computers, provides deep inspection of the site-centric analytics data that publishers use. The combination of these assets gives Comscore the ability to detect push traffic, malware influence, automated clicks through bot nets, and other mechanisms that can inflate audience and ad measurement beyond the ‘user–intent’ definition that the industry needs to follow.

What Can Publishers Do to Protect Themselves?
Comscore is committed to help the industry’s charge against these illegitimate practices. Make no mistake about it, this is an arms race that will constantly demand ingenuity in fraud detection. All of us must remain ever vigilant about the constantly evolving nature of these practices. We believe the industry’s interest, including advertisers and publishers, is best served by preserving the integrity of the ecosystem, and working together to neutralize practices that might inhibit the industry’s continued growth and development.

There are a few ways to protect against this fraud. The first is for publishers to keep a close eye on their web analytics data for clues that invalid traffic inflation is occurring. For example, the clicks generated by a particular third-party might at first glance appear similar to any other click, but their downstream metrics, like site engagement and conversion, will look significantly lower. If a publisher knows that its average conversion rate is 2% but all the clicks coming from a certain provider have a conversion rate of 0.01%, that’s a good potential clue that the provider is steering users to a site they never intended to visit. Another clue might be that the search terms driving traffic to the site have nothing to do with that particular business, an indication that search clicks are being hijacked. Publishers should have good forensic web analysts look into the sources of traffic, searching for these clues as a first line of defense against this type of fraud.

If such clues are found, the next step should be to ask the third-party traffic providers how they are generating traffic for the site. They ought to be able to clearly explain their methodologies. Those engaging in fraudulent activity will not be able to do so.

Some Final Thoughts
Invalid traffic exists to the detriment of the entire industry. Even those who might experience short term benefits, such as being able to claim a higher audience to advertisers, need to understand that invalid practices are being actively identified and might face the reputational downside of knowingly engaging in such practices.

It is in publishers’ best interest to actively employ forensic web analysis to ensure that no one is unintentionally supporting invalid clicks. If such activity is discovered, publishers should immediately reexamine relationships that are enabling this activity. Comscore will continue to be proactive in identifying invalid traffic and alerting affected publishers when such threats are found. By partnering together, we believe we can stay ahead in this dangerous cat-and-mouse game.

Plus

IVT