November 20, 2014

A Security + Big Data Approach to Non-Human Traffic Removal

Paul Barford Paul Barford
SVP, Chief Scientist

In order to create comprehensive capabilities to remove non-human traffic (NHT) comScore has been at the forefront of the idea of combining IT security techniques with big data analytics. This approach begins by identifying threats through in-depth investigation of our diverse data assets and leads to the development of NHT filters that are deployed throughout our product lines.

Security Methods and NHT in Online Advertising
At a high level, security is based on considering how adversaries might attack and then building defenses to address potential vulnerabilities.  Comprehensive removal of NHT begins by understanding the spectrum of threats in the online ad ecosystem. It is critical to be broad-minded in this effort and to recognize that adversaries are constantly developing new NHT traffic generation capabilities.

There has been a good deal of publicity on the threats posed by bots (a bot is a compromised host system under the control of a third party) that are used for NHT generation in online advertising. While it’s true that bots play a role in NHT, they are only one of many threats to online advertising. Other prominent threats include human view/click farms, non-bot tools for traffic generation (e.g., simple scripts, plugins, pay-per-view networks, etc.) and mechanisms for hiding, redirecting or obfuscating ads.  Understanding these threats requires diverse data on their deployments, and behaviors compared to valid human traffic on legitimate websites and interactions with ads.  Fortunately, comScore’s long history and experience in digital media measurement provide certain unique data assets that offer invaluable insight into the details of these threats.

Understanding the details of threats enables the development of methods for NHT reporting. We consider the problem of detecting NHT in terms of direct and indirect methods. Direct methods attribute NHT to a single impression or users. An example of a direct NHT detection method is identifying when an ad is rendered in an invisible iframe. Indirect methods are based on anomaly detection techniques. These detection techniques begin by establishing a baseline of normal behavior for a population (e.g., the ways in which real users browse webpages and interact with ads). Next, a threshold is established to signal when behavior for a subset of the population deviates significantly from the norm. A simple example of an indirect NHT detection method is when a small population of users has a huge number of page visits over a very short period of time. The challenge in developing both direct and indirect methods is in gathering data and applying analyses that enable norms and thresholds to be identified.

A Big Data Approach to Identifying NHT
While the term big data is often overused, it is an accurate description of the other necessary approach in detecting NHT. The value of massive datasets is not merely its size, but rather in how that size and granularity enables more powerful analysis – typically through the combination and reconciliation of disparate datasets. Such efforts can involve significant challenges in collecting, managing and analyzing large quantities of diverse data, including configuration and management of infrastructure, design and development of scalable algorithms, and interpretation of results that lead to effective countermeasures.

Fortunately, comScore has a great deal of experience in big data through several unique data assets. Our global audience panel, census network and ad tag deployment provide tens of billions of measurements per day. Signals from these diverse systems provide both the scope and detail necessary to identify new and emerging threats. The challenge is in identifying the NHT needles in the gigantic haystack of online ad traffic.

Leveraging a huge storage and computational infrastructure, NHT filter development at comScore is based on threat modeling, analytic methods and visualization.  Our superb team of engineers and data scientists – well-versed in statistics, signal processing, machine learning, data mining, anomaly detection, and sparse/incomplete data analysis –use processes and tools for exploratory analysis that enable signals in the data to be associated with specific threats. Visualizations such as time-series, scatter plots and animations, and cluster diagrams are particularly useful in these efforts since they reveal signals and relationships that might otherwise be difficult to identify numerically. 

Also critical to filter development is the ability to validate – in other words, to understand the accuracy of filters across broad populations of users and deployments. We do this by using a combination of assets including our panel and census system, and a set of network honeypots, which received only NHT and help to expose emerging threats.

Deploying NHT Countermeasures in comScore Products
The security and big data approaches can be exceptionally effective means of NHT detection, but their ultimate value resides in their ability to be used by our clients within comScore products. In just the past few months, we have made significant progress in integrating this technology into the comScore vCE, vME and Media Metrix product suites, which gives our clients even greater confidence that their ad and audience measurements properly filter the effects of NHT and represent the digital activity of actual human internet users. Ultimately this means planning media more efficiently, measuring campaign performance more effectively, and ensuring that ad dollars are spent more wisely while flowing to the content owners and ecosystem players who deliver real value.

Learn more about comScore NHT triple detection.

Related Products

validated Campaign Essentials

comScore validated Campaign Essentials™ (vCE®) is a holistic ad and audience delivery validation solution that provides deep campaign insights, in-flight reporting and...
Learn More