As COVID-19 forced shutdowns and health officials urged Americans to stay home, it didn’t take a crystal ball to anticipate an increase in TV viewership. For instance, local television station viewership experienced a surge as many viewers relied on locally produced news content for frequent coronavirus updates in their communities.
It is generally known that prior to the pandemic, people were consuming more media than ever before on platforms other than traditional TV, such as computers, smartphones, tablets, and over-the-top (OTT) devices. The pandemic may have altered existing viewing behaviors and even created new ones, such as higher likelihood of coviewing (multiple individuals watching the content at the same time on the same screen) among certain members in the captive households. It is therefore reasonable to expect the emerging media consumption landscape to consist of increasingly nuanced and multi-faceted person-level viewing behaviors that are difficult to measure reliably using traditional sample-based datasets. Instead, these behaviors are better measured by mining “big data” sets consisting of billions of viewing records for capturing person-level viewership signals embedded in household-level observations.
Comscore’s new personification solution is designed to do just that – to take full advantage of massive and passive set-top box (STB) audience information to extract reliable person-level audience estimates at granular levels in multiple contexts, including local market, cross-platform, and OTT. In this article, we present the overarching vision and overview of Comscore’s personification solution. Naturally, this solution will evolve over time alongside other methodology components, adhering to the principle of using the best-in-class data assets for each segment of the media landscape.
Comscore’s Vision for a New Personification Solution
Comscore’s cross-platform measurement solutions, including Xmedia and Comscore Campaign Ratings (CCR), have for years been transforming household TV tuning observations into person-level projections. The process known as “personification” is a statistical inference process by which household (HH) level media consumption data is assigned to, or allocated to, the persons within the HH. The process starts with measurement of HH level tuning, obtained
through STB data coming from a number of MVPDs – different local markets have different combinations of MVPDs. This is a massive data footprint consisting of viewership tunes at the rate of billions of events per day coming from more than 75 million STBs in over 30 million households. For a large portion of these households, we have reliable and valid demographic information. There is however a limitation in this household centric measurement - we only know that a TV was tuned to some channel at some time, somewhere in the HH, but not who in the HH is watching. Said differently, the measurement captures three out of four dimensions of viewership: what was watched, when it was watched, how much was watched, but not who watched.
For the past several years, Comscore’s approach to personifying (i.e., predicting who watched to fill in the missing fourth dimension) the TV viewership of national cross-platform measurement has leveraged a panel-based training dataset that was licensed from a third-party provider. The training dataset comprised of a small sample (a few thousand households) of live and time-shifted TV viewership at the person-level and served as a good-for-fit dataset for the purpose-at-hand. However, it had some serious limitations in personifying a wide variety of content in a reliable manner due to insufficient coverage at various levels in the content hierarchy (ex: Network, Series, Programs, or Genre) and uneven quality of the panel sample in general. With the reality being there exists no sample-based training dataset that is big enough to personify the vastness of content available and viewed today, the Analytics & Innovation team at Comscore decided to pursue a bold approach using big-data analytics and modeling to solve the personification problem.
Comscore’s new personification solution is based on the vision that a household-level measurement of TV viewership when performed at scale contains encoded signals that can be used to infer the demographics of the viewers. What do we need next? A statistical “decoder” that can extract those signals and predict the demographic composition of the viewers. How? This is where massive, passive, and deterministic household viewership information combined with known household rosters and Bayesian statistics makes the difference. Consider the below illustration indicating one household out of the 30+ million STB households that Comscore is able to measure on any given day.
In this example, we see a four-member household with two adults and two kids, each of them belonging to a particular known demographic (age/gender) group. Each of the colored bars against every member in the household represents viewing of a particular piece of content with the length of the bar representing the duration of viewership. For example, maybe the “orange” bar represents a particular episode of SpongeBob SquarePants, and the “yellow” bar a late-night news program. Here, every household member has a distinct person-level viewership signal, one with unique and overlapping patterns. For instance, all four members watched somewhat of similar proportions the content marked “dark blue”. Only the kids watched “orange” and, in the same way, only Adult #2 watched the content marked “red.” For simplicity, let’s ignore the temporal ordering of the viewership (i.e., which content was watched first, second, etc.) in this example. Notice that all these person-level signals, which are not directly observed by Comscore, are embedded in the household-level signal that Comscore does observe, directly and deterministically. Pushing this concept further – we can observe with high fidelity the household signal/distribution conditioned on the presence of one or more demographic characteristics (ex: presence of at least one male 18-24 in three-member households). Aggregating data across millions and millions of households, we can estimate, which is the probability of household viewing distribution of content “c”, conditioned on the (unknown) person viewing distributions. Using augmented data sources such as survey or subset of observed STB households, we can derive candidate set of person-level demo distributions and coviewing estimates and use them within a Bayesian framework for estimating - the probability of a person-level demo viewing of content “c”, conditional on the observed household distribution! Voila, that’s exactly the output we expect in a personification solution. Furthermore, the framework provides the flexibility to calibrate the Bayesian models as and when better quality first or third-party data assets become available for use as inputs to the model. To learn more about the Bayesian model, .
Some Early Insights of the New Personification Solution
In 2021, Comscore’s Analytics team transformed this vision into reality and conducted extensive testing and validation studies of the new solution against the legacy solution and also with multiple external data sources. In this section, in the interest of brevity, a couple of examples of those test results are discussed with highlights on areas of convergence and divergence between new and legacy solutions. The graph below shows the distribution of viewership minutes across age/gender groups derived from a day’s worth of data consisting of Live TV viewership of all news-related programs on a popular network.
There are some interesting areas of convergence between the two solutions. For instance, the two solutions are directionally aligned with a pattern of increasing share of viewing minutes for ages 18-64, both for males and females. The two solutions also exhibit some stark differences. For instance, there is a highly discernable increase for females 50–64-year-olds in the new solution (similar trend for males). Post-hoc validation studies suggested that the sharp increase for this demo group is in line with expectations due to the nature of content programming on that day. Along the same lines, the new solution, as expected, indicated very low viewership share for kids (ages 0-17), compared to a noticeable share in the legacy solution, both for males and females.
The second example below shows the distribution of viewership minutes across age/gender groups derived from the same days’ worth of data used in the previous example. In this case, the data consists of Live TV viewership of all kids-related programs on a popular network.
Just as in the previous example, there are some interesting areas of convergence and divergence between the two solutions. For instance, the two solutions are directionally aligned with a pattern of increasing share of viewing minutes for ages 18-49 followed by a precipitous decline, both for males and females. Also similar are viewing shares for kids (m/f) across the two solutions. What is further interesting is the higher viewership share for adults for these programs, particularly in the new solution - an effect of higher coviewing (i.e., kids watching the content with adult members in the household). For instance, in the new solution the viewing share for females and males 35-49 is more pronounced than in the legacy solution (17% vs 13% for females and 16% vs 12% for males). This observation of higher coviewing of kids with adults in the household picked by the new solution makes logical sense especially during the pandemic with kids stuck at home and parents inadvertently becoming audience themselves to their kids’ screen time. What would be interesting to see if the new solution shows a tapering of adults viewing share for these programs when things go back to normal (whatever that would mean). All in all, the new personification solution has so far exhibited strong performance across internal (alignment with current and historic norms) and external (alignment with marginal distributions) validation tests. As testing continues over the coming weeks, Comscore will ensure the reliability and integrity of the new personification solution is maintained with the evolution and dynamism of TV viewership.