Topics API data: what a million impressions tell us

Don Marti, CafeMedia's VP of Ecosystem Innovation, reports on early tests of Google Chrome's new "Topics API."

In the web advertising business, the value of an ad impression that reaches you depends on what your interests are, among other things. Today, sites use a variety of ways to track your interests—but using your interests from one site to place ads on another site is generally done with third-party cookies. The Google Chrome browser will be eliminating third-party cookies some time in the next few years, so they‘re testing out new features for matching ads to users. One of these, called “Topics API,” is intended as a way to share your interests across sites you visit—but it provides lower-quality interest data than other post-cookie ad targeting methods.

Here‘s how it works. Google classifies sites by topic, based on the domain name, assigning one or a few topics per site. When you visit a site, the Chrome browser remembers your topics. If the same third-party script is used by two sites, Topics from the first site can be shared on the second site. (It‘s a little more complicated than that, technically, but that‘s the basics.) Right now, Topics API is just being tested on a small fraction of Google Chrome users. Advertisers aren‘t really buying ads based on Topics API. They‘re using other targeting data. In order to figure out if Topics API is going to be a commercially viable replacement for third-party cookies, we need to check: are there differences in ad revenue across topics? If we see a consistent set of high and low revenue Topics, it would indicate that the topics detected by Topics API are correlated with some data that advertisers are using to buy, today.

We analyzed about a million ad impressions, and it turns out that yes, there are significant revenue differences among topics–by almost an order of magnitude. In our September-October data collection period, the biggest money topics—the topics that provide the biggest revenue lift when a user with one topic visits a site that does not have that topic—were:

  1. /Autos & Vehicles/Classic Vehicles
  2. /People & Society/Family & Relationships
  3. /Internet & Telecom/Web Apps & Online Tools
  4. /Jobs & Education/Education/Early Childhood Education
  5. /Finance/Accounting & Auditing
  6. /Arts & Entertainment/Acting & Theater
  7. /Hobbies & Leisure/Paintball
  8. /Law & Government/Legal/Legal Services
  9. /Computers & Electronics/Network Security
  10. /Science/Robotics

Remember, these aren‘t what advertisers are using now. These Topics are associated with some quality of a user that tends to increase the revenue on sites whose topic doesn‘t match the user‘s topic. If you look at pairs of site topics, some of the biggest topics “lifts” are understandable. “Parenting” in the user‘s topics is good for a 137% lift to sites about “Massively Multiplayer Games.” When the user is into “Roleplaying Games” that brings down the revenue for ads on “Autos & Vehicles” and “Shopping” by 94 percent. It‘s not the Topics API topic doing that today, but something about those users is less valuable. The same pattern would likely continue if advertisers moved from cookie-based user interest targeting to Topics API.

Topics API is trained on very little data, though, just the domain name of the site. We noticed that many of the pages on CafeMedia publisher sites are getting classified in ways that might make sense to an AI, but not to a human web reader. How-to tips for mechanics got tagged as “vehicle shopping.” That‘s close, but pages about how to find shoes that fit, and how to tell if mayonnaise has gone bad, were also “vehicle shopping.” Of course, it‘s impossible to put everything in the exact right category. What do you do with a page about fashionable outfits for video game characters? Gaming? Style? Google says “Soccer.” A baked beans recipe, according to Chrome, is “Dance & Electronic Music.” (Maybe it‘s true what they say about the musical fruit.)

The classifications don‘t look right by a spot check, but anecdotes aren‘t data. So we looked at another big dataset: the classifications of a million recently visited URLs. Chrome‘s Topics are assigned at the domain level, but we have a more thorough classifier, based on IBM Watson, that classified an entire page based on the full text.

The results are remarkable. 17% of our “Automotive and vehicles” pages were classified by Chrome as “News.” Chrome and Watson usually agree that Watson‘s “technology and computing” pages are similar to Chrome‘s “Computers & Electronics,” but Chrome did label 12% of technology and computing pages as just “News.” And 10% of the “Health and Fitness” pages found by Watson got classified by Chrome as “Reference.”

Fortunately, Topics API is just one of several options for matching an ad to the user‘s interests. More accurate options include first-party or clean room data, and Seller-Defined Audiences. Topics API might be just good enough to be able to sell some kind of ad on a lower-value or lower-reputation site, the kind of ad space that Google can monetize at large scale but that are not qualified for a high-end ad service like AdThrive. Sites with higher engagement content, though, will be able to make better choices. In most cases, sites with a higher-value Topic will need to opt out—at least opt out their logged-in users—in order to block audience data from leaking toward lower-quality content.

The decisions on how to handle Topics API and other post-cookie ad technologies will be challenging. CafeMedia has data access and data science skills to help you make the decision better than you could alone or with a basic ad service. Should you use Topics API, or opt out? Or will Topics API go the way of FLoC? Either way we have the data to make the right choice. As always, we continue to track and test the latest web advertising proposals to help our publishers maximize their ad revenue.