In the web advertising business, the value of an ad impression that reaches you depends on what your interests are, among other things. Today, sites use a variety of ways to track your interestsâ€”but using your interests from one site to place ads on another site is generally done with third-party cookies. The Google Chrome browser will be eliminating third-party cookies some time in the next few years, so theyâ€™re testing out new features for matching ads to users. One of these, called â€œTopics API,â€ is intended as a way to share your interests across sites you visitâ€”but it provides lower-quality interest data than other post-cookie ad targeting methods.
Hereâ€™s how it works. Google classifies sites by topic, based on the domain name, assigning one or a few topics per site. When you visit a site, the Chrome browser remembers your topics. If the same third-party script is used by two sites, Topics from the first site can be shared on the second site. (Itâ€™s a little more complicated than that, technically, but thatâ€™s the basics.) Right now, Topics API is just being tested on a small fraction of Google Chrome users. Advertisers arenâ€™t really buying ads based on Topics API. Theyâ€™re using other targeting data. In order to figure out if Topics API is going to be a commercially viable replacement for third-party cookies, we need to check: are there differences in ad revenue across topics? If we see a consistent set of high and low revenue Topics, it would indicate that the topics detected by Topics API are correlated with some data that advertisers are using to buy, today.
We analyzed about a million ad impressions, and it turns out that yes, there are significant revenue differences among topicsâ€“by almost an order of magnitude. In our September-October data collection period, the biggest money topicsâ€”the topics that provide the biggest revenue lift when a user with one topic visits a site that does not have that topicâ€”were:
- /Autos & Vehicles/Classic Vehicles
- /People & Society/Family & Relationships
- /Internet & Telecom/Web Apps & Online Tools
- /Jobs & Education/Education/Early Childhood Education
- /Finance/Accounting & Auditing
- /Arts & Entertainment/Acting & Theater
- /Hobbies & Leisure/Paintball
- /Law & Government/Legal/Legal Services
- /Computers & Electronics/Network Security
Remember, these arenâ€™t what advertisers are using now. These Topics are associated with some quality of a user that tends to increase the revenue on sites whose topic doesnâ€™t match the userâ€™s topic. If you look at pairs of site topics, some of the biggest topics â€œliftsâ€ are understandable. â€œParentingâ€ in the userâ€™s topics is good for a 137% lift to sites about â€œMassively Multiplayer Games.â€ When the user is into â€œRoleplaying Gamesâ€ that brings down the revenue for ads on â€œAutos & Vehiclesâ€ and â€œShoppingâ€ by 94 percent. Itâ€™s not the Topics API topic doing that today, but something about those users is less valuable. The same pattern would likely continue if advertisers moved from cookie-based user interest targeting to Topics API.
Topics API is trained on very little data, though, just the domain name of the site. We noticed that many of the pages on CafeMedia publisher sites are getting classified in ways that might make sense to an AI, but not to a human web reader. How-to tips for mechanics got tagged as â€œvehicle shopping.â€ Thatâ€™s close, but pages about how to find shoes that fit, and how to tell if mayonnaise has gone bad, were also â€œvehicle shopping.â€ Of course, itâ€™s impossible to put everything in the exact right category. What do you do with a page about fashionable outfits for video game characters? Gaming? Style? Google says â€œSoccer.â€ A baked beans recipe, according to Chrome, is â€œDance & Electronic Music.â€ (Maybe itâ€™s true what they say about the musical fruit.)
The classifications donâ€™t look right by a spot check, but anecdotes arenâ€™t data. So we looked at another big dataset: the classifications of a million recently visited URLs. Chromeâ€™s Topics are assigned at the domain level, but we have a more thorough classifier, based on IBM Watson, that classified an entire page based on the full text.
The results are remarkable. 17% of our â€œAutomotive and vehiclesâ€ pages were classified by Chrome as â€œNews.â€ Chrome and Watson usually agree that Watsonâ€™s â€œtechnology and computingâ€ pages are similar to Chromeâ€™s â€œComputers & Electronics,â€ but Chrome did label 12% of technology and computing pages as just â€œNews.â€ And 10% of the â€œHealth and Fitnessâ€ pages found by Watson got classified by Chrome as â€œReference.â€
Fortunately, Topics API is just one of several options for matching an ad to the userâ€™s interests. More accurate options include first-party or clean room data, and Seller-Defined Audiences. Topics API might be just good enough to be able to sell some kind of ad on a lower-value or lower-reputation site, the kind of ad space that Google can monetize at large scale but that are not qualified for a high-end ad service like AdThrive. Sites with higher engagement content, though, will be able to make better choices. In most cases, sites with a higher-value Topic will need to opt outâ€”at least opt out their logged-in usersâ€”in order to block audience data from leaking toward lower-quality content.
The decisions on how to handle Topics API and other post-cookie ad technologies will be challenging. CafeMedia has data access and data science skills to help you make the decision better than you could alone or with a basic ad service. Should you use Topics API, or opt out? Or will Topics API go the way of FLoC? Either way we have the data to make the right choice. As always, we continue to track and test the latest web advertising proposals to help our publishers maximize their ad revenue.