Froomle.ai | Intelligent Personalization for Media

Why even the simplest ranking metric is more subtle than it seems.

When Netflix or Amazon highlight the most “Popular” content, what does it actually mean? Or when a music platform shows “Trending now,” or a news site shows “Most read,” what is being measured?

At first glance, calculating popularity sounds trivial: just count how many times each item was consumed — right?

If your goal is to find the best-performing items in the complete history of your data, then yes. But most real-world applications — from entertainment platforms to e-commerce to news feeds — do not want “all-time greatest hits.” They want what is performing best now.

As both an academic scientist working in recommender systems and CEO of Froomle, a personalization company working with publishers, retailers, and content platforms, I often see popularity described as “a simple baseline”. Conceptually, it is of course. Nevertheless, it hides a set of deeply consequential modeling choices, becoming surprisingly intricate. These choices influence user experience, business outcomes, and the dynamics of attention across entire ecosystems.

Here, I focus on one of the most fundamental and most overlooked aspects: the time dimension of popularity. Even in academic research and benchmarks, this important aspect is typically ignored in comparative evaluations.

The Time Window Problem

Consider the following examples:

A TikTok video with the most likes in the last hour can definitely be considered “popular right now.”
For movies or books, showing the all-time greats or showing “popular today” serves very different purposes.
A typical summer product may sell enough in two months to become that year’s most popular item, but that doesn’t mean it’s useful to show it as “Top Product” in winter months.
A news article about an event with lasting impact may stay relevant for multiple days, while sports results quickly lose attention despite getting more total views. On the other hand, an all-time most-viewed article is likely only historically relevant, not current.

Popularity is always measured within a time frame, even when it is not explicitly specified. For starters, different items have different lifetimes. So the first question becomes: Over which time window is popularity best measured?

Different windows yield different interpretations:

Short windows, from minutes to hours, capture fast-moving behavior but are noisy.
Long windows, from days to weeks, provide stability but favor evergreen content.

A natural approach is to A/B test different windows or to tune them offline on historical data. But even if such a test identifies the best-performing window on average, that does not mean it performs well at all times. Traffic patterns are never uniform across the day.

A window that excels during daytime — when thousands of interactions occur — may perform poorly during low-traffic periods such as late at night. During those hours, the same time window might contain only a handful of clicks, making “popular now” indistinguishable from random noise. In other words, this approach optimizes for peak traffic, while quietly ignoring the periods where data becomes sparse and unstable.

A popularity metric needs to be robust not only on average, but across all traffic regimes.

“If we take ‘popular now’ literally, the only correct time window is exactly one event.”

In a purely probabilistic sense, every new click, purchase, view, or like provides a micro-update to our belief about what item will be the target of the next event. But this estimate is extremely unstable — effectively 100% for the last-clicked item, 0% for everything else.

So, while theoretically consistent, fixed windows have significant structural limitations, whether we choose them short or long.

Learning Rates: A More Robust Alternative

To avoid the pitfalls of rigid time windows, several solutions exist, such as variable time windows, or exponential decay, which gives more weight to recent interactions than to older ones, while never entirely discarding historical data.

My personal favorite, however, is a variation which uses learning rates. Intuitively, its aim is to estimate the true current popularity of an item. Instead of jumping from 0% to 100% with every event, it gradually moves towards these numbers in small steps.

More concrete, we update the popularity estimate of an item after every new event, using a learning rate α ∈ [0,1]. Let P(i) be the current popularity estimate for item i, starting at 0 for every new item. When an event, such as a click, occurs on item i, we update:

for the clicked item i: P(i) ← (1 — α)·P(i) + α
for all other items j ≠ i: P(j) ← (1 — α)·P(j)

Large α results in fast adaptation, akin to a short time window, and small α results in slow adaptation, akin to a long time window.

While this approach is computationally efficient, for large traffic volumes batch updates may be more suitable. But again, also batching is not trivial: within each batch you must decide how the events are positioned over time, because different assumptions lead to different popularity estimates.

A common approximation is to assume that events are uniformly distributed throughout the batch. Under this assumption, the closed-form batch update for an item with k(i) events in a batch of size K becomes:

for all items i: P(i) ← (1−α)ᴷ P(i) + (1 − (1−α)ᵏ⁽ⁱ⁾)

This yields the same result as applying the single-event update rule K times, without requiring event-by-event processing.

Why this approach is effective

It adapts naturally to fluctuations in traffic volume.
It avoids instability during low-traffic periods.
It does not rely on timestamps or strict time windows.
It remains interpretable: P(i) approximates the probability that the next event will occur on item i.

This formulation captures the essential idea behind popularity as a dynamic, continuously updated estimate. Furthermore, different learning rates can be used for different recommendation surfaces at the same time, allowing both “popularity now” as well as “historically popular”, while avoiding the pitfalls of fixed windows.

Opportunity Bias: Exposure Versus Interest

Even if we measure “popular now” perfectly, another structural issue remains: items do not all receive the same opportunity to be clicked.

Some items appear at the top of a webpage — or in otherwise prominent visual positions — and therefore naturally receive far more impressions. Other items might be just as interesting or relevant, but if they appear lower on the page, or only after scrolling, their chance of ever becoming “most popular” is drastically reduced.

“Popularity reflects exposure — not just user interest.”

Popularity metrics tend to reward items that happen to be:

placed in high-visibility page locations
published during high-traffic periods
aligned with the placement priorities of editors or product teams

Meanwhile, items in low-visibility positions receive fewer impressions, and therefore fewer clicks — regardless of their potential appeal to users.

This distinction is crucial in environments where visibility is scarce and attention is highly competitive. To understand true popularity, we must consider not only how often an item is clicked, but also how often it was actually seen.

Doing this requires data about which items were actually visible to users. The learning-rate approach can then be adapted to update only these items before each click event.

The result is a much fairer estimate of what users genuinely find interesting — not just what happened to be placed at the top of the page.

The absence of such impression data undermines the validity of offline evaluations as it essentially corrupts historical data. If certain items never received meaningful exposure, they will appear uninteresting in the logs, simply because they had no chance to be clicked. Any offline evaluation method based on historical interactions will therefore learn from data that is already biased by past placement decisions. This makes offline comparisons less reliable and can systematically favor models that reproduce the same visibility patterns.

Beyond Popularity

The moment we use popularity in an actual recommendation list, additional questions arise. Even a perfectly measured popularity signal does not automatically translate into the most useful recommendations.

Should previously consumed items remain in the list?

Probably not. A returning reader who already clicked an article gains no value from seeing it again at the top. Removing recently consumed items not only improves usefulness, but also gives other items a chance to gain exposure.

What about items a user keeps seeing but never clicks?

Repeated impressions without interaction are informative. If a user has ignored an item several times, it becomes unlikely that showing it again will generate engagement. Impression discounting reduces the relevance of such items and prevents the system from repeatedly promoting content that is consistently unappealing to that user.

Should every user see the same “Top Stories”?

Not necessarily. Two readers with opposite interests, for example one only reads finance while the other only reads sports, likely benefit from different versions of the list. Even the best-measured popularity signal cannot compensate for these intrinsic differences in user preferences. Addressing this requires genuine personalization techniques that go beyond adjusting popularity; they actively learn what each user is most likely to engage with, but also introduce new challenges — from aligning with newsroom strategy, editorial priorities, and brand identity to maintaining diversity and avoiding filter bubbles.

The Takeaway

Popularity is not just a count. It is shaped by timing, exposure, opportunity, and user behavior — and any system that uses popularity must account for these dynamics. Once these elements are considered, popularity becomes a solid foundation for building effective, and genuinely user-centric platforms.

Measuring popularity well is essential. Making it useful requires going beyond raw click counts. The subsequent steps — collaborative filtering, content-based models, and full personalization strategies — build upon that foundation, but could be topics for future posts.

Originally published on Medium.

What Does “Popular” Really Mean?