Doing More With Less: Rethinking Entity-Level Sentiment at Scale

In most technology systems, there is a fundamental trade-off between cost and performance. If you want better accuracy, you typically need larger models, more compute, or more processing time, all of which increase cost. If you want to reduce cost, you usually have to accept lower accuracy, slower insights, or reduced coverage. In our case, we were able to improve both at the same time. We reduced inference costs by 45.5% while also improving accuracy by 3.02%. That combination is unusual, and the path to getting there is worth sharing.

The Problem: Understanding Sentiment at the Entity Level

We process millions of documents across news and social media every day. These documents often mention multiple companies, products, or people, and our customers care about how each of those entities is perceived.

Traditional sentiment analysis assigns a single label to an entire document. But that breaks down quickly in real-world scenarios.

Consider for example:

Example sentence reading 'Apple's pricing strategy is frustrating. Samsung, on the other hand, offers better value for money.' with Apple highlighted in red and Samsung highlighted in green — A single sentence can contain different sentiment toward different entities

A document-level model might label this as Neutral. But that loses the real insight:

Apple -> Negative (pricing criticism)
Samsung -> Positive (favorable comparison)

Entity-Level Sentiment (ELS) solves this by answering a more precise question:

What is the sentiment toward each entity mentioned in a document?

This is significantly more useful, but also much more computationally expensive.

The First Approach (v1): Accurate but Expensive

Our initial system approached ELS as a question-answering problem.

For each entity in a document:

We locate all mentions of that entity
Extract a context window around each mention
Pass that context, along with the entity, into the model
Predict sentiment (positive, negative, or neutral)

This worked well in terms of accuracy.

But it had a fundamental limitation: it scaled linearly with the number of entities.

A document with 10 entities required 10 separate model runs. At our scale, where documents often contain multiple entities, this quickly became expensive.

Architecture diagram showing the v1 approach where a document is processed through separate BERT Encoder and Classifier pipelines for each entity, with Apple getting a Positive result and Samsung getting a Negative result — The v1 approach required a separate model run for each entity in the document

The core issue was simple:

We were making the model re-read the same document multiple times.

The Key Insight: Stop Re-Reading the Document

This led us to a simple but important question:

Does the model really need to process the entire document again for every entity?

The answer turned out to be no.

Transformer models build contextual representations for every token in a document during a single forward pass. By the time the model has processed the document once, it already has a rich understanding of:

What entities are present
Where they appear
How they relate to surrounding context

Running the model again for a different entity doesn’t add new information. It just repeats work that has already been done.

That realization pointed directly to a more efficient design.

The New Approach (v2): Read Once, Understand Everything

We redesigned the system around a simple principle:

Process the document once, and extract everything you need from that shared understanding.

The updated approach works as follows:

1. Single-pass encoding

We process the document once, allowing the model to build contextual representations for all tokens.

2. Entity-specific extraction

From this shared representation, we extract embeddings corresponding to each entity mention.

3. Aggregation across mentions

If an entity appears multiple times, we combine its mention-level representations (by averaging) into a single representation.

4. Sentiment prediction

This aggregated representation is used to predict sentiment for the entity.

Architecture diagram showing the v2 approach where a document is processed once through a single BERT Encoder, then entity mentions are extracted and averaged before being passed to a Classifier, producing Apple Positive and Samsung Negative results — The v2 approach processes the document once and extracts sentiment for all entities from the shared representation

What Changed

Instead of one model run per entity, we now have one model run per document
Instead of recomputing context, we reuse it across all entities

This effectively removes the dependency on the number of entities, turning an O(n) process into something much closer to constant-time per document.

The Results

The impact was significant across both cost and performance.

Two cards showing key results: inference cost down 45.5% and accuracy up 3.02% — The new approach reduced inference costs by 45.5% while improving accuracy by 3.02%

The accuracy gain came from an important side effect: the efficiency improvements allowed us to deploy a larger, more capable model within the same resource constraints.

What Surprised Us

Two things stood out during this transition.

1. Simple aggregation worked better than expected

We initially assumed that averaging mention representations would be a rough approximation. In practice, it proved to be surprisingly robust.

For entities mentioned multiple times across a document, aggregation often produced more stable and reliable representations than the previous approach.

2. Less context was enough

We expected that removing broader surrounding context might hurt performance.

But in practice, focusing on mention-level context was sufficient. The model was still able to capture the necessary signals to make accurate predictions.

This suggests that, for ELS, relevant local context carries most of the signal, even if broader context can still help in edge cases.

What This Means Going Forward

Our initial approach wasn’t wrong. It was a reasonable way to tackle a complex problem.

But it was built on an assumption that didn’t hold:

That the model needed to process each entity independently.

By questioning that assumption, we uncovered a much more efficient approach.

This shift doesn’t just reduce cost. It fundamentally changes what’s possible:

We can process more data in real time
Support more entities per document without penalty
Deploy stronger models within the same infrastructure

And most importantly, it gives us a scalable foundation to build on as we continue to improve Brand Sentiment.

Final Thought

Sometimes the biggest gains don’t come from adding more complexity.

They come from recognizing and eliminating unnecessary work.

In our case, the breakthrough was simple:

Stop making the model read the same document over and over again.