"Sentiment. No target, no KPI. Just a practical algorithm. On the sense and nonsense of tonality measurement

Sentiment or tonality analysis is the most requested feature offered by media monitoring and social listening tools. In the Excellence Forum, our confidential benchmarking circle for digital communicators, it has been hotly debated for years. That's because everyone wants it, from C-levels to experts - and at the same time, many know how problematic the measurement process is. Can "sentiment" even work as a metric? Can "positive tone of voice" be a KPI that evaluates the quality of one's own communication and brand resonance? Our answer: No. This long read explains why.

"Sentiment" is even discussed at top level. Why on earth?

Most digital KPIs have the problem that only a few decision-makers get to see them and only a few experts have the knowledge to interpret them correctly. With "sentiment" it's different. Everyone talks about "sentiment." Because EVERYONE understands what is meant by it, immediately, because "sentiment analysis" is an everyday experience. Everyone evaluates all sorts of things for themselves on a daily basis according to the categories "positive / neutral / negative". So why shouldn't a tool be able to do the same with "content"?

Some communications managers also seem to think this way and declare "positive sentiment" to be a strategic goal for corporate communications . Corresponding key figures (net sentiment) are anchored in management reports, and the positive mentions of one's own brand become the internal currency for success. In contrast, negative mentions are twisted and turned in reporting because, according to sentiment analysis, they are an indication of less successful communications work.

Objections to a KPI "positive sentiment" are hardly to be expected in companies. This key figure is too popular, too conspicuous. That an objection would be necessary - all communicators who still have their theoretical tools in their luggage know that. To avoid useless discussions, they ignore the results of the "sentiment analysis" instead. Silently. Rightly so.

"Sentiment analysis" already fails in theory

The "basic laws" of communication theory state that the MEANING of a content (also its "tonality") depends entirely on the perspective of the viewer . Because only he / she / it creates (out of itself) the context, which gives a certain or no meaning to a piece of information. (Relationship aspect, 2nd axiom of communication theory, Watzlawick et al).

The communication axioms presuppose that the content and meaning of a "content" are by their nature two fundamentally different things that maintain a rather indirect and also autonomous relationship to each other. An on/off relationship, so to speak. She (meaning) in Munich. He (content) in Hamburg. One meets only in the head of recipients, always changing. And still have a fixed relationship.

You have to be clear about this: The information, the content, has NOTHING to do with meaning and its evaluation, e.g. as tonality. It is almost impossible to realize this on the basis of life experience, but it is realiter so. That is why Shannon & Weaver wrote: "In particular, information must not be confused with meaning." (A Mathematical Theory of Communication 1949)

If meaning and sentiment depend on the subjective relationship aspect, what relationship does a monitoring and listening tool have with the sender of a content? None. And what context can the tool set? Only an objectified one, i.e., a meaningless one.

How the measurement works

The meaninglessness can be nicely illustrated by the following table. It shows the tonality peaks in "Vader" (thanks, Dr. M!). Vader is a popular open source sentiment analysis algorithm maintained at Georgia Tech (Hutto/Gilbert) since 2015.

In Vader, each word is assigned a fixed tonality value. This is done in advance in the rule-based "Old School". If Vader were a machine learning algorithm, this step would happen only after the analysis, but the principle, fixed assignment of a "sentiment" to each word, would remain the same. The machines recognize words and add the negative and positive word weights wired to them. Finally, a balance is formed and - voilà - a post with 300 words has more positive than negative word weight.

Here are the 15 most negative and the 15 most positive of 8000 English Vader entries ("aml" = all my love")

And now a case study

According to the logic of the list, words like "rapist", "condemn", "punishment" are undoubtedly associated with a rather negative tonality. So if Vader finds these three words in a text, he evaluates a "negative word weight" for these 3 words.

If you apply this procedure to the following fictitious 5-word headline, the sentiment analysis will rate it as "strongly negative" with good reason: "Rapist sentenced to life". Is that really negative? Or not positive after all?

Only tools answer this question unambiguously. We will see where this leads in the practical test.

"Sentiment" in a practical test - we are looking for an extreme example, the worst case for "Sentiment".

Whether a KPI can function well or whether it is misleading is proven in its extreme situations. Maximum positive, maximum negative - HERE it must be shown whether a KPI measures correctly. So in the following, let's assume an extreme scenario for negative reporting and test the measurement of sentiment. For this purpose, we are looking for a communicative worst-case scenario. So let's look for:

  • a company that certainly has no positive news and ONLY NEGATIVE NEWS to show.
    For that, it would have to be in (at least) an acute crisis situation that overshadows everything, really everything in the media response. A fraud scandal, for example, or an insolvency.
  • a company that generates MASSIVE NEGATIVE NEWS.
    So the crisis must have political significance, for example. But high coverage also needs tabloid ingredients. So let's take celebrities, crime, boards, organized crime, manhunt, escape, Interpol, South Sea Islands... A worldwide scandal is what we are looking for, along with the associated tabloid media (portals, Click Baiter, Bild, Facebook, etc.), also worldwide.
  • a company with LONG-lasting NEGATIVE NEWS.
    Permanent crisis, in other words, preferably for months. Could it be worse? Hardly.

THAT, then, would be an ideal negative run to determine the negative maximum value that a "KPI" (key indicator) called sentiment can ever achieve. And yes, this worst case scenario does exist. It is called Wirecard.

Worst Case in Practice - Wirecard's "Sentiment".

"Shambles" Wirecard. With insolvency application directly from the Dax30 into prison. Wirecard stands for the largest, most comprehensive, longest and most publicly organized fraud in European economic history - and corresponding reporting. An ideal case to put "sentiment measurement" to the worst-case test.

So we look into the data of one of the market-leading monitoring tools. Logically, it now also uses "artificial intelligence" in marketing. Sentiment can now be analyzed even better and integrated into even fancier diagrams.

We exclude last remnants of only hypothetically possible positivity for Wirecard by analyzing only the mentions of the CEO: Doctor Markus Braun. To be expected at the latest now is the absolute MAXIMUM of negativity, the Sentiment Armageddon, not to be undercut, THE negative benchmark of all.

Will it be 100% negative sentiment? Or a value in this direction? 90%, 80% maybe? Is 70% fair? But wait, "irony cannot be detected" so let's go down even further. Let's say: 60%, no 50% for the tone of the media echo on the Wirecard CEO.

A negative benchmark of minus 50% in such a clear case, is that still demanding?

Here is the result for Wirecard in maximum crisis situation.

  • Wirecard with CEO scored a negative high of 29% negative mentions at the height of the crisis, May to July 2020.
  • At the same time, there were 70% "neutral mentions". That was in the week around July 20.
  • The average May to July 2020, the peak of the crisis, was 14% negative mentions.
  • 14% negative tone in a (rightly) globally scandalized, prolonged and dramatic corporate crisis?

What does sentiment measurement actually make a statement about?

As always: a metric makes a statement about what and how it is measured.

In the case of sentiment, it is a frequency of words. They are associated with a word weight that generates tonality ratings, although subjective (interpretive) contexts that constitute a possible "sentiment" in the first place are missing. This is why sentiment analysis is irrelevant - and at the same time so hotly debated (with subjective contexts vying for interpretive authority).

Conclusion: Sentiment KPIs cannot generate valid statements for management.
And even if they were valid - a KPI that generates a maximum value of minus 29% in a worst-case scenario is not suitable for data-based management. It does not fluctuate sufficiently and measures fuzzily.

Sentiment data does not belong in the executive suite, but can generate value in the engine room

However, our sentiment story has a good ending. If you measure incorrectly, but constantly, you can still create value with data generated in this way. Because in the time series analysis, strong deviations from the normal value (acuh of "wrong KPI") are always interesting. That means: strong swings of the sentiment data are ALWAYS worth a closer look, a manual check of the content and above all: the context.

The tool providers (and we, too) use bots, automated functions that identify test cases based on deviations from sentiment data, embed them in alert emails, and then send them off ("fire") to humans who take a closer look at the content around conspicuous mentions and make an assessment. And they do so in a way that follows the axioms of communication: WITH context, WITH subjectivity.

The nonsensical key figure "sentiment" finds a home after all - as a truffle pig that asks questions of communications professionals from the engine room.

Sentiment analysis facilitates and accelerates internal quality assurance, but it remains an important job only for people who know the company's interssen.