Textblob vs VADER Evaluation

By Alastair Beeson

Introduction

Two of the most commonly used Python NLP and Sentiment Analysis Libraries are Textblob and Vader. While their functionality is pretty similar, they have several different nuances that might make them better suited for specific use cases.

Textblob

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Textblob's website can be found here: https://textblob.readthedocs.io/en/dev/

Pros:

Very easy to get set up.

Polarity and Subjectivity are separate distinct properties

You only get one score back

Has a subjectivity function

This could be really useful for projects like recreating the language of a famous author or determining who wrote a specific text

Has speech tagging, noun-phrase extraction, translation

Much more functionality beyond sentiment analysis

Cons:

Lacks intensity function

Struggles with slang and emojis so not the best for social media data like tweets

Slightly less accurate than Vader on average

Runs slower than Vader

Converts text into a textblob object.

VADER

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Vader's repository can be found here: https://github.com/cjhutto/vaderSentiment

Pros:

Tuned for modern social media

Understands non-text like emojis

Understands slang

VADER returns 4 different scores: positive, neutral, negative and compound

This level of granularity could be helpful for more sophisticated projects

Has an intensity function

2x as fast as Textblob

Cons:

In the majority of use cases, compound is the only score that matters

Requires a little engineering to isolate the compound score which can be a hassle

Lacks subjectivity function

Subjectivity not mentioned in any of VADER's documentation

To get subjectivity you have to hard code it with 1 - abs[compound]

Conclusions

The obvious conclusion from the pros and cons is that Vader is better for non-formal text like social media posts with slang and emojis while Textblob is better for formal text like books or papers or projects that extend beyond simply sentiment analysis

Textblob is a better choice for plug and play projects where you want to classify data quickly. You can get a polarity score easily without any data engineering. However, Textblob lacks an intensity function. Can be slower compared to its competitors since it handles more than just sentiment analysis.

VADER has an accuracy edge with the data I used, tweets, but it lacks a subjectivity function. Takes a little data engineering to get the right output and to get the kinds of scores and data you want. For pure speed and sentiment analysis thought, this may be the way to go.

On average, the accuracy difference between Vader and Textblob around 7% both from my testing and from others online.

There are other NLP libraries like Flair were not evaluated as they are significantly slower but might be more accurate and powerful. This may be a future project idea.

Another strategy to get the best sentiment score is to combine Vader and Textblob to produce an averaged composite score between the two. This is a strategy called ensemble learning. Rather than declare one library as better than the other for a specific purpose, combining multiple models often can lead to an ever more accurate result.