Textblob vs VADER Evaluation
By Alastair Beeson
Introduction
Two of the most commonly used Python NLP and Sentiment Analysis Libraries are Textblob and Vader. While their functionality is pretty similar, they have several different nuances that might make them better suited for specific use cases.
Textblob
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Textblob's website can be found here: https://textblob.readthedocs.io/en/dev/
Pros:
Very easy to get set up.
Polarity and Subjectivity are separate distinct properties
You only get one score back
Has a subjectivity function
This could be really useful for projects like recreating the language of a famous author or determining who wrote a specific text
Has speech tagging, noun-phrase extraction, translation
Much more functionality beyond sentiment analysis
Cons:
Lacks intensity function
Struggles with slang and emojis so not the best for social media data like tweets
Slightly less accurate than Vader on average
Runs slower than Vader
Converts text into a textblob object.
VADER
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
Vader's repository can be found here: https://github.com/cjhutto/vaderSentiment
Pros:
Tuned for modern social media
Understands non-text like emojis
Understands slang
VADER returns 4 different scores: positive, neutral, negative and compound
This level of granularity could be helpful for more sophisticated projects
Has an intensity function
2x as fast as Textblob
Cons:
In the majority of use cases, compound is the only score that matters
Requires a little engineering to isolate the compound score which can be a hassle
Lacks subjectivity function
Subjectivity not mentioned in any of VADER's documentation
To get subjectivity you have to hard code it with 1 - abs[compound]
Conclusions
The obvious conclusion from the pros and cons is that Vader is better for non-formal text like social media posts with slang and emojis while Textblob is better for formal text like books or papers or projects that extend beyond simply sentiment analysis
Textblob is a better choice for plug and play projects where you want to classify data quickly. You can get a polarity score easily without any data engineering. However, Textblob lacks an intensity function. Can be slower compared to its competitors since it handles more than just sentiment analysis.
VADER has an accuracy edge with the data I used, tweets, but it lacks a subjectivity function. Takes a little data engineering to get the right output and to get the kinds of scores and data you want. For pure speed and sentiment analysis thought, this may be the way to go.
On average, the accuracy difference between Vader and Textblob around 7% both from my testing and from others online.
There are other NLP libraries like Flair were not evaluated as they are significantly slower but might be more accurate and powerful. This may be a future project idea.
Another strategy to get the best sentiment score is to combine Vader and Textblob to produce an averaged composite score between the two. This is a strategy called ensemble learning. Rather than declare one library as better than the other for a specific purpose, combining multiple models often can lead to an ever more accurate result.