rahular.github.io

When I started learning about Artificial Intelligence, the hottest topic was to analyse the sentiment of unstructured data like blogs and tweets. I tried implementing a module for it and failed miserably due to the lack of ability, time and more importantly libraries! Now that I know a bit of coding and there are libraries lying around on GitHub, I planned to give it another shot. I couldn’t believe how easy it is! So here goes..

You will need to set up a couple of things before we can get started with coding.

Creating a Twitter App

debug=true
oauth.consumerKey=<api-key-for-your-app>
oauth.consumerSecret=<api-secret-for-your-app>
oauth.accessToken=<access-token>
oauth.accessTokenSecret=<access-token-secret>

Setting up Twitter4J

import java.util.ArrayList;
import java.util.List;

import twitter4j.Query;
import twitter4j.QueryResult;
import twitter4j.Status;
import twitter4j.Twitter;
import twitter4j.TwitterException;
import twitter4j.TwitterFactory;

public class TweetManager {

	public static ArrayList<String> getTweets(String topic) {

		Twitter twitter = new TwitterFactory().getInstance();
		ArrayList<String> tweetList = new ArrayList<String>();
		try {
			Query query = new Query(topic);
			QueryResult result;
			do {
				result = twitter.search(query);
				List<Status> tweets = result.getTweets();
				for (Status tweet : tweets) {
					tweetList.add(tweet.getText());
				}
			} while ((query = result.nextQuery()) != null);
		} catch (TwitterException te) {
			te.printStackTrace();
			System.out.println("Failed to search tweets: " + te.getMessage());
		}
		return tweetList;
	}
}

Setting up Stanford’s Core NLP

annotators = tokenize, ssplit, parse, sentiment
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.CoreMap;

public class NLP {
	static StanfordCoreNLP pipeline;

	public static void init() {
		pipeline = new StanfordCoreNLP("MyPropFile.properties");
	}

	public static int findSentiment(String tweet) {

		int mainSentiment = 0;
		if (tweet != null && tweet.length() > 0) {
			int longest = 0;
			Annotation annotation = pipeline.process(tweet);
			for (CoreMap sentence : annotation
					.get(CoreAnnotations.SentencesAnnotation.class)) {
				Tree tree = sentence
						.get(SentimentCoreAnnotations.AnnotatedTree.class);
				int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
				String partText = sentence.toString();
				if (partText.length() > longest) {
					mainSentiment = sentiment;
					longest = partText.length();
				}

			}
		}
		return mainSentiment;
	}
}

Putting it all together

import java.util.ArrayList;

public class WhatToThink {

	public static void main(String[] args) {
		String topic = "ICCT20WC";
		ArrayList<String> tweets = TweetManager.getTweets(topic);
		NLP.init();
		for(String tweet : tweets) {
			System.out.println(tweet + " : " + NLP.findSentiment(tweet));
		}
	}
}

And that’s it. If you run this, you will initially see some garbage output from Twitter4J while it queries for tweets. After that, you can see a list of tweets and their sentiment scores in this format

<Tweet> : <Sentiment-score>

There is a way to get much better results than what we get now by cleaning up the tweets before sending it the the sentiment analyzer as most of the tweets inherently contains useless data such as usernames, links, hashtags, etc. One way to clean the tweets is by using this awesome library. You can try it out if you want.

By the way, I am not uploading my project onto GitHub because it is around 400 MB which is quite a lot and also I don’t want someone stealing my keys :)

Hope this helped you!