In this project, we will be writing an application that downloads tweets from Twitter. We continue learning C#, as we started with our Yahoo Finance data downloader. Twitter has a REST API that allows us to search for tweets, users, timelines, or even post new messages. We will use an incredible C# Twitter Library called Tweetinvi. It has everything you need to start building your program. There are other alternatives, but this was the most accessible and complete. To use this program, you need to have a Twitter developer account and use your own credentials. 

Application Structure

The program is separated to keep the Twitter interactions, file management and application logic separate. These are separated into three files: Program.cs, FileManagement.cs, Twitter.cs and Tweet.cs. 

  • Program.cs - Loop over Twitter symbols, download symbols, and send them to files to be saved.
  • FileManagement.cs - Load user names and append new tweets to the end of tweet files.
  • Twitter.cs - Download tweets, manage the rate limit constraints, and log in to Twitter API.
  • Tweet.cs - Give format to the downloaded tweets 

This allows you to change the application logic easily, the location you save files, or even the twitter library without affecting other parts of your code. The program starts by logging into Twitter with the SetCredentials function. This requires four keys from the Twitter developer website. 

Twitter.set_credentials(accessToken:"xxxxx", accessTokenSectet:"xxxxx", consumerKey:"xxxxx", consumerSecret:"xxxxx");

We retrieve the Twitter user names from a file twitterUsernames.csv, which contains the list of usernames to download. We've collected a list of 3000 financial and news symbols for this project that we scrapped from Twitter lists and search results. We also estimate the next best time to update the users tweets based on their frequency of tweeting. var usernames = FileManagement.GetUsernames(); var nextUpdateTime= FileManagement.GetNextUpdateTime(); The tweet downloading and rate limiting are entirely managed by the Twitter class; the function GetTimeline manages downloading all the historical tweets possible or only downloading updates. var tweetList = Twitter.GetTimeline(userName, lastTweet, ref tweetCount); The freshly downloaded tweets are serialized to JSON by Json.NET and then written to a file - one per Twitter username.

Optimizing Tweet Downloads

We want our program to download the maximum number of historical tweets possible per user and then recheck accounts for new tweets. Twitter rate limits API requests to 300 requests per 15 minutes, and allows access to a maximum of 3200 historical tweets. Additionally, each request can download 200 tweets at a time. To maximize the productive use of our requests we will constantly calculate an average period from the user's latest tweets and set the time the program should recheck for new tweets so we're confident there will be at least 1 new tweet. The following code reads the tweets from the file and calculates the average gap between tweets: 

public static TimeSpan GetAverageTimeSpan(List tweets)
{
     if (tweets.count == 0)
     {
         return TimeSpan.from_seconds(500);
     }
     else
     {
         List dates = new List();
         foreach (var line in tweets)
         {
             dates.add(line.time);
         }
         var difference = dates.max().subtract(dates.min());
         var averageTimes = TimeSpan.from_milliseconds(difference.total_milliseconds / (dates.count()));
         return averageTimes;
     }
}

Downloading Tweets

When downloading tweets, we check if we have already downloaded tweets for this user. If we have historical tweets for this user, we'll only download the updates. Twitter's API has 2 ways for doing this: Each tweet in the tweetosphere has a unique ID number. To download updates, we download every tweet since an ID (since_id). This means, "download all tweets since the last tweet we got"

public static long LastSavedTweetID(List getTweets)
{
     var lastLine = getTweets.first();
     long lastTweetID = lastLine.id;
     return lastTweetID;
}

If we don't have any historical tweets for this user, the program will download all historical tweets possible. With each request, we'll attempt to download the last 200 tweets, and the max_id specifies the ID of the most recent tweet we want in this request. 

public static List (string userName, List getTweets, ref int tweetcount)
{
   List tweets;
   if (getTweets.count == 0)
   {
         Console.write_line(" First time downloading " + ticker + ", creating new file.");
         tweets = TweetsDownload(true, userName, getTweets, ref tweetcount);
   }
   else
   {
         tweets = TweetsDownload(false, userName, getTweets, ref tweetcount);
   }

Encoding and Saving Tweets

Each Tweet comes in its format, containing a lot of information (ID, language, message, date, etc). We save a personalized subset of this information in the Tweet class: 

/// Create a new tweet from an original Tweetinvi object
public Tweet(Tweetinvi.core.interfaces.i_tweet original)
{
    this.id = original.id;
    this.text = original.text.replace(",", "");
    this.time = original.created_at;
    this.retweets = original.retweet_count;
    this.favourites = original.favourite_count;
    this.user = original.creator.name;
    this.followers = original.creator.followers_count;
}
The new encoded tweets are added to a list, that is then written to its "username.txt" file. //Encode each tweet and add them to a list
public static List Serializer(List tweetList)
 {
    var encodedList = new List();
    foreach (var line in tweetList)
    {
       var encodedTweet = Tweet.serializer(line);
       encodedList.add(encodedTweet);
    }
    return encodedList;
 }
//Open & Write to file only if there are new tweets
FileManagement.writer(encodedList, file);

API Restrictions management

Finally, we should rate limit the requests we do to the API. The API Ready function will make the program sleep until new requests are available. 

/// Check if API is ready for new request
private static int WaitForAPIReady()
{
      int count = 0;
      do
      {
          DateTime current_time = DateTime.NOW;
          current_time = current_time.add_minutes(-15);

          count = (from time in timeStamps
                   where time > current_time
                   select time).count();
          if (count > 290)
          {
              Console.write_line(" Twitter downloading limit reached. Waiting...");
              Thread.sleep(50000);            
          }
      } while (count > 290);
      return (300 - count);
}

That briefly explained how we handled Twitter's API limitations using Tweetinvi. The downloader is built; now the fun part begins: What accounts shall we scan? What can we do with the downloaded data? It would be fun to see an algorithm that uses Twitter sentiment data to make investing decisions!

[Twitter API Used in Article Deprecated]