Writing the Code in Small Parts: Part 1, The Basic App

Build a Twitter Analytics App

1 Introduction: Start Here

2 The First Step: Design Your Solution

3 In Which I Rant a Little

4  Design Solution

5 Writing Great Code

6 Writing the Backend Twitter Server

Writing the Code in Small Parts: Part 1, The Basic App

Part 2: Adding a Counter to Exit

Part 3: Adding Language and Retweet Count

Part 4: Organising Our Code

7 Adding the Data to a Database

8 Testing: What and How to Test

8.1 Testing Our Frontend

8.2 Testing Our Backend

9 Displaying our Data using the Flask Webserver

9.1 Introduction to Flask

9.2 Adding templates to our Flask app

9.3 Displaying our Tweets in the Flask Web Server

10 Future Work and Improvements

So finally, we get to the code.

But not too fast.

We’ll go slow, building our code in stages. After each stage, we’ll have a mini code review, after which we’ll fix the code, have another review, & so on. This is for 2 reasons:

  • It’ll closely mirror how you would work in a real job, although the code reviews won’t be for such tiny segments
  • You’ll learn how to build a large complex apps, starting from tiny files with just a few lines of code.

To start with, remember our challenge:

1 Search Twitter for a term, say Python.

2 Find the top trends on Twitter.

3 Print 100 tweets from the Twitter streaming data.

We’ll start with that.

I hope you have your own solution. If not, have a look at mine. Get the code from here.

The first thing you need to do is rename local_config_skel.py to local_config.py, and update the security tokens from your own Twitter App (I hope you created a Twitter developer account, and a Twitter dev app).

Next, we’re going to use a virtual environment to run our code. This is to prevent the Works on my machine problem discussed in the last section. This is optional, but I recommend it. It’ll also save you from finding the right version of library. Get requirements.txt from here:

Now, everytime you want to work on this project, make sure you activate it first. This will ensure you are always using the same version of libraries, and more importantly, if you move to a different machine, or share the code with someone, it will work for them too.

Note:  ** I strongly** recommend you use something like Ipython QtConsole to run this code. That’s because the tweets will be in different languages, and most shells won’t be able to print them. Windows certainly won’t, and will throw Unicode errors. Linux might as well; although it is possible to configure Linux shells to print unicode, I haven’t done so myself. Stick to Ipython QtConsole.

Okay, to the code. We’ll look at the whole code first, and then go line by line.

Ouch. That’s ugly. The only possible response to that code is to fall down in agony, screaming My eyes! God, my eyes.

But let’s go over it anyway:

We import tweepy and the file that contains our password.

This code sets up our access. Like many APIs, you have authenticate yourself before use. This code has been taken from Tweepys docs, so nothing to explain.

We are searching for Python. BTW, if you are unsure about something, it might be easier to look at Twitter’s official docs, as Tweepy isn’t well documented. So if you don’t know what q means in the code above, you can check the search function in Twitter.

It tells us q is the search term. So obvious (not).

A small problem with Twitter’s API is that it is not clear how to only return a few results. The link I gave has confusing instructions. In Tweepy, doing .items(5) works, though I had to search a lot to find this.

The search returns an iterator, and we loop over it, printing our search terms.

The trends_place function returns the trends. 1 means global trends. Again, this returns a JSON iterator. We loop over the [“trends”] part of the JSON list and print that.

To understand this more, try printing the whole of t2 and t2[0][“trends”] and t2[0][“trends”][0], and you’ll understand what the returned Json looks like. This is important, as we will be messing around with Json a lot, as that’s what Twitter will usually return: Huge & messy Json lists.

Code Review

Okay, do I need to say it? Ugly.

In the previous section we talked about using a coding standard. I’ll use pep8, which is Python’s sort of official standard (major libraries like Qt & Unittest don’t use it strictly). Even better, we’ll use autopep8, which will fix minor formatting problems. Things like proper spacing, indentation, etc.

The -i flag means do the changes inline (normally, autopep8 just prints the changes on screen, and expects you to pipe them to a different file). -a enables aggressive mode, which isn’t as bad as it sounds, as it just force fixes spacing issues. Like in the code above,

will become

Two extra spaces, but it’s a lot easier to read.

Right, I’ll just give you the cleaned up code now, and let you judge if it’s much easier to read:

search_results instead of tt, result instead of t. This makes it clearer what we are doing. Have the two codes side by a side, and see the difference good variable naming makes.

The Streaming API

The streaming api continuously reads data from Twitter. You get a live stream of live time feed, but this is less than 10% of all live Tweets. Anymore, and you need to pay. However, even this 10% is more than enough for us to handle.

Let’s look at the code:

I’ll only go over the new code.

You need to define a class, inherited from Tweepy StreamListener. There are 2 functions you must have: on_data(), which tells Tweepy what to do when a new Tweet is available. In our code, we are loading the json data using the inbuilt json library, and printing the text part of the tweet. The other function is on_error(), and we just print the error status, if we get an error.

We create a Twitter stream with our authentication data and the class we created, and then we call the sample() function to get a sample of Tweets.

If you run this code, you might only get 1-2 tweets, or even none. We’ll come back to that. First, we fix our code. Here is the improved version:

Okay, before we go ahead, look at the 2 versions of the code side by side, by opening the code in separate tabs in your web browser.

For twit1.py:   Messy and clean version

For Twit2.py: Messy & Clean

Next Part: Our streaming app should continue printing tweets till it is killed with Ctrl C. So why does it stop after just a few (or even none) tweets? We’ll fix that.

PS: Want a free 1+ hour video course, Introduction to Web Scraping and Data Analysis? Also get a free mini-book, Python: From Apprentice to Master.

* indicates required

I will never spam you. Unsubscribe anytime.

Leave a Reply