Build a Reddit Bot Series
So we are going to build a simple Reddit Bot that will do two things:
1. It will monitor a particular subreddit for new posts, and when someone posts “I love Python”, it will reply “Me too!”.
2. It will also monitor all comments to recent posts, and if it finds one that says “I hate Python”, it will post a link to /r/learnpython and ask the commenter to ask a question there.
Only a basic knowledge of Python is required, as building bots is fairly easy.
In part one, we will see how we can read data from Reddit using the Reddit API. You can follow along with the video or text (or both). The source code is available at Github:
A software bot is a program that can interact with websites autonomously. They can be as simple or as complex as you want them to be.
The bot runs in the background and monitors a website. When it sees a change (like a post on Reddit), it can reply to it, upvote, or do any other task it was programmed to.
There are many ways to monitor websites. You can use web scraping tools like urllib or Beautifulsoup any anything similar. There is a slight problem with this, though. Bots can make thousands of requests a second, and this can overload servers. So most big websites ban bots. Ignore this at your own risk. I have been banned from Google for hours, had my Gmail locked till I entered a dozen captachas, my mobile and the name of my first cat.
If you want to do this properly, stick to any rules the website has.
Reddit provides an API, and unlike some websites, it’s actually quite easy to use. It’s based on REST and json, so in theory doesn’t require any fancy setup.
The important thing is to follow the rules they set. Two of the most important ones are:
- You can’t make more than 1 request every 2 seconds (or 30 a minute)
- You must not lie about your user agent
Read the rest here.
The user agent is what identifies your browser. Libraries like Python’s urllib are severely restricted by Reddit to prevent abuse. Reddit recommends you use your own special user agent, and that’s what we’ll do.
Using the API
The API is quite easy to use, like I said. You make a REST request, and this can be done via urllib2 (as long as you set the user agent properly). This is how you would do it. I have put two links below. Open both in a new tab:
The first is how a human would see it. The second is how your code sees it. As you can see, getting the json is fairly easy.
The problem with this approach is that you still have to make sure you rate limit your requests. You also have to parse the json yourself. Json is easy to parse in Python, as it’s essentially a Python dictionary, but if you actually look at the json, there is a lot of data.
Praw is a library that fixes many of these problems for you. It limits how many requests you can make, and makes it easy to extract the json. Install it by:
pip install praw
Let’s go over the code now. Download it at Github.
We import praw.
user_agent = ("PyEng Bot 0.1")
Remember I said the Reddit rules say you have to have a specific user agent? I’m choosing the name PyEng Bot. The number at the end is the version. This is recommended, because once your code is out there, people might abuse it. If someone spams Reddit with your code, Reddit will ban that user agent.
In that case, you just move the version up. Not ideal, but you have to accept that your code may be misused by spammers.
r = praw.Reddit(user_agent = user_agent)
subreddit = r.get_subreddit("learnpython")
We create a Reddit instance, and get the subreddit learnpython.
Now, if you look on the subreddit, you can see that there is a hot tab. This does not indicate the temperature there is high or that there are racy swimsuit models. It means the most popular posts. That’s what we are going to read now. The function to do so is get_hot().
for submission in subreddit.get_hot(limit = 5):
We get the top 5 hot submissions. At this stage, you can do this to see which functions are available (you can do that at any stage, or look at Praw’s documentation).
Seeing a snipped list:
I’ll point out a few important ones. Title is the title, as it appears on Reddit’s main page. Selftext is the optional text you can put on posts- most posts don’t have these. learnpython is unique in that most posts do have text (usually the poster asking their question), which is why I’ve chosen it. score is the total score, adding upvotes and downvotes (both of which are also available).
These are the three we will print:
for submission in subreddit.get_hot(limit = 5):
print "Title: ", submission.title
print "Text: ", submission.selftext
print "Score: ", submission.score
That’s it. Run the script, and open Reddit in a browser at the same time. Check that you are getting the right results.
Next time we will look at how to send a reply to a post on Reddit. Next Part
PS: Want a free 1+ hour video course, Introduction to Web Scraping and Data Analysis? Also get a free mini-book, Python: From Apprentice to Master.
I will never spam you. Unsubscribe anytime.