Build a Reddit Bot Part 1

Build a Reddit Bot Series

Part 1: Read posts from reddit

Part 2: Reply to posts

Part 3: Automate our Bot

Introduction

So we are going to build a simple Reddit Bot that will do two things:

1. It will monitor a particular subreddit for new posts, and when someone posts “I love Python”, it will reply “Me too!”.

2. It will also monitor all comments to recent posts, and if it finds one that says “I hate Python”, it will post a link to /r/learnpython and ask the commenter to ask a question there.

Prerequisite knowledge

Only a basic knowledge of Python is required, as building bots is fairly easy.

Part 1

In part one, we will see how we can read data from Reddit using the Reddit API. You can follow along with the video or text (or both). The source code is available at Github:

 

Software bot

A software bot is a program that can interact with websites autonomously. They can be as simple or as complex as you want them to be.

The bot runs in the background and monitors a website. When it sees a change (like a post on Reddit), it can reply to it, upvote, or do any other task it was programmed to.

Monitoring websites

There are many ways to monitor websites. You can use web scraping tools like urllib or Beautifulsoup any anything similar. There is a slight problem with this, though. Bots can make thousands of requests a second, and this can overload servers. So most big websites ban bots. Ignore this at your own risk. I have been banned from Google for hours, had my Gmail locked till I entered a dozen captachas, my mobile and the name of my first cat.

If you want to do this properly, stick to any rules the website has.

Reddit API

Reddit provides an API, and unlike some websites, it’s actually quite easy to use. It’s based on REST and json, so in theory doesn’t require any fancy setup.

http://www.reddit.com/dev/api

The important thing is to follow the rules they set. Two of the most important ones are:

  •  You can’t make more than 1 request every 2 seconds (or 30 a minute)
  • You must not lie about your user agent

Read the rest here.

The user agent is what identifies your browser. Libraries like Python’s urllib are severely restricted by Reddit to prevent abuse. Reddit recommends you use your own special user agent, and that’s what we’ll do.

Using the API

The API is quite easy to use, like I said. You make a REST request, and this can be done via urllib2 (as long as you set the user agent properly). This is how you would do it. I have put two links below. Open both in a new tab:

http://www.reddit.com/r/learnPython/

http://www.reddit.com/r/learnPython/hot/.json

The first is how a human would see it. The second is how your code sees it. As you can see, getting the json is fairly easy.

The problem with this approach is that you still have to make sure you rate limit your requests. You also have to parse the json yourself. Json is easy to parse in Python, as it’s essentially a Python dictionary, but if you actually look at the json, there is a lot of data.

Introducing Praw

Praw is a library that fixes many of these problems for you. It limits how many requests you can make, and makes it easy to extract the json. Install it by:

Let’s go over the code now. Download it at Github.

We import praw.

Remember I said the Reddit rules say you have to have a specific user agent? I’m choosing the name PyEng Bot. The number at the end is the version. This is recommended, because once your code is out there, people might abuse it. If someone spams Reddit with your code, Reddit will ban that user agent.

In that case, you just move the version up. Not ideal, but you have to accept that your code may be misused by spammers.

We create a Reddit instance, and get the subreddit learnpython.

Now, if you look on the subreddit, you can see that there is a hot tab. This does not indicate the temperature there is high or that there are racy swimsuit models. It means the most popular posts. That’s what we are going to read now. The function to do so is get_hot().

We get the top 5 hot submissions. At this stage, you can do this to see which functions are available (you can do that at any stage, or look at Praw’s documentation).

Seeing a snipped list:

I’ll point out a few important ones. Title is the title, as it appears on Reddit’s main page. Selftext is the optional text you can put on posts- most posts don’t have these. learnpython is unique in that most posts do have text (usually the poster asking their question), which is why I’ve chosen it. score is the total score, adding upvotes and downvotes (both of which are also available).

These are the three we will print:

That’s it. Run the script, and open Reddit in a browser at the same time. Check that you are getting the right results.

Next time

Next time we will look at how to send a reply to a post on Reddit. Next Part

 

PS: Interested in leveling up your Python and getting a great job? Check out the Python Apprenticeship Program.

Leave a Reply

  1. Running the script returns the following error:

    File “bot_read.py”, line 11
    print “Title: “, submission.title
    ^
    SyntaxError: Missing parenthesis in call to ‘print’

  2. Thank you for this amazing post. I am enjoying learning Python because of such build yourself tutorials. Watching your screen show the output and not an error is an amazing confidence booster. Keep up the awesome work.

  3. Hey man. This bot has helped me massively. I am so close to completing a project I have been working on. I have been trying to find out how to extract the comments from submissions and this has helped.

    However when I edited the code I got this error:

    posts_replied_to.append(submission.id)
    AttributeError: ‘filter’ object has no attribute ‘append’

    I can’t see why it is suddenly not working, when it worked before

    • you’re welcome.

      The error most likely means posts_replied_to has not been initialised correctly. Put a break point right before that line and check what it’s been set to.

      • Thanks I read somewhere that it was a python 2 vs python 3 problem with the use of the term “append”

        I solved it with this code instead:

        with open(“posts_replied_to.txt”, “a”) as myfile:
        myfile.write(submission.id + “\n”)

        Thanks again