Build a Reddit Bot Part 1

Build a Reddit Bot Series

Part 1: Read posts from reddit

Part 2: Reply to posts

Part 3: Automate our Bot

Part 4: Marvin the Depressed Bot

Introduction

So we are going to build a simple Reddit Bot that will do two things:

  1. It will monitor a particular subreddit for new posts, and when someone posts “I love Python”, it will reply “Me too!”.
  2. It will also monitor all comments to recent posts, and if it finds one that says “I hate Python”, it will post a link to /r/learnpython and ask the commenter to ask a question there.

Prerequisite knowledge

Only a basic knowledge of Python is required, as building bots is fairly easy.

Part 1

In part one, we will see how we can read data from Reddit using the Reddit API. The source code is available at Github:

 

 

Software bot

A software bot is a program that can interact with websites autonomously. They can be as simple or as complex as you want them to be.

The bot runs in the background and monitors a website. When it sees a change (like a post on Reddit), it can reply to it, upvote, or do any other task it was programmed to.

Monitoring websites

There are many ways to monitor websites. You can use web scraping tools like urllib or Beautifulsoup any anything similar. There is a slight problem with this, though. Bots can make thousands of requests a second, and this can overload servers. So most big websites ban bots. Ignore this at your own risk. I have been banned from Google for hours, had my Gmail locked till I entered a dozen captachas, my mobile and the name of my first cat.

If you want to do this properly, stick to any rules the website has.

Reddit API

Reddit provides an API, and unlike some websites, it’s actually quite easy to use. It’s based on REST and json, so in theory doesn’t require any fancy setup.

http://www.reddit.com/dev/api

The important thing is to follow the rules they set. Two of the most important ones are:

  •  You can’t make more than 1 request every 2 seconds (or 30 a minute)
  • You must not lie about your user agent

Read the rest here.

The user agent is what identifies your browser. Libraries like Python’s urllib are severely restricted by Reddit to prevent abuse. Reddit recommends you use your own special user agent, and that’s what we’ll do.

Using the API

The API is quite easy to use, like I said. You make a REST request, and this can be done via urllib2 (as long as you set the user agent properly). This is how you would do it. I have put two links below. Open both in a new tab:

http://www.reddit.com/r/learnPython/

http://www.reddit.com/r/learnPython/hot/.json

The first is how a human would see it. The second is how your code sees it. As you can see, getting the json is fairly easy.

The problem with this approach is that you still have to make sure you rate limit your requests. You also have to parse the json yourself. Json is easy to parse in Python, as it’s essentially a Python dictionary, but if you actually look at the json, there is a lot of data.

Introducing Praw

[Update Dec 2016: Reddit and Praw now force you to use Oauth. I’ve updated the article to use that]

Praw is a library that fixes many of these problems for you. It limits how many requests you can make, and makes it easy to extract the json. Install it by:

You need to do some setup first.

Create Reddit App

Go to: https://www.reddit.com/prefs/apps/

And select Create App:

Give it a name. You have to choose a redirect uri (for some stupid reason, stupid because Im building a bot, not a webapp, but whatever). I chose http://127.0.0.1

You will now get a client_id (red box below) and secret (blue box below). Note it down, but keep it secret.

Now, you need to update your praw ini file to remember these settings. Otherwise, you’ll have to put them in your script and thats dangerous (as others might see them).

This page describes how to change praw.ini files: https://praw.readthedocs.io/en/v4.0.0/getting_started/configuration/prawini.html

You will find the file in your Python install folder, under Lib\Site-Packages\praw\praw.ini

Update: As Bryce points out in the comments:

I don’t recommend modifying the package-level praw.ini as those changes will be overwritten every time the package is updated. Instead praw.ini should be placed in the directory that the program is run from (often the same directory as the file).

Other options are specified here: https://praw.readthedocs.io/en/latest/getting_started/configuration/prawini.html#praw-ini-files

I recommend following Bryce’s advice.

Add the values we noted down:

client_id and client_secret are what you wrote down. Username and password are your account details (and optional if you only want read only access).

There is a new field: user_agent.

Remember I said the Reddit rules say you have to have a specific user agent? I’m choosing the name PyEng Bot. The number at the end is the version. This is recommended, because once your code is out there, people might abuse it. If someone spams Reddit with your code, Reddit will ban that user agent.

In that case, you just move the version up. Not ideal, but you have to accept that your code may be misused by spammers.

Let’s go over the code now. Download it at Github.

We import praw.

We create a Reddit instance using the values we saved under bot1.

Then we get the subreddit learnpython.

Now, if you look on the subreddit, you can see that there is a hot tab. This does not indicate the temperature there is high or that there are racy swimsuit models. It means the most popular posts. That’s what we are going to read now. The function to do so is get_hot().

We get the top 5 hot submissions. At this stage, you can do this to see which functions are available (you can do that at any stage, or look at Praw’s documentation).

Seeing a snipped list:

I’ll point out a few important ones. Title is the title, as it appears on Reddit’s main page. Selftext is the optional text you can put on posts- most posts don’t have these. learnpython is unique in that most posts do have text (usually the poster asking their question), which is why I’ve chosen it. score is the total score, adding upvotes and downvotes (both of which are also available).

These are the three we will print:

That’s it. Run the script, and open Reddit in a browser at the same time. Check that you are getting the right results.

Next time

Next time we will look at how to send a reply to a post on Reddit. Next Part

 

47 thoughts on “Build a Reddit Bot Part 1”

  1. Running the script returns the following error:

    File “bot_read.py”, line 11
    print “Title: “, submission.title
    ^
    SyntaxError: Missing parenthesis in call to ‘print’

  2. Thank you for this amazing post. I am enjoying learning Python because of such build yourself tutorials. Watching your screen show the output and not an error is an amazing confidence booster. Keep up the awesome work.

  3. Hey man. This bot has helped me massively. I am so close to completing a project I have been working on. I have been trying to find out how to extract the comments from submissions and this has helped.

    However when I edited the code I got this error:

    posts_replied_to.append(submission.id)
    AttributeError: ‘filter’ object has no attribute ‘append’

    I can’t see why it is suddenly not working, when it worked before

    1. you’re welcome.

      The error most likely means posts_replied_to has not been initialised correctly. Put a break point right before that line and check what it’s been set to.

      1. Thanks I read somewhere that it was a python 2 vs python 3 problem with the use of the term “append”

        I solved it with this code instead:

        with open(“posts_replied_to.txt”, “a”) as myfile:
        myfile.write(submission.id + “\n”)

        Thanks again

  4. Hello there, and thank-you for this awesome tutorial! Sadly though, when I run this script, it returns the following:

    Traceback (most recent call last):
    File “untitled0.py”, line 6, in
    r = praw.Reddit(user_agent = user_agent)
    File “/anaconda/lib/python3.5/site-packages/praw/reddit.py”, line 114, in __init__
    raise ClientException(required_message.format(attribute))
    praw.exceptions.ClientException: Required configuration setting ‘client_id’ missing.
    This setting can be provided in a praw.ini file, as a keyword argument to the Reddit class constructor, or as an environment variable.

    Any help would be greatly appreciated. Thank-you again!

    1. Jeff, it seems praw have updated their script (to keep up with reddit changes to Ouath).

      I’ll have to update the article. Thanks for letting me know, I’ll get back to you soon.

  5. Thanks for the post! Everything worked for me except that apparently reddit.get_subreddit() has been deprecated in favor of reddit.subreddit()

  6. hi. first of all, this is a great tutorial. but i’m having a problem. when i try to install praw i get this error: File “”, line 1 pip install praw SyntaxError: invalid syntax. i’ve been stuck on this for a while now and would really appriciate your help.

      1. the command is pip install praw and the error says File””, line 1 pip install praw ^ SyntaxError: invalid syntax
        I am trying to install it through the python command line.

  7. when I use python script.py, I get:

    prawcore.exceptions.OAuthException: unauthorized_client error processing request (Only script apps may use password auth)

  8. Great tutorial! One suggestion, however, pertains to this line:

    > You will find the file in your Python install folder, under Lib\Site-Packages\praw\praw.ini

    I don’t recommend modifying the package-level praw.ini as those changes will be overwritten every time the package is updated. Instead praw.ini should be placed in the directory that the program is run from (often the same directory as the file).

    Other options are specified here: https://praw.readthedocs.io/en/latest/getting_started/configuration/prawini.html#praw-ini-files

    One other comment is that for PRAW4 the following line:

    subreddit = r.get_subreddit(“learnpython”)

    should now be:

    subreddit = r.subreddit(“learnpython”)

  9. Hi, for whatever reason I’m getting an SSL error when I try to run the script. I’ve currently got Python 2.7.13.

    Here’s the error:
    File “/home//.local/lib/python2.7/site-packages/prawcore/requestor.py”, line 48, in request
    raise RequestException(exc, args, kwargs)
    prawcore.exceptions.RequestException: error with request Can’t connect to HTTPS URL because the SSL module is not available.

    I’ve tried to reinstall the SSL module using “pip install ssl”, but that doesn’t work because, according to the error message, it’s “already built in”.

    I’ve been googling for hours but to no avail, would really appreciate some help with this. No one else seems to be in my situation…

    Great tutorial otherwise!
    -Hugh

  10. I am getting error –> NoSectionError: No section: ‘bot1’

    I have updated my praw..ini file as follow:

    [bot1]
    client_id: 6LuNIgq******Q
    client_secret: xtbrrx********DGiv4GxFE
    username: **********
    password: **********
    user_agent: python_bot 0.1

      1. You have set the user agent. Is there a typo, or are you not setting the user agent correctly? What value are you using?

  11. I noticed you have…

    reddit = praw.Reddit(‘bot1’)
    subreddit = r.subreddit(“learnpython”)

    …in the code example. Should be…

    r = praw.Reddit(‘bot1’)
    subreddit = r.subreddit(“learnpython”)

    …or…

    reddit = praw.Reddit(‘bot1’)
    subreddit = reddit.subreddit(“learnpython”)

    1. Thanks for pointing that out! I had updated the code, but didnt update the article (at least, not properly).

      cheers!

  12. Hello I have installed praw successfully in my C:/Python folder and updated the praw.ini file to include the bot info. When i run my script (located in a different dir, my cygwin dir ) I get the following error. Any ideas?

    Traceback (most recent call last):
    File “first_script.py”, line 2, in
    import praw
    ImportError: No module named praw

    1. Is Python on the path? How did you install Python? I recommend Anaconda Python, as it comes with a lot of libraries, and also adds itself to the path for you.

      1. Hey you must forgive me, I am very new to these environment variables. I can execute python files from my python directory but not in my cygwin directory, does that mean I need to add my home cygwin directory to the path? any clarification or a point in the right direction would be super helpful. I am sure I am overlooking something small.

        1. Another P.S lol I am able to run my hello world python program from that directory. I went ahead and added the cygwin directory to the path as well, I am still experiencing the same issue.

      2. Sorry for all of the replies/spam. It looks like my problem is rooted with cygwin, I have successfully ran it from the IDLE shell.

  13. Hi, I’ve used your code but I keep getting a long error message that ends in “prawcore.exceptions.OAuthException: invalid_grant error processing request”. Apparently it’s something to do with the praw.ini file but I can’t make it work. Do you recognise the error and how can I fix it?

      1. I feel stupid – I had the bot’s username rather than my Reddit account’s. I put that in and it worked fine, thank you

  14. You say to put the praw.ini file in the program’s folder and not to use the file in the package directory. If I put the file in the new folder, don’t I have to point the script to use the new .ini file?

    1. Nevermind, I found out that it checks for the praw.ini file in the script’s folder first and only uses the one in the package directory if it did not find it anywhere else.

Leave a Reply