Hacker Read

NicoJuicy · 2021-01-11 15:54:38+00:00

I crawled reddit in several topics.

It's supported through their api.

reply

toyg | karma 35240 | avg karma 2.97 · | 2020-04-09 16:13:49+00:00

Probably with the official Reddit API? There are several libraries for it.

diminoten | karma 1949 | avg karma 0.55 · | 2019-10-13 22:17:15+00:00

Yeah no need to scrape Reddit, their content is accessible via their API.

gitgud | karma 5259 | avg karma 2.23 · | 2018-08-21 03:26:27

This is such a cool idea! Does it scrape reddit, or use the API?

TazeTSchnitzel | karma 26116 | avg karma 2.81 · | 2012-10-27 00:02:59+00:00

Isn't that the non-public internal reddit API? They have proper APIs for other things, I think.

marginalia_nu | karma 21123 | avg karma 4.08 · | 2023-02-03 17:16:32

Data is available for at least the first three of those (reddit isn't "officially" available but there are 3rd party dumps).

Why not load it into OpenSearch or some such?

reply

constantly | karma 1089 | avg karma 3.57 · | 2023-06-14 13:13:51

Not to worry, you can still search google/etc., which scrapes Reddit, or Reddit itself through its native search. No need to use an API explicitly to access content like that.

pixelatedindex | karma 508 | avg karma 2.08 · | 2023-06-13 11:55:01

Reddit's APIs are not user-based from what I can tell.

bshipp | karma 1738 | avg karma 4.26 · | 2021-12-16 18:39:26

but it's not really necessary for Reddit. their API is fairly robust and there are numerous options for scraping the site.

O_H_E | karma 1534 | avg karma 1.84 · | 2019-01-27 04:30:48+00:00

Reddit have a Public API, thats how all these unofficial clients work. Don't know for sure if this site is using it. (It probably is)

qwerty456127 | karma 8748 | avg karma 1.93 · | 2019-12-08 22:12:22+00:00

By the way, isn't there an open-source desktop-native client for reading (and searching through) Reddit?

nandemo | karma 4159 | avg karma 2.37 · | 2017-06-07 08:09:40+00:00

This doesn't address your question directly, but are you aware that Reddit provides an API? Why not use it instead of scraping?

tumult | karma 4430 | avg karma 4.94 · | 2023-05-31 14:53:14

Reddit is a website. Just make normal browser requests. You don't have to use their "sanctioned" "API."

whb07 | karma 764 | avg karma 1.23 · | 2020-03-02 13:06:32+00:00

reddit has a JSON api you can access with your credentials.

rezashirazian | karma 1033 | avg karma 3.12 · | 2016-09-16 15:03:59+00:00

Not particularly. The one thing that helps but most people don't know about reddit is the fact that adding a .json to the end of each url displays the content of that page in json format.

for example: reddit.com/r/funny.json

This make crawling/fetching content from reddit much more trivial than old school web crawling.

reply

citricsquid | karma 14884 | avg karma 6.97 · | 2012-06-22 20:37:13+00:00

They are fine with it as long as you abide to their terms, they have a subreddit dedicated to reddit development and the reddit api which has discussion of scraping: http://www.reddit.com/r/redditdev

masklinn | karma 65147 | avg karma 3.36 · | 2009-09-07 16:56:42+00:00

There's an open-source Reddit client out there, if yc provided some kind of API (you don't want to do page scraping on the iphone) it would probably be pretty easy to adapt it.

stuck_in_the_ma | karma 24 | avg karma 2.4 · | 2018-07-31 01:58:59

I built redditsearch.io -- It uses the pushshift API for the back-end. It was thrown together and barely works but hey, I'm just one guy maintaining this as a labor of love. :)

jwcrux | karma 2268 | avg karma 7.32 · | 2014-08-11 16:59:10+00:00

Why not use PRAW? It's very mature, useful library using the Reddit API.

ok123456 | karma 2762 | avg karma 2.22 · | 2023-04-19 10:47:48

The reddit api was never very good. It's easier just to scrape the site.