Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I know. GPs solution was to point them to a URL which you control, with a suggestion that FN owners would go as far as to scrape these.

Edit: they are obviously already filtering the topics themselves.



sort by: page size:

Thank you for mentioning this. It's so convenient to manage that kind of filtering through a service. Works for all platforms too. I can finally bteak my reddit addiction again.

Isn't that solved by using the more specific filter site:reddit.com?

Seems like reddit has some thoughts how to handle huge pile of information which is 'out there'. As ardent reddit lurker this was most needed feature of reddit.

If this solves the problem where their results are useless unless you add "reddit", it's the best thing they could be doing.

Absolutely, I loved seeing spikes in my traffic and going to the reddit/HN thread that caused it. I guess you still can manually figure it out using a search engine maybe, but like you said the smaller weird ones will go under the cover probably.

Reddit does have some subreddits that deal with this. And Nuuton will have that functionality once it gets out of ALPHA (in about 6 months).

Yeah, that was a key design decision from the start. All posts and comments made on the platform are indexable by search engines.

The https://reddit.com/r/datahoarder community is working on it

My current solution for this is to just tag `site:reddit.com` to the beginning of Google searches. A Google search for `site:reddit.com best miter saw` has a lot of relevant results.

Marketers/SEO people are starting to infiltrate this as well, but since they can't control and SEO the content on Reddit nearly as much, this still works pretty well for now.


I crawled reddit in several topics.

It's supported through their api.


Great idea! I immediately thought "Why didn't I think of that!?"

With regards to the privacy concerns of Research mode, there may be a way solution. For sites like Reddit, it should be possible to build a bloom filter. Have the metafruit server actively spidering Reddit for new, popular threads and add them to a bloom filter. The plugin would download the bloom filter from the metafruit server at some regular interval. That way checking whether any particular URL has an associated conversation is just a local operation. Plus, it's faster than pinging an API, and burns less of the target API's resources.

That would also provide a way to monetize, by giving out the metafruit bloom filter to subscribers only. Or perhaps the free plugin can update its bloom filter once a day, but subscribers can update once per hour.


I'd like to be able to enter a URL and see every place on the internet that people have talked about it.

Reddit is pretty well indexed.


Just append .json to any Reddit URL and you'll get a full dump of that page, we'll see if they get rid of this feature as well. Way easier than scraping.

Something like what keybase does with e.g. reddit should work.

How do you ensure that Google is able to index it? Do you sniff the user-agent and serve it a pre-rendered view of the discussion thread?

You can already simply click a domain on reddit and it shows you all the posts that linked it.

I wonder to what extent this is an early step toward walling off Reddit's content from third-party search engines. They probably recognize that without a functional search they'd take a big hit in such a scenario, but with search under their own control they can better influence how people end up seeing various content.
next

Legal | privacy