Hacker Read

squiggleblaz · 2019-09-02 23:28:23

But doesn't that just mean the closure of duplicates is irrelevant? You're already getting a never-ending stream of the same basic low quality beggings. So the two possibilities are - filter them or go away.

DanielStraight | karma 4787 | avg karma 3.5 · | 2009-08-28 12:50:27

It would only be useful to me if it removed duplicates. Otherwise, what's the point? It's just seeing the same thing twice.

0x7f800000 | karma 138 | avg karma 4.6 · | 2018-02-22 14:25:31

It would be nice if 5x duplicates didn't show up in the RSS feed.

zingplex | karma 580 | avg karma 3.17 · | 2016-11-11 12:46:13

By deduplication, do you mean that you want articles on the same subject to be filtered out or do you just want the same article not to be displayed multiple times?

hackuser | karma 12788 | avg karma 2.89 · | 2016-11-11 15:42:44

> By deduplication, do you mean that you want articles on the same subject to be filtered out or do you just want the same article not to be displayed multiple times?

I just want the same article not displayed multiple times. For example, if you subscribe to multiple feeds from a newspaper, some articles which fit more than one category will appear multiple times.

Articles on the same subject would be handled by 'grouping', according to what I wrote above.

reply

Jenk | karma 3359 | avg karma 4.9 · | 2021-05-25 08:14:25+00:00

Meh. It's just a method of filtration.

When you can only tackle so many things there is 0 point in having a bug tracker that will only inflate and with little in the way of automatic triage determining the priority it is a borderline impossible task to wade through them.

Allowing duplicates is fine, the important stuff comes up again whilst the unimportant bits die off.

reply

xstartup | karma 900 | avg karma 1.45 · | 2018-04-07 07:08:18+00:00

One of the major problems is removing duplicate or near duplicate content like images, text etc....

conradludgate | karma 1079 | avg karma 2.55 · | 2023-05-06 07:11:51

During search, we do remove duplicates. It's not a bad idea though and I'll see how we can support it

willnonya | karma 102 | avg karma 0.31 · | 2022-08-21 14:56:55

While intended to agree the duplicates need to be easily identifiable and preferably filterable by quality for bulk downloads.

jefftk | karma 22506 | avg karma 4.92 · | 2024-01-21 06:20:22

I also don't understand why they're doing it. On my long-running full history duplicates cost me a factor of 2 in space, and presumably the same in search time (but still perceptually instant).

smichel17 | karma 2534 | avg karma 2.39 · | 2018-02-12 22:04:01+00:00

Without questioning this line of thought, it seems like deduplicating by lowercasing and perhaps removing dots is a good choice, but stripping +suffixes seems likely to generate more user annoyance than it prevents. If I filter based on those suffixes and you send me mail and strip the suffix, I'm going to be pissed.

justincormack | karma 11120 | avg karma 2.4 · | 2013-11-11 09:09:12+00:00

No, networks are dumb. They do not detect duplicates that would require vast storage! Unless you have a tcp (eg http) proxy on the route you will be the one filtering the duplicates.

bd_at_rivenhill | karma 591 | avg karma 2.39 · | 2013-04-02 13:42:26+00:00

I'm seeing a fair few duplicates in the results, probably need to work on the algorithm for filtering these out.

staunch | karma 28228 | avg karma 4.34 · | 2011-09-24 02:04:51+00:00

We're still testing this out with real data, but it looks like it's actually quite useful to have duplicates in the database. The key is how to return results to someone coming along later. We're working on that now.

We committed upfront to not letting the site become overrun with "Yahoo Answers" style duplicated/low quality stuff. We'd much rather delete useless stuff than get an extra page view or two.

reply

maxerickson | karma 36996 | avg karma 2.02 · | 2020-09-12 19:10:44+00:00

With single stream everything is contaminated.

And then many of the categories simply aren't worth anything, sorted or not.

reply

tharkun__ | karma 2422 | avg karma 1.96 · | 2023-03-14 19:32:40

+100

Which is exactly why static analysis tools that force you to do something need to be shot. Static analysis tools that inform you about a possible duplicate are totally fine. Give me an option to disable that particular instance.

Co-incidentally, micro-services do away with such problems in many cases due to the fact that code is "separate" and thus analyzers and sticklers don't find the "duplicates" and you can write beautifully simple code. Unfortunately it has the opposite problem then of leading to things like this Netflix architecture https://res.infoq.com/presentations/netflix-chaos-microservi... but for something simple like a personal blog (yes I exaggerate - slightly)

In the end I think the only solution is to have the right people and stay small enough to keep the right culture. That probably goes against all your metrics and growth goals of the company of course.

reply

bsaunder | karma 869 | avg karma 2.4 · | 2009-10-21 13:35:04+00:00

1. Why isn't there a duplicate filter that catches this?

2. Seems like a partial consequence of the higher churn rate modifications.

reply

verdverm | karma 6501 | avg karma 0.84 · | 2022-06-29 03:20:20

The removing duplicates section is the kind of thing that has me worried about technologies like this

md224 | karma 3679 | avg karma 5.35 · | 2014-04-26 03:07:28+00:00

Are you saying this is a bad thing? Personally I think it's a good thing that the duplicate detector is easy to get around. Allows for submissions to get multiple chances.

siculars | karma 4597 | avg karma 3.44 · | 2011-06-29 23:02:41

I would recommend a user curated feature to that marks 'overlaps' as duplicates that then get sent to some human to moderate. Similar to user curated spam or inappropriate material.