Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Thanks! It wouldn't be so bad if there weren't so many duplicates :)


sort by: page size:

Thanks :) Duplicate entries are not allowed, but I'm still trying to figure out the best way to handle them. If you come across one, simply flag the last to be entered so I can manually delete them. SICP also got added a couple of times.

I'm seeing a fair few duplicates in the results, probably need to work on the algorithm for filtering these out.

This is great! My only complaint is that it comes up with lots of near duplicates. The images look they are different frames, but the quotes they reference are the same

You have a lot of duplicate entries there.

yeah I agree, that is a good idea to help clean up duplication. Thanks for the comments, let me know what else you can think of.

It shouldn't allow duplicates, I'll work on the rest.

Looks very cool! A couple things however that I noticed. There seem to be a lot of duplicates which clutter the search results. Also, the first thing I looked for was available, but the correct result didn't appear until the second page.

During search, we do remove duplicates. It's not a bad idea though and I'll see how we can support it

I just modified the way the queries run; duplicates should be MUCH less common now.

Looks cool, you have a lot of duplicates, though.

   $ cat sorted_unique_cf.txt | wc -l
    7385121

   $ cat sorted_unique_cf.txt | uniq |wc -l
    4287625

Yeah, I agree. I didn't expect such a quick influx of new bits- definitely need a way to detect duplicates.

Nice! haha thanks for checking it out! I noticed an error in my Duplication template, so I think I was creating too many per user. You are most likely around first 5 users ;p

I had duplicates, it's actually only 4,287,625 (still a lot though).

Fixed the duplicates: https://github.com/pirate/sites-using-cloudflare/raw/master/...


Very interesting concept. I signed up; we'll see where it goes.

One thing to note, though - on the left I'm seeing a number of duplicated entries, with the duplicates immediately after the original. I'm using Firefox, if that matters.


Huh, there's a ton of duplicates in the data set... I would have expected that it would be worthwhile to remove those. Maybe multiple descriptions of the same thing helps, but some of the duplicates have duplicated descriptions as well. Maybe deduplication happens after this step?

http://laion-aesthetic.datasette.io/laion-aesthetic-6pls/ima...


I just pruned all the duplicates right now. Will work on the today, last week, last month sections tonight.

We're still testing this out with real data, but it looks like it's actually quite useful to have duplicates in the database. The key is how to return results to someone coming along later. We're working on that now.

We committed upfront to not letting the site become overrun with "Yahoo Answers" style duplicated/low quality stuff. We'd much rather delete useless stuff than get an extra page view or two.


It would be nice if 5x duplicates didn't show up in the RSS feed.

Awesome concept! Thanks for building this.

One possible issue - There needs to be some kind of search for finding similar "VimBits". I wanted to add some of my favorites, but I have no way of knowing whether they're there already without reading through everything on the site.

You'll probably get a massive amount of duplication unless you can implement some kind of "suggested duplicates" on the create page, like Stack Overflow.

next

Legal | privacy