We're still testing this out with real data, but it looks like it's actually quite useful to have duplicates in the database. The key is how to return results to someone coming along later. We're working on that now.
We committed upfront to not letting the site become overrun with "Yahoo Answers" style duplicated/low quality stuff. We'd much rather delete useless stuff than get an extra page view or two.
Huh, there's a ton of duplicates in the data set... I would have expected that it would be worthwhile to remove those. Maybe multiple descriptions of the same thing helps, but some of the duplicates have duplicated descriptions as well. Maybe deduplication happens after this step?
Looks very cool! A couple things however that I noticed. There seem to be a lot of duplicates which clutter the search results. Also, the first thing I looked for was available, but the correct result didn't appear until the second page.
Thanks :)
Duplicate entries are not allowed, but I'm still trying to figure out the best way to handle them. If you come across one, simply flag the last to be entered so I can manually delete them. SICP also got added a couple of times.
Very interesting concept. I signed up; we'll see where it goes.
One thing to note, though - on the left I'm seeing a number of duplicated entries, with the duplicates immediately after the original. I'm using Firefox, if that matters.
I notice a surprising number of duplicates. E.g. if I sort by aesthetic, there’s the same 500x500 Tuscan village painting multiple times on the first page of results.
Presumably it wouldn’t be so hard to hash the images and filter out repeats. Is the idea to keep the duplicates to preserve the description mappings?
Duplicate removal generally happens after the initial count is received. That's why you can get major variations in the count and the actual number of results.
Cool idea, but there seem to be a few duplicates for Java. When I search 'Java', I see 3-4 duplicate results, only the first one has any companies listed, and the rest seem to be empty?
reply