Hacker Read

amelius · 2017-08-03 20:52:15+00:00

What is the usefulness/learning_effort ratio of this tool?

spaghetti | karma 500 | avg karma 1.69 · | 2012-09-03 02:48:07+00:00

Interesting. +1 for sharing actual numbers as that helps the learning process along quite a bit imo.

artgon | karma 64 | avg karma 3.56 · | 2012-11-13 20:36:43+00:00

How would you rate Learndot based on that scale?

newobj | karma 2848 | avg karma 4.31 · | 2022-01-23 19:54:02

My manual average is 3.28. Smaller sample count than these solvers of course.

robertskmiles | karma 380 | avg karma 2.55 · | 2012-02-27 12:17:42

Yeah, probably. That's something that's fairly easy to measure for any given data set, so you can check to see if it's worthwhile before you do it.

Waterluvian | karma 45068 | avg karma 5.19 · | 2019-03-24 16:22:05

I'm curious if the numbers work out to yield you anything within the galaxy of useful.

deutronium | karma 1495 | avg karma 2.84 · | 2016-07-30 21:50:56

I'm curious how you can know the tool is 95% accurate, if it's being tested on real world data, such as from reddit etc.?

I can only assume it was tested on a synthetic dataset perhaps.

Also I'm wondering how many unique users are present in the dataset, along with the volume of content for each user.

reply

danuker | karma 6850 | avg karma 2.78 · | 2020-11-26 10:32:11

Thank you for this tool! I think it's a cool idea. Also, it is private, for not sending any requests.

What I would love is the actual predicted number of points (I don't mind that the error would be quite large given the title alone, given that it is not overfit).

reply

rocqua | karma 9129 | avg karma 2.16 · | 2021-07-07 03:36:07

I used the same dataset for a small project in my CS master. It was a really fun challenge, and it taught me a lot.

Most notably, it taught me that it was incredibly hard to make significant progress past the most simplest and naive approach. That approach was "Take average rating a user gives, take the average rating a movie gets, multiply". (Ratings normalized to be between 0 and 1).

Just using this method would give us 95% of the accuracy of our final method. I think I calculated, and compared to the prize winning result, our method got ~90% as accurate a result.

reply

fyokdrigd | karma 39 | avg karma 0.51 · | 2023-11-08 05:16:43

it's actually very easy to measure. the audience here is close to the 1%, so your usecase is already discarded in the grand scheme with a 2% error margin when measuring behavior for the large population.

tjpick | karma 637 | avg karma 1.63 · | 2010-05-02 16:43:07

I found that surprisingly useful. I'd be interested in developing that scale a little further since it's kind of weird that everything has different weights and, for example with point 7: The current code base can’t be used for anything. 1-20

what's the different between say a 15 and a 16? Seems very subjective, so maybe if everything was on a scale of 1 to 4 to make it easier to choose, then you used some multiplicative weighting for each issue...

This brings back memories, from university, of COCOMO (http://en.wikipedia.org/wiki/COCOMO)

reply

reivalc | karma 4 | avg karma 1.0 · | 2021-04-13 16:38:34

Relevancy - I'd guess it's difficult to measure, due to the large dataset, the large number of possible search terms, and large number of possible results.

recoiledsnake | karma 8785 | avg karma 5.07 · | 2011-08-29 17:46:04+00:00

It is closer to fair sampling than you think, since the average internet user is more likely to install toolbars unlike the technogeek niches that we hang out in. Also, it's more useful for relative changes like drops and increases rather than absolute percentages.

Also, it's one of the only few samplings we have, unfortunately.

reply

vlovich123 | karma 10600 | avg karma 2.26 · | 2022-02-26 01:58:29

Does anyone have any details of how they converted a multidimensional problem (size of file + user perceived quality) into a binary win/loss score? It felt like that part of the article jumped from draw an oval to draw the rest of the fucking owl real quick.

Also how they evaluated user perceived quality doesn’t seem to be elaborated. That itself is an area of active research last I looked.

reply

im3w1l | karma 8737 | avg karma 1.51 · | 2016-03-21 01:17:25+00:00

The latter ratio is called precision. There is a handy table at https://en.wikipedia.org/wiki/Precision_and_recall#Probabili...

duaneb | karma 6551 | avg karma 1.79 · | 2016-02-18 22:08:54

What's the hit rate for extracting meaningful information? Any false positives?

Twixes | karma 1176 | avg karma 4.72 · | 2023-06-16 05:36:25

This is interesting, but in practice the quality of results is paramount, and how does something like searx compare against DDG… or Google? I wonder what kind of metric could be defined to even plot that.

reivalc | karma 4 | avg karma 1.0 · | 2021-04-13 14:25:26+00:00

Neat! I wonder if you quantified the performance of your search somehow? I guess labelling data for that would be laborious.

typon | karma 3796 | avg karma 3.2 · | 2023-08-15 01:19:17

52% correct seems awfully high - I am guessing it's inflated because of the easiness of the questions on StackOverflow. In my experience, at least in my domain, I will be lucky it it can generate an answer that is more than 5 lines of code without a single error. Still incredibly useful - just have to handhold it and watch it very carefully when going through a solution.

DenODonnell | karma 23 | avg karma 3.83 · | 2013-10-22 21:07:28

I found that to be one of the more interesting metrics shown - but have to agree with everyone else as to how that is calculated. That said, anecdotally, it seems to jive up with experience, but who knows.