Hey HN!
I'm one of the two developers behind Hackerhunt. As much as I love Hacker News and it's ranking algorithm for the front page news, it has a downside for the Show HN submissions. A lot of cool and useful stuff people have actually made themselves gets lost in /shownew without a real chance to get to the right audience. That's where the idea of a curated and categorised, à-la-Product-hunt, list was born.
This is a very early proof of concept and any suggestions on how to make it better are welcome!
Yes, we're analyzing the titles. They're vectorized and fed into a LSTM net which was trained on a manually tagged training set.
The set is not very big yet but yields good enough results for the initial proof of concept.
The samples were selected randomly at the beginning. Tags were thought up on the go, trying to generalize into broader categories.
After the initial tagging, we counted the number of samples for each category and grouped similar underrepresented tags together + added additional samples for tagging that could go into the smaller categories just by filtering those who matched specific keywords and further hand-picking them.
We initially tried training the classifier only with GitHub based samples and using the user-given tags from there. Although we grouped the tag base into a reasonable number of distinct categories, the way how GitHub users tag their projects turned out to be just too inconsistent and often unrelated to the titles, so manual tagging was seen as a better option for getting decent results fast enough.
If you have any more specific questions feel free to drop me a mail to arturs@finch.io
For those of us in industries at the fringes (but still of interest) to HN, want to have an 'Other' category? :-) We make scientific/genetic software - and I'm not sure where we might have fit in your sidebar.
Getting to the front page via 'Show HN' was very helpful to us. It'd be nice (for others too) to be able to both replicate that success, and soften the blow when you get a grand total of 2 upvotes.
This is very useful, as a frequent HN user, I find myself strolling down the showHN tab quite often, and the current UI doesn't let you go more than 3 pages deep (~5-6 days old posts).
Actually, you can only be fined by the EU if you have a dominant position in the market you are using to promote your service. The goal is to avoid monopolies jumping from one market to the other (which is particularly easy to do in tech). "Good" monopolistic companies may benefit the customer in the short term, but proper competition benefits everyone in the long term.
I know it was just a joke, but it really doesn't apply to this case.
Hey, thanks! It's a bit of deep learning magic combined with a few days of manual labor tagging the training set :)
The classifier itself is an LSTM and runs on TensorFlow. As said, this is a proof of concept so we'll try to improve it over time.
I have a similar classification project and I used word2vec to build embeddings from 1GB+ of text, then I just do vector similarity between the article and the topics.
The vector of an article can be obtained by summing the vectors of its words (minus stop words). For a topic you just sum up 5-10 of the topic keywords. You don't need to exhaustively list all the topic keywords because word2vec automatically maps them in close vicinity.
This system has the advantage that you don't need a training dataset. It's unsupervised learning coupled with a small amount of supervised topic pointers.
Sorry but, did you really had to use HH? I'm German and that only rings my right extremists alarm bells, because they use that for "Heil Hitler" (fuck them and him) for generations.
That's a personal problem. You could try reprogramming your brain to associate it with Helly Hansen or Hansestadt Hamburg (where HH is the license plate).
If anything, you should be grateful to see other people use "HH" for things completely and utterly unrelated to extremists.
I'm German, too. Lived for 10 years in Hamburg. So my association with HH is manly with the car licence plate. If the site would be called Hackerhunt88, then that would be a different thing ;)
Well, it's not really practical to not ever reuse some initial lest some of the 180+ countries in the world (or worse, some particular sub-group within a country/culture) has some history with it in a totally unrelated context.
I find this kind of post-factum concerns funny as well (same with banning Mein Kampf etc today) -- it was 1933-1945 when the German people should have been really concerned about Hitler, not 2017. The new threats of today don't need Mein Kampf or the freedom to use "HH" initials -- they can make their own stuff even if they care for the Nazis, and usually they have their own, 2017 names and agendas to use anyway. So all it does it reduce some historical guilt.
How are the topics assigned? Two Show HN posts I have posted recently (that are Open-Source and Javascript) don't have the relevant tags attached and I can't see a way of attaching them.
The tags are added fully automatic and there is no option to add tags manually for now. That being said, we really want to make the tagging algorithm better, so hit me at human@finch.io with your case.
Initially we used all Show HN posts pointing to github repos as training data and the tags that are available at github.com, but that did return pretty noisy data.
Nice work! I just noticed a bug though, trying to go to the next page of system software that is sorted by votes doesn't work. Instead of going to the next page, the first page is reloaded with /NaN appended to the URL, as such:
Yes, Hacker Hunt indexes all Show HN submissions, including those who never make to the main page. Actually - that was the whole reason to make Hacker Hunt happen.
Really cool site. I recently asked HN[1] why did the free alternative to PH die and found this postmortem thread[2].
I think it really boils down to making sure that people coming to your site find something new and creative all the time - to help turn lurkers or one-time visitors into repeat visitors. I think PH does that quite well with their podcasts, daily digests, twitter updates (though they are forced but they do work), etc. Also you're building a community site so if the traffic dies in a month keep at it, it looks sites like these take many years to gain that traction. Basically I think you have something really great going here, just make sure to focus on bringing the visitors back and you will definitely have a winner!
If you can nail down the categorization, get some more historical stuff, and maximize that newsletter or just suggestions. This should be a great utility.
1. Do you have a way to receive bug reports other than HN? :)
2. After doing a search, the left category menu disappears, and stays disappeared even after clicking the HH "home" link at top left. This is true for FF and Chromium latest-ish on LinuxMint.
There are two possible bugs here:
a) do you actually want the left menu to disappear, and b) what your intent is for clicking the top left "HH".
- Go to HH.
- Search for something. Results appear as you type, nice. No indication from browser that a new page is loading; guessing no load by design. But left menu disappears.
- Manually erase search bar. Menu back.
- Type out a search again, menu disappears.
- Click "HH" at top left. Browser indicates a page is loading", but the search is not erased and (therefore?) the menu is still missing.
- Re-enter HH either by typing the URL into the browser location and clicking "make it so", or by clicking in from another site (like HN). Search field is empty, therefore the left menu is available.
EDIT: This was going to be a separate bug, but I think it's related to
above.
Scrollbar behavior is buggy.
- Clear site cookies. ("It's the only way to be sure.")
- Don't click anything, just move the mouse around and scroll, with
mousewheel or dragging scrollbar. Scrollbar intact, entire page
scrolls.
- Click in search field, don't type anything.
Scrollbar disappears, mousewheel scrolling has no effect,
regardless of where the mouse hovers.
Entire page jumps slightly to right, appearing to "chase" the
disappeared scrollbar.
- Type something in search field that gets results.
Scrollbar returns, top of scrollbar is even with bottom of search
field, page does not jump back;
I'm guessing this is "your" scrollbar
rather than the browser's scrollbar. Mousewheel only has effect if
mouse is hovered below the search field,
in the area region where the scrollbar exists.
- Click on any non-active area outside the search field. Search field
jumps left very slightly.
Scrollbar is back to full length (browser's scrollbar?),
but there are now two separate scrolling areas:
- Hover mouse at or above search field level. The entire original
front page, including the missing menu and the default "Today" list of
sites, scrolls up into the area from viewport top to bottom level of
search field (which also scrolls up and away with the rest of the
page). Search results do not scroll.
- Hover mouse below the search field, mousewheel scrolls the search
results, phantom page at top of viewport does not scroll.
- Drag the scrollbar, the "top" scroll area scrolls.
Are there/will there be categories for general products? I like seeing the new business ideas that come through every now and then that a developer worked on.
That's really cool! Thanks for making this website, it's absolutely useful, it might save a lot o projects. I can tell that because I have myself posted on Show HN and my submission never made it past the /shownew. Probably because it was not that interesting to HN's audience, but I can imagine how many really cool projects end up buried in there.
Those stories would get onto /show and also the front page if they got more upvotes, so it is indeed a curation problem. If you're willing to do the work of rescuing good submissions that the rest of us missed, that's great! I wonder if we could integrate that back into HN somehow.
This is a very early proof of concept and any suggestions on how to make it better are welcome!
reply