Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Show HN: Hackerhunt – categorised curation of Show HN submissions (hackerhunt.co) similar stories update story
335.0 points by degif | karma 318 | avg karma 4.24 2017-06-28 07:05:48+00:00 | hide | past | favorite | 69 comments



view as:

Hey HN! I'm one of the two developers behind Hackerhunt. As much as I love Hacker News and it's ranking algorithm for the front page news, it has a downside for the Show HN submissions. A lot of cool and useful stuff people have actually made themselves gets lost in /shownew without a real chance to get to the right audience. That's where the idea of a curated and categorised, à-la-Product-hunt, list was born.

This is a very early proof of concept and any suggestions on how to make it better are welcome!


This is great. Is search broken? or is it just me?.

Yep, it was a bit buggy. Fixed. Thanks!

If there is no content is available in a paginated page then it outputs some of your code. CHeck this link - https://hackerhunt.co/topic/blockchain/trending/200000

It's not a bug, it's a feature!

It's a pretty neat way to handle empty conversions IMO.

Just remove the unnecessary semi-colon at the end of the code ;-)


That had to be done asap! Semicolon removed!

I think that's a joke. Actual code output probably wouldn't feature syntax highlighting

Does it use NLP (NER, etc) to categorize the news into topics?

Yes, we're analyzing the titles. They're vectorized and fed into a LSTM net which was trained on a manually tagged training set. The set is not very big yet but yields good enough results for the initial proof of concept.

How did you go about selecting a sample for manual tagging, and how did you decide what tags to end up using?

The samples were selected randomly at the beginning. Tags were thought up on the go, trying to generalize into broader categories. After the initial tagging, we counted the number of samples for each category and grouped similar underrepresented tags together + added additional samples for tagging that could go into the smaller categories just by filtering those who matched specific keywords and further hand-picking them.

We initially tried training the classifier only with GitHub based samples and using the user-given tags from there. Although we grouped the tag base into a reasonable number of distinct categories, the way how GitHub users tag their projects turned out to be just too inconsistent and often unrelated to the titles, so manual tagging was seen as a better option for getting decent results fast enough.

If you have any more specific questions feel free to drop me a mail to arturs@finch.io


For those of us in industries at the fringes (but still of interest) to HN, want to have an 'Other' category? :-) We make scientific/genetic software - and I'm not sure where we might have fit in your sidebar.

Getting to the front page via 'Show HN' was very helpful to us. It'd be nice (for others too) to be able to both replicate that success, and soften the blow when you get a grand total of 2 upvotes.


for v2 can you make a "no-tech" category? -- i find that those can be the best threads...especially for the non-techy users on HN. nice work!

GJ :o

This is very useful, as a frequent HN user, I find myself strolling down the showHN tab quite often, and the current UI doesn't let you go more than 3 pages deep (~5-6 days old posts).

Yeah, it'd be nice to have the archive available.

I like how Hackerhunt is the most popular item on Hackerhunt:

http://i.imgur.com/68OeJ94.jpg


You might get fined by EU for this practice ;)

[1] http://europa.eu/rapid/press-release_IP-17-1784_en.htm


Actually, you can only be fined by the EU if you have a dominant position in the market you are using to promote your service. The goal is to avoid monopolies jumping from one market to the other (which is particularly easy to do in tech). "Good" monopolistic companies may benefit the customer in the short term, but proper competition benefits everyone in the long term.

I know it was just a joke, but it really doesn't apply to this case.


I know that and i'm happy that there are still entities in world that can still influence google and other mega corps.

I wouldn't reply seriously to a joke before, but I became somewhat aware of the danger of sarcasm after recent global events...

I'm now feeling the need to properly contextualize hyperbolic jokes, so that people reading don't take it literally.


Hi! Just thought I'd report a bug - when you search, and try to click the comments icon to go to the HN post, it goes to https://news.ycombinator.com/item?id=undefined

This is great. You have a small (albeit fun) bug when hitting next page though: http://i.imgur.com/eqTVBPy.png :)

On that fine line between a bug and an easter egg

Looks great. Can you tell us how you built it? I'm most interested in automatic categorization of submitted articles.

Hey, thanks! It's a bit of deep learning magic combined with a few days of manual labor tagging the training set :) The classifier itself is an LSTM and runs on TensorFlow. As said, this is a proof of concept so we'll try to improve it over time.

I have a similar classification project and I used word2vec to build embeddings from 1GB+ of text, then I just do vector similarity between the article and the topics.

The vector of an article can be obtained by summing the vectors of its words (minus stop words). For a topic you just sum up 5-10 of the topic keywords. You don't need to exhaustively list all the topic keywords because word2vec automatically maps them in close vicinity.

This system has the advantage that you don't need a training dataset. It's unsupervised learning coupled with a small amount of supervised topic pointers.


Really like the design and agree that the idea serves a purpose. Must be really discouraging to brave the Show HN and have it fall flat.

Wondered if maybe having the list for today, then perhaps some other recent options in a slimmer format either beside or below?

Nice work though!


Sorry but, did you really had to use HH? I'm German and that only rings my right extremists alarm bells, because they use that for "Heil Hitler" (fuck them and him) for generations.

That's a personal problem. You could try reprogramming your brain to associate it with Helly Hansen or Hansestadt Hamburg (where HH is the license plate).

If anything, you should be grateful to see other people use "HH" for things completely and utterly unrelated to extremists.


I'm German, too. Lived for 10 years in Hamburg. So my association with HH is manly with the car licence plate. If the site would be called Hackerhunt88, then that would be a different thing ;)

Well, it's not really practical to not ever reuse some initial lest some of the 180+ countries in the world (or worse, some particular sub-group within a country/culture) has some history with it in a totally unrelated context.

I find this kind of post-factum concerns funny as well (same with banning Mein Kampf etc today) -- it was 1933-1945 when the German people should have been really concerned about Hitler, not 2017. The new threats of today don't need Mein Kampf or the freedom to use "HH" initials -- they can make their own stuff even if they care for the Nazis, and usually they have their own, 2017 names and agendas to use anyway. So all it does it reduce some historical guilt.


We detached this subthread from https://news.ycombinator.com/item?id=14652276 and marked it off-topic.

How are the topics assigned? Two Show HN posts I have posted recently (that are Open-Source and Javascript) don't have the relevant tags attached and I can't see a way of attaching them.

The tags are added fully automatic and there is no option to add tags manually for now. That being said, we really want to make the tagging algorithm better, so hit me at human@finch.io with your case.

done

I created a small program to categorize titles by sub-reddit[1] to learn sci-kit, tensorflow, etc.

I basically used reddit's Bigquery data for the dataset (it's huge!). My algorithm and code is here[2].

[1] https://www.youtube.com/watch?v=gudnFNBXc58

[2] https://www.reddit.com/r/learnmachinelearning/comments/6hqd6...


Initially we used all Show HN posts pointing to github repos as training data and the tags that are available at github.com, but that did return pretty noisy data.

I'm confused. Is this list automated or curated? "Curated by machine" seems like an oxymoron.

You should rethink the logo "HH" though.. (it's a neonazi identification sign)

They should be more worried about other companies using this abbreviation (e.g. hellyhansen.com)...

HexHex...

Well done! I'd just add a way of suggesting tags for the submitted projects to make them more useful.

This is very nice! Keep it up!

Nice work! I just noticed a bug though, trying to go to the next page of system software that is sorted by votes doesn't work. Instead of going to the next page, the first page is reloaded with /NaN appended to the URL, as such:

https://hackerhunt.co/topic/system/votes/NaN


Thanks for the notice! Bug eliminated!

https://hackerhunt.co/topic/development/trending

if(!res.statusCode===500){ TODO

};

Hee-hee :D


The UI on mobile looks great, really clean job!

I have a question, does it also index submissions which never make it to the main show page?

Also, shameless plug : I am hosting an event inspired by ShowHn in Hyderabad, India ( showhyd.com )


Yes, Hacker Hunt indexes all Show HN submissions, including those who never make to the main page. Actually - that was the whole reason to make Hacker Hunt happen.

Are you aware of the thread of threads?

http://news.ycombinator.com/item?id=2158116


Really cool site. I recently asked HN[1] why did the free alternative to PH die and found this postmortem thread[2].

I think it really boils down to making sure that people coming to your site find something new and creative all the time - to help turn lurkers or one-time visitors into repeat visitors. I think PH does that quite well with their podcasts, daily digests, twitter updates (though they are forced but they do work), etc. Also you're building a community site so if the traffic dies in a month keep at it, it looks sites like these take many years to gain that traction. Basically I think you have something really great going here, just make sure to focus on bringing the visitors back and you will definitely have a winner!

[1] https://news.ycombinator.com/item?id=14584527

[2] https://news.ycombinator.com/item?id=11233967


I too liked the idea. Basically It all comes down to giving visitors an incentive to keep coming back.

If you can nail down the categorization, get some more historical stuff, and maximize that newsletter or just suggestions. This should be a great utility.

Can we get RSS feeds as well for categories :)

Nice!

A feature ive been missing on hackernews, that perhaps you'd be willing to add, is a community written tldr for each link.

I.e apart from title and link, a short (200chars or so) description anf tldr.


Very cool.

Two bug reports:

1. Do you have a way to receive bug reports other than HN? :)

2. After doing a search, the left category menu disappears, and stays disappeared even after clicking the HH "home" link at top left. This is true for FF and Chromium latest-ish on LinuxMint.

There are two possible bugs here: a) do you actually want the left menu to disappear, and b) what your intent is for clicking the top left "HH".

- Go to HH.

- Search for something. Results appear as you type, nice. No indication from browser that a new page is loading; guessing no load by design. But left menu disappears.

- Manually erase search bar. Menu back.

- Type out a search again, menu disappears.

- Click "HH" at top left. Browser indicates a page is loading", but the search is not erased and (therefore?) the menu is still missing.

- Re-enter HH either by typing the URL into the browser location and clicking "make it so", or by clicking in from another site (like HN). Search field is empty, therefore the left menu is available.

EDIT: This was going to be a separate bug, but I think it's related to above.

Scrollbar behavior is buggy.

- Clear site cookies. ("It's the only way to be sure.")

- Don't click anything, just move the mouse around and scroll, with mousewheel or dragging scrollbar. Scrollbar intact, entire page scrolls.

- Click in search field, don't type anything. Scrollbar disappears, mousewheel scrolling has no effect, regardless of where the mouse hovers. Entire page jumps slightly to right, appearing to "chase" the disappeared scrollbar.

- Type something in search field that gets results. Scrollbar returns, top of scrollbar is even with bottom of search field, page does not jump back; I'm guessing this is "your" scrollbar rather than the browser's scrollbar. Mousewheel only has effect if mouse is hovered below the search field, in the area region where the scrollbar exists.

- Click on any non-active area outside the search field. Search field jumps left very slightly. Scrollbar is back to full length (browser's scrollbar?), but there are now two separate scrolling areas:

- Hover mouse at or above search field level. The entire original front page, including the missing menu and the default "Today" list of sites, scrolls up into the area from viewport top to bottom level of search field (which also scrolls up and away with the rest of the page). Search results do not scroll.

- Hover mouse below the search field, mousewheel scrolls the search results, phantom page at top of viewport does not scroll.

- Drag the scrollbar, the "top" scroll area scrolls.


meta, cool

Are there/will there be categories for general products? I like seeing the new business ideas that come through every now and then that a developer worked on.

Gives me memories of Yahoo!, in the best way possible.

Minor issue and I understand why it happens, but the name of the submission here ("google.com") isn't terribly helpful :)

http://imgur.com/a/LLZhI


That's really cool! Thanks for making this website, it's absolutely useful, it might save a lot o projects. I can tell that because I have myself posted on Show HN and my submission never made it past the /shownew. Probably because it was not that interesting to HN's audience, but I can imagine how many really cool projects end up buried in there.

Very cool. Can I get RSS / Atom for specific topics?

Cool work, missing a "Why Cryptocurrency is Bad" section though

Show HN used to be one of the awesome things about HN - a real community showcase. It's not like that any more.

As HackerHunt says, /shownew is just a place that awesome new stuff is hidden.


Those stories would get onto /show and also the front page if they got more upvotes, so it is indeed a curation problem. If you're willing to do the work of rescuing good submissions that the rest of us missed, that's great! I wonder if we could integrate that back into HN somehow.

Legal | privacy