Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

		Who is horse_js? (whoishorsejs.com) similar stories update story
		206.0 points by juniusfree \| karma 234 \| avg karma 6.88 2019-01-19 07:01:23+00:00 \| hide \| past \| favorite \| 87 comments

view as:

craftinator | karma 662 | avg karma 0.48 2019-01-19 10:03:52+00:00 | [–] similar comments

bump

Epskampie | karma 738 | avg karma 3.45 2019-01-19 10:11:58+00:00 | [–] similar comments

Very fun read, it contains just the right amount of detail to stay entertaining without getting bogged down in minutiae.

skilled | karma 18165 | avg karma 5.67 2019-01-19 10:12:03+00:00 | [–] similar comments

Neat concept, wonderful execution, and beautiful presentation. Now if only the entire web could follow suit.

ivanche | karma 2432 | avg karma 3.48 2019-01-19 11:03:59+00:00 | [–] similar comments

I have to politely disagree. This could've been made as a static page. Instead, with Javascript off you see just a totally blank page. Thank God that most of the web doesn't follow the suit.

tastroder | karma 1361 | avg karma 2.01 2019-01-19 11:26:10+00:00 | [–] similar comments

I would add to that the unnecessary scrolling of this presentation style. With classical layouting you can get far more information over the fold and allow your reader to skip stuff "ah, yeah, I see what they did there" - without having to constantly interact with the keyboard / mouse.

michaelmcmillan | karma 1005 | avg karma 2.75 2019-01-19 10:42:26+00:00 | [–] similar comments

This is journalism!

pickpuck | karma 139 | avg karma 1.9 2019-01-19 13:10:28+00:00 | [–] similar comments

This is native advertising.

But it does have the tone of a type of “data journalism” that we should see more often.

I would appreciate a site that treats all news the way 538 treats politics.

Confiks | karma 2215 | avg karma 6.09 2019-01-19 10:55:18 | [–] similar comments

I was scrolling through the whole article raging for an ethics paragraph, but I guess they handled that pretty well.

With a nifty and I think necessary touch of themselves still being in the dark; I very much doubt that the data they gathered can really reveal the author's identity, and the result they arrived on (Tom Dale) seems to largely originate from the "quotes one person far more than others" metric.

You could almost consider it an anti-metric: which intently pseudonymous author would dare to retweet their own nym? However, to counter this analysis you'd have to blend in with the average Twitter user in your niche, so it then comes down to a psychological game of "what would horse_js do?".

cwmma | karma 1588 | avg karma 2.9 2019-01-19 13:13:58+00:00 | [–] similar comments

The ember part is a giveaway too since it's a language that has a small but passionate base you'd expect the person behind it to care about ember. That being said the reasoning doesn't rule out wycats.

defertoreptar | karma 1294 | avg karma 3.34 2019-01-19 13:58:45+00:00 | [–] similar comments

> I was scrolling through the whole article raging for an ethics paragraph, but I guess they handled that pretty well.

What good is an "ethics paragraph" anyway? Isn't it like prefacing an offensive statement with "no offense, but"? It's one thing if you have a disclaimer to protect the user, since the user is making the decision. It's another thing to make a disclaimer that just lectures the user about privacy, but does nothing to protect the doxxed. That just seems like a lame attempt to make the site less liable, like when people post copyright content with the disclaimer "I do not own this."

lclarkmichalek | karma 2692 | avg karma 4.01 2019-01-19 10:56:29 | [–] similar comments

I'm clearly a terrible person, but I read the reveal and immediately thought "But X isn't funny enough to be horse_js!"

aboutruby | karma 1129 | avg karma 2.47 2019-01-19 12:00:02+00:00 | [–] similar comments

I thought "Oh! Exactly! That is definitely his type of humor, makes so much sense". Especially the actual accounts he quotes are often very small and quite specific.

I'm very surprised they didn't find him based on the least quoted people's followers actually.

tomdale | karma 3527 | avg karma 17.64 2019-01-19 13:08:15+00:00 | [–] similar comments

Wowwwwww

muglug | karma 5273 | avg karma 5.52 2019-01-19 13:14:46+00:00 | [–] similar comments

haha, ur funny

pickpuck | karma 139 | avg karma 1.9 2019-01-19 07:21:39 | [–] similar comments

> We got permission from our suspect before we released this site and they have allowed to use their name and release the data that we had about them.

Looking forward to your response!

mothsonasloth | karma 2187 | avg karma 3.57 2019-01-19 11:13:38 | [–] similar comments

Its interesting but also worrying. This is pretty much doxxing under the guise of intellectual curiosity.

If the article explained why they wanted to identify this account then fair enough. However you are going to end up in an ethical slippery slope were it will be used to doxx people who are controversial, troll, political dissidents and whistle blowers.

nightfly | karma 1837 | avg karma 3.16 2019-01-19 11:24:53+00:00 | [–] similar comments

None of the techniques they used to come to their conclusion were groundbreaking or even surprising.

mothsonasloth | karma 2187 | avg karma 3.57 2019-01-19 16:41:43 | [–] similar comments

To you or me but for others its an education on how they can be compromised or how they can compromise others.

lolc | karma 4481 | avg karma 2.32 2019-01-19 11:52:02+00:00 | [–] similar comments

I don't see the problem. We didn't need this demonstration to know that people can be unmasked with these methods. They had consent from the "doxxed" person and explicitly addressed the ethical issues.

What more do you want? It's not like we can take away these dangers by not talking about them openly.

gumoro | karma 163 | avg karma 4.08 2019-01-19 11:25:46+00:00 | [–] similar comments

Bug report: time series analysis chart fails to display properly on Firefox, all points stay on x=0, I see "Unexpected value NaN parsing cx attribute" x250 in the console. Works fine on Chrome.

ricardobeat | karma 18682 | avg karma 2.77 2019-01-19 11:38:40+00:00 | [–] similar comments

Breaks in Safari too.

thibautg | karma 1486 | avg karma 8.54 2019-01-19 11:50:53+00:00 | [–] similar comments

The layout is completely broken in Edge too. They must have forgotten that it does not use the Chromium engine yet.

Too bad for a (fun and clever) Microsoft advertising.

thibautg | karma 1486 | avg karma 8.54 2019-01-19 11:56:41+00:00 | [–] similar comments

It seems that the date format is not correctly recognized by moment.js. There is a warning in the console.

_underfl0w_ | karma 706 | avg karma 2.03 2019-01-19 15:32:49+00:00 | [–] similar comments

This must be them prepping for that new Internet Explorer-flavored Chrome or whatever.

aembleton | karma 3344 | avg karma 1.52 2019-01-20 10:25:45+00:00 | [–] similar comments

I actually found it easier to read in Firefox, as there was only one axis to read from. The correlation was clearer.

patrickaljord | karma 11514 | avg karma 5.47 2019-01-19 11:36:31+00:00 | [–] similar comments

In case you didn't notice, this is clever advertising for Microsoft Azure. Both authors work for Microsoft. I could count 3 mentions of Azure, two direct links to Azure products, 1 quote from a Microsoft researcher, 1 quote from a Microsoft dev advocate and 1 embedded Bing maps. You've just been played by Microsoft marketing. Also, Tom Dale works for Microsoft himself so it's just one big family story.

gandutraveler | karma 396 | avg karma 1.74 2019-01-19 11:53:40+00:00 | [–] similar comments

Marketing for EmberJS perhaps ? :P

FlorianRappl | karma 443 | avg karma 2.23 2019-01-19 12:20:16+00:00 | [–] similar comments

Noticed it, but I like(d) the ad / package / presentation.

Personally, I think this is "good" marketing.

pickpuck | karma 139 | avg karma 1.9 2019-01-19 13:16:07+00:00 | [–] similar comments

Kudos to you for noticing but! I think they should add some kind of disclosure.

the_duke | karma 16666 | avg karma 6.89 2019-01-19 14:50:02+00:00 | [–] similar comments

This is the kind of PR (rather than marketing) that I despise, because it is deceiving.

madeofpalk | karma 21444 | avg karma 3.75 2019-01-19 14:57:14+00:00 | [–] similar comments

I mean, its annoying that it wasn't clearly stated that it was a MS article, but from the beginning I thought it was fairly clear that it was an advertisement for something, and all the Microsoft references was a pretty strong hint it was for them.

maroonblazer | karma 2670 | avg karma 2.98 2019-01-19 15:09:10 | [–] similar comments

Do we know for a fact that this was a marketing tactic conceived and developed by the Marketing group at Microsoft? Versus simply these three individuals, of their own accord, wanting to show off the tools in a fun(?) way?

Nullabillity | karma 4768 | avg karma 1.8 2019-01-19 15:56:44 | [–] similar comments

Does it matter? If you're showing off something that depends on your company's product then you're advertising for them, regardless of whether you're paid for it directly.

maroonblazer | karma 2670 | avg karma 2.98 2019-01-20 17:19:46 | [–] similar comments

Yes, it does matter. I'd question the intentions of a marketing department very differently than I would 3 developers who appreciate a given set of tools and want to share them with other developers.

diegoperini | karma 1229 | avg karma 2.07 2019-01-19 12:30:22+00:00 | [–] similar comments

Marketing for Twitter? (Bad sarcasm)

forgotmyacc | karma 93 | avg karma 4.65 2019-01-19 12:31:39+00:00 | [–] similar comments

Fuckkkk, the ads are evolving. South Park is right!! I didn't realize it.

widforss | karma 1005 | avg karma 3.67 2019-01-19 12:46:40+00:00 | [–] similar comments

Compulsory XKCD link: https://xkcd.com/810/

the_duke | karma 16666 | avg karma 6.89 2019-01-19 12:52:59+00:00 | [–] similar comments

Tom Dale working for MS makes it likely that this was a reverse-engineering effort.

As in, they knew who they were looking for from the start, and just worked with the data to find the known conclusion.

Also, there's no actual machine learning in this really, except calling out to a hosted language processing service...

jlborxes | karma 203 | avg karma 12.69 2019-01-19 13:07:00+00:00 | [–] similar comments

A nice case of parallel construction.

derangedHorse | karma 892 | avg karma 1.82 2019-01-19 14:11:27+00:00 | [–] similar comments

Technically any project showcasing a specific technology could be considered advertising though.

ape4 | karma 4305 | avg karma 1.73 2019-01-19 14:28:32+00:00 | [–] similar comments

Ah, who is whoishorsejs.com

fibers | karma 88 | avg karma 0.77 2019-01-19 14:44:11+00:00 | [–] similar comments

This is really telling because the twitter api already tells you the source of a tweet anyway (android/tweetdeck/hootsuite). It's like they didn't even try.

madeofpalk | karma 21444 | avg karma 3.75 2019-01-19 14:45:57 | [–] similar comments

What's the relevance of this? The article points this out, no?

chrismeller | karma 1967 | avg karma 2.75 2019-01-19 15:55:45 | [–] similar comments

That’s awfully cynical. Other than wondering exactly why they used those Azure services (which makes sense now that I know they work for Microsoft), nothing else about Microsoft stood out for me when I read through.

Edit: I’m not sure how that was worthy of a downvote. Not everything is a conspiracy, but thanks for valuing contradictory options.

enraged_camel | karma 16714 | avg karma 2.78 2019-01-19 17:40:46+00:00 | [–] similar comments

>>That’s awfully cynical.

Perhaps so, but that doesn't make it not true.

http://www.paulgraham.com/submarine.html

coldtea | karma 86593 | avg karma 2.38 2019-01-19 18:51:22 | [–] similar comments

>That’s awfully cynical.

The world is awfully exploitative and cunning. The world of business doubly so.

hardwaresofton | karma 11918 | avg karma 2.9 2019-01-19 16:32:17 | [–] similar comments

Am I the only one that skimmed the story so fast that I didn't notice the MS associations?

I could tell it wasn't going to be a super deep dive or deeply technical just from a look at layout/style of the post -- but it wasn't supposed to be super deeply technical right? Kind of just like a fun post.

They were extremely helpful in making the "conclusion" bits on the end of every section very obvious though (I aim to write like that as well, to save readers time), so I basically only read those, and skipped to the bottom where they unveiled and clicked the button...

Lobosque | karma 14 | avg karma 1.27 2019-01-19 17:15:32+00:00 | [–] similar comments

The article is fun and well written, and it showcases how the marketed tools/services can actually be used to accomplish something interesting. As long as the content is useful, should I care if it was written as a way to advertise a product or not? This is so much better than doorslaming ads in one's face.

failrate | karma 1871 | avg karma 2.78 2019-01-19 18:06:30 | [–] similar comments

Thank goodness. I was worried that we were going to see another doxing like the one that drove whythe luckystiff away.

aw3c2 | karma 11791 | avg karma 3.66 2019-01-19 11:37:37+00:00 | [–] similar comments

> Finally. Some Machine Learning.

> We ran all of @horse_js tweets from the last 2 years through Azure Cognitive Services Text Analytics service. This service identifies keywords in phrases.

How was that necessary in comparison to a simple "split by whitespace, count occurences"? :P

rootlocus | karma 2736 | avg karma 2.61 2019-01-19 12:01:36 | [–] similar comments

Sounds better in a job interview.

a_bonobo | karma 6053 | avg karma 4.44 2019-01-19 12:39:32 | [–] similar comments

That service does more NLP level stuff - remove stop-words (a, the, an, etc.), tokenize text (eating becomes eat), keep only words that represent the core message of the text, I think that's about it

hhjinks | karma 752 | avg karma 2.51 2019-01-19 13:06:50+00:00 | [–] similar comments

Neat, that's actually something I'll find useful in one of my personal projects. Guess I'll have to check it out.

kamaln7 | karma 135 | avg karma 2.76 2019-01-19 13:29:37+00:00 | [–] similar comments

We used Lucene (open source) in our information retrieval course and tokenizing (w/ removing stop words etc.) is one of the things it does. If you just want to experiment, that's also another option to look at if you like!

9dev | karma 4458 | avg karma 3.43 2019-01-19 16:41:03+00:00 | [–] similar comments

That's stuff I can hack together in 20 minutes using JS or PHP... Why on earth would you need a remote API for that?

sdrothrock | karma 3403 | avg karma 3.56 2019-01-19 11:51:27 | [–] similar comments

This wasn't very rigorous at all, but it was a moderately fun read because it really made it clear how "large amounts of data" with some simple visualization can help people make some mostly-educated guesses.

I was most surprised at them glossing over the activity patterns with guesses based on their assumption of the target's sleeping patterns -- their guess of a time zone would have been stymied by someone who liked to sleep early/late or had an unusual work schedule, but there was no mention of that or their reasoning.

That all made sense given patrickaljord's comment about it all being one big Microsoft ad, though.

tetha | karma 3815 | avg karma 3.07 2019-01-19 12:53:39+00:00 | [–] similar comments

This also can be a pretty powerful troubleshooting way, which is why I consider having something like grafana or prometheus around extremely valuable.

Looking for anomalies at the same time, or in sequence easily turns "What is going on?" into "Alright, why is there so much more stuff coming into the system, and why is that increased ingress causing increased memory usage per event?"

derangedHorse | karma 892 | avg karma 1.82 2019-01-19 14:09:50 | [–] similar comments

I think it's more unlikely that it would be irregular sleep patterns but you're right, it could've been mentioned.

stephenr | karma 7998 | avg karma 1.42 2019-01-19 14:15:04+00:00 | [–] similar comments

> their guess of a time zone would have been stymied by someone who liked to sleep early/late or had an unusual work schedule, but there was no mention of that or their reasoning.

As you said - this only makes sense because it's a Microsoft ad and they could have arguably known the answer to start with anyway.

Given that this is focused on someone who's at least interested in software development, they made some pretty specific assumptions.

There's no way anyone (unless they somehow literally think there is nothing outside America) would just discount the concept of it being someone in another country, and/or with non-'regular' work hours.

But this is Microsoft afterall. The imagination of a brick. I suspect if you asked about remote work, they'd think you want to work on remote desktop.

urvader | karma 568 | avg karma 9.47 2019-01-19 11:58:31+00:00 | [–] similar comments

Just use elasticsearch/kibana and you will save yourselves from lots of Azure costs. Find keywords and group by device and location. Simple as that.

sbarre | karma 10170 | avg karma 4.57 2019-01-19 12:09:20 | [–] similar comments

> Just use elasticsearch/kibana

Are you volunteering all the time to set that up for me? ;-)

saagarjha | karma 56017 | avg karma 2.29 2019-01-19 12:01:33 | [–] similar comments

> @horse_js lives in either the Central or Eastern time zone. Their activity dwindles sharply in the evening and disapears between ~11 PM - 12 AM CST and reappears at ~8 AM - 9 AM CST because they are likely asleep.

As someone commenting at 4 AM, this might not be a great assumption to make ;)

audiolion | karma 55 | avg karma 1.02 2019-01-19 13:26:42 | [–] similar comments

do you comment every day with consistency around 4am? the point is with enough data you establish a pattern and ignore outliers. the time series graph is indicative of an EST/CST sleep schedule.

saagarjha | karma 56017 | avg karma 2.29 2019-01-19 13:32:06+00:00 | [–] similar comments

More consistently than I'd certainly like. I've been told by concerned friends that I have issues with my sleep schedule; the most recent was something along the lines of "why is that when I check Hacker News in the morning I keep finding your comments made three hours ago".

unao | karma 40 | avg karma 2.67 2019-01-19 12:02:44+00:00 | [–] similar comments

The article was quite enjoyable to read though full of MS marketing.

When saw the part with Azure Cognitive Services Text Analytics - I burst out laughing. Earlier, they quoted: Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY.

Their motivation of using AI is even below that threshold.

Now awaiting some horse_js comment about this absurdity.

vijaybritto | karma 844 | avg karma 1.38 2019-01-19 12:03:16+00:00 | [–] similar comments

This is very bad. They spoiled it for everyone. That privacy disclaimer at the end is of no use really. Also Bing maps? Thats the first time I saw that.

XCSme | karma 2007 | avg karma 0.9 2019-01-19 13:46:17+00:00 | [–] similar comments

It's made by Microsoft employees, that's why all the advertisements for Azure and other Microsoft products.

jypepin | karma 2308 | avg karma 3.86 2019-01-19 12:03:45 | [–] similar comments

That was a nice, fun read, and a simple way to show how sometimes, simple data analysis and common sense trumps everything else :)

Also, great website design. Simple and clean!

LeanderK | karma 2281 | avg karma 2.49 2019-01-19 12:45:12 | [–] similar comments

So why was this a statistics problem?

eggie5 | karma 402 | avg karma 1.23 2019-01-19 13:57:43+00:00 | [–] similar comments

cluster horse_js and Tom Dale's tweets in an embedding space and you can confirm your hypothesis.

mcintyre1994 | karma 5178 | avg karma 1.92 2019-01-19 14:18:34+00:00 | [–] similar comments

> The API is rate limited, so we created a set of Node.js Azure Functions that ran on a timers. These functions would request as many tweets as they could before they were rate limited, wait for the timeout interval specified by the API docs, then resume processing where they left off.

How does this work? You pay for your function's run time in serverless so you wouldn't want to just have the function sleep for x minutes or however long it gets rate limited surely. I can see a way to do it using a service bus queue (push the message with a delay of x minutes, have the function set up to run on messages on that queue) but they specifically said timers. Does Azure let you programatically set the timer for a function from inside that function (eg. "Run me again in 3 minutes")?

raudabaugh | karma 3 | avg karma 0.6 2019-01-19 14:58:26+00:00 | [–] similar comments

Azure Functions can be configured to run on fixed schedules via timer triggers (https://docs.microsoft.com/en-us/azure/azure-functions/funct...), so I’m guessing they set theirs to run every API timeout interval + max amount of time they could request tweets before getting rate limited. Their Cosmos DB instance could then be set up to track how many tweets they had gotten through on each function run.

xg15 | karma 16670 | avg karma 3.55 2019-01-19 14:23:55+00:00 | [–] similar comments

> "But I already know who @horse_js is, and it's not [...]!"
Perhaps. The data here is not 100% conclusive. There are some critical assumptions holding up our conclusion and [...] has never confirmed (or denied) our findings.

Perhaps the horse lives to tweet another day...

Ironically this highlights one of the main problems with how machine learning is used.

On a very high level, I think you can sum up machine learning algorithms as finding pattern in enormous heaps of noisy data ("training") then trying to apply the discovered pattern to novel data and using the result to guess the answer to a question you posed ("predicting").

The keyword being guess here. Unlike algorithms not based on learning, there is no guarantee that the answer is correct, because you usually don't know if the training data you supplied was sufficient or if the learned patterns were the ones you need. If you knew, you could just hard code the patterns directly and get rid of the whole learning overhead altogether.

Researchers know and communicate this. However, in the press, "AI" seems to be seen as almost the exact opposite: Not only can those fantasy AI systems answer questions about fuzzy human concepts with the precision of a computer, their answers are even better than the human ones - which is why the things we need to worry about are ethics discussions and humanity becoming obsolete...

This could be funny if it were just restricted to science fiction and public discussion, but it becomes problematic when "AI" systems are used to make life-changing descisions like setting insurance premiums or declaring persons suspicious to law enforcement.

_underfl0w_ | karma 706 | avg karma 2.03 2019-01-19 15:30:38+00:00 | [–] similar comments

> it becomes problematic when "AI" systems are used to make life-changing descisions like setting insurance premiums or declaring persons suspicious to law enforcement.

Hasn't the latter already happened? I'm without link/source, but I seem to recall reading about there being tests of using a homanoid-looking AI-driven "attendant" at a border somewhere that would judge people based on looks/temperament and try to guess if they were lying about what's in their luggage.

hobofan | karma 6345 | avg karma 2.86 2019-01-19 15:58:10+00:00 | [–] similar comments

Yes, discriminating machine learning is commonplace, but often hard to uncover, as it's not often obvious to the users of automated systems how those values are constructed.

Luckily some critical parts like issurance calculation is regulated (in some parts of the world) to have the requirement of explainable algorithms to prevent this kind of discrimination, so it's not as bleak as it's often made out to be. Of course it's also important that it stays that way.

CorvusCrypto | karma 369 | avg karma 2.18 2019-01-19 15:46:16+00:00 | [–] similar comments

I don't see that as a major problem. In fact most life-changing decisions are already based on probability. Insurance companies already do risk analysis and whether the algorithm uses ML or basic statistics, there is a threshold level of confidence used.

I'd also argue it's how our brains work. Many times as we come to a decision we are going off of confidence, not true correctness. I'm the case of declaring suspicious person's, well by definition they are suspects based on confidence, not by truth. Even in court we determine verdicts based on human probabilistic confidence that comes from the evidence.

manaatemandate | karma 1 | avg karma 1.0 2019-01-19 15:53:18+00:00 | [–] similar comments

I am so tired of Microsoft meddling in the developer space. I wish they just crawled away and made themselves irrelevant. If not for the money and how they shove they products in people's throats nobody would consider even using them. Internet? They had to taint it with IE6. Operating systems? The dreadful Windows 10 malware OS. Then lure developers with their software, which if you have money then you can produce a ton of and just crush everything that is good in the IT.

nailer | karma 21705 | avg karma 2.02 2019-01-19 16:09:19+00:00 | [–] similar comments

Nice try, Angelina Fabbro

fartcannon | karma 4153 | avg karma 2.62 2019-01-19 16:50:24+00:00 | [–] similar comments

This is marketing. Someone should flag it.

zemo | karma 2944 | avg karma 4.18 2019-01-19 18:11:55 | [–] similar comments

not knowing is half the fun.

pepijndevos | karma 1510 | avg karma 3.59 2019-01-19 19:00:01+00:00 | [–] similar comments

Tweet from Tom Dale himself https://twitter.com/tomdale/status/1086675110801625089

ireallyknow | karma 1 | avg karma 1.0 2019-01-19 23:20:50+00:00 | [–] similar comments

lon ingram

Legal | privacy