Tracking the Fake GitHub Star Black Market

siva7 | karma 3498 | avg karma 3.08 · 2023-03-18 03:58:22

Is there even such a thing as a github influencer (people living just from github)?

supriyo-biswas | karma 3683 | avg karma 3.78 · 2023-03-18 04:02:23

People working in DevRel often aggregate developer oriented content and gain popularity that way, an example would be "swyx" for example. I'm not taking a dump on his work, but you can see the Github influencer effect over there.

wodenokoto | karma 15554 | avg karma 2.59 · 2023-03-18 04:26:17

Never heard of swyx.

Self proclaimed GitHub star. But still only 5000 followers and projects max out at 8000 stars.

I don’t know what I had expected but I think it was bigger numbers than that.

https://github.com/sw-yx

reply

delusional | karma 3256 | avg karma 3.8 · 2023-03-18 04:34:51

The "github star" claim links to the source (it's some github program where you can nominate people to be accepted into some promotion campaign). Saying self proclaimed makes him sound pretentious, it's actually awarded by github.

latexr | karma 12136 | avg karma 4.03 · 2023-03-18 05:25:12

You can be factual and still sound pretentious and cringey. Like the medical doctors who insist on being called “doctor”, to the point of smuggly “correcting” strangers in a social setting.

I don’t know this user and won’t assume his intentions, but I can see how having “I’m a GitHub star [star emoji]” as the first sentence on the profile is doing him a disservice: it makes it seem like it’s the most impressive thing he’s achieved and diminishes everything else.

reply

newmac | karma 385 | avg karma 8.75 · 2023-03-18 05:35:26

FWIW a smug doctor usually corrects people that they are a Physician.

latexr | karma 12136 | avg karma 4.03 · 2023-03-18 05:47:40

Maybe in English, but in my native tongue there’s no word for physician.

Also, I meant in the sense that you call someone “mister McSmug” and they reply almost angrily with “doctor McSmug”.

reply

swyx | karma 19447 | avg karma 4.18 · 2023-03-18 11:47:24

fixed. i wrote that when i was still trying to be approachable and cutesy. now i dont need it lol.

tylerhannan | karma 116 | avg karma 1.49 · 2023-03-18 12:02:41

I love the edit in GH. So much.

Thank you for the work you do and for how much you have contributed to people learning over the years. <3

reply

latexr | karma 12136 | avg karma 4.03 · 2023-03-18 21:24:10

To reiterate, I don’t know you and don’t assume your intentions—and thus do not judge them. I’m also not familiar with your work but I have no doubt it’s more relevant than whatever “star award”.

In other words: it makes zero difference to me what you write in your bio though I can see how its previous wording took away from what’s important. I was conveying to the parent comment my understanding of the comment they were replying to.

Apologies for making you feel judged, that was not the point. Quite the contrary: I wanted to underline that by not knowing your intentions it does not make sense to criticise how you choose to present yourself.

reply

swyx | karma 19447 | avg karma 4.18 · 2023-03-19 14:49:21

yup no hard feelings. felt defensive heheh. i guess as my career has gone on i've accumulated other stuff but early on the github star thing really did feel like a big deal + if i wasnt gonna plug it on my github readme where else

adamgordonbell | karma 4986 | avg karma 7.19 · 2023-03-18 08:20:42

swyx is on hn and legit great writer. He's influenced my thinking in many areas.

I've never seen his github account before but I expect that people following him there are doing so because of the content he's putting out. His blog has been on the HN Frontpage many times and has a book about developer career building.

My github account isn't as pimped out as his, but marketing yourself isn't toxic, it's smart.

reply

tbragin | karma 66 | avg karma 1.43 · 2023-03-18 11:32:00

Agreed that marketing yourself is not toxic. I follow "swyx" on Twitter and find his insight valuable, and so do a lot of my peers. Btw, looks like his Github profile has not been updated for some time - he's no longer Head of DX at Airbyte and is now an independent consultant. https://www.swyx.io/about

swyx | karma 19447 | avg karma 4.18 · 2023-03-18 11:58:28

appreciate it but also whoa this literally just happened and its freaky how up to date you are. consulting is temporary (check out https://www.trychroma.com/ if you are exploring LangChain/OpenAI apps and need an embeddings database) and i'm working on an ai infra startup idea on the side with a couple cofoudners.

tbragin | karma 66 | avg karma 1.43 · 2023-03-18 12:04:49

Congrats! I'll be watching :)

swyx | karma 19447 | avg karma 4.18 · 2023-03-18 11:50:22

love and appreciate your work as well adam (everyone check out Corecursive https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... )

i honestly dont even view my github readme as "marketing yourself". most pple dont even go to an individual's profile in the first place, but if you do its kinda like a cute little myspace thing where you can let people know you as a human being and be a little quirky. i certainly dont hold myself out as an authority on writing the best software in the world and hey if 40k stars on the react-typescript stuff doesnt count i'm alright with that

reply

rozenmd | karma 3549 | avg karma 3.15 · 2023-03-18 04:43:30

I didn't even know Shawn had a popular GitHub, though he has written about the meta-creator ceiling before: https://www.swyx.io/meta-creator-ceiling

swyx | karma 19447 | avg karma 4.18 · 2023-03-18 11:53:18

yeah i also am surprised that people use the follow feature for my work even tho i dont run a popular oss project.

well idk what "github influencer" even means but fwiw i am not "people living just from github". ive never taken a dime of github sponsor money. as far as github is concerned i just put my stuff up for free and the github stars program gets me an early look into new features so i can give them feedback. (eg i helped with Hey GitHub before the big launch at GH Universe).

obviously i'll happily ambassador github to anyone who will listen but who isnt already on github here

reply

pictur | karma 103 | avg karma 0.18 · 2023-03-18 04:45:16

There are very few people who work like this and are non-toxic.

bombolo | karma 934 | avg karma 0.72 · 2023-03-18 04:17:00

I guess the purpose is to find a job as evangelist and similar.

reidjs | karma 2452 | avg karma 2.47 · 2023-03-18 04:19:58

I have heard of people getting interviews from their GitHub profile.

ccouzens | karma 380 | avg karma 2.18 · 2023-03-18 06:31:06

I got my current job through GitHub.

At least that's how the 3rd party recruiter told me he found me. It's possible he was lying and thought it would impress me (it did).

My profile is more active than most, but very far from rockstar.

reply

justinclift | karma 11995 | avg karma 1.81 · 2023-03-18 06:55:24

Yeah. Several years ago extremely clueless recruiters used to email people heaps. Lots of people were complaining about getting tonnes of spam from them. :(

Had to change my Location (or some similar obvious field) in my GitHub profile to "Recruiters FUCK OFF" before they took the hint. ;)

Thankfully, GitHub introduced some other way to signal if you are/aren't interested in getting a job (toggle switch?) not long after, which seemed to work.

reply

version_five | karma 21081 | avg karma 3.82 · 2023-03-18 04:40:49

I think it would be tough (a good thing) because how often do people go to someone's root github page, even if they have a good repo? Not to say it never happens, but github is really about the repo, not the person (again a good thing) so it would be harder for an individual to become "influential". Hopefully nobody gets any ideas.

deefour | karma 85 | avg karma 2.3 · 2023-03-18 04:43:47

There are plenty of people making a living from donations to their open source contributions.

It seems odd to title them influencers based on that.

reply

blitzar | karma 5877 | avg karma 2.08 · 2023-03-18 04:53:16

I am going start posting linkedin influencer style "content" on my github for clout.

Hackbraten | karma 2538 | avg karma 2.12 · 2023-03-18 05:12:14

Twenty pull requests every morning. That’s my plan for 2024.

azu | karma 6 | avg karma 1.0 · 2023-03-18 05:07:20

https://press.stripe.com/working-in-public

The book presents similar stories.

reply

PragmaticPulp | karma 66674 | avg karma 12.83 · 2023-03-18 06:58:35

I’ve seen a number of resumes where people convey the popularity of their personal projects by number of stars or number of downloads.

hoofhearted | karma 671 | avg karma 1.8 · 2023-03-18 09:07:11

Taylor Otwell lol.. He has some pretty dope cars in his garage and is doing well.

I follow him on GitHub, and pay for some of his products. I have been heavily influenced by his coding styles, and the tools he uses. His code just looks so tight and perfect. He writes his stuff so open ended and reusable that he basically writes a method once, and then reuses it across numerous projects.

Look at this tight code: https://github.com/laravel/framework/blob/10.x/src/Illuminat...

I’d say that Adam Wathan is rapidly growing his influence as well, and is probably doing alright too.

reply

jorgesborges | karma 1464 | avg karma 7.11 · 2023-03-18 10:01:14

The multiple-line comment styling is so pleasingly pathological — each descending line has a few characters less than the last.

thih9 | karma 6590 | avg karma 2.78 · 2023-03-18 04:30:49

> In spam detection, we often use heuristics in conjunction with machine learning to identify spammers.

Heuristics can only be used to identify suspected spammers. Not everyone who behaves like a spammer is a spammer, it could be e.g. a random user with privacy settings on, or someone who didn’t update their bio in a while and it got affected by link rot, etc.

Even if a group of low activity accounts stars the same projects, it could be that the account owners just discuss these projects elsewhere.

reply

GlumWoodpecker | karma 169 | avg karma 3.31 · 2023-03-18 04:48:47

The article notes this, and like any spam detection method, it has a degree of false positives, but it seems very low (less than a percent according to the article). I'm sure an official implementation of this could take more internal, non-public factors into account, like IP addresses and clustering of account creation times, to make it even more accurate and drastically reduce the amount of spam users.

andreareina | karma 4966 | avg karma 2.57 · 2023-03-18 05:12:00

The claim I saw in the article is 98% precision. Which doesn't actually tell us the predictive value without the base rate which seems to be all over the place.

sgammon | karma 958 | avg karma 1.53 · 2023-03-18 04:33:43

this shouldn't be posted with links to the actual places to buy stars.... that seems like a bad idea?

lessname | karma 88 | avg karma 1.76 · 2023-03-18 05:02:45

Why? You can find these websites anyway if you search for terms like "buy github stars"

optimalsolver | karma 9286 | avg karma 4.17 · 2023-03-18 04:43:41

What's the street value?

robin_reala | karma 63921 | avg karma 9.98 · 2023-03-18 04:46:03

It’s in the article.

perihelions | karma 28273 | avg karma 8.36 · 2023-03-18 04:48:20

Goodhart's law: if you rely on a social signal to tell you what's good, you'll break that signal.

Very soon, the domain of bullshit will extend to actual text. We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments -- and you can get them to say "this GitHub repo is the best", or "this startup is the real deal". Won't that be fun?

reply

GlumWoodpecker | karma 169 | avg karma 3.31 · 2023-03-18 04:51:23

The scary part is that this doesn't seem too far off, with the current proliferation of large language models like the GPTs..

rzzzt | karma 3871 | avg karma 1.97 · 2023-03-18 05:00:03

Parent was definitely not referring to these at all /s

perihelions | karma 28273 | avg karma 8.36 · 2023-03-18 05:01:25

(I ninja-edited my comment in the first minute; the parent might have responded to a less clear version, since they posted at +3 minutes. I added "AI" in a revision).

rzzzt | karma 3871 | avg karma 1.97 · 2023-03-18 05:04:52

OK, sounds reasonable. I didn't see the edit either, was just thinking about the myriad of LLM articles on the front page recently.

quickthrower2 | karma 24182 | avg karma 1.71 · 2023-03-18 05:11:03

You sound way too human to be an AI then

dang | karma 18142 | avg karma 0.25 · 2023-03-18 11:32:37

If you want to, you can always set 'delay' in your profile to the number of minutes (up to 10) that you would like your comments to be visible only to you. This puts the stealth back in stealth editing. https://news.ycombinator.com/newsfaq.html

I rely heavily on this because it's somehow only after the comment is 'real' (i.e. staring back at me from a real HN thread) that I notice most of the edits I want to make.

reply

Biganon | karma 1287 | avg karma 2.77 · 2023-03-18 05:41:20

[flagged]

siva7 | karma 3498 | avg karma 3.08 · 2023-03-18 04:51:53

Who says this isn't already happening?

echelon | karma 19387 | avg karma 2.75 · 2023-03-18 06:17:39

Reddit better hold their IPO soon or they'll get caught up in this. Pretty soon there will be dozens of different GPT/LLM-powered Reddit spam bots on Github. Some of them no doubt for political trolling. [1]

Phone, then ID-based verification is a stop gap, but IDV services will have to spin up to support the mass volume of verifying all humans.

[1] I kind of want to do this from an innocent / artistic perspective myself. Perhaps a bot that responds with a bunch of rhetorical questions or onomatopoeia. Then I'd scale it to the point people start noticing and feeling weirded out by it. "Is this the new Gen Alpha lingo?" Alas, I have too many other AI projects.

reply

siva7 | karma 3498 | avg karma 3.08 · 2023-03-18 06:57:25

The Anti-AI\GPT-Detection will soon be a multi-billion dollar industry

asmor | karma 2288 | avg karma 2.43 · 2023-03-18 07:23:15

And it'll silently remove your real posts too, faster than the horrible moderation on reddit ever could!

ChrisKnott | karma 2081 | avg karma 8.0 · 2023-03-18 07:12:58

I just tried to find a FOSS tool for converting MS Outlook .pst file to .mbox.

I first tried Google; the results are dominating by commercial crap.

Then I tried the "google reddit" trick to try and find some real people's opinions... but look at all the blatantly bullshit comments on this Reddit thread; https://www.reddit.com/r/Thunderbird/comments/ae4cdg/good_ps...

---

(if anyone is wondering, the best option for Windows is to use 'readpst' command via WSL. Comes in the 'pst-utils' package).

reply

siva7 | karma 3498 | avg karma 3.08 · 2023-03-18 07:17:08

So a GPT bot instead of the human commenters would make reddit more useful in the end, this is what you're saying right?

ChrisKnott | karma 2081 | avg karma 8.0 · 2023-03-18 07:21:07

How so? The commercial organisations will be able to use a GPT bot to provide more believable comments, at greater scale, and cheaper.

deafpolygon | karma 1548 | avg karma 0.92 · 2023-03-18 07:48:23

I'm blind maybe, but what are the blatantly bullshit comments? The spam of PST to MBOX?

SalmoShalazar | karma 263 | avg karma 1.44 · 2023-03-18 09:14:44

Yeah they are almost all clearly spammy, broken english ads for paid software

vageli | karma 2576 | avg karma 1.85 · 2023-03-18 09:58:31

Yes and if you look at the comment history of the posters in that thread, it is clear they are all spam accounts.

dang | karma 18142 | avg karma 0.25 · 2023-03-18 11:34:35

If people see AI-generated comments on HN they should flag them and let us know at hn@ycombinator.com. HN is for humans to converse, and bots have never been allowed.

Of course it's not always easy to say what's AI-generated or not. But if an account is making a habit of it, it still seems possible to tell.

reply

einpoklum | karma 3425 | avg karma 0.93 · 2023-03-18 05:10:20

Your comment is the best. It's the real deal!

ryan69howard | karma 23 | avg karma 1.44 · 2023-03-18 05:36:15

This comment summarizes it best. We need more discussion like this!

s9w | karma 1590 | avg karma 1.59 · 2023-03-18 05:29:20

[dead]

iLoveOncall | karma 1368 | avg karma 0.92 · 2023-03-18 05:41:19

> Very soon, the domain of bullshit will extend to actual text. We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments -- and you can get them to say "this GitHub repo is the best", or "this startup is the real deal". Won't that be fun?

Definitely already the case, you really think Rust and SQLite would get more than a couple of upvotes otherwise? :D

reply

wongarsu | karma 24397 | avg karma 4.14 · 2023-03-18 08:08:53

Then how do you explain the Go hype HN went through just before the current rust hype? Where "[ordinary tool] in Go" was the formula for upvotes.

Then again, maybe Google had some mandatory HN time for their employees, that would be enough to explain that :D

reply

klabb3 | karma 6788 | avg karma 3.72 · 2023-03-18 05:45:09

Content based auto moderation has been shitty since it’s inception. I don’t like that GPT will cause the biggest flood of shit mankind has ever seen, but I am happy that it will kill these flawed ideas about policing.

The obvious problem is we don’t have any great alternatives. We have captcha, and we can look at behavior and source data (IP), and of course everyone’s favorite fingerprinting. To make matters worse: abuse, spam and fraud prevention lives in the same security-by-obscurity paradigm that cyber security lived in for decades before “we” collectively gave up on it, and decided that openness is better. People would laugh at you to suggest abuse tech should be open (“you’d just help the spammers”).

I tried to find whether academia has taken a stab at these problems but came up pretty much empty handed. Hopefully I’m just bad at searching. I truly don’t get why people aren’t looking at these issues seriously and systematically.

In the medium term, I’m worried that we’ll not address the systemic threats, and continue to throw ID checks, heuristics and ML at the wall, enjoying the short lived successes when some classifier works for a month before it’s defeated. The reason this is concerning is that we will be neck deep in crap (think SEO blogspam and recipe sites but for everything) which will be disorienting for long enough to erode a lot of trust that we could really use right now.

reply

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 06:35:40

I am unclear why a reasonable digital ID (probably government ID card style) plus rate limits is not going to be effective.

I can see lots of reaosns people might oppose the idea but I am not sure why it's not a widely discussed option?

(asking honestly and openly - please don't shout!)

reply

rosebay | karma 20 | avg karma 0.43 · 2023-03-18 06:39:54

[dead]

creakingstairs | karma 1108 | avg karma 3.35 · 2023-03-18 06:41:47

Closest example I know of is Korean internet. It is almost nigh impossible to get an account in major websites without SSN and a phone number. Despite this, there are still countless bots and scammers that uses hacked or leaked personal data. So I’m not sure if it would be that effective

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 07:14:27

I am thinking more like webauthn - but where I own a key pair, and I go to post office with my passport, they give me a nonce and prove that my it's my key pair then they post that public key is definitely me. I then can use that posting as my "username" and any challenge response includes the public key so they know that only I could be signing up

I am very aware of "designing a security system they themselves cannot break" and the difficulties of key management etc.

Would be interested in knowing more from smarter people

(probably need to build a poc - one day :-( )

reply

bombolo | karma 934 | avg karma 0.72 · 2023-03-18 09:11:37

> I own a key pair

Right there… it won't work with the general population.

reply

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 09:17:50

something like 2 billion people have a phone with a secure enclave capable of this in their pockets today - and they use it everyday for logins, payment and paying at the car park.

We have the penetration

(Afaik smartphone penetration is around 4.5-5 BN, and something like 50%+ have secure enclaves but honestly Indont follow that deeply so would defer to more knowledgeable people)

reply

klabb3 | karma 6788 | avg karma 3.72 · 2023-03-18 11:20:51

That’s not your identity, it’s an access token protected by an advanced lock screen (which is greatly useful, but not the same). If you lose your device, the way you get back into your accounts is your de-facto identity—usually it ranges between the email you used during signup to your govt id.

There isn’t a widely deployed public key network with keys that represent a person, afaik. PGP is the closest maybe?

reply

bombolo | karma 934 | avg karma 0.72 · 2023-03-19 03:23:20

> something like 2 billion people have a phone with a secure enclave capable of this in their pockets today - and they use it everyday for logins, payment and paying at the car park.

They don't own a key pair. They carry one around, which is owned by google or some other entity?

reply

nprateem | karma 2352 | avg karma 1.59 · 2023-03-18 06:42:08

Because the only way it'd work is if it was mandatory (because of point 2); it'd then be extended to porn sites to protect the children. That means politicians browsing history on pornhub would also be recorded and inevitably leaked when they get hacked.

ipaddr | karma 6901 | avg karma 1.37 · 2023-03-18 08:06:55

If spam was your only problem now we have two spam and identity theft. Selling/obtaining identity information becomes very profitable and those working in the postal office must guard access like a bank vault.

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 09:21:55

Then make it a banks job to guard the bank vaults - they need to earn that FDIC bailout money :-)

wpietri | karma 58013 | avg karma 4.11 · 2023-03-18 11:29:01

The paradigm of fixed identity information as proof is pretty obviously doomed. Just like how the 1970s concept of username/password as proof of identity is on its way out. Or credit card numbers alone being used to validate transactions.

All of those notions are pre-internet ways of proving identity. In a world where we're all rarely more than an arm's length from a globally connected computer, they're on the way out.

reply

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-19 07:02:42

I am guessing that "fixed identity information" is not a key pair ?

tbrownaw | karma 5348 | avg karma 2.25 · 2023-03-18 08:23:46

Anonymity is critical to free speech, because there exist bad actors who will resort to violence to suppress speech they don't like.

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 09:15:54

But, and I understand the argument, that is a problem for IRL society / government to solve.

If someone walks upto me in the voting booth and says "vote for X or I will kill you" that's a crime. If they do it in the pub it's probably a crime. If they do it online the police don't have enough manpower to deal with the situation.

We should change that.

Every time some fuckwit tweets "you and your kids are going to get raped to death and I know where you live" because some woman dares suggest some political chnage I would like to see jail time.

And if we do that then I can understand your argument, but I would then say it is not valid - in a society that protects free speech.

reply

woile | karma 301 | avg karma 1.65 · 2023-03-18 09:28:34

Actually, there could be places where verified humans are required, and places where they are not.

tbrownaw | karma 5348 | avg karma 2.25 · 2023-03-18 10:41:06

That doesn't work so well when the government is one of the bad actors.

lifeisstillgood | karma 22022 | avg karma 2.49 · 2023-03-18 11:25:41

My point is that if government is a bad actor, there is no recourse. We need a fair democratic society - it's on us to build one / keep it there

account42 | karma 5969 | avg karma 0.98 · 2023-03-20 08:15:37

It might get to be that way some day, but for now there is recourse. France is (in)famous for it and they are currently making use of that way.

And this is important because a "fair democratic society" that doesn't need people to be able to protest is, as history has shown many times, only a temporary affair. The best way to keep it is to not give the government the tools a worse government could use to suppress dissent.

reply

__MatrixMan__ | karma 5321 | avg karma 2.13 · 2023-03-18 10:43:12

I'm far less worried about being intimidated into voting a certain way by someone who is avoiding the authorities online.

Much more likely is that I'll vote ignorantly because I lack information that someone withheld because they're intimidated by the authorities.

reply

wpietri | karma 58013 | avg karma 4.11 · 2023-03-18 11:23:57

I expect that's where we're heading. But then, as somebody who writes online mostly under my own name, maybe I'm just biased. Come on in, the water's fine!

I think there are cases for anonymous/pseudonymous speech, but I think that's going to have to shift away from disposable identities. Newspapers, for example, have been providing selective anonymity for hundreds of years, so I think there's a model to follow: trusted people/organizations who validate the quality of a non-public identity.

So a place like HN, for example, could promise that each pseudonymous account is connected to a unique human via some sort of government ID with challenge/response capability. Or you could end up with third-party ID providers that provide a similar service that goes beyond mere identity, like the Twitter Verified program scaled up.

Disposable identities have always been a struggle. E.g., look at Reddit's very popular Am I the Asshole, where people widely believe a lot of the content is creative writing exercises. But keeping up a fake identity over the long term was a lot of work. Not anymore, though!

reply

Andrew_nenakhov | karma 10663 | avg karma 3.44 · 2023-03-18 06:52:21

> The obvious problem is we don’t have any great alternatives.

Of course we do. The rise of digital finance services has led to creation of a number of servives that offer identity verification necessary for KYC. All such services offer APIs, so adding an identity verification requirement to your forum is trivial.

Of course, if it isn't obvious, I'm only half joking.

reply

coldtea | karma 86593 | avg karma 2.38 · 2023-03-18 06:55:20

>The obvious problem is we don’t have any great alternatives.

There's always identity based network of trust. Several other members vouch for new people to be included.

reply

eternalban | karma 6518 | avg karma 1.44 · 2023-03-18 07:22:10

Maybe even push that a level higher and have org to org vouching as well (so it can scale and reputation propagates social bubbles.) Bootstrapping remains somewhat an issue.

wongarsu | karma 24397 | avg karma 4.14 · 2023-03-18 07:53:27

One somewhat popular solution for bootstrapping is to allow people to buy in, paired with quickly banning those members in cases of rule violation. It's by no means perfect, but it puts a real price on abuse and thus reduces it a lot

groestl | karma 2222 | avg karma 2.61 · 2023-03-18 07:34:41

I've mentioned a "market of lemons" elsewhere in this thread. One such market is the market for malware and stolen credit card details. One result of the market being broken: serious criminals restrict themselves to very small (company like) social circles and invite only forums. One signal of trust that remained very long: a very short ICQ number. You don't want to burn such a handle with a bad trade, so trust was given upfront.

wpietri | karma 58013 | avg karma 4.11 · 2023-03-18 11:12:43

How would you imagine that applying here? If fake accounts are at least as convincing as real ones, then it seems like trust networks would be quickly prone to corruption as the fake accounts gain enough of a foothold to start recommending each other.

coldtea | karma 86593 | avg karma 2.38 · 2023-03-18 17:09:12

On a network started by 2-3-10 people, the first new members would need to be vouched by a percentage of those to get in - and so on.

If someone down the line does some BS activity, the accounts that vouched for it have their reputation on the line.

A whole tree of the person who did the BS and 1-2 layers of vouching above gets put on check, gets big red warning label in their UI presence (e.g. under their avatar/name), and loses privileges. It could even just get immediately deleted.

And since I said "identity based", you would need to provide to real world id to get in, on top of others vouching for you. It can be made so you wouldn't be able to get a fake account any easier than you can get a fake passport.

reply

wpietri | karma 58013 | avg karma 4.11 · 2023-03-19 16:04:43

Are you talking about in-person verification and vouching? Or can it be digitally mediated?

If the former, it looks quite impractical unless there are widely trusted bulk verifiers. E.g., state DMVs.

If the latter, then it all looks quite prone to corruption once bots become as convincing correspondents as the median person.

reply

coldtea | karma 86593 | avg karma 2.38 · 2023-03-19 16:50:21

>Are you talking about in-person verification and vouching? Or can it be digitally mediated?

Yes and yes.

>If the former, it looks quite impractical unless there are widely trusted bulk verifiers. E.g., state DMVs.

It's happened already in some cases, e.g.: https://en.wikipedia.org/wiki/Real-name_system

>If the latter, then it all looks quite prone to corruption once bots become as convincing correspondents as the median person

How about a requirement to personally know the other person in what hackers in the past called "meatspace"?

Just brainstorming here, but for a cohesive forum, even of tens of thousands of people, it shouldn't be that difficult to achieve.

For something Facebook / Tweeter scale it would take "bulk verifiers" that are trusted, and where you need to register in person.

reply

robertlagrant | karma 11405 | avg karma 1.41 · 2023-03-18 05:46:54

Maybe we need a social network based on physical exchange of trust.

api | karma 31631 | avg karma 2.2 · 2023-03-18 06:57:16

That’s mostly what the person to person phone system was.

groestl | karma 2222 | avg karma 2.61 · 2023-03-18 06:14:08

Next keyword: market of lemons. If you can't rely on said signals anymore, you must treat every item the same (untrusted), which drives out the legitimate players from the market. We have a lot of lemon markets, we can probably infer from them what the social result will be..

Nowado | karma 499 | avg karma 2.97 · 2023-03-18 06:37:55

You can do it already. It's a normal order for a copywriter, nobody will bat an eye when you post an offer. It costs cents/dollars per 1000 words instead of fraction of a cent, but that's not exactly outside of reach of a funded startup.

charlieyu1 | karma 792 | avg karma 1.32 · 2023-03-18 06:52:51

I hope it breaks the current system of requiring references in job search as well

paulcole | karma 4431 | avg karma 0.78 · 2023-03-18 06:56:21

This system is already essentially broken. Either you worked at a large business that only gives out dates of employment and job title by policy or you are in complete control of who the hiring company talks to.

The first time you don’t get a job because of a reference you gave you learn a lesson. If it ever happens again, it’s on you.

reply

asmor | karma 2288 | avg karma 2.43 · 2023-03-18 07:24:30

What's really an alternative. At least where I live, a multi-year gap in your CV is going to set off more red flags than an honest "It didn't work out between us".

paulcole | karma 4431 | avg karma 0.78 · 2023-03-18 07:36:36

Don’t give them your boss’s name. Give them a coworker’s name. Give them a friend’s name and have them lie for you.

If a company is proactively contacting people you don’t give them contact information for, that’s not requiring references — which is the process I (and the comment I replied to) was talking about. If a company knows where you’ve worked, they can contact them if they want.

reply

moneywoes | karma 1126 | avg karma 0.62 · 2023-03-18 10:58:59

What’s the solution for the latter point you mentioned?

If they proactively contact someone as part of their verification?

reply

paulcole | karma 4431 | avg karma 0.78 · 2023-03-18 11:16:39

Then you’re fucked if they check and the reference is bad and they care. Either you take your chances, leave it as a gap in your resume, or you make something up.

In the past, I’ve extended the time I was at either the company before/after and then leave the one in the middle off. Smaller gap is easier to explain and you just need a coworker at the one you stretched to cover for you - or have it be somebody who wasn’t there during the time you added. You can also just say you did the “freelance” thing and then talk about whatever you want.

I’ve also just been 100% honest and said, “I didn’t like this job and left on bad terms. I’d rather you not contact them.”

Just have to read the situation and make your best guess as to what is going to get you the job.

reply

is_true | karma 864 | avg karma 1.12 · 2023-03-18 06:53:35

I'm sure it's already happening in the "books" threads

vidarh | karma 41717 | avg karma 2.6 · 2023-03-18 07:29:23

We'll be back to the 1990's "software agents" craze take two: Needing AI driven agents that seek out and index and evaluate content on our behalf, and seek to negotiate with each other for recommendations with currency being trust based on how "your" agent evaluated prior results.

I'm hoping to put an AI between me and my e-mail inbox this weekend (I had ChatGPT write most of the code; it's not much); not fully automated, but evaluating and summarising and categorising. I might extend that to e.g. give me an "algorithm" for my Mastodon timeline (despite all of the people insisting on reverse chronological, I'm at a few hundred people I follow and already can't keep up), and a number of other sites I visit. For most of these things latency does not matter, so e.g. putting them through llama.cpp rather than something faster is fine, and precision isn't critical (I won't trust it to automatically reply or automatically reject anything, but prioritisation an categorisation where missteps won't have any critical impact.

reply

Alex3917 | karma 30435 | avg karma 3.42 · 2023-03-18 07:39:54

> We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments

You're forgetting the millions of additional comments that will be written by humans to trick the AI into promoting their content.

Even worse, currently if you ask Chat GPT to write you some code, it will make up an API endpoint that doesn't exist and then make up a URL that doesn't exist where you can register for an API key. People are already registering these domains, and parking fake sites on them to scam people. ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

The biggest risk may not be people using AI-generated comments to promote their own repos, but rather registering new repos to match the fake ones that the AI is already promoting.

reply

permo-w | karma 2273 | avg karma 1.33 · 2023-03-18 07:53:03

I feel like you’re overstating this as a long term issue. sure it’s a problem now, but realistically how long before code hallucinations are patched out?

ptato | karma 219 | avg karma 2.19 · 2023-03-18 07:54:35

Nobody knows.

permo-w | karma 2273 | avg karma 1.33 · 2023-03-18 07:55:32

undoubtedly not long

lanternfish | karma 475 | avg karma 3.39 · 2023-03-18 07:55:21

The black box nature of the model means this isn't something you can really 'patch out'. It's a byproduct of the way the system processes data - they'll get less frequent with targeted fine tuning and improved model power, but there's no easy solve.

permo-w | karma 2273 | avg karma 1.33 · 2023-03-18 13:39:25

this is clearly untrue. it’s an input, a black box, then an output. openai have 100% control over the output. they may not be able to directly control what comes out of the black box, but a) they can tune the model, and they undoubtedly will, and b) they can control what comes after the black box. they can—for example—simply block urls

Sai_ | karma 892 | avg karma 2.81 · 2023-03-19 06:46:03

They don’t have control over the output. They created something that creates something else. They can only tweak what they created, not whatever was created by what they created.

E.g., if I create a great paintbrush which creates amazing spatter designs on the wall when it is used just so, then, beyond a point, I have no way to control the spatter designs - I can only influence the designs to some extent.

reply

permo-w | karma 2273 | avg karma 1.33 · 2023-03-20 21:06:44

did you read what I said?

lanternfish | karma 475 | avg karma 3.39 · 2023-03-19 07:58:34

This is true, but detecting and omitting code hallucinations is (functionally) as hard as just not hallucinating in the first place.

aent | karma 300 | avg karma 4.92 · 2023-03-18 08:59:51

Assuming those hallucinations are a thing to be patched out and not the core part of a system that works by essentially sampling a probability distribution for the most likely following word.

permo-w | karma 2273 | avg karma 1.33 · 2023-03-22 14:48:00

evidently, they can hard-code exceptions into it. this idea that it's entirely a black box that they have no control over is really strange and incorrect and feels to me like little more than contrarianism to my comment

warent | karma 10770 | avg karma 5.74 · 2023-03-18 09:45:30

Folks, doesn't it seem a little harsh to pile downvotes onto this comment? It's an interesting objection stimulating meaningful conversation for us all to learn from.

If you disagree or have proof of the opposite, just say so and don't vote up. There's no reason to get so emotional we also try to hide it from the community by spamming it down into oblivion.

reply

permo-w | karma 2273 | avg karma 1.33 · 2023-03-18 14:27:25

to be fair, it’s only one net downvote

trippingrobot | karma 4 | avg karma 1.0 · 2023-03-18 10:22:46

An aside: what do people mean when they say “hallucinations” generally? Is it something more refined than just “wrong”?

As far as I can tell most people just use it as a shorthand for “wow that was weird” but there’s no difference as far as the model is concerned?

reply

mlhpdx | karma 1020 | avg karma 1.8 · 2023-03-18 11:40:50

Most people don’t understand the technology and maths at play in these systems. That’s normal, as is using familiar words that make that feel less awful. If you have a genuine interest in understanding how and why errant generated content emerges, it will take some study. There isn’t (in my opinion) a quick helpful answer.

trippingrobot | karma 4 | avg karma 1.0 · 2023-03-18 20:24:40

I genuinely want to understand whether there’s a meaningful difference between non-hallucinatory and hallucinatory content generation other than “real world correctness”.

mlhpdx | karma 1020 | avg karma 1.8 · 2023-03-20 19:51:23

I’m far from an expert but as I understand it the reference point isn’t so much the “real world” as it is the training data. If the model generates a strongly weighted association that isn’t in the data, and shouldn’t exist perhaps at all. I’d prefer a word like “superstition”, it seems more relatable.

bombcar | karma 42761 | avg karma 2.58 · 2023-03-18 13:03:46

Wrong is saying 2+2 is five.

Wrong is saying that the sun rises in the west.

By hallucinating they’re trying to imply that it didn’t just get something wrong but instead dreamed up an alternate world where what you want existed, and then described that.

Or another way to look at it, it gave an answer that looks right enough that you can’t immediately tell it is wrong.

reply

permo-w | karma 2273 | avg karma 1.33 · 2023-03-22 14:47:04

this isn't a good explanation. these LLMs are essentially statistical models. when they "hallucinate", they're not "imagining" or "dreaming", they're simply producing a string of results that your prompt combined with its training corpus implies to be likely

fantod | karma 196 | avg karma 2.36 · 2023-03-18 09:52:47

> ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

Does ChatGPT consistently generate the same fake data though?

reply

redeux | karma 444 | avg karma 2.6 · 2023-03-18 11:44:29

I have noticed that ChatGPT will give me a consistent output when the input is identical, but I haven’t done extensive research on this.

bombcar | karma 42761 | avg karma 2.58 · 2023-03-18 13:04:41

There was one company that had to put up a “our API can’t get location data from a phone number so stop asking, GPT lied” page.

notabee | karma 537 | avg karma 1.7 · 2023-03-18 17:22:22

I'm constantly curious whether anyone working in the AI space is cognizant of the Tower of Babel myth.

I don't think an arms race for convincing looking bullshit is going to turn out well for our species.

reply

dorian-graph | karma 1145 | avg karma 1.91 · 2023-03-18 07:51:49

That's what Product Hunt has felt like for a long time—and LinkedIn too.

rwallace | karma 3490 | avg karma 2.3 · 2023-03-18 08:24:32

This is the first time I've ever posted an XKCD link here, but I think the occasion calls for it.

https://xkcd.com/810/

reply

precompute | karma 913 | avg karma 2.12 · 2023-03-18 09:40:54

Now is the time to cultivate friendships and to make networks that persist online, and are verified via irl meetups / contacts. People who pull that off now will be in much, much better shape in the future. GPT's output is apparent to a discernible eye right now, but according to the power law, it won't take much "novel" input to train upon to make that discernment useless. Then, the only internet community that could be dependably reliable would be your group of irl verified people.

password4321 | karma 2755 | avg karma 2.42 · 2023-03-18 09:51:54

I would phrase it more as we're pretty much out of time to have initiated online-only relationships.

precompute | karma 913 | avg karma 2.12 · 2023-03-18 11:33:57

Agreed. It's very difficult now to build communities that have lasting impact, because everyone's saturated with info as-is. Contributions to niche communities now rely on a societal "outsider" status, which means there's basically a couple of people that contribute heavily and very few onlookers. Everything else is either gamified or comes from video games / gambling.

On the bright side, it's THE time to cultivate close friendships and to seek like-minded people. The entire phenomenon of popular attention hugging a community to death does not exist any longer. You can now have OG members persisting with notions for a long time and building a shared mythos with a small group of friends, because information is now more accessible than ever.

Obviously, most people aren't part of these communities. The people that are "drifting" alone are given to wasting their time on charismatic attention-seekers that talk a big game (twitch/e-celebs) but deliver nothing of value. So there's also room in the market for charismatic folk with some technical expertise to rally people to their cause, but only very briefly. This is because the number of people half-committing and then jumping ship is likely the highest it's ever been. Also, platforms have now resorted to paying people to stay on their platform (youtube / tiktok / sponsorships / twitch boosting streamers / etc.) to combat occasional ennui, ironically exacerbating the issue.

reply

moneywoes | karma 1126 | avg karma 0.62 · 2023-03-18 10:28:28

Best methods for that? Local meetups?

precompute | karma 913 | avg karma 2.12 · 2023-03-18 11:47:13

Most tight, close-knit groups originate from shared mythos. These can be family, proximity, "same school year", "same college", "friend of best friend", etc. Online, you can find people that are interested in some niche topic (or elaboration of some popular topic to an absurd degree) and engage with them. Small newsletters are also a good way to get people talking. What most people don't do is return attention, aka reciprocate positively. This could also mean you'd have you write about unrelated things or maybe try to build a "business relationship" that would then progress if you invest some time and hope for the best.

It's a really bad time to try and get the attention of someone more famous / notable than you, though. Sure, you can go on $platform and talk to them, but it's really not the same when they have a gorillion other messages. Same goes for people in large communities that are a "guy" there, known for something. Extremely high-return investments but you're likely going to fail.

Some people try to start youtube channels / info streams and then entice people to join their forum / server. While this does seem to work, it only brings in quality people AFTER the community is fully formed and rigorous laws are in place. The initial stragglers are usually the recently excommunicated looking to try their hand at the same shit somewhere else.

If you really put some effort into a topic and blog about it, you're likely to get some high-quality responses even if you only pose a question to someone that's partly interested. I've found this to be a really great way to separate the folks that are actually interested from those that aren't. You'll usually get people around your own level this way and IME this is the best approach.

It takes a lot of effort to make people clock in regularly to your online circle, and it's better to establish digital / irl face-to-face contact after a good interaction. It builds trust and because we're wired to judge people from their facial reactions rather than text, it also sobers conversation / tempers over potentially divisive topics. Works well with cerebral / "deep" people. Doesn't work with people that only come online to blow steam / enact a persona, so it's a good filter.

TL;DR: Touch grass (digitally), make friends (digitally)

reply

greesil | karma 2499 | avg karma 2.83 · 2023-03-18 10:02:18

How do you know we aren't already there?

soheil | karma 2114 | avg karma 0.91 · 2023-03-18 10:09:56

Stop making up laws. You'll do much more good dismantling existing ones. And non-social signals like # of commits, # of pull requests cannot be faked? We need signals among the noise.

Sometimes signals are noise we just need to calibrate.

reply

wpietri | karma 58013 | avg karma 4.11 · 2023-03-18 11:06:46

I mean, there have always been shills. What's changing now is the cost of shilling is dropping from dollars per comment to fractions of a cent. Troll farms used to be a lot of work to put together, but soon they'll be aaS.

Those of us who are careful internet readers have spent years developing good heuristics to use textual clues to tell us about the person behind the text. Are they smart? Are they sincere? Are they honest? Are they commenting in good faith? Those skills will soon be obsolete.

The folks at OpenAI, who are nominally on a mission to make sure AI "benefits all of humanity", have condemned us to a life sentence of fending off high-volume, high-quality bullshit. Bullshit that they are actively working to make harder to detect. And I think the first victims of that will be internet forums where text is the main signal, places like this and Reddit.

reply

vehemenz | karma 2170 | avg karma 2.57 · 2023-03-18 11:59:04

Maybe more appropriately, Campbell's law:

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

reply

yla92 | karma 3855 | avg karma 9.38 · 2023-03-18 04:51:54

TIL: you can buy (fake) GitHub stars.

That was a bit shocking to me to learn.

reply

quickthrower2 | karma 24182 | avg karma 1.71 · 2023-03-18 05:12:31

/s ??

I always expected there was a market for fake stars. I am trying to get a repo naturally to 1000 stars, but I would never buy them.

reply

sorokod | karma 4501 | avg karma 1.96 · 2023-03-18 06:04:20

Can you explain why is it "natural" to try to get your repo to have many stars in a world where starts can be bought?

s9w | karma 1590 | avg karma 1.59 · 2023-03-18 06:06:53

most people don't know that stars are bought

quickthrower2 | karma 24182 | avg karma 1.71 · 2023-03-18 06:12:28

Natural: promote the repo, people see it and like it. I don’t beg for them.

Unnatural: pay some bot runner to buy stars.

I prefer natural as the stars are a metric not an end goal.

reply

sorokod | karma 4501 | avg karma 1.96 · 2023-03-18 06:14:41

Right, was not asking what you mean by natural but rather why is it of any value.

quickthrower2 | karma 24182 | avg karma 1.71 · 2023-03-18 06:21:49

I see. I just see it as an indicator of reach. Also some people with snap judge a project by number of stars and more likely use it if it has a bunch.

morelisp | karma 3202 | avg karma 0.97 · 2023-03-18 07:19:50

> I just see it as an indicator of reach.

That just shifts the question to why is "reach" something worth wanting?

> some people with snap judge a project by number of stars and more likely use it if it has a bunch.

And why do you want these users?

reply

quickthrower2 | karma 24182 | avg karma 1.71 · 2023-03-18 08:36:38

1. to get more feedback

2. they may just be busy users, looking for something for their job.

I take on board your point though. The stars thing isn't the biggest consideration by a long shot. Probably the smallest!

reply

mr_mitm | karma 2572 | avg karma 3.03 · 2023-03-18 06:31:32

That's my issue with stars already. One repo having more stars than another doesn't mean it's better in any way. It might just mean it's been promoted more.

That's how record labels can simply decide what's going to be the next summer hit. They pick a song and promote the hell out of it. It's not the summer hit because it was somehow better, just more promoted.

reply

astura | karma 9774 | avg karma 1.97 · 2023-03-18 08:09:15

You can buy Twitter followers, Instagram followers, YouTube views, Amazon reviews, Reddit upvotes, Reddit comments, and Yelp reviews - so what's so shocking about GitHub stars?

Springtime | karma 1549 | avg karma 3.72 · 2023-03-18 09:46:45

After that post on HN months ago[1] where users discovered OAuth permissions for unrelated things being used/abused to star projects without their knowledge this news of buying stars didn't come as a surprise.

It's unfortunate as I've seen stars used as a metric of trustworthiness in general user discussions.

[1] https://news.ycombinator.com/item?id=33917962

reply

amsterdorn | karma 61 | avg karma 1.03 · 2023-03-18 04:53:46

GitHub is fully aware of these, would they consider something like a "confirmed" star count that subtracts the suspicious/fake number? Or is that too much of a slippery slope.

mapmeld | karma 941 | avg karma 7.35 · 2023-03-18 05:05:46

GitHub gradually removes these users as they catch up to them, so not helpful to have extra steps. I have a couple of repos which were briefly popular, so when a new user stars it today, and I see 1000s of other stars, it's suspicious and I get a peek into their world.

There are obvious numeric usernames, but also fake orgs with repos for the users to fork and interact with, and a few account takeovers (i.e. someone had signed up for GitHub in 2015 to make a free wedding website, abandoned it, and the account fell into spammer hands). These used to be easier to report.

reply

Azadzadeh | karma 78 | avg karma 19.5 · 2023-03-18 05:20:58

>GitHub gradually removes these users as they catch up to them

With collaterals too I presume [1]. I guess I've been the victim of some automated system. They have banned my account without warning or explanation and they've been ignoring my support tickets for about 2 months!

[1]: https://news.ycombinator.com/item?id=34817163

reply

account42 | karma 5969 | avg karma 0.98 · 2023-03-20 09:03:25

> They have banned my account without warning or explanation and they've been ignoring my support tickets for about 2 months!

Which is especially ridiculous if this was due to a false positive spam detection as real spammers will not bother with chasing support when new accounts can be created easily.

reply

groffee | karma 1171 | avg karma 2.46 · 2023-03-18 04:57:52

[dead]

lessname | karma 88 | avg karma 1.76 · 2023-03-18 05:01:43

How did you find out the name of the company behind GitHub24 though? If I go to their website I do cannot see it, I even cannot find anything if I search the company name.

cyberia23424 | karma -2 | avg karma -0.29 · 2023-03-18 05:03:21

Perhaps they got it via payment info

gerogerke | karma 38 | avg karma 2.24 · 2023-03-18 05:49:57

I was also surprised when I saw it. A GbR is a German "Gesellschaft bürgerlichen Rechts" which does not need to be formally incorporated and offers no limited liability. The name needs to include the names of all partners, so we can deduce it is being run by two persons. I am quite surprised they do this without liability protection. Upon googling, I found only a playlist on YouTube which has this name and contains one explainer video about signing up a company with German tax authorities. If they are indeed based in Germany, they're required to have an Impressum / imprint on their home-page, without it, they risk being fined.

JaDogg | karma 318 | avg karma 1.34 · 2023-03-18 05:11:19

Just use Show HN & Reddit.

Ralfp | karma 1527 | avg karma 3.56 · 2023-03-18 06:48:10

Those never worked for me.

Show HN: there are maybe dozens of those posted everyday but they rarely hit main page.

Reddit ad is great to kick off the star growth, but unless you have something interesting to many people, don’t expect more than 50 stars on first day and plateau to a star every few days.

Most GH stars I’ve got was from somebody mentioning my project in comment in some heated discussion on HN. So I guess drama sells?

reply

coolsank | karma 810 | avg karma 15.28 · 2023-03-18 05:24:29

Is it just me or the fact that Dagster has one of their competitors Mage.ai listed here as a repo with around 15% of fake stars seems like an odd coincidence?

say_it_as_it_is | karma 1888 | avg karma 2.11 · 2023-03-18 05:33:06

It shouldn't be a surprise. Why are you surprised? Do you often pursue random activities irrelevant to your life for dozens of hours?

coolsank | karma 810 | avg karma 15.28 · 2023-03-18 08:17:45

Yes I do.

zeroonetwothree | karma 4300 | avg karma 2.02 · 2023-03-18 11:00:11

Pretty standard for anyone ND

janalsncm | karma 6246 | avg karma 3.1 · 2023-03-18 05:54:10

If you’re going to accuse a competitor of fraud, writing a blog post showing your work seems like the most safe way to do it. People lie with statistics all the time of course.

speedgoose | karma 8135 | avg karma 2.13 · 2023-03-18 06:22:36

They don’t mention what I think is their biggest competitor: Prefect.

frasermarlow | karma 201 | avg karma 2.91 · 2023-03-18 07:45:47

[Blogpost author here] We ran the numbers for Prefect and several other repos in our space and they came out clean. As we note in the article, while some repos game the system, from what we can tell the number of abusers is actually fairly small.

julienfr112 | karma 485 | avg karma 1.32 · 2023-03-18 08:21:57

or they used a more sophisticated star provider ?

TheDong | karma 6868 | avg karma 3.71 · 2023-03-18 07:32:01

I mean, they explain it at the top:

> we track our own GitHub star count along with that of other projects. So when we spotted some new open-source projects suddenly racking up hundreds of stars a week, we were impressed. In some cases, it looked a bit too good to be true, and the patterns seemed off

If their competitor has fake-looking star counts, I'd expect them to be the ones best equipped and most likely to suspect it.

reply

bart_spoon | karma 2787 | avg karma 2.79 · 2023-03-18 07:33:06

It’s possible that was the impetus of the blog post. Maybe they suspect Mage.ai of astroturfing GitHub stars and investigate it as above. They then publish a blog post that:

1. Indicates the astroturfing without actually specifically calling them out 2. Does so in a way where others can verify their work and use it on other repos 3. Uses their product to do so

Seems pretty brilliant to me.

reply

saurik | karma 31936 | avg karma 4.3 · 2023-03-18 05:34:47

> Yet [GitHub stars] influence serious, high stakes decisions, including which projects get used by enterprises, which startups get funded, and which companies talented professionals join.

Really? I honestly just don't believe this... if I were to believe this, I think I'd have to conclude the world is just too broken to bother rescuing.

reply

rossmohax | karma 1009 | avg karma 2.52 · 2023-03-18 05:39:55

More than once I've seen when number of stars was an argument to decide whether to pull dependency or write our own.

ZephyrBlu | karma 7568 | avg karma 3.32 · 2023-03-18 05:42:31

People use flawed but easily consumable metrics to make almost all decisions.

It takes a lot more effort to collect multiple metrics along different axes, understand the skew/bias of them and make an informed decision.

Visibility and ease of consumption are the most important aspects of a metric if you want people to use it.

reply

saurik | karma 31936 | avg karma 4.3 · 2023-03-18 08:58:14

The list in the article, though, was carefully selected to presume competent people doing the decision-making. I totally believe many people use that star count for something... but an "enterprise"? someone investing non-trivial amount should of money? a specifically-"talented professional"? I just find that really difficult to believe. I've sold software to enterprise, I've worked with a number of venture capital funds, and I know a ton of actually-talented professionals... I dare say most of them consider GitHub's social features to be a joke.

The enterprises I deal with cared almost exclusively about stuff like license choices, support contract options, and "invoice billing" ;P. The vetting process I've dealt with at VCs was intense, having worked both sides of that situation; and I know multiple people who have worked data science jobs at such firms to try to better select investments. As for a "talented professional", I can pretty much guarantee they are going to look at your codebase, not the number of stars it has, while they evaluate any number of more reasonable things to judge an opportunity on (commute, pay, management style, etc.). A key property of competent deciders is that they aren't using trivial metrics.

reply

philbo | karma 2356 | avg karma 6.18 · 2023-03-18 05:43:34

One of my stock interview questions asks people how they evaluate 3rd-party dependencies for use in a production environment. So many interviewees respond with GitHub stars as their main or only criterion. It depresses me every time.

tasuki | karma 3058 | avg karma 1.7 · 2023-03-18 05:59:57

It depresses me too, but what else can you do? I check what the docs look like, but if I'm to depend on a thing I'd rather choose something popular than unpopular. GitHub stars, Hackage downloads, StackShare... what else can one check?

throw_away1525 | karma 212 | avg karma 2.21 · 2023-03-18 06:01:03

That's a very interesting question. There are so many things you can look at. How is the documentation? Who are the primary maintainers? How are they funded? What are their motivations? Are the primary maintainers active on Stack Overflow, Reddit, Discord, etc...? How many contributors are there? How does their Github issues page look? What about the Github discussion page? How many maintainers are there total? How many downloads per week on NPM (for JS libraries)? From all of these things - how long do you expect this library to be maintained? And that's just the initial qualification research, nothing about how it will impact the actual code-base.

What did I miss? What's the best answer you've ever heard? How do you evaluate 3rd party dependencies?

reply

Etheryte | karma 9322 | avg karma 4.55 · 2023-03-18 06:21:08

You overlooked what I consider to be the first thing you should check — when was the repository last committed to. There are countless projects that rank high on every other metric, but are essentially abandonware.

throw_away1525 | karma 212 | avg karma 2.21 · 2023-03-18 06:26:59

Yeah good point... definitely something I would have checked, forgot to put it in the list. I'm baffled people have trouble coming up with more than "number of stars" for this.

Of course there can be libraries that are more or less "finished", so the last commit/frequency of commits isn't on its own a deciding factor, but in proper context/holistically it is definitely an important metric!

reply

saurik | karma 31936 | avg karma 4.3 · 2023-03-18 09:04:09

FWIW, I am not baffled by that, as the vast majority of programmers are not "talented professionals" (which is the specific category of potential employee I was balling at, along with enterprises and venture capital firms). So like, you ask your question, they say "star count", and you don't have to really continue the interview.

(When I was in high school, I used to work for a pre-Internet company that helped people pre-filter interview candidates for ads posted in classified sections of newspapers and what they did was have questions like this that could be asked by people well before they reached your calendar for an interview.)

reply

BeFlatXIII | karma 1904 | avg karma 1.05 · 2023-03-18 09:09:22

However, some language ecosystems are more OK with "finished" software than others. It hasn't had a commit in 4 years because none were necessary. Needing constant updates is a sign the local ecosystem is driven by churn over quality.

Etheryte | karma 9322 | avg karma 4.55 · 2023-03-18 10:28:12

I don't really think this generalization holds. TeX is one of the very few widely used pieces of software that's considered complete, more or less everything else is either getting updated or superseded by other things.

mattgreenrocks | karma 7288 | avg karma 3.13 · 2023-03-18 11:06:49

A NFA library, for example, probably doesn't need to be constantly updated.

If you avoid building on something that's constantly shifting (the web) then the need to update goes down significantly.

reply

BeFlatXIII | karma 1904 | avg karma 1.05 · 2023-03-19 06:52:22

Clojure, Elixir, and Lisp (especially Clojure) all have slower acceptable churn rates than other language ecosystems. If it works sensibly (both in terms of being fully debugged and ergonomics) and the underlying system hasn't had significant changes, what good does a commit within the past six months do beyond signaling to the GitHub meta game?

majkinetor | karma 2833 | avg karma 1.57 · 2023-03-18 06:39:53

Insights -> contributors, and number of active maintainers based on entire commit history of the project and frequency of commits. Also, network page which shows number of active forks. Also, PRs, and how are they handled.

Contributors is the most informative page for me. So many projects are 1 man show basically all the time. I don't mind that, it means passion, but it also mean it can dissaper any moment depending on circumstances.

I also look into issue details to see how maintainers communicate with community members that do due dilligence before aksing for help.

reply

jart | karma 12123 | avg karma 5.27 · 2023-03-18 07:26:17

You missed: look at the actual code.

Stars only mean something because of the people who do. They're the ones leading the herd. If you're just going off the social signals, then you're just monitoring where the herd is going.

reply

philbo | karma 2356 | avg karma 6.18 · 2023-03-18 08:18:22

Yep, this one is the headline item for me. Look at the code and, if it has further dependencies of its own, look at the code for those too.

The main question I'm asking myself while looking at the code is: if I had to fork this thing and maintain it myself, how would I feel about it? Because sometimes that happens.

reply

philbo | karma 2356 | avg karma 6.18 · 2023-03-18 08:19:41

> How do you evaluate 3rd party dependencies?

I actually blogged my answer to that exact question recently (shameless plug):

https://philbooth.me/blog/how-to-evaluate-dependencies

reply

GartzenDeHaes | karma 1349 | avg karma 3.3 · 2023-03-18 10:25:34

I'd add support to that list. When it breaks, can I cut a contract and get an expert available to diagnose the problem within a few hours. Production outages are not the time for self help and digging around in other peoples code bases.

kaeruct | karma 1754 | avg karma 8.51 · 2023-03-18 06:02:33

What kind of answer would make you happy?

I prefer to look at the recent commits, or any recent activity on the repo's issues, but I would like to know what else can be used as an indicator.

reply

saurik | karma 31936 | avg karma 4.3 · 2023-03-18 09:24:15

So, ask yourself for a moment: what is it you are actually caring about?

I'd like the project to not introduce security vulnerabilities or bugs into my code. I thereby care what language it was written in, what libraries they use, what their testing and QA/CI process is, and whether it is being used by any "critical" projects (like, if that library is embedded in Chrome, you have to bet there are tons of people like me every day trying to hack it).

As part of that, I care about if the project takes a cavalier attitude towards contributions: if I see a number of pull requests from random "contributors" being casually accepted, that is going to be a major major red flag; if possible, I want to see a core team doing most of the development and integration (and not merely most of the "review", add I see in some projects where the people in charge feel above doing work).

I definitely care that the project is being maintained and that there are people paying attention to issues, and it needs to have a culture of taking bug reports seriously... nothing is more dangerous than a project that tries to pretend they are responsive using bots to "automatically close" issues: I'd rather see bugs open for years than worry a critical issue was reported and subsequently lost.

I am certainly curious how work on the project is funded and whether I can trust that its license is going to hold constant over time: I don't want to end up relying on a dependency that is really the pet project of a small startup that is either going to disappear next year or will decide to redirect development to a closed-source fork. I'd thereby also prefer the project be run by a core committee of participants from multiple companies.

I honestly can't imagine caring two shits about how many stars a project had on GitHub... hell: what if the project isn't even on GitHub? What then? Do you just give up and decide it sucks? A world where everyone feels any incentive at all to put their code on a centralized platform is one where we have all failed as stewards of the future of software :(.

reply

derivagral | karma 495 | avg karma 1.58 · 2023-03-18 07:43:33

Activity on other sites related to finance/coding is similar (seekingalpha likes, for example) and I've gotten organic inbound requests for work periodically scraping such info into... Excel.

newmac | karma 385 | avg karma 8.75 · 2023-03-18 05:43:38

I think the most interesting thing would be to run this test against the list of Launch HNs, sorted by votes, grouped by class.

NiloCK | karma 449 | avg karma 3.35 · 2023-03-18 05:49:00

I have a half-written article about this, but I didn't have any good notion about quantifying the problem so this article is very welcome info to me.

My own angle is that copilot has shifted the incentives around this practice, maybe substantially. Businesses want to get (free tiers of) their paid SaaS endpoints into copilot suggestions - it's a great funnel!

I'd guess that github is as likely as not to become an SEO spam battlefield (like the rest of the web).

reply

UncleEntity | karma 2069 | avg karma 0.83 · 2023-03-18 11:43:23

> Businesses want to get (free tiers of) their paid SaaS endpoints into copilot suggestions - it's a great funnel!

That’s so brilliantly evil…

I can see the next generation of “how I got to $3m in passive income” articles being written (by ChatGPT) right now.

reply

Der_Einzige | karma 1358 | avg karma 0.53 · 2023-03-18 06:03:04

I wrote a tiny tool which calculates the "brightness" score of a github repo based on calculating the total star count of the people who starred your repo. It will automatically detect these kinds of scams (assuming that it's mostly low star bots giving the stars).

https://github.com/Hellisotherpeople/Bright

Edit: I love clustering, I really do, but I think that techniques like the one I am using are far superior to unsupervised learning for trying to detect fake accounts in this context.

reply

newmac | karma 385 | avg karma 8.75 · 2023-03-18 06:16:11

It is worth noting that it is trivial to buy fake stars for a project you are not affiliated with. The reason someone might do this would be to "test" the purchasing of fake stars without risking contaminating their own project.

nvr219 | karma 3800 | avg karma 3.13 · 2023-03-18 06:44:08

I once bought a friend of mine a thousand Twitter followers as a prank. He wasn't happy.

i_am_toaster | karma 82 | avg karma 1.86 · 2023-03-18 07:52:21

As he should be, that wasn’t a well thought through prank.