Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Tracking the Fake GitHub Star Black Market (dagster.io) similar stories update story
489 points by kaeruct | karma 1754 | avg karma 8.51 2023-03-18 02:42:34 | hide | past | favorite | 293 comments



view as:

Is there even such a thing as a github influencer (people living just from github)?

People working in DevRel often aggregate developer oriented content and gain popularity that way, an example would be "swyx" for example. I'm not taking a dump on his work, but you can see the Github influencer effect over there.

Never heard of swyx.

Self proclaimed GitHub star. But still only 5000 followers and projects max out at 8000 stars.

I don’t know what I had expected but I think it was bigger numbers than that.

https://github.com/sw-yx


The "github star" claim links to the source (it's some github program where you can nominate people to be accepted into some promotion campaign). Saying self proclaimed makes him sound pretentious, it's actually awarded by github.

You can be factual and still sound pretentious and cringey. Like the medical doctors who insist on being called “doctor”, to the point of smuggly “correcting” strangers in a social setting.

I don’t know this user and won’t assume his intentions, but I can see how having “I’m a GitHub star [star emoji]” as the first sentence on the profile is doing him a disservice: it makes it seem like it’s the most impressive thing he’s achieved and diminishes everything else.


FWIW a smug doctor usually corrects people that they are a Physician.

Maybe in English, but in my native tongue there’s no word for physician.

Also, I meant in the sense that you call someone “mister McSmug” and they reply almost angrily with “doctor McSmug”.


fixed. i wrote that when i was still trying to be approachable and cutesy. now i dont need it lol.

I love the edit in GH. So much.

Thank you for the work you do and for how much you have contributed to people learning over the years. <3


To reiterate, I don’t know you and don’t assume your intentions—and thus do not judge them. I’m also not familiar with your work but I have no doubt it’s more relevant than whatever “star award”.

In other words: it makes zero difference to me what you write in your bio though I can see how its previous wording took away from what’s important. I was conveying to the parent comment my understanding of the comment they were replying to.

Apologies for making you feel judged, that was not the point. Quite the contrary: I wanted to underline that by not knowing your intentions it does not make sense to criticise how you choose to present yourself.


yup no hard feelings. felt defensive heheh. i guess as my career has gone on i've accumulated other stuff but early on the github star thing really did feel like a big deal + if i wasnt gonna plug it on my github readme where else

swyx is on hn and legit great writer. He's influenced my thinking in many areas.

I've never seen his github account before but I expect that people following him there are doing so because of the content he's putting out. His blog has been on the HN Frontpage many times and has a book about developer career building.

My github account isn't as pimped out as his, but marketing yourself isn't toxic, it's smart.


Agreed that marketing yourself is not toxic. I follow "swyx" on Twitter and find his insight valuable, and so do a lot of my peers. Btw, looks like his Github profile has not been updated for some time - he's no longer Head of DX at Airbyte and is now an independent consultant. https://www.swyx.io/about

appreciate it but also whoa this literally just happened and its freaky how up to date you are. consulting is temporary (check out https://www.trychroma.com/ if you are exploring LangChain/OpenAI apps and need an embeddings database) and i'm working on an ai infra startup idea on the side with a couple cofoudners.

Congrats! I'll be watching :)

love and appreciate your work as well adam (everyone check out Corecursive https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... )

i honestly dont even view my github readme as "marketing yourself". most pple dont even go to an individual's profile in the first place, but if you do its kinda like a cute little myspace thing where you can let people know you as a human being and be a little quirky. i certainly dont hold myself out as an authority on writing the best software in the world and hey if 40k stars on the react-typescript stuff doesnt count i'm alright with that


I didn't even know Shawn had a popular GitHub, though he has written about the meta-creator ceiling before: https://www.swyx.io/meta-creator-ceiling

yeah i also am surprised that people use the follow feature for my work even tho i dont run a popular oss project.

well idk what "github influencer" even means but fwiw i am not "people living just from github". ive never taken a dime of github sponsor money. as far as github is concerned i just put my stuff up for free and the github stars program gets me an early look into new features so i can give them feedback. (eg i helped with Hey GitHub before the big launch at GH Universe).

obviously i'll happily ambassador github to anyone who will listen but who isnt already on github here


There are very few people who work like this and are non-toxic.

I guess the purpose is to find a job as evangelist and similar.

I have heard of people getting interviews from their GitHub profile.

I got my current job through GitHub.

At least that's how the 3rd party recruiter told me he found me. It's possible he was lying and thought it would impress me (it did).

My profile is more active than most, but very far from rockstar.


Yeah. Several years ago extremely clueless recruiters used to email people heaps. Lots of people were complaining about getting tonnes of spam from them. :(

Had to change my Location (or some similar obvious field) in my GitHub profile to "Recruiters FUCK OFF" before they took the hint. ;)

Thankfully, GitHub introduced some other way to signal if you are/aren't interested in getting a job (toggle switch?) not long after, which seemed to work.


I think it would be tough (a good thing) because how often do people go to someone's root github page, even if they have a good repo? Not to say it never happens, but github is really about the repo, not the person (again a good thing) so it would be harder for an individual to become "influential". Hopefully nobody gets any ideas.

There are plenty of people making a living from donations to their open source contributions.

It seems odd to title them influencers based on that.


I am going start posting linkedin influencer style "content" on my github for clout.

Twenty pull requests every morning. That’s my plan for 2024.

https://press.stripe.com/working-in-public

The book presents similar stories.


I’ve seen a number of resumes where people convey the popularity of their personal projects by number of stars or number of downloads.

Taylor Otwell lol.. He has some pretty dope cars in his garage and is doing well.

I follow him on GitHub, and pay for some of his products. I have been heavily influenced by his coding styles, and the tools he uses. His code just looks so tight and perfect. He writes his stuff so open ended and reusable that he basically writes a method once, and then reuses it across numerous projects.

Look at this tight code: https://github.com/laravel/framework/blob/10.x/src/Illuminat...

I’d say that Adam Wathan is rapidly growing his influence as well, and is probably doing alright too.


The multiple-line comment styling is so pleasingly pathological — each descending line has a few characters less than the last.

> In spam detection, we often use heuristics in conjunction with machine learning to identify spammers.

Heuristics can only be used to identify suspected spammers. Not everyone who behaves like a spammer is a spammer, it could be e.g. a random user with privacy settings on, or someone who didn’t update their bio in a while and it got affected by link rot, etc.

Even if a group of low activity accounts stars the same projects, it could be that the account owners just discuss these projects elsewhere.


The article notes this, and like any spam detection method, it has a degree of false positives, but it seems very low (less than a percent according to the article). I'm sure an official implementation of this could take more internal, non-public factors into account, like IP addresses and clustering of account creation times, to make it even more accurate and drastically reduce the amount of spam users.

The claim I saw in the article is 98% precision. Which doesn't actually tell us the predictive value without the base rate which seems to be all over the place.

this shouldn't be posted with links to the actual places to buy stars.... that seems like a bad idea?

Why? You can find these websites anyway if you search for terms like "buy github stars"

What's the street value?

It’s in the article.

Goodhart's law: if you rely on a social signal to tell you what's good, you'll break that signal.

Very soon, the domain of bullshit will extend to actual text. We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments -- and you can get them to say "this GitHub repo is the best", or "this startup is the real deal". Won't that be fun?


The scary part is that this doesn't seem too far off, with the current proliferation of large language models like the GPTs..

Parent was definitely not referring to these at all /s

(I ninja-edited my comment in the first minute; the parent might have responded to a less clear version, since they posted at +3 minutes. I added "AI" in a revision).

OK, sounds reasonable. I didn't see the edit either, was just thinking about the myriad of LLM articles on the front page recently.

You sound way too human to be an AI then

If you want to, you can always set 'delay' in your profile to the number of minutes (up to 10) that you would like your comments to be visible only to you. This puts the stealth back in stealth editing. https://news.ycombinator.com/newsfaq.html

I rely heavily on this because it's somehow only after the comment is 'real' (i.e. staring back at me from a real HN thread) that I notice most of the edits I want to make.


[flagged]

Who says this isn't already happening?

Reddit better hold their IPO soon or they'll get caught up in this. Pretty soon there will be dozens of different GPT/LLM-powered Reddit spam bots on Github. Some of them no doubt for political trolling. [1]

Phone, then ID-based verification is a stop gap, but IDV services will have to spin up to support the mass volume of verifying all humans.

[1] I kind of want to do this from an innocent / artistic perspective myself. Perhaps a bot that responds with a bunch of rhetorical questions or onomatopoeia. Then I'd scale it to the point people start noticing and feeling weirded out by it. "Is this the new Gen Alpha lingo?" Alas, I have too many other AI projects.


The Anti-AI\GPT-Detection will soon be a multi-billion dollar industry

And it'll silently remove your real posts too, faster than the horrible moderation on reddit ever could!

I just tried to find a FOSS tool for converting MS Outlook .pst file to .mbox.

I first tried Google; the results are dominating by commercial crap.

Then I tried the "google reddit" trick to try and find some real people's opinions... but look at all the blatantly bullshit comments on this Reddit thread; https://www.reddit.com/r/Thunderbird/comments/ae4cdg/good_ps...

---

(if anyone is wondering, the best option for Windows is to use 'readpst' command via WSL. Comes in the 'pst-utils' package).


So a GPT bot instead of the human commenters would make reddit more useful in the end, this is what you're saying right?

How so? The commercial organisations will be able to use a GPT bot to provide more believable comments, at greater scale, and cheaper.

I'm blind maybe, but what are the blatantly bullshit comments? The spam of PST to MBOX?

Yeah they are almost all clearly spammy, broken english ads for paid software

Yes and if you look at the comment history of the posters in that thread, it is clear they are all spam accounts.

If people see AI-generated comments on HN they should flag them and let us know at hn@ycombinator.com. HN is for humans to converse, and bots have never been allowed.

Of course it's not always easy to say what's AI-generated or not. But if an account is making a habit of it, it still seems possible to tell.


Your comment is the best. It's the real deal!

This comment summarizes it best. We need more discussion like this!

[dead]

> Very soon, the domain of bullshit will extend to actual text. We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments -- and you can get them to say "this GitHub repo is the best", or "this startup is the real deal". Won't that be fun?

Definitely already the case, you really think Rust and SQLite would get more than a couple of upvotes otherwise? :D


Then how do you explain the Go hype HN went through just before the current rust hype? Where "[ordinary tool] in Go" was the formula for upvotes.

Then again, maybe Google had some mandatory HN time for their employees, that would be enough to explain that :D


Content based auto moderation has been shitty since it’s inception. I don’t like that GPT will cause the biggest flood of shit mankind has ever seen, but I am happy that it will kill these flawed ideas about policing.

The obvious problem is we don’t have any great alternatives. We have captcha, and we can look at behavior and source data (IP), and of course everyone’s favorite fingerprinting. To make matters worse: abuse, spam and fraud prevention lives in the same security-by-obscurity paradigm that cyber security lived in for decades before “we” collectively gave up on it, and decided that openness is better. People would laugh at you to suggest abuse tech should be open (“you’d just help the spammers”).

I tried to find whether academia has taken a stab at these problems but came up pretty much empty handed. Hopefully I’m just bad at searching. I truly don’t get why people aren’t looking at these issues seriously and systematically.

In the medium term, I’m worried that we’ll not address the systemic threats, and continue to throw ID checks, heuristics and ML at the wall, enjoying the short lived successes when some classifier works for a month before it’s defeated. The reason this is concerning is that we will be neck deep in crap (think SEO blogspam and recipe sites but for everything) which will be disorienting for long enough to erode a lot of trust that we could really use right now.


I am unclear why a reasonable digital ID (probably government ID card style) plus rate limits is not going to be effective.

I can see lots of reaosns people might oppose the idea but I am not sure why it's not a widely discussed option?

(asking honestly and openly - please don't shout!)


[dead]

Closest example I know of is Korean internet. It is almost nigh impossible to get an account in major websites without SSN and a phone number. Despite this, there are still countless bots and scammers that uses hacked or leaked personal data. So I’m not sure if it would be that effective

I am thinking more like webauthn - but where I own a key pair, and I go to post office with my passport, they give me a nonce and prove that my it's my key pair then they post that public key is definitely me. I then can use that posting as my "username" and any challenge response includes the public key so they know that only I could be signing up

I am very aware of "designing a security system they themselves cannot break" and the difficulties of key management etc.

Would be interested in knowing more from smarter people

(probably need to build a poc - one day :-( )


> I own a key pair

Right there… it won't work with the general population.


something like 2 billion people have a phone with a secure enclave capable of this in their pockets today - and they use it everyday for logins, payment and paying at the car park.

We have the penetration

(Afaik smartphone penetration is around 4.5-5 BN, and something like 50%+ have secure enclaves but honestly Indont follow that deeply so would defer to more knowledgeable people)


That’s not your identity, it’s an access token protected by an advanced lock screen (which is greatly useful, but not the same). If you lose your device, the way you get back into your accounts is your de-facto identity—usually it ranges between the email you used during signup to your govt id.

There isn’t a widely deployed public key network with keys that represent a person, afaik. PGP is the closest maybe?


> something like 2 billion people have a phone with a secure enclave capable of this in their pockets today - and they use it everyday for logins, payment and paying at the car park.

They don't own a key pair. They carry one around, which is owned by google or some other entity?


Because the only way it'd work is if it was mandatory (because of point 2); it'd then be extended to porn sites to protect the children. That means politicians browsing history on pornhub would also be recorded and inevitably leaked when they get hacked.

If spam was your only problem now we have two spam and identity theft. Selling/obtaining identity information becomes very profitable and those working in the postal office must guard access like a bank vault.

Then make it a banks job to guard the bank vaults - they need to earn that FDIC bailout money :-)

The paradigm of fixed identity information as proof is pretty obviously doomed. Just like how the 1970s concept of username/password as proof of identity is on its way out. Or credit card numbers alone being used to validate transactions.

All of those notions are pre-internet ways of proving identity. In a world where we're all rarely more than an arm's length from a globally connected computer, they're on the way out.


I am guessing that "fixed identity information" is not a key pair ?

Anonymity is critical to free speech, because there exist bad actors who will resort to violence to suppress speech they don't like.

But, and I understand the argument, that is a problem for IRL society / government to solve.

If someone walks upto me in the voting booth and says "vote for X or I will kill you" that's a crime. If they do it in the pub it's probably a crime. If they do it online the police don't have enough manpower to deal with the situation.

We should change that.

Every time some fuckwit tweets "you and your kids are going to get raped to death and I know where you live" because some woman dares suggest some political chnage I would like to see jail time.

And if we do that then I can understand your argument, but I would then say it is not valid - in a society that protects free speech.


Actually, there could be places where verified humans are required, and places where they are not.

That doesn't work so well when the government is one of the bad actors.

My point is that if government is a bad actor, there is no recourse. We need a fair democratic society - it's on us to build one / keep it there

It might get to be that way some day, but for now there is recourse. France is (in)famous for it and they are currently making use of that way.

And this is important because a "fair democratic society" that doesn't need people to be able to protest is, as history has shown many times, only a temporary affair. The best way to keep it is to not give the government the tools a worse government could use to suppress dissent.


I'm far less worried about being intimidated into voting a certain way by someone who is avoiding the authorities online.

Much more likely is that I'll vote ignorantly because I lack information that someone withheld because they're intimidated by the authorities.


I expect that's where we're heading. But then, as somebody who writes online mostly under my own name, maybe I'm just biased. Come on in, the water's fine!

I think there are cases for anonymous/pseudonymous speech, but I think that's going to have to shift away from disposable identities. Newspapers, for example, have been providing selective anonymity for hundreds of years, so I think there's a model to follow: trusted people/organizations who validate the quality of a non-public identity.

So a place like HN, for example, could promise that each pseudonymous account is connected to a unique human via some sort of government ID with challenge/response capability. Or you could end up with third-party ID providers that provide a similar service that goes beyond mere identity, like the Twitter Verified program scaled up.

Disposable identities have always been a struggle. E.g., look at Reddit's very popular Am I the Asshole, where people widely believe a lot of the content is creative writing exercises. But keeping up a fake identity over the long term was a lot of work. Not anymore, though!


> The obvious problem is we don’t have any great alternatives.

Of course we do. The rise of digital finance services has led to creation of a number of servives that offer identity verification necessary for KYC. All such services offer APIs, so adding an identity verification requirement to your forum is trivial.

Of course, if it isn't obvious, I'm only half joking.


>The obvious problem is we don’t have any great alternatives.

There's always identity based network of trust. Several other members vouch for new people to be included.


Maybe even push that a level higher and have org to org vouching as well (so it can scale and reputation propagates social bubbles.) Bootstrapping remains somewhat an issue.

One somewhat popular solution for bootstrapping is to allow people to buy in, paired with quickly banning those members in cases of rule violation. It's by no means perfect, but it puts a real price on abuse and thus reduces it a lot

I've mentioned a "market of lemons" elsewhere in this thread. One such market is the market for malware and stolen credit card details. One result of the market being broken: serious criminals restrict themselves to very small (company like) social circles and invite only forums. One signal of trust that remained very long: a very short ICQ number. You don't want to burn such a handle with a bad trade, so trust was given upfront.

How would you imagine that applying here? If fake accounts are at least as convincing as real ones, then it seems like trust networks would be quickly prone to corruption as the fake accounts gain enough of a foothold to start recommending each other.

On a network started by 2-3-10 people, the first new members would need to be vouched by a percentage of those to get in - and so on.

If someone down the line does some BS activity, the accounts that vouched for it have their reputation on the line.

A whole tree of the person who did the BS and 1-2 layers of vouching above gets put on check, gets big red warning label in their UI presence (e.g. under their avatar/name), and loses privileges. It could even just get immediately deleted.

And since I said "identity based", you would need to provide to real world id to get in, on top of others vouching for you. It can be made so you wouldn't be able to get a fake account any easier than you can get a fake passport.


Are you talking about in-person verification and vouching? Or can it be digitally mediated?

If the former, it looks quite impractical unless there are widely trusted bulk verifiers. E.g., state DMVs.

If the latter, then it all looks quite prone to corruption once bots become as convincing correspondents as the median person.


>Are you talking about in-person verification and vouching? Or can it be digitally mediated?

Yes and yes.

>If the former, it looks quite impractical unless there are widely trusted bulk verifiers. E.g., state DMVs.

It's happened already in some cases, e.g.: https://en.wikipedia.org/wiki/Real-name_system

>If the latter, then it all looks quite prone to corruption once bots become as convincing correspondents as the median person

How about a requirement to personally know the other person in what hackers in the past called "meatspace"?

Just brainstorming here, but for a cohesive forum, even of tens of thousands of people, it shouldn't be that difficult to achieve.

For something Facebook / Tweeter scale it would take "bulk verifiers" that are trusted, and where you need to register in person.


Maybe we need a social network based on physical exchange of trust.

That’s mostly what the person to person phone system was.

Next keyword: market of lemons. If you can't rely on said signals anymore, you must treat every item the same (untrusted), which drives out the legitimate players from the market. We have a lot of lemon markets, we can probably infer from them what the social result will be..

You can do it already. It's a normal order for a copywriter, nobody will bat an eye when you post an offer. It costs cents/dollars per 1000 words instead of fraction of a cent, but that's not exactly outside of reach of a funded startup.

I hope it breaks the current system of requiring references in job search as well

This system is already essentially broken. Either you worked at a large business that only gives out dates of employment and job title by policy or you are in complete control of who the hiring company talks to.

The first time you don’t get a job because of a reference you gave you learn a lesson. If it ever happens again, it’s on you.


What's really an alternative. At least where I live, a multi-year gap in your CV is going to set off more red flags than an honest "It didn't work out between us".

Don’t give them your boss’s name. Give them a coworker’s name. Give them a friend’s name and have them lie for you.

If a company is proactively contacting people you don’t give them contact information for, that’s not requiring references — which is the process I (and the comment I replied to) was talking about. If a company knows where you’ve worked, they can contact them if they want.


What’s the solution for the latter point you mentioned?

If they proactively contact someone as part of their verification?


Then you’re fucked if they check and the reference is bad and they care. Either you take your chances, leave it as a gap in your resume, or you make something up.

In the past, I’ve extended the time I was at either the company before/after and then leave the one in the middle off. Smaller gap is easier to explain and you just need a coworker at the one you stretched to cover for you - or have it be somebody who wasn’t there during the time you added. You can also just say you did the “freelance” thing and then talk about whatever you want.

I’ve also just been 100% honest and said, “I didn’t like this job and left on bad terms. I’d rather you not contact them.”

Just have to read the situation and make your best guess as to what is going to get you the job.


I'm sure it's already happening in the "books" threads

We'll be back to the 1990's "software agents" craze take two: Needing AI driven agents that seek out and index and evaluate content on our behalf, and seek to negotiate with each other for recommendations with currency being trust based on how "your" agent evaluated prior results.

I'm hoping to put an AI between me and my e-mail inbox this weekend (I had ChatGPT write most of the code; it's not much); not fully automated, but evaluating and summarising and categorising. I might extend that to e.g. give me an "algorithm" for my Mastodon timeline (despite all of the people insisting on reverse chronological, I'm at a few hundred people I follow and already can't keep up), and a number of other sites I visit. For most of these things latency does not matter, so e.g. putting them through llama.cpp rather than something faster is fine, and precision isn't critical (I won't trust it to automatically reply or automatically reject anything, but prioritisation an categorisation where missteps won't have any critical impact.


> We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments

You're forgetting the millions of additional comments that will be written by humans to trick the AI into promoting their content.

Even worse, currently if you ask Chat GPT to write you some code, it will make up an API endpoint that doesn't exist and then make up a URL that doesn't exist where you can register for an API key. People are already registering these domains, and parking fake sites on them to scam people. ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

The biggest risk may not be people using AI-generated comments to promote their own repos, but rather registering new repos to match the fake ones that the AI is already promoting.


I feel like you’re overstating this as a long term issue. sure it’s a problem now, but realistically how long before code hallucinations are patched out?

Nobody knows.

undoubtedly not long

The black box nature of the model means this isn't something you can really 'patch out'. It's a byproduct of the way the system processes data - they'll get less frequent with targeted fine tuning and improved model power, but there's no easy solve.

this is clearly untrue. it’s an input, a black box, then an output. openai have 100% control over the output. they may not be able to directly control what comes out of the black box, but a) they can tune the model, and they undoubtedly will, and b) they can control what comes after the black box. they can—for example—simply block urls

They don’t have control over the output. They created something that creates something else. They can only tweak what they created, not whatever was created by what they created.

E.g., if I create a great paintbrush which creates amazing spatter designs on the wall when it is used just so, then, beyond a point, I have no way to control the spatter designs - I can only influence the designs to some extent.


did you read what I said?

This is true, but detecting and omitting code hallucinations is (functionally) as hard as just not hallucinating in the first place.

Assuming those hallucinations are a thing to be patched out and not the core part of a system that works by essentially sampling a probability distribution for the most likely following word.

evidently, they can hard-code exceptions into it. this idea that it's entirely a black box that they have no control over is really strange and incorrect and feels to me like little more than contrarianism to my comment

Folks, doesn't it seem a little harsh to pile downvotes onto this comment? It's an interesting objection stimulating meaningful conversation for us all to learn from.

If you disagree or have proof of the opposite, just say so and don't vote up. There's no reason to get so emotional we also try to hide it from the community by spamming it down into oblivion.


to be fair, it’s only one net downvote

An aside: what do people mean when they say “hallucinations” generally? Is it something more refined than just “wrong”?

As far as I can tell most people just use it as a shorthand for “wow that was weird” but there’s no difference as far as the model is concerned?


Most people don’t understand the technology and maths at play in these systems. That’s normal, as is using familiar words that make that feel less awful. If you have a genuine interest in understanding how and why errant generated content emerges, it will take some study. There isn’t (in my opinion) a quick helpful answer.

I genuinely want to understand whether there’s a meaningful difference between non-hallucinatory and hallucinatory content generation other than “real world correctness”.

I’m far from an expert but as I understand it the reference point isn’t so much the “real world” as it is the training data. If the model generates a strongly weighted association that isn’t in the data, and shouldn’t exist perhaps at all. I’d prefer a word like “superstition”, it seems more relatable.

Wrong is saying 2+2 is five.

Wrong is saying that the sun rises in the west.

By hallucinating they’re trying to imply that it didn’t just get something wrong but instead dreamed up an alternate world where what you want existed, and then described that.

Or another way to look at it, it gave an answer that looks right enough that you can’t immediately tell it is wrong.


this isn't a good explanation. these LLMs are essentially statistical models. when they "hallucinate", they're not "imagining" or "dreaming", they're simply producing a string of results that your prompt combined with its training corpus implies to be likely

> ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

Does ChatGPT consistently generate the same fake data though?


I have noticed that ChatGPT will give me a consistent output when the input is identical, but I haven’t done extensive research on this.

There was one company that had to put up a “our API can’t get location data from a phone number so stop asking, GPT lied” page.

I'm constantly curious whether anyone working in the AI space is cognizant of the Tower of Babel myth.

I don't think an arms race for convincing looking bullshit is going to turn out well for our species.


That's what Product Hunt has felt like for a long time—and LinkedIn too.

This is the first time I've ever posted an XKCD link here, but I think the occasion calls for it.

https://xkcd.com/810/


Now is the time to cultivate friendships and to make networks that persist online, and are verified via irl meetups / contacts. People who pull that off now will be in much, much better shape in the future. GPT's output is apparent to a discernible eye right now, but according to the power law, it won't take much "novel" input to train upon to make that discernment useless. Then, the only internet community that could be dependably reliable would be your group of irl verified people.

I would phrase it more as we're pretty much out of time to have initiated online-only relationships.

Agreed. It's very difficult now to build communities that have lasting impact, because everyone's saturated with info as-is. Contributions to niche communities now rely on a societal "outsider" status, which means there's basically a couple of people that contribute heavily and very few onlookers. Everything else is either gamified or comes from video games / gambling.

On the bright side, it's THE time to cultivate close friendships and to seek like-minded people. The entire phenomenon of popular attention hugging a community to death does not exist any longer. You can now have OG members persisting with notions for a long time and building a shared mythos with a small group of friends, because information is now more accessible than ever.

Obviously, most people aren't part of these communities. The people that are "drifting" alone are given to wasting their time on charismatic attention-seekers that talk a big game (twitch/e-celebs) but deliver nothing of value. So there's also room in the market for charismatic folk with some technical expertise to rally people to their cause, but only very briefly. This is because the number of people half-committing and then jumping ship is likely the highest it's ever been. Also, platforms have now resorted to paying people to stay on their platform (youtube / tiktok / sponsorships / twitch boosting streamers / etc.) to combat occasional ennui, ironically exacerbating the issue.


Best methods for that? Local meetups?

Most tight, close-knit groups originate from shared mythos. These can be family, proximity, "same school year", "same college", "friend of best friend", etc. Online, you can find people that are interested in some niche topic (or elaboration of some popular topic to an absurd degree) and engage with them. Small newsletters are also a good way to get people talking. What most people don't do is return attention, aka reciprocate positively. This could also mean you'd have you write about unrelated things or maybe try to build a "business relationship" that would then progress if you invest some time and hope for the best.

It's a really bad time to try and get the attention of someone more famous / notable than you, though. Sure, you can go on $platform and talk to them, but it's really not the same when they have a gorillion other messages. Same goes for people in large communities that are a "guy" there, known for something. Extremely high-return investments but you're likely going to fail.

Some people try to start youtube channels / info streams and then entice people to join their forum / server. While this does seem to work, it only brings in quality people AFTER the community is fully formed and rigorous laws are in place. The initial stragglers are usually the recently excommunicated looking to try their hand at the same shit somewhere else.

If you really put some effort into a topic and blog about it, you're likely to get some high-quality responses even if you only pose a question to someone that's partly interested. I've found this to be a really great way to separate the folks that are actually interested from those that aren't. You'll usually get people around your own level this way and IME this is the best approach.

It takes a lot of effort to make people clock in regularly to your online circle, and it's better to establish digital / irl face-to-face contact after a good interaction. It builds trust and because we're wired to judge people from their facial reactions rather than text, it also sobers conversation / tempers over potentially divisive topics. Works well with cerebral / "deep" people. Doesn't work with people that only come online to blow steam / enact a persona, so it's a good filter.

TL;DR: Touch grass (digitally), make friends (digitally)


How do you know we aren't already there?

Stop making up laws. You'll do much more good dismantling existing ones. And non-social signals like # of commits, # of pull requests cannot be faked? We need signals among the noise.

Sometimes signals are noise we just need to calibrate.


I mean, there have always been shills. What's changing now is the cost of shilling is dropping from dollars per comment to fractions of a cent. Troll farms used to be a lot of work to put together, but soon they'll be aaS.

Those of us who are careful internet readers have spent years developing good heuristics to use textual clues to tell us about the person behind the text. Are they smart? Are they sincere? Are they honest? Are they commenting in good faith? Those skills will soon be obsolete.

The folks at OpenAI, who are nominally on a mission to make sure AI "benefits all of humanity", have condemned us to a life sentence of fending off high-volume, high-quality bullshit. Bullshit that they are actively working to make harder to detect. And I think the first victims of that will be internet forums where text is the main signal, places like this and Reddit.


Maybe more appropriately, Campbell's law:

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."


TIL: you can buy (fake) GitHub stars.

That was a bit shocking to me to learn.


/s ??

I always expected there was a market for fake stars. I am trying to get a repo naturally to 1000 stars, but I would never buy them.


Can you explain why is it "natural" to try to get your repo to have many stars in a world where starts can be bought?

most people don't know that stars are bought

Natural: promote the repo, people see it and like it. I don’t beg for them.

Unnatural: pay some bot runner to buy stars.

I prefer natural as the stars are a metric not an end goal.


Right, was not asking what you mean by natural but rather why is it of any value.

I see. I just see it as an indicator of reach. Also some people with snap judge a project by number of stars and more likely use it if it has a bunch.

> I just see it as an indicator of reach.

That just shifts the question to why is "reach" something worth wanting?

> some people with snap judge a project by number of stars and more likely use it if it has a bunch.

And why do you want these users?


1. to get more feedback

2. they may just be busy users, looking for something for their job.

I take on board your point though. The stars thing isn't the biggest consideration by a long shot. Probably the smallest!


That's my issue with stars already. One repo having more stars than another doesn't mean it's better in any way. It might just mean it's been promoted more.

That's how record labels can simply decide what's going to be the next summer hit. They pick a song and promote the hell out of it. It's not the summer hit because it was somehow better, just more promoted.


You can buy Twitter followers, Instagram followers, YouTube views, Amazon reviews, Reddit upvotes, Reddit comments, and Yelp reviews - so what's so shocking about GitHub stars?

After that post on HN months ago[1] where users discovered OAuth permissions for unrelated things being used/abused to star projects without their knowledge this news of buying stars didn't come as a surprise.

It's unfortunate as I've seen stars used as a metric of trustworthiness in general user discussions.

[1] https://news.ycombinator.com/item?id=33917962


GitHub is fully aware of these, would they consider something like a "confirmed" star count that subtracts the suspicious/fake number? Or is that too much of a slippery slope.

GitHub gradually removes these users as they catch up to them, so not helpful to have extra steps. I have a couple of repos which were briefly popular, so when a new user stars it today, and I see 1000s of other stars, it's suspicious and I get a peek into their world.

There are obvious numeric usernames, but also fake orgs with repos for the users to fork and interact with, and a few account takeovers (i.e. someone had signed up for GitHub in 2015 to make a free wedding website, abandoned it, and the account fell into spammer hands). These used to be easier to report.


>GitHub gradually removes these users as they catch up to them

With collaterals too I presume [1]. I guess I've been the victim of some automated system. They have banned my account without warning or explanation and they've been ignoring my support tickets for about 2 months!

[1]: https://news.ycombinator.com/item?id=34817163


> They have banned my account without warning or explanation and they've been ignoring my support tickets for about 2 months!

Which is especially ridiculous if this was due to a false positive spam detection as real spammers will not bother with chasing support when new accounts can be created easily.


[dead]

How did you find out the name of the company behind GitHub24 though? If I go to their website I do cannot see it, I even cannot find anything if I search the company name.

Perhaps they got it via payment info

I was also surprised when I saw it. A GbR is a German "Gesellschaft bürgerlichen Rechts" which does not need to be formally incorporated and offers no limited liability. The name needs to include the names of all partners, so we can deduce it is being run by two persons. I am quite surprised they do this without liability protection. Upon googling, I found only a playlist on YouTube which has this name and contains one explainer video about signing up a company with German tax authorities. If they are indeed based in Germany, they're required to have an Impressum / imprint on their home-page, without it, they risk being fined.

Just use Show HN & Reddit.

Those never worked for me.

Show HN: there are maybe dozens of those posted everyday but they rarely hit main page.

Reddit ad is great to kick off the star growth, but unless you have something interesting to many people, don’t expect more than 50 stars on first day and plateau to a star every few days.

Most GH stars I’ve got was from somebody mentioning my project in comment in some heated discussion on HN. So I guess drama sells?


Is it just me or the fact that Dagster has one of their competitors Mage.ai listed here as a repo with around 15% of fake stars seems like an odd coincidence?

It shouldn't be a surprise. Why are you surprised? Do you often pursue random activities irrelevant to your life for dozens of hours?

Yes I do.

Pretty standard for anyone ND

If you’re going to accuse a competitor of fraud, writing a blog post showing your work seems like the most safe way to do it. People lie with statistics all the time of course.

They don’t mention what I think is their biggest competitor: Prefect.

[Blogpost author here] We ran the numbers for Prefect and several other repos in our space and they came out clean. As we note in the article, while some repos game the system, from what we can tell the number of abusers is actually fairly small.

or they used a more sophisticated star provider ?

I mean, they explain it at the top:

> we track our own GitHub star count along with that of other projects. So when we spotted some new open-source projects suddenly racking up hundreds of stars a week, we were impressed. In some cases, it looked a bit too good to be true, and the patterns seemed off

If their competitor has fake-looking star counts, I'd expect them to be the ones best equipped and most likely to suspect it.


It’s possible that was the impetus of the blog post. Maybe they suspect Mage.ai of astroturfing GitHub stars and investigate it as above. They then publish a blog post that:

1. Indicates the astroturfing without actually specifically calling them out 2. Does so in a way where others can verify their work and use it on other repos 3. Uses their product to do so

Seems pretty brilliant to me.


> Yet [GitHub stars] influence serious, high stakes decisions, including which projects get used by enterprises, which startups get funded, and which companies talented professionals join.

Really? I honestly just don't believe this... if I were to believe this, I think I'd have to conclude the world is just too broken to bother rescuing.


More than once I've seen when number of stars was an argument to decide whether to pull dependency or write our own.

People use flawed but easily consumable metrics to make almost all decisions.

It takes a lot more effort to collect multiple metrics along different axes, understand the skew/bias of them and make an informed decision.

Visibility and ease of consumption are the most important aspects of a metric if you want people to use it.


The list in the article, though, was carefully selected to presume competent people doing the decision-making. I totally believe many people use that star count for something... but an "enterprise"? someone investing non-trivial amount should of money? a specifically-"talented professional"? I just find that really difficult to believe. I've sold software to enterprise, I've worked with a number of venture capital funds, and I know a ton of actually-talented professionals... I dare say most of them consider GitHub's social features to be a joke.

The enterprises I deal with cared almost exclusively about stuff like license choices, support contract options, and "invoice billing" ;P. The vetting process I've dealt with at VCs was intense, having worked both sides of that situation; and I know multiple people who have worked data science jobs at such firms to try to better select investments. As for a "talented professional", I can pretty much guarantee they are going to look at your codebase, not the number of stars it has, while they evaluate any number of more reasonable things to judge an opportunity on (commute, pay, management style, etc.). A key property of competent deciders is that they aren't using trivial metrics.


One of my stock interview questions asks people how they evaluate 3rd-party dependencies for use in a production environment. So many interviewees respond with GitHub stars as their main or only criterion. It depresses me every time.

It depresses me too, but what else can you do? I check what the docs look like, but if I'm to depend on a thing I'd rather choose something popular than unpopular. GitHub stars, Hackage downloads, StackShare... what else can one check?

That's a very interesting question. There are so many things you can look at. How is the documentation? Who are the primary maintainers? How are they funded? What are their motivations? Are the primary maintainers active on Stack Overflow, Reddit, Discord, etc...? How many contributors are there? How does their Github issues page look? What about the Github discussion page? How many maintainers are there total? How many downloads per week on NPM (for JS libraries)? From all of these things - how long do you expect this library to be maintained? And that's just the initial qualification research, nothing about how it will impact the actual code-base.

What did I miss? What's the best answer you've ever heard? How do you evaluate 3rd party dependencies?


You overlooked what I consider to be the first thing you should check — when was the repository last committed to. There are countless projects that rank high on every other metric, but are essentially abandonware.

Yeah good point... definitely something I would have checked, forgot to put it in the list. I'm baffled people have trouble coming up with more than "number of stars" for this.

Of course there can be libraries that are more or less "finished", so the last commit/frequency of commits isn't on its own a deciding factor, but in proper context/holistically it is definitely an important metric!


FWIW, I am not baffled by that, as the vast majority of programmers are not "talented professionals" (which is the specific category of potential employee I was balling at, along with enterprises and venture capital firms). So like, you ask your question, they say "star count", and you don't have to really continue the interview.

(When I was in high school, I used to work for a pre-Internet company that helped people pre-filter interview candidates for ads posted in classified sections of newspapers and what they did was have questions like this that could be asked by people well before they reached your calendar for an interview.)


However, some language ecosystems are more OK with "finished" software than others. It hasn't had a commit in 4 years because none were necessary. Needing constant updates is a sign the local ecosystem is driven by churn over quality.

I don't really think this generalization holds. TeX is one of the very few widely used pieces of software that's considered complete, more or less everything else is either getting updated or superseded by other things.

A NFA library, for example, probably doesn't need to be constantly updated.

If you avoid building on something that's constantly shifting (the web) then the need to update goes down significantly.


Clojure, Elixir, and Lisp (especially Clojure) all have slower acceptable churn rates than other language ecosystems. If it works sensibly (both in terms of being fully debugged and ergonomics) and the underlying system hasn't had significant changes, what good does a commit within the past six months do beyond signaling to the GitHub meta game?

Insights -> contributors, and number of active maintainers based on entire commit history of the project and frequency of commits. Also, network page which shows number of active forks. Also, PRs, and how are they handled.

Contributors is the most informative page for me. So many projects are 1 man show basically all the time. I don't mind that, it means passion, but it also mean it can dissaper any moment depending on circumstances.

I also look into issue details to see how maintainers communicate with community members that do due dilligence before aksing for help.


You missed: look at the actual code.

Stars only mean something because of the people who do. They're the ones leading the herd. If you're just going off the social signals, then you're just monitoring where the herd is going.


Yep, this one is the headline item for me. Look at the code and, if it has further dependencies of its own, look at the code for those too.

The main question I'm asking myself while looking at the code is: if I had to fork this thing and maintain it myself, how would I feel about it? Because sometimes that happens.


> How do you evaluate 3rd party dependencies?

I actually blogged my answer to that exact question recently (shameless plug):

https://philbooth.me/blog/how-to-evaluate-dependencies


I'd add support to that list. When it breaks, can I cut a contract and get an expert available to diagnose the problem within a few hours. Production outages are not the time for self help and digging around in other peoples code bases.

What kind of answer would make you happy?

I prefer to look at the recent commits, or any recent activity on the repo's issues, but I would like to know what else can be used as an indicator.


So, ask yourself for a moment: what is it you are actually caring about?

I'd like the project to not introduce security vulnerabilities or bugs into my code. I thereby care what language it was written in, what libraries they use, what their testing and QA/CI process is, and whether it is being used by any "critical" projects (like, if that library is embedded in Chrome, you have to bet there are tons of people like me every day trying to hack it).

As part of that, I care about if the project takes a cavalier attitude towards contributions: if I see a number of pull requests from random "contributors" being casually accepted, that is going to be a major major red flag; if possible, I want to see a core team doing most of the development and integration (and not merely most of the "review", add I see in some projects where the people in charge feel above doing work).

I definitely care that the project is being maintained and that there are people paying attention to issues, and it needs to have a culture of taking bug reports seriously... nothing is more dangerous than a project that tries to pretend they are responsive using bots to "automatically close" issues: I'd rather see bugs open for years than worry a critical issue was reported and subsequently lost.

I am certainly curious how work on the project is funded and whether I can trust that its license is going to hold constant over time: I don't want to end up relying on a dependency that is really the pet project of a small startup that is either going to disappear next year or will decide to redirect development to a closed-source fork. I'd thereby also prefer the project be run by a core committee of participants from multiple companies.

I honestly can't imagine caring two shits about how many stars a project had on GitHub... hell: what if the project isn't even on GitHub? What then? Do you just give up and decide it sucks? A world where everyone feels any incentive at all to put their code on a centralized platform is one where we have all failed as stewards of the future of software :(.


Activity on other sites related to finance/coding is similar (seekingalpha likes, for example) and I've gotten organic inbound requests for work periodically scraping such info into... Excel.

I think the most interesting thing would be to run this test against the list of Launch HNs, sorted by votes, grouped by class.

I have a half-written article about this, but I didn't have any good notion about quantifying the problem so this article is very welcome info to me.

My own angle is that copilot has shifted the incentives around this practice, maybe substantially. Businesses want to get (free tiers of) their paid SaaS endpoints into copilot suggestions - it's a great funnel!

I'd guess that github is as likely as not to become an SEO spam battlefield (like the rest of the web).


> Businesses want to get (free tiers of) their paid SaaS endpoints into copilot suggestions - it's a great funnel!

That’s so brilliantly evil…

I can see the next generation of “how I got to $3m in passive income” articles being written (by ChatGPT) right now.


I wrote a tiny tool which calculates the "brightness" score of a github repo based on calculating the total star count of the people who starred your repo. It will automatically detect these kinds of scams (assuming that it's mostly low star bots giving the stars).

https://github.com/Hellisotherpeople/Bright

Edit: I love clustering, I really do, but I think that techniques like the one I am using are far superior to unsupervised learning for trying to detect fake accounts in this context.


It is worth noting that it is trivial to buy fake stars for a project you are not affiliated with. The reason someone might do this would be to "test" the purchasing of fake stars without risking contaminating their own project.

I once bought a friend of mine a thousand Twitter followers as a prank. He wasn't happy.

As he should be, that wasn’t a well thought through prank.

Am I missing something, that seems like a decent prank. It’s harmless.

No, his friends account is flagged as a spammer now and gets less visibility.

Where did you purchase that?

I wanna say I got it through Fiverr? This was like 8-10 years ago. I don't remember exactly.

The projects with suspicious stars were still >80% nonfake stars. That to me suggests that most of the fake stars have been classified as nonfake. There isn't much psychological value in boosting your star count by just 25%.

Depends on when the fake stars were created. If they are early in a projects life cycle, they may be used to get attention on the project, and once they have awareness, fake stars are no longer necessary.

> And if you enjoy this article, head on over to the Dagster repo and give us a real GitHub star!

Kind of ironic that they’re using blog articles and social media to pander for more stars on their GitHub project.


I wouldn’t describe that as ironic.

While evaluting OSS project, key indicator is community activity. Github stars is a weak community activity indicator. Firstly, as shown in the article it can be gamed. Also, Stars is very low threshold action so does not indicate whether the person who starred the project will actually use it.

I think 2 great community activity indicators are - Github issues and of slack/discord/discourse comments. One key thing with github issues in my opinions is that, If the github issues are mostly by the core team, it is not a great sign. You want a large mix of issues from customers or users and not from the team. This is a good indicator if the project is solving real problem or not. Stars is very low threshold action. Same goes with the slack comments, it should have both volume and freshness.


I think checking if people donates to a project is a better indicator to the value of the project than the stars. I never paid attention to stars.

But there are OSS projects that are VC backed. They don't take donations.

That's already a very different deal then, no need to gauge repository health, you know there's a good chance of work suddenly ceasing.

You have a point. I have often seen OSS projects being funded on the basis of github stars with no revenue whereas all the parameters show that the project health is not that great.

Donating to yourself would be pretty cheap...

Maybe not as cheap as you may think. I think github takes a small cut plus you may need to declare the donation as Income on your taxes.

Also if you get "smart" and donate on multiple cards, I would think it is a trivial task for github to determine is is a scam. The CC address would match you Address for the funds your receive.

Probably way too much work for this :)


They don't take a fee from what I read about it.

> https://docs.github.com/en/sponsors/sponsoring-open-source-c...

> GitHub Sponsors does not charge any fees for sponsorships from personal accounts, so 100% of these sponsorships go to the sponsored developer or organization. The 10% fee for sponsorships from organizations is waived during the beta. For more information, see "About billing for GitHub Sponsors."


GitHub sponsors has been out of beta for a long time, they take 10% of the donations if the code is under an organization which is very common for OSS projects. Of course one of the ways to get around it is to sponsor the lead developer, which is sometimes available as an option. Or just sponsor the developer some other way which doesn't go through Microsoft such as Liberapay or Opencollective.

I don't you can externally measure how much money is being donated for an OSS project can you?

Pretty sure those who game their repo are motivated by investment into associated startup. I think you are right that community activity is a high fiedlity indicator and a smart investor in OSS startups should definitely not only lurk in the community but if possible actually have resources to kick the project tires as well.

In a very strange way (but reflective of the economic regime) a startup that fakes stars vs a straight-arrow startup that doesn't is demonstrating a key element for success in business, which seems to require a significant element of bullshiting, and outright deceiving. The mantra has been that "grow grow grow" is the only guideline for success. Inflating your stars is just rookie hour practice for bigger better growth b.s. down the line.


My ex-employer used Github stars in their job description and during recruitement pitches. They regularly encouraged employees to go and star the firm's repos in Github. In all-hands meetings, the Github stars were one of the items they reported: "we've surpassed X in Github stars" (applause).

(The firm X, however, is a more well-known name than my ex-employer was).

A while ago, I listened to a Freakonomics episode where it was discussed that businesses use proxies to both boost their image and to cover up their incompetency. The example was that a lot of businesses chose fancy names starting with A (like, AAA plumbers), so that they get listed first in business directories. These firms were later proven to be very incompetent and/or even fraudulent.

The relevant paper, also cited in the episode, was "A Business by Any Other Name": https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1667550.


They were incompetent because they didn’t have enough As. I exclusively use AAAAAAAAAAAA Plumbers

Apparently seven is the sweet spot for visual recognition at a glance, so I'd go with AAAAAAA Plumbers instead.

Aaaaaah real plumbers

Podcast episode name please

Not sure if this is it, but 552. Is Google Getting Worse has the 'AAA Plumbers' in it.

https://freakonomics.com/podcast/is-google-getting-worse/


Small correction, the episode is 522

The tech version of this is SaaS companies advertising on Reddit.

do you mind elaborating on this? I am using Reddit to advertise some of my projects because it seems like a relevant crowd to advertise to, but I am curious to hear how it would be perceived.

Sure. I don't think I can generalize my PoV but I see with distrust anything that is advertised because there is so much noise.

only vaguely related - but I've been recently trying out dagster and I'm pretty impressed so far. I've run large scale data-processing from Hadoop onwards and was expecting the usual crumminess whenever you hit and edge case.

Instead I found a system that seems to be thoughtfully designed and, crucially, easy to debug.


Great post, though I was low-key hoping for a top 10 or maybe top 100 ranking of most starred juiced-up repos.

I didn't knoe people used stars to make decisions. For me it is more like HN karma points. I use their issue history/pr history to get an idea of how good or bad a project is

I'm surprised that Github stars are valuable enough to buy. Personally I never look at the star count because even if they were legit, they don't really tell me anything more useful than I get from looking at other things in the repo.

I tend to check the age difference between the earliest and latest commits because that lets me be sure it's not a project that someone spent a couple weeks coding up, dropped on github, and then forgot about. I'll also check the issues on there. I'm looking for more closed issues than open ones, but I'll also quickly scan over them to get a rough idea of how many are truly meaningful issues. I also get signals from the readme and docs. It's not a hard pass if there's issues with those, but it's certainly helpful to my opinion if they exist and are both clear and detailed.


I mean based on the number of repos they identified buying stars and prices advertised, the revenue just doesn’t make sense. The sellers have made like, hundreds of dollars at most. How much effort have they invested for this meager return?

I find stars helpful when I'm evaluating several different repos to choose a particular tool for a job.

If one of the repos has many more stars, I weigh that strongly when choosing. Freshness of commits is definitely important, but for me the fact that many other people starred the repo shows that there are eyeballs and activity.


I'll admit I've used them. In particular, I've used paperswithcode to find implementations of ML models. There are often a number of implementations of the same model, and the quality is highly variable. I've used stars (which paperswithcode displays) as a pre-screen. Spoiler alert, the highest started implementations are not always the best. But it still helps to triage, as a proxy for how well used it is

You are likely not important enough to scam. The first people I can imagine this being shown to are VCs in pitch decks who are only going to see this on a powerpoint and not actually on github. Very unlikely the VC will check github itself to verify the number, and if they do, even less likely they'll verify that the stars are real.

You're the kind that checks everything. Even if you had something valuable, a scammer wouldn't waste their time with you then there are easier fish to bait.


Interesting, I just use them to keep track of interesting projects ( edit: not the number of starts as a proxy; stars is basically my bookmark ). People treat them as internet points?

> dropped on github, and then forgot about.

I really wish GitHub would have some sort of flag for "stale" projects. I use your methods too (issues, dates, etc.), and I'm usually disappointed when search results bring up ghost projects. However, in a few instances, I found a project that was similar to an issue I was working on that went one step beyond where I was, and even though it was a ghost project, it helped. But in general, these projects don't help. I'm also disappointed that I'm thinking, "Hmmm, maybe LLMs can help..."


Why is stale a bad thing? It could be something that was created to serve a purpose, developed to the point that it was feature complete for that purpose, and now requires no more development yet continues to do its purpose without modifications.

It's almost like you are thinking of it as an expiration date and the software has spoiled.


"Stale" and "done" are different states. Stale is when bugs are known but not fixed, dependencies old and unsupported, build instructions do not work any more on modern versions of OSes and other environments.

i think you're leaving out the state of "good enough"

All software is subject to shifting environments over time that will eventually render it obsolete. How fast this happens really depends on the ecosystem—it's a function of the abstraction level and context in which it runs. C or Go code that compiles to a standalone binary will be less susceptible to this, higher level Ruby or Node code that depends on a lot of peer libraries moving in lockstep will be more susceptible. Newer languages that have some notion of backwards compatibility baked into their charter like Elixir or Rust are somewhere in between.

well, the original dev did release the code as open source. you are free to take their lead and continue on with modifications in your own source or even as a fork if you feel so strongly about it needing to be maintained to that level.

Yes, I certainly could. This comment chain started with "why is stale a bad thing". It's bad because I have to do that.

There might be a maintained fork/separate project that does what I want that I would like to find instead. Or maybe I was just searching to save myself 30 minutes on a one time task and I'm not up for adopting an abandoned project.


Stale is bad. Asymptotically approaching stale is great.

Because many languages have breaking changes in the interpreter. For example it is almost impossible to review old Python projects you have to change so much, it is easier to rewrite in many cases.

Rust and other compiled languages that have backward and forward compatibility in mind do much better.


But in that case it should have a note saying it's finished or in maintenance mode (e.g. https://github.com/sirupsen/logrus); include references to replacements, offer paid support if you really need it or still use it, keep an eye on issues, and update dependencies.

Else, ask for a new maintainer. While code can be considered done (especially if no new features are added), it should never go unmaintained. If it's actually used a lot of course.


I have one project on GitHub that I use all the time as part of a script and only push changes when the python API breaks it. It is essentially “finished” and usually just needs a quick compile against the new python version whenever I upgrade the distro. I haven’t even had to touch for at least as long as GitHub required ssh keys so by all accounts this would be an abandoned project.

Now that I think about it — it is a python wrapper around a boost library and neither of those have made backwards incompatible changes in a long time which is quite suspicious.


Boost libs circa Ubuntu (14 or 16.04) had JSON parser that allowed comments, while the newer Boost in Ubuntu 20.04 (and I think already in 18.04) had "updated" it and then it didn't allow comments any more.

Just a small anecdote of Boost changing behavior that broke some of my stuff.


I kind of expect that I’ll have to do some work at upgrade time but it’s been a while. Usually python is the culprit and can only remember boost breaking something once but that was a different project. The maintainer was quite nice on trying to help me figure it out but I don’t think I ever got it working the same again.

Displaying stars to represent traction in open source was a pitch deck phenomenon that was highly effective fitting the ZIRP.

Metrics based on issues / commit activity are certainly higher fidelity.

As you indicate though, they require more effort to adjudicate. Are issues from core team members? Are commits meaningful? Is community activity meaningful? I wish GitHub would give allow us to parse things like this more easily.

My use of star count is generally a binary indicator. 1k+ is probably a legit project and below is probably still early. Beyond that, it's probably too noisy.


Closed issues dont mean anything though... a lot of maintainers bulk close hundered of issues as "nofix", "no activity after 3 months", and so on. Just sweeping them under the rug. And many of them pride themselves with the 0 opened issues like it mean something. Any software in the world can have 0 issues if they played this game.

So unless you are really well versed in the project and spent some time following it, stars actually might be a better indicator of the project quality and reputation.


> a lot of maintainers bulk close hundered of issues as "nofix", "no activity after 3 months", and so on

God, I hate this. Every time I have an issue with something, look it up on the issue tracker and find the exact issue I'm having autoclosed as "stale" by a fucking bot because the author didn't reply "this is still an issue" once every 24 hours, it instantly makes my blood boil and I avoid using the software in question as much as possible in the future. Nothing screams "I care more about github numbers than my users or the quality of my software" more than this.


Are you paying the maintainer to use their software? If not then you don't really have right to make such demands on them.

I don't think GP said anything about making demands. They said they avoid using that piece of software and that is not a demand on the software's author.

If you read my comment carefully you'll notice that I at no point demanded that the developers actually fix the issue.

The problem here is simply closing issues that are not fixed because they're "stale", no reason to do this unless you're obsessed with keeping the number of open issues low to deceive people into believing no issues exist. Keeping issues open does not take any effort.


Go back and read your comment carefully, it's literally a rant about the maintainer.

Yes, it is. What point are you trying to make here? Being a maintainer of open source software does not elevate you above criticism.

I can be upset with people lying to me even if I don't pay them and there is nothing wrong with avoiding projects engaging in such behavior and warning others about them.

>I tend to check the age difference between the earliest and latest commits because that lets me be sure it's not a project that someone spent a couple weeks coding up

I doubt anyone would do this, but commit date can be arbitrarily changed.


I have moved all my repositories to sourcehut. They are generally mirrored by a github repository consisting of a single README file explaining the new location for the project, and my reasons for the migration.

However, given sourcehut eschews the use such "social metrics" (which at some level I agree with the principle behind it, on the other hand I do appreciate the value of being able to give visibility to good projects) I usually mention in my README that "If you like the project and wish to promote it, feel free to star this github page".

I'm sure github probably wouldn't like this use-case, but the stars would certainly be genuine, even if possibly quite dodgy-looking.


I have moved repositories off github, replaced the README with a warning and the new location and archived the project.

It's still getting starred...


> It's still getting starred...

clearly you did too good a job on the README


i wonder how many PRs this README receives to fix typos

I’m conflicted about this. Sourcehut, Codeberg, etc are great. But having everything I’m looking for on GitHub is extremely convenient. I use the “Add to List” function extensively for bookmarking.

Yes, this is why I didn't want to migrate without leaving a trace on github. The redirecting README on github is a good compromise, I think.

Having said that, it may be worth thinking what is the price we may be paying as a community for this convenience, btw. MS Github is clearly already past the "embrace" phase, and well into the "extend" phase.


This is true. I’m hoping https://forgefed.org/ will be a useful way out of this conundrum.

Rabbit trail: I accidentally right-clicked on their home icon and it brought up their branding page with license agreements for their IP. Really neat idea.

This sort of gamification exists only because there are too many green engineers that only care about their salaries, and they mimic what people successfully recruited by FAANG (etc.) did, and so do other companies. Then this purity spirals into taking the entire field down because there's no one around to educate the new newbies. Facebook was IMO a step in the right direction because it was a "general" social network, you could post anything. Imagine if FB had released some sort of an "extension" that allowed you to share anything via a template of sorts, instead of having to type out everything in the same old text post. It would have been meta enough (sorry) to not spiral very quickly.

Leaving the arena is the only viable option. Software projects that aren't dependent on github drive their own vehicle, everyone else is on a crowded bus.


I wrote on this topic a while ago; experimenting I found out you can basically change the repos names and keep the stars; this wouldn't work if you use the repo as issue tracker or PR tracker, since the history would all be broken, but if it's pretty much just the code it's easy to swap the star count between two repos:

https://francisco.io/blog/transferring-github-stars/


do streamlit

Sounds like they take it more serious than Google does likes on Youtube. A competitor had a video that rapidly had over 100k likes - but if you looked at the total time played, each view averaged to just a couple of seconds on a video over 10 minutes. Reported it, but nothing came of it. (No, not something we regularly do. I think it may be the only video I've ever reported; just want a fair playing field)

youtube competitor. that's just funny to me. kind of even comes across as petty. you took however much time to investigate average viewed time of a competitor and then cried to daddy about the perceived slight in "advantage" instead of taking that time to improve your competing product to make it better.

Umm…

I think you have it backwards, the other video was using fake likes to avoid having to improve their quality to get an equal number of eyeballs.


Truth rises to the occasion. All these years later, they're sitting at 2.3 stars on Google, even though they charge less, and we are sitting on 5 :-)

No, we had someone show up out the blue, with no established presence in the space, with a video with hundreds of thousands of views. Was curious how they were so viral so fast.

Overall, it's bad for everyone if someone can create fraudulent views: us, other companies, and most importantly, consumers.

> taking that time to improve your competing product to make it better.

Took less than 3 minutes to do the math and send the report. I'm a fast developer, but I can't improve our product that fast :-)


> if you looked at the total time played, each view averaged to just a couple of seconds on a video over 10 minutes.

That makes no sense to me. Speaking as someone who has been using YouTube Data API v3 and YouTube Analytics API v2 for many years, estimated minutes watched of a video shouldn’t be public info. So how can you “look at the total time played” on a competitor’s video?


Been a few years; I don't recall the how. Maybe I'm thinking of a different platform?

This is a great article, I've developed the same tactics for other projects but never was able to graft the proper vernacular. It really helps tackling how to organize and present information.

I wonder if this is also in general OSINT or ISC^2 training - everything this article showed for breadtrails and reverse operation (e.g. pay a company to do the work, see how it is, evaluate the results, see if you can find other work similar/akin to it.)


I give Github star as a bookmark for the repo so I assumed that others might be using it the same way too.

Things like this are part of why I cringe when I see supply chain analysis/security companies include “popularity” in their criticality metrics: the relationship between public popularity signals (like GitHub stars) and criticality is weak, at best.

In my experience, it's actually a great signal. That's why so many people rely on it. The distribution of GitHub stars is an extreme power law.[1] Stargazer thresholds are used by maintainers to make decisions on including projects for different purposes from dependency management to package manager maintainers deciding to list software by name.[2]

[1]: https://github.com/andrewmcwattersandco/github-statistics

[2]: https://github.com/Homebrew/brew/blob/master/docs/Acceptable...


Selection suitability and criticality are different metrics. The former is what Homebrew uses, as a way to lessen maintainer load and prevent inclusion in Homebrew becoming its own quality signal. The latter is what I’ve seen supply chain companies provide: an implication that a project is somehow critical or essential to the overall ecosystem because it has so-and-so many stars.

That first use is not unreasonable, in my opinion. The second one is questionable, at best.


Maybe our code forges don't need to be social media platforms. These 'stars' have pretty dubious value and rarely correlate with code quality or importance (core libraries generally have less attention than apps or tools). There's also a heavy language skew where JavaScript and Python libraries & programs get way more thumbs-ups even when they're technically not any better than alternatives.

The next thing in social media vending machines.

https://twitter.com/Alexey__Kovalev/status/87184200877156761...


As a note, GitHub stars are often used in pitch decks for OSS startups. VCs seem to care about that, judging from what I’ve seen around.

Legal | privacy