Not sure why this got voted down. It was an honest question.
Gentoo said no AI contributions. Artwork is a contribution, and someone will use AI to do it.
I’m just pointing out that this will actually be hard to enforce, and I think someone will test it sooner rather than later. Artwork would be the obvious vector to do that, and I wonder how long before someone tries it.
(I did not downvote and can't even do it), but Gentoo does not accept art contributions of any kind (human/ai), there is just no place for this.
But in previous discussion[1] somebody mentioned that pkgxdev/pantry[2] generated image for RabbitMQ - "gloriously bonkers", I recovered the image from web archive - https://web.archive.org/web/20240419091549/https://gui.tea.x... . I guess I'll refrain from explaining what's wrong here....
> Górny wrote that it was unlikely that the project could detect these contributions, or that it would want to actively pursue finding them. The point, he said, is to make a statement that they are undesirable.
I think it does, at least until it produces better work than humans on average/median. We have enough trouble spotting bugs (including vulnerabilities) before they hit a beta or production release as it is. If the same diligence went into checking machine output, I think we would get worse code because in the scenario where you wrote it yourself (I'm imagining a patch here, not a whole project), you've got a good understanding of what you just wrote: how it's meant to work, why that method works, what the intention was with every statement. With machine output, you have to build that understanding before you can validate its work, and with the state of the art as far as I'm aware, the validation step really is necessary.
It is also a matter of the universe versus the software engineers. With the universe striving to make better idiots¹ who can all tell a machine what it should do in natural language without needing ~any skills, it would be the task of other people to mop up their potential mess. (Not saying that prompt engineering is a skill, just that you can get started without having acquired that skill yet.)
Open source contributions are a benefit in several ways: it looks good on your CV, you feel like you're helping society, or perhaps you just want to fix a bug that you're personally affected by. I can see legitimate motives for using "AI" as seven-league boots, but the result where someone else has to do the work for you in the end seems immoral, even if that someone can't tell if the person was just unskilled or if it's machine-generated ("AI") output
You can't automate perfect enforcement, but you can express a norm (as they have) and hold individual human contributors accountable for following it so that when a violation is discoved the humanms other contributions can be scrutinized and the human can be penalized/monitored/whatever going forward.
People bicker over the quality of generated content vs human content, but accountability is actually the big challenge when granting agency to AI tools.
There are many humans who might produce or contribute to a work, and when they violate norms it's accepted that we can hold the person individually accountable in some way or another -- perhaps by rejecting them, replacing them, or even suing them. Much the way mature financial systems depend on many features of legal and regulatory norms that crypto can'f deliver, most mature organizations depend on accountability that's not so easy to transfer to software agents.
If there are a handful of code generators and they're granted autonomy/agency without an accountable vouchsafe, what do you do when they violate your norms? Perhaps you try to debug them, but what does that process look like for a black box instrument? How does that work when those services are hosted by third-parties who themselves reject accountability and who only expose a small interface for modification?
Here, Gentoo dodges the problem entirely by saying effectively saying "we're not going to leave room for somebody to poiny to an AI tool as an excuse for errant work.. if they contribute, they need to take accountability as if the work is wholly and originally theirs by representing it as so."
Contributors may violate that policy by using AI anyway, but they can't use it as an excuse/out for other violations because they now need to stay steadfast in their denial that AI was involved.
So ultimately, this doesn't mean that generated code won't appear in Gentoo, but it makes sure contributed code is held to no lower a standard of accountability.
> a pejorative neologism for the idea that an expression of a moral viewpoint is being done disingenuously
(--Wikipedia. Apparently it was a term was newly invented in 2004 and used by religions in 2010 and 2012. Then a journalist picked it up in 2015. Interesting to see a word so new that isn't a youth thing like yeet!)
How is it disingenuous to ban this, what self-interest do they have in it?
Firstly, please let me know which rock you've been living under for the last ~8 years to not know what 'virtue signalling' means, it sounds awesome and I would like to join you under there.
Anyway, the insinuation is that they don't actually believe they can reduce the rate of AI contribution via such a statement, but that they're doing it just for the sake of showing everyone that they're on the anti-AI side of the fence, which indeed does seem to have become some sort of ideological signpost for a certain group of people.
> Firstly, please let me know which rock you've been living under for the last ~8 years to not know what 'virtue signalling' means, it sounds awesome and I would like to join you under there.
On the other hand it's been my experience that the only people using it unironically as a term are the sorts of people I don't want to talk to for any appreciable length of time, and them self-identifying by using it is extremely handy for me personally.
It's like the reverse Nigerian prince email scam, as soon as someone drops "virtue signaling" I know to immediately disengage lest I be exposed to toxic and deranged opinions that are both low effort and boring to hear litigated.
To save me from having to remember your name, please add me to your list of people who use the phrase "virtue signalling" unironically, even if I do so sparingly.
I suppose I'm one of today's lucky ten thousand. Join any time, the only requirement is learning something new every day!
> some sort of ideological signpost for a certain group of people.
That feels about as laden and insinuating as the formerly religious term from above. Do you think it's not okay to state that a method of creating works may not be used when one believes it to be (*checks the reasons given in the article*) potentially illegal under copyright, unethical in various ways (not sure how to summarize that), as well as exceedingly difficult to get quality output from? One of those reasons seem to me like it would be enough to use as grounds, with the only potentially subjective one being the ethics which would leave two other reasons
I don't have a strong opinion on the ban, in case it seems that way. The reasons make sense to me so while I'm not opposed either, I'm also not a fan of selective enforcement that this runs a risk of, and the world moves on (progress is hard to halt). Perhaps there's an "AI" tool that can be ethical and free of copyright claims, and then the resulting work can be judged by its own merit to resolve the quality concern. So I don't really have a solid opinion either way, it just doesn't seem bad to me to make a statement for what one believes to be right
Edit: perhaps I should bring this back to the original question. Is this virtue signaling if (per my understanding) it is genuine? I've scarcely known open source people (Gentoo isn't corporate, afaik) to have ulterior motives so, to me, the claim feels a bit... negative towards a group trying to do good, I guess?
The accusation of virtue signaling is completely non-falsifiable, at least until such time as an organization has documents leaked showing that they actually hate whatever group but are going to openly signal that they love them to increase sales, which hasn't yet happened.
Instead what does seem to happen quite often is that virtue signaling as an accusation is deployed against a very minor and token gesture of vague social progressiveness or some other ethical position on the part of an entity, usually a business, and the charge is levied at them by people who disagree with whatever stance has been taken. It's not hard to draw a line from this back to the rhetoric around various social issues in recent memory, going back to Nixon's famous "silent majority" concept; that most people agree with the thing you think, as a reactionary, but are "afraid" to say it; it's literally the same thing, the leadership of whatever org is criticized for acting in bad faith and invoking whatever cause, because they couldn't possibly just actually believe in it, it obviously must be a ruse.
I apologize if this is more political than HN likes to get, but it's simply not possible to disentangle the phrase "virtue signaling" from specific kinds of reactionary politics. As to why it's now being deployed in a discussion about AI, I don't particularly think it's a 100% rock solid link to it, but it's one of those things, if one sees smoke, one isn't found wrong if they assume fire.
Anyway that was a lot to say it's pointless culture war bullshit, please keep your innocence.
Making a rule that is difficult to enforce isn't virtue signalling at all. It's adding a rule that is expected to be adhered to just like the other rules. It's useful to say outright what the expectations and requirements of contributions are.
Actual "virtue signalling" is espousing a position that you don't really believe in because you think it makes you look good.
Does Gentoo really need PR you think? I'm not sure there's not a corporate division, but the main website says they're a foundation. (Then again, seeing how the cashflow works in places like Mozilla, maybe I should put less stock in that.)
There's a CYA aspect in one of the three reasons mentioned (copyright concerns), I'm not sure that's necessarily bad if the thing they're banning is objectionable for (in their opinion) also two further reasons
The supply chain risk elephant in the room is that bad-quality LLM-derived contributions could be kept out by the same rigorous review process that keeps out bad-quality human contributions. Now where did I leave that process? I'm sure I saw it around here somewhere...
If you're on the fence about putting time into something, and they start micromanaging you in a way that you don't agree with, you're less likely to bother.
I see many saying that contributors will continue using AI/ML tools clandestinely, but I’d counter that only a malicious actor would act in this manner.
The ban touches on products made specifically for Gentoo, a distribution made entirely with Free/Libre software, made (mostly) by volunteers. Why would any of these volunteers, who chose to develop for the Gentoo distribution specifically - presumably because of their commitment to Free/Libre software and its community - go along with using AI tools that: A) Were trained on the work of thousands of developers without their consent and without regard for the outcomes of said training; and B) go against the wishes of the project maintainers and by extension the community, willingly choosing to go against the mission of the distribution and its tenets of Free/Libre software?
It sounds to me that people would do this either unknowingly, by not knowing this particular piece of news, or maliciously, making a choice to say “I will do what I will do and if you don’t like it I’ll take my things and go elsewhere”. I don’t accept that any adult, let alone a professional developer would grin in the darkness of their room and say to themselves “I bet you I can sneak AI code into the Gentoo project in less than a week”. What’s the gain? Who would they be trying to impress? Let’s not even open the big vault of problems of security that an AI coder would bring. What if your mundane calculator app gets an integral solving algorithm that is the same, down to the letter, as the one used in a Microsoft product? That’s grounds for a lawsuit, if MS cared to look for it.
The former case may prompt a reconsideration from the board - If key personnel drives the hard bargain, the potential loss of significant tribal knowledge may outweigh the purity of such a blanket ban on AI. The latter case surely will bring about some though, but of staying the ship and making the ban even more stringent.
On a personal note, I use no AI products, maybe I picked them up too early but I don’t like what they produce. If I need complex structures made, I am 100% more comfortable designing them myself, as I have enough problems trying to read my own code, let alone a synthetic brain’s. If I need long, repetitive code made, I’ve had a long time to practice with VIM/Python/Bash and can reproduce hundreds of lines with tiny differences in minutes. AI Evangelists may think they found the Ark but to me, all they found was a briefcase with money in it.
> would grin in the darkness of their room and say to themselves “I bet you I can sneak AI code into the Gentoo project in less than a week”. [...] Who would they be trying to impress?
I agree with most of what you said, but as someone working in the security field, I know enough adults that I'd suspect doing something just for the heck of it. Not maliciously for any particular gain, but for the hack value -- assuming it's not deemed too easy to pull off and thus not interesting to try
Overall, of course, yeah I agree people generally want to do good, and doubly so in a community like Gentoo
> maybe I picked them up too early but I don’t like what they produce.
You don't use translation engines? As someone living in a country whose language they're still learning, I can't tell you how helpful translations from DeepL are. They're not flawless, but it's at a level that I come across polite and well enough towards someone like a prospective landlord or government agency, and my language skills good enough that I can see where it got the meaning wrong (so long as it's not some subtlety, but then it tries to be formal so that's also not often an issue). I'm sure there's more examples but just in general, I really do see their benefit even if I also see the concerns
I agree with the use of such tools in our personal lives, as liability is not as big a problem and the stakes are lower generally.
I only object to the use of AI in places where I have to put my signature and will be held liable for whatever output the AI decides to give me.
Like that Lawyer that made a whole deposition with AI and got laughed out of the room, he could’ve been held liable for court fees of the other party in any other country, and lawyers aren’t cheap! I don’t imagine his employers were very happy.
I work on open source projects, I use ChatGPT and Copilot. I've yet to see any convincing argument why I shouldn't use advanced auto-complete to do my job faster, or how copying and pasting from StackOverflow (while using adblockers) is fine but copying and pasting from ChatGPT is evil.
I don't think it's evil, but I think it generates bad code that I have to fix all the time. It's very annoying when developers use it to effectively copy and paste code snippets that should be wrapped in a function and then the bugs that Copilot drops in have to be fixed in twelve places instead of one.
That's fine, using those tools is your prerogative and I won't deny it or besmirch you because of it. That being said, going to a project such as Gentoo's - defined by its ideals of Free/Libre software and its commitment to users of Free/Libre software - and intentionally go against the grain by using AI is what I protested against, as do the other contributors and council members.
Using AI, such as ChatGPT and Claude and such, in contributions to Free/Libre software, is not practically wrong, it's ideologically wrong. It is so, in my opinion, because the AI was produced by a company that did not ask anyone for permission when incorporating training materials (applies to all commercial AI products of today), it did not respect the liberties of its contributors nor its rights to ownership of their work (applies to all commercial AI products of today), and now asks of you to foot the bill for whatever slop it produces (applies to most commercial AI products of today, some are still free to use).
You can use your tools if you like them, that's fine, I'm not your boss. But on that last point, a large majority of users I've encountered, in this thread even, seem to agree that all AI-produced works need retouching (big or small retouching, though most often big) in order to be usable in any serious endeavor. Midjourney users need to touch up the fingers and the landscapes and look for artifacts of all kinds. Copilot users need to rewrite whole algorithms sometimes because of plagiarism and outright erroneous calculations (unless you don't care). ChatGPT needs to be fact-checked if you ask it to write anything significant.
I ask you this, if you have to oversee everything it produces, costs money, and is generally agreed to produce low-quality work, why not hire a junior employee instead? Freshers, as our fellows in India would call them. Lord knows we need more of them.
But how can you then sign a DCO or a CLA for contributing? Both of these say that you certify that you are the author (or you know that you have the legal rights to contribute the content). How do you know this for generated content?
Given Gentoo is a non-corporate, all volunteer project I think this is the salient point:
“ In his RFC, he laid out three broad areas of concern: copyrights, quality, and ethics. On the copyright front, he argued that LLMs are trained on copyrighted material and the companies behind them are unconcerned with copyright violations. "In particular, there's a good risk that these tools would yield stuff we can't legally use." “
Unlike Red Hat, SuSE, Ubuntu, who all are corporate organizations with support revenue and legal teams to defend these issues, along with policy to prevent the issues, Gentoo is an all volunteer project that doesn’t have the organizational or financial support to fight legal battles. Their risk from copyright legal issues is much higher.
Without AI, I struggle with dyslexia and writing difficulties. With AI, I can contribute good code, with well written requirements spec, test plan, unit tests, documentation, and automation. I spend about ten hours a week working on open source, when before I spent none.
When I see an anti AI take that dismisses the potential and makes unlikely legal handwringing, I realize how much work is left in making open source accessible to those who struggle like I do.
I won't be contributing to Gentoo or any project that rejects AI assisted work. I'll keep working on my own accessibility projects and my own computational geometry codes.
I believe that in the end, accessibly focused projects will win the day here. AI assisted open source will become the norm, because it lowers the access barriers to good code.
As helpful as it may be, AI can emit verbatim training data where the training data for the particular issue is scarce. I have, unfortunately, seen it myself in an interview take-home assignment where the solution for a particular sub-problem was 1:1 copied from a seemingly obscure chinese blog and had a completely different coding style than the rest. (The rest also had numerous inconsistencies and the solution had AI written all over it.)
For an open source project inadvertently accepting something like that may spell legal trouble long after the original contributor disappeared, so until the legal status of AI-generated code is clarified in court, there are good reasons to ask contributors to limit themselves or completely forego using an AI assistant.
I'd be curious about the problem and context that elicited the verbatim output. Did it have detailed descriptions of a real system in production that you were supposed to extend, including API docs and other system specific elements? Or was it a generic homework assignment?
If it's emitting verbatim solutions in my context, that is surprising. I write up custom requirements, have a conversation with a two sided critique and requirement negotiation and gap analysis vs the exist system. I then generate and critique a test plan, generate stub tests and stub implementations in line with existing code, then implement and test the stories by hand on top of that.
Nothing that "has AI written all over it" gets generated, since we're having a custom conversation and arriving at custom solutions. If it does copy public but not open code, it's tiny generic snippets or, design patterns, rather than, say, sophisticated algorithms.. not anything likely to cause damage to another org or it's market position.
I'm prepared to field a cease and desist or put down a lawsuit by yanking affected code.
And I'm only going to do this for projects that I own, or whose authors are taking a permissive stance. I'm boycotting any project taking a hard line against AI.
My projects are affected in the case of a full retroactive AI ban that makes all AI generated codes unlawful. However, due to the transformative nature of processes like mine, and the vast majority of AI output, I really doubt that'll happen.
I'll grant that for a Linux distribution, the risk is much higher - Linux most definitely impacts market positions, and yoinking someone's public blog code could lead to rework after a cease and desist. But even there, I feel the legal risk is symbolic rather than existential or even eminent.
Legal risk won't be a driver for most projects, and might not be the driver for gentoo. It'll be prejudice against people who can't easily contribute without AI assistance. There's certainly an element of that in the gentoo discussion, with AI contributions being misrepresented, discounted and dismissed outright. That'll prove short-sighted, and drive contribution to inclusive projects with strong controls ensuring their AI use is responsible and transformative.
I wish Gentoo would choose to be an example of that. And I'm sad to say I plan to boycott it until it changes stance. But I am sure other distributions will step up here and work to mitigate the legal risks in return for the obvious accessibility benefits.
Typically, I found that LLMs tends to emit training data when you ask it simple questions, such as "connect a MySQL server". In one of my experiments it emitted a StackOverflow example 1:1, including the mistakes that were made there (no error handling). Regardless if it is legal to use code in such fashion, more experienced coders tend to rewrite the code an LLM emits, so it's less likely to lead to actual problems down the line. I've seen a lot of less experienced people take LLM-generated code and use it as-is, which can lead to problems.
In case of an open source project one of the main problems with LLMs is that if a contributor does contribute infringing material, inadvertently or not, the project will be left holding the bag for it. The contributor may disappear because all that's known about them may just be an alias and an email address. This is especially true if a project doesn't have the backing of a large, well-funded organization that can fight out legal battles on behalf of the contributors. The legal situation on Internet-trained LLM usage isn't clear and I can totally understand a small, underfunded project not wanting to go up against large corporations. I think respecting the project maintainers wishes in this case is the right thing to do.
I've seen several examples where an LLM was used as a tool for accessibility (vision assistance, etc) and I do think that that's going to be one of the areas where it will be most valuable to society as a whole. Unfortunately, the legal system moves slowly and it will be a few more years until this mess gets cleaned up. I also think you are doing the right thing and I wish more people would follow your example. If a particular project deems the risk of using an LLM too high, contributors should follow the wishes of the maintainers and not contribute to that project with the usage of an LLM. Depending on the country, petitioning lawmakers may also help move the needle in the desired direction.
Good. Explicit is better than implicit (or implied). Documenting the decision and making it clear is probably better for everybody -- both those who agree and those who disagree.
Lots of other projects also have reached the same decision, but the way of communicating this has not been clear. I suppose people with better wordsmithing skills than me will eventually end up with a more-or-less standardized expression to be included in the CONTRIBUTING file.
From the point of view of contributors, the situation is a little clearer. In the current situation, using most generative AI tools to produce content essentially stops me from being able to sign a DCO or a CLA to contribute this content, since they both say "I certify that I wrote this" and I can't claim that any longer.
I think this policy is yet another example of ethical overengineering.
Yes, of course, ChatGPT & Co. are very unlikely to take any real care about what they give their models to learn - especially with regard to copyright.
For a better perspective, I always try to transfer the behavior to the analog world. In this sense, someone who, for example, illegally obtains a book on Linux (let's say via ebook) would not be ethically in the correct position to contribute to Gentoo, because the knowledge on which their contribution is based would be stolen intellectual property.
I can't imagine that there are any people in this world whose knowledge was obtained exclusively through copyright legal means. We are all in a gray area - one possibly more than the other.
In my view, it is typical of engineers to simply transfer philosophically complicated moral issues to technologies. But this method will never come close to solving anything, because the very complicated analog moral world cannot simply be mapped to 0 or 1.
Man, slippery slope!!
reply