> Is it a valid defense against copyright infringement to say “we don’t know where we got it, maybe someone else copied it from you first?”
I mean, in humans it's just referred to as 'experience', 'training', or 'creativity'. Unless your experience is job-only, all the code you write is based on some source you can't attribute combined with your own mental routine of "i've been given this problem and need to emit code to solve it". In fact, you might regularly violate copyright every time you write the same 3 lines of code that solve some common language workaround or problem. Maybe the solution is CoPilot accompanying each generation with a URL containing all of the run's weights and traces so that a court can unlock the URL upon court order to investigate copyright infringement.
> If someone violated the copyright of a song by sampling too much of it and released it in the public domain (or failed to claim it at all), and you take the entire sample from them, would that hold up in a legal setting? I doubt it.
In general you're not liable for this. While you still will likely have to go to court with the original copyright holder's work, all the damages you pay can be attributed to whoever defrauded or misrepresented ownership over that work. (I am not your lawyer)
> What if a human reads some restrictively licensed code and years later uses some idea he noticed in that code, maybe even no longer being aware from where this idea comes?
In general using the idea is fine, whether it is AI or human written. I think the major concern here is when the code is copied verbatim, or near verbatim. (AKA the produced code is not "transformative" upon the original)
> But what if the system memorizes entire functions? What if a human does so?
In both of these cases I believe it would be a copyright concern. It is not strictly defined, and it depends on the complexity of the function. If you memorized (|a| a + 1) I doubt any court would call that copying a creative work. But if you memorized the quake fast inverse square root it is likely protected under copyright, even if you changed the variable names and formatting.
It seems clear to me that GitHub Copilot is capable of producing code that is copyrighted and needs to be used according to the copyright owner's license. Worse still, it doesn't appear of capable of knowing when it is doing that, and what the source is.
> Are you laundering 5 lines of code? Probably fair use.
The amount of copying does not matter for a copyright claim. If you copy a single character from a codebase that could could get nowhere but there, and lawyers could prove it, that could go to court. This is a hypothetical but this is entirely possible. There have been copyright cases fought over single sentences especially in the music industry.
> Are you trying to launder, or is it unintentional?
I'm not a lawyer but in the US, Copyright is a strict liability statute which means intent does not matter.
> What happens if I create a 10 lines function character by character identical to some proprietary or GPLled piece of code, without ever looking at that code nor knowing that it existed?
Then you're in the clear, though you may need to convince a court that that's what happened. (Patents and trademarks could still be issues, but there's no copyright issue)
> that happens to people as well, ie. they see some algorithm and down the road have to write something similar and end up reproducing the exact same thing.
And if such algorithm is copyrighted, that would be infringing! It doesn't matter if you copy on purpose or by chance.
> Or they could integrate a way to find the produced output back in the corpus if it's sufficiently close and provide a reference/attribution. Basically whatever tool a copyright lawyer would use to track down original work.
That assumes that the licenses of your code and the original code are compatible which often isn't the case.
They could argue that by viewing the copyrighted code/implementation, you could effectively infringe by (even subconsciously) writing the same/similar code.
There's merits to this claim if you're indeed implementing some advanced, niche algorithm but it definitely wouldn't apply here as all he's doing is calling HTTP APIs, a very generic and common thing to do.
> It's not illegal to come up with code that is exactly the same as an existing piece of copyrighted code.
That's not why it is not a copyright infringement to come up with code that is exactly the same as a piece of code that is the same as some other code.
"In computer programs, concerns for efficiency may limit the possible ways to achieve a particular function, making a particular expression necessary to achieving the idea. In this case, the expression is not protected by copyright."
It's because the code in question is most likely to only be written in one way.
If you happen to come up with the same melody and lyrics as a pop song in a "clean room" do you think you are in the clear? We're talking about copyright! It is meant to cover artistic expression! Not utilitarian inventions.
Yeah, that part makes this whole thing smell really fishy. The researcher claims it was done through reverse engineering which would not be a violation of copyright. They'll have to show that someone had the opportunity to see their code and then copied it. Patents would arguably be an easier defense since you only need to show similarity and independence of invention doesn't matter.
> A programmer can read available but not oss licensed code and learn from it. Thats fair use. If a machine does it, is it wrong ?
You can learn from it, but if you start copying snippets or base your code on it to such an extent that its clear your work is based on it, things start to get risky.
For comparison, people have tried to get around copyright of photos by hiring an illustrator to "draw" the photo, which doesn't work legally. This situation seems similar.
The second tweet in the thread seems badly off the mark in its understanding of copyright law.
> copyright does not only cover copying and pasting; it covers derivative works. github copilot was trained on open source code and the sum total of everything it knows was drawn from that code. there is no possible interpretation of "derivative" that does not include this
Copyright law is very complicated (remember Google vs Oracle?) and involves a lot of balancing different factors [0]. Simply saying that something is a "derivative work" doesn't establish that it's copyright infringement. An important defense against infringement claims is arguing that the work is "transformative." Obviously "transformative" is a subjective term, but one example is the Supreme Court determining that Google copying Java's API's to a different platform is transformative [1]. There are a lot of other really interesting examples out there [2] involving things like if parodies are fair use (yes) or if satires are fair use (not necessarily). But one way or another, it's hard for me to believe that taking static code and using it to build a code-generating AI wouldn't meet that standard.
As I said, though, copyright law is really complicated, and I'm certainly not a lawyer. I'm sure someone out there could make an argument that Copilot is copyright infringement, but this thread isn't that argument.
Edit: Note that the other comments saying "I'm just going to wrap an entire operating system in 'AI' to do an end run around copyright" are proposing to do something that wouldn't be transformative and therefore probably wouldn't be fair use. Copyright law has a lot of shades of grey and balancing of factors that make it a lot less "hackable" than those of us who live in the world of code might imagine.
> If the AI generates all the code, but then a human debugs it and alters it, is that copyright that can be owned? Does the entire code base then become copyrightable?
I am not a lawyer and I did not research anything for this, but I'm under the impression that a derivative work of something in the public domain is itself copyrightable. If something isn't copyrightable, it's in the public domain. So, if you alter it sufficiently to create a derivative work, the altered form should be copyrightable. But the original would still be public domain. I think?
> By looking at other code? Are you violating copyright when you are writing similar code yourself?
It's certainly possible. Back when Phoenix reverse-engineered the IBM BIOS using published sources with a restrictive license, they did it by having one team read the sources and write a very detailed specification of everything that happens there, and then another team used that spec to write new code from scratch. They did it that way because if the first team were to write the code themselves, it is quite likely that the result would be legally considered derived work.
Yes. Copyright already protects what a patent granted for source code would protect.
The problem you describe exists in both scenarios, and is resolved with enforcement. You can sneakily break the law, and you run the risk of getting caught for fraud.
Under what license was the original published? If it was an open-source license, and if the copy includes your copyright notice, then there's no wrongdoing and nothing you can do.
If it was an open-source license and your copyright notice is not present, and if there is any line-by-line copying from your original to the copy, you should be able to prove that the copier simply copied and pasted your code. This kind of copying can be proven using textual matching and the mathematical improbability of an inadvertent exact copy.
If the work was published under anything other than an open-source license, it shouldn't have been published in any way that could be copied (i.e. in source-code form). In this case, any substantial copying is a violation, but you would have a very hard time pursuing a legal remedy.
> I am not sure about html-part, but in terms of design it's copy.
If we're talking about design, not code copying, things become more difficult. Ask yourself how obvious to a practitioner of the art the layout and functions are. To pursue a legal remedy, you would have to show something unique about your design that sets it apart, and you would have to show that someone copied your specific work as opposed to copying the general design principles of similar programs.
You ask a very insightful question. Let me see where I end up running out the analogy in a certain direction.
If Danger Mouse sold a remixing tool that enable widespread remixing of any/all albums, would DM be profiting illegally from the content of others?
In each individual case, the remix album produced would have to pass the fair use tests, and if the user produced a sufficiently close replica, they could be restrained from distributing it. But that wouldn’t implicitly be the remixing tool’s fault, unless it mechanically reproduced a complete protected work with the user completely unaware it was doing so. A dedicated user can make any tool produce a protected work, so we have to aim for the narrow window of user-oblivious in order to fault the tool.
Translating back to Copilot, this then becomes the question: can Copilot regurgitate an entire protected work for a user who then sells that work, with the user being fully unaware that they have reproduced a protected work without meeting fair use terms, such that Copilot is responsible?
Copilot requires user prompting to emit code, and seems to draw the line at around the single function boundary, so reproducing an entire codebase becomes exponentially less likely as the number of functions increases.
So if there were a weakness in Copilot’s defense, it would be in small single-function programs, at which point the parallel to another music case comes to mind: the person who generated and copyrighted every single musical phrase in Western major/minor, to prove that the law as written is not applicable when the total size and complexity of a given work falls below a certain threshold. I thus assume that Copilot is essentially protected in the single function case - it doesn’t matter if you have a protected work for (‘four’ (2 2 +) func), because that’s so simplistic that any human might reproduce it at any time unaided, and so claiming against them would fall flat when a judge applies the common sense threshold. It’s a high bar to expect a judge to recognize this analogy and understand code well enough, but I think between user intention to break fair use being required for complex multi-function systems, and the protection of snippets being essentially impossible to enforce against in music terms, would absolutely shield Copilot from being judged liable and owing damages.
(General disclaimer applies: I am not your lawyer, please seek legal counsel before making use of my opinion, etc.)
> How do you prove that you did the work and just happened with the same code because that's just the most straightforward way to do it?
This was a factor in Intel's case with NEC over the microcode of their 8086 clone, and the judge ruled in NEC's favour, finding that similarities dictated by functional constraints were (subjectively) not creative expressions i.e. copyright did not apply.?¹?
Note this relates to copyright, does not extinguish patents, and actually relying on this precedent may require bottomless legal resources.
I mean, in humans it's just referred to as 'experience', 'training', or 'creativity'. Unless your experience is job-only, all the code you write is based on some source you can't attribute combined with your own mental routine of "i've been given this problem and need to emit code to solve it". In fact, you might regularly violate copyright every time you write the same 3 lines of code that solve some common language workaround or problem. Maybe the solution is CoPilot accompanying each generation with a URL containing all of the run's weights and traces so that a court can unlock the URL upon court order to investigate copyright infringement.
> If someone violated the copyright of a song by sampling too much of it and released it in the public domain (or failed to claim it at all), and you take the entire sample from them, would that hold up in a legal setting? I doubt it.
In general you're not liable for this. While you still will likely have to go to court with the original copyright holder's work, all the damages you pay can be attributed to whoever defrauded or misrepresented ownership over that work. (I am not your lawyer)
reply