Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Well I am "a fan" of Copilot and I do think AI is the future, but I think the author has a valid point.

I think the fair use violation he describes doesn't happen during training. I do think training AI on anything that is publicly accessible is fair use just as in an example of a person learning by reading/watching the same materials.

However, this fair use rule is being violated the moment the resulting AI starts suggesting verbatim copied code from licensed works without attribution.

So one could argue the source code is not being used in a transformative way but copilot is just more efficient method of retrieval of licensed code. This misses the fact copilot actually is capable of writing new code. I've used it as "an autocomplete on steroids". Letting it suggest maybe half a line, or 1 line of code at a time (or trivial stuff we automate even without copilot like getters/setters in java). But when actual licensed code is suggested yes, this is IMO a license violation.

Therefore one way of resolving this would be to pair copilot with a tool that scanned the resulting code for presence of licensed code then it woukd make a list of "credits" or references. Also there should be measures taken (perhaps during training) to penalise generation of verbatim (or extremely similar) code. Would this make copilot less of a useful tool? I'm not sure.

One thing that's not going to happen is putting tools like copilot back "in the bottle". We now have similar models anyone can download (faux pilot) and I as well as many others have found those tools to speed up mundane tasks a lot. This translates into monetary advantage for users. Therefore there is no way this will disappear, lawsuit or no lawsuit.



view as:

Legal | privacy