Hacker Read

paxys | karma 66162 | avg karma 7.18 · 2023-09-07 14:44:41

They also have the ability to install malware on Windows and use everyone's source code for training, but choose not to, because private code is private. Their own code isn't an exception. Microsoft code in Github repos is used for training, just like the rest.

kube-system | karma 21374 | avg karma 2.36 · 2023-09-07 15:03:32

Why would MS need to worry about their private code being fed into their own private AI model?

paxys | karma 66162 | avg karma 7.18 · 2023-09-07 15:36:17

Because... it's private code. Can the company be 100% certain there are no passwords, DB keys, other company secrets in it? Can they be certain there's no personal employee data? Internal product names? A hundred other similar concerns with proprietary IP? Regardless of how the LLM transforms it the individual bits of data are still there.

On the other hand if the repo is already public on Github then exposing it via an LLM is not introducing any new security risk.

reply

tyingq | karma 59102 | avg karma 3.47 · 2023-09-07 15:14:29

If they believe their own assertations, like:

- that it doesn't output training data verbatim

- the product is very transformative, only "learning" from training data

- There are no copyright infringements because of these two above

Well, then there's really no reason not to throw their own private code on the pile.

reply