Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

They also have the ability to install malware on Windows and use everyone's source code for training, but choose not to, because private code is private. Their own code isn't an exception. Microsoft code in Github repos is used for training, just like the rest.


view as:

Why would MS need to worry about their private code being fed into their own private AI model?

Because... it's private code. Can the company be 100% certain there are no passwords, DB keys, other company secrets in it? Can they be certain there's no personal employee data? Internal product names? A hundred other similar concerns with proprietary IP? Regardless of how the LLM transforms it the individual bits of data are still there.

On the other hand if the repo is already public on Github then exposing it via an LLM is not introducing any new security risk.


If they believe their own assertations, like:

- that it doesn't output training data verbatim

- the product is very transformative, only "learning" from training data

- There are no copyright infringements because of these two above

Well, then there's really no reason not to throw their own private code on the pile.


Legal | privacy