Hacker Read

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-27 20:40:24

If this ML algorithm were not open source, it would be illegal for you to release it yourself as Apache 2.

monocasa | karma 27236 | avg karma 2.94 · 2023-09-27 20:58:36

What exactly is stopping anyone from releasing a binary as Apache 2 and never releasing the source?

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-27 21:13:37

You’re welcome to fuck around and find out. Go release llama2 under Apache 2. You’re saying that’s fine right?

The answer to your question is that code stored as a binary is not different from code stored as text. Pickled models are code.

reply

monocasa | karma 27236 | avg karma 2.94 · 2023-09-27 21:22:25

> You’re welcome to fuck around and find out. Go release llama2 under Apache 2. You’re saying that’s fine right?

You're missing my point. Obviously you can't release someone else's IP under whatever license you see fit.

You can release your own binary under Apache 2. Doing so without releasing the source doesn't make it open source despite being an open source license.

> The answer to your question is that code stored as a binary is not different from code stored as text. Pickled models are code.

I'm not saying it's not code; I'm saying it's not source.

reply

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-28 08:16:41

It is source.

The data used to derive this model is not different from the brain and worldly observations and learnings of the engineers, which are not part of any open source materials.

reply

camgunz | karma 5075 | avg karma 2.2 · 2023-09-28 12:54:24

What are you talking about "the brain" of the engineers, this is bonkers. Monocasa is being excruciatingly patient with you all but the fact is this was generated with tools and is not a source release, it's a final product or compiled or generated release.

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-28 18:20:53

It’s bonkers because Monocasa’s take is bonkers.

Code generated with tools is still code. This code is the source. The output of the code is the output. Monocasa is failing to understand or perhaps intentionally not understanding the difference. In some contexts a “compiled release” implies an output that is largely immutable for practical purposes. That is not what this is. It’s technically a binary object, but it’s a binary object you can easily unpack to get executable code that you can read and edit. It is a convenient format different from classical text code. The fact that it’s a binary is completely irrelevant. It’s akin to arguing that code that is provided in a zip file cannot be open source. Both because it’s a compressed file and because it doesn’t include the compression algorithm.

With that understood, demanding the “tools” that were used to create the code is like asking for the engineers’ notebooks of design thoughts along the way. It has no bearing on your ability to use or modify it. This is not an open source project to make neural nets. This an open source project of a neural net.

If someone releases math_funcs.py, you don’t need anything about the tools that were used to create math_funcs.py to consider it open source.

reply

monocasa | karma 27236 | avg karma 2.94 · 2023-09-28 19:04:34

Once again, what do Mistral's engineers edit to do their job? That's the source that when released would constitute 'opeb source'.

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-28 19:27:27

So if you use tools that generate some boilerplate code as part of your project you need to include the boilerplate generator otherwise it’s not open source?

monocasa | karma 27236 | avg karma 2.94 · 2023-09-28 19:48:15

The model weights aren't boilerplate.

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-28 21:46:55

It’s all code. Is your opinion now that open source repos have to include the tools to generate only the parts you personally deem important enough?

camgunz | karma 5075 | avg karma 2.2 · 2023-09-29 01:52:07

Why are you being so obtuse? No, devs don't have to include the source to vim in their repos. They have to include the source files for their product in their repos. I'm confident this just isn't that hard to understand.

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-29 08:33:26

These are the source files! I’m going to stop responding to monocasa because I think he is being obtuse and leading me to say things that you are misinterpreting.

There is no expectation to include vim, or any tools required to create a codebase. We agree. And that’s why this repo is sufficient. Asking for the tooling that was used to make this project would be out of scope and unreasonable.

This is a repo that can be used to make predictions. It is not a repo that is used to make models.

reply

camgunz | karma 5075 | avg karma 2.2 · 2023-09-30 05:39:38

Well but you can see that you're a little downstream right?

c++ source -> photoshop -> images

source we're asking for -> the repo you're talking about -> predictions

reply

jncfhnb | karma 4263 | avg karma 1.75 · 2023-09-30 11:49:43

The source code of the repo referred to by “open source” is the code of the repo.

You can ask all you want but that is irrelevant as to whether it is open source. If photoshop were open source, the c++ code would need to be available. Not the tooling used to make the c++ code. The c++ code is equivalent to the model. Not the separate pythoj codebase that was involved in making it.

Which is some BSD PyTorch + PyTorch calling code that anyone competent in the field can implement any number of ways and is not special to this output.

reply

camgunz | karma 5075 | avg karma 2.2 · 2023-10-02 04:02:11

> You can ask all you want but that is irrelevant as to whether it is open source.

There's a pretty good definition of open source at OSI [0], point of 2 of which is (emphasis mine):

"The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."

You can't bump the window on what "program" means. "Program" here doesn't mean "predictions", that's the output of the program. If you had a program that generated images, you wouldn't say that that program was the source code of the images. You would say that that program generates images and has source code.

This just isn't an open source release. It's freely released to the public, but it doesn't contain the source used to create or modify it.

> Which is some BSD PyTorch + PyTorch calling code that anyone competent in the field can implement any number of ways and is not special to this output.

It then seems trivial to release it.

[0]: https://opensource.org/osd/

reply

jncfhnb | karma 4263 | avg karma 1.75 · 2023-10-02 17:07:45

> You can't bump the window on what "program" means. "Program" here doesn't mean "predictions", that's the output of the program. If you had a program that generated images, you wouldn't say that that program was the source code of the images. You would say that that program generates images and has source code

My dude… you have no idea what you’re talking about.

The picked model is the preferred way to interact and modify it. It is the source code. It is not like a compiled program. It is literally code.

I am NOT claiming the predictions are the program. I am saying the pickled model is. You 100% don’t need anything else to do anything more with the model.

I don’t know or care if they released their model generating code but nobody competent who understands what they are talking about cares about this.

It’s pickled because it’s big. Just imagine this as a zip containing algo.py

reply