Interesting to say it's scientific because falsifiable.
The objection is that it wasn't a theory, just fitting a function to data. It did "work" in that it captured some pattern: it was extremely good at generalizing/extrapolating/predicting. And was a "model" of something in the data. But there was no operational model behind it, of what was actually happening.
Newtonian mechanics has a model, beyond curve fitting.
BTW: I made this analogy between LLM and epicycles as a joke, but it's looking strangely isomorphic...
It's funny, because epicycles used to be the poster-child for ad hoc, overcomplex models with no conceptual basis... which is an exact match for LLM... but I don't recall the analogy being made. Not even in https://norvig.com/chomsky.html
That would be a valid criticism if epicycles were added to the model post-hoc to explain previous observations, and adding those epicycles to the model did not improve future predictions.
That seems a pretty devastating critique. The model is so confident it takes hardly any concrete data points that don't match to disprove it. (Note I'm speaking math in that sentence, not English.)
F = G * ((m1 * m2) / r^2), plus, yes, the concepts of distance and inertial mass.
G (and the bounds on the precision of our measurement of G) is derived by fitting from experiment. It's a regression model which happens to have R^2 extremely close to 1, and that's why we can treat it as (nearly-)always true.
It's absolutely a statement about observed behavior which we then _interpret_ (incompletely, but usefully, as Einstein showed) as an inverse square law. This is precisely a _model_ of behavior. That model can be derived from first principles or it can be purely phenomenological, and different models are useful for different intellectual tasks.
Once you have a statement which is nearly always true, you can ask _why_ it's nearly always true, and that's very useful, but "law" really _does_ just mean "statement to which we haven't found counterexamples yet".
But the model is just a model. Science is the process of building, interpreting and invalidating models, and different pieces of science live at different _points_ on this continuum. Large language models in linguistics live off to an extreme point on it, but even there, models have designed-in inductive biases (eg the attention mechanism in most LLMs) which reflect the modeller's hypotheses about the structure of the problem.
You seem, like Chomsky, to want science to be much more Platonic and profound than it actually is. That's your choice.
Most of the "predictions" you'd make using it, when discovered, were wrong. it is a deeply unpredictive formula, being useless for predicting vast classes of problems (since we are largely ignorant about the masses involved, and cannot compute the dynamics beyond a few anyway).
Science is explanatory, not "predictive" -- this is an antique mistake.
As for 'math models' insofar as these are science, they arent math. They use mathematical notation as a paraphrase for english, and the english words refer to the world.
F=GMM/r^2 is just a summary of "a force occurs in proportion to the product to two masses and inversely in proportion to their square distance"
note: force, mass, distance, etc. <- terms which describe reality and its properties; not mathematics.
All that's proven is that statistical models are a good fit for something we don't really understand. No different than using epicycles to solve the problem of planetary motion.
I've yet to read a comment about how well such models predict reality. Who is going to weigh in?
I studied economics, and I have some doubts about whether this kind of modelling works well enough to be useful. There's statistical issues like whether you have enough data to bring in your error bars, and theoretical issues like whether the thing you're studying moved below you because it's being studied. Neither of which invalidate the model entirely, but both of which make it a heck of a hard task.
And in the end, what matters is whether it works or not. The Lagrangian for quantum physics is some huge formula that gets pushed as meme now and again but it works. Likewise f = ma works perfectly well for your high school rolling ball experiment.
The surprising thing is that the models generate predictions far beyond the domains they were designed for (and far beyond the original knowledge of the people making the models), and that the predictions are so mindbogglingly accurate that there seems to be Something Else going on.
See the Unreasonable Effectiveness of Mathematics link below.
I feel like that example assumes the conclusion. A better analogy would be "spinning objects theories". It's a category so broad you could always find something which works, but since it's not really thought of as a particular solution, it isn't so vulnerable to overfitting. Epicycles were bad largely because they gave a false impression of closing in on reality; every new round of additions consisted of smaller and smaller tweaks. Spinning object theories are even more broad and adaptable, but since they aren't basically one solution you don't have the same issue. When Kepler found a spinning object theory which worked, it wasn't overfitting.
The semantics for the mathematical model /are/ the theory of how it works.
M = mass, F = force etc.
st., the model claims we are in a world of masses and forces, and so on. And this model can be disconfirmed.
You're confusing two issues here. Nowhere in the whole of science are models presented without an explanatory semantics. Almost nowhere in the whole of ML, are they.
Here it is trivial to disconfirm even the authors interpretation of the NNs 'finding' of solutions, by noting it is often using edges that don't exist.
This whatever it is doing, we can immediately show it isn't traversing a graph. The author seems to wilfully opted for the superstitious interpretation
They’d probably argue that this same model works for that - you just update the manifold. That’s probably fair, the model isn't really powered to be predictive just correlative.
It still embeds some of the same problems, though, mainly how an individual actually interacts with it and why their interpretation is correct.
For the purposes of this discussion, Norvig has defined a statistical model as:
"a mathematical model which is modified or trained by the input of data points."
He then illustrates how what would be considered a "scientific" model, Newton's law of gravitation, is a statistical model under his definition, but simple one with not many parameters. He contrasts this with a Markov model of a communication channel with a large vocabulary, which has many parameters. His argument is, then, that Chomsky dislikes statistical models with large numbers of parameters, as stated in the passage I quoted before.
My point was that Chomsky's concerns are, with reference to Norvig's argument, equivalent to concerns about model fitting, namely that to fit models with many more parameters than you have data, you require additional assumptions about your model structure. It is difficult (though not necessarily impossible) to learn about the system from your model, because in order to construct your model you have had to assume things about reality that you will not be verifying against observations.
In the opposite case, where you have many more data than parameters, you can fit your model with confidence, given only assumptions about your sampling (which you address by being a good experimentalist). This is what allows you to "learn about the underlying system" - you have a model that describes reality well by itself, without requiring additional assumptions about the nature of reality, so the structure of your model reflects something about the structure of reality, and you can explore your model as though you were exploring reality. Of course, sometimes it turns out the equivalence wasn't as good as we thought, but often it provides us with new directions of investigation.
Hopefully that clarifies the equivalence between the two statements - I apologise for not making it more obvious earlier.
“I can explain how this model works” is not something you can claim about a model. You can only claim it about a tuple of (model, assumptions, data, context).
In context A, some simple linear regression might be very “explainable.” In context B, that same linear model might be totally not explainable (because the mechanism of coefficients based on the regression’s fitting procedure might be totally incompatible with other details of the situation.)
The analogy between Newtonian and quantum mechanics still holds.
The fact that you can more easily map Newtonian mechanics onto English words or pictures absolutely doesn’t make it more “explainable.”
By that logic, saying “a magic wizard did it” would be the most “explainable” model of all.
This is exactly my point. What is “complex” or “simple”? Arbitrary standards of natural language words? The field of explainable models has done no work on this. It starts from some totally arbitrary and confused idea about something both being an accurate model and an explainable model as if they are separate.
The closest academic topic to making “explainability” a serious subject would be the philosophy of language and connection to computability theory, like Kolmogorov complexity, PAC learning, VC dimension, Occam’s razor.
But these are algorithmic aspects of model complexity in the face of a specific data set, it’s absolutely not some hand wavy “oh but a person could ‘easily’ understand certain verbal acoustic vibrations about this” based on nothing.
What? It's not "coincidentally" correct, and yes, it absolutely is a correct model for phenomena at a certain scale (in fact, for the vast majority of phenomena in engineering disciplines for instance).
Newtonian mechanics has a model, beyond curve fitting.
BTW: I made this analogy between LLM and epicycles as a joke, but it's looking strangely isomorphic...
It's funny, because epicycles used to be the poster-child for ad hoc, overcomplex models with no conceptual basis... which is an exact match for LLM... but I don't recall the analogy being made. Not even in https://norvig.com/chomsky.html
reply