Recalls Gall's Law[1]. "A complex system that works is invariably found to have evolved from a simple system that worked."
Also, TFA invites a question: if handed a big ball of mud, is it riskier to start from scratch and go for something more triumphant, or try to evolve the mud gradually?
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare, The 1980 ACM Turing Award Lecture
Yep, my first thought upon reading this was that no discussion of this subject is complete without a perusal of Gall's Systemantics (https://en.wikipedia.org/wiki/Systemantics).
> It is offered from the perspective of how not to design systems, based on system engineering failures. The primary precept of treatise is that large complex systems are extremely difficult to design correctly despite best intentions, so care must be taken to design smaller, less-complex systems and to do so with incremental functionality based on close and continual touch with user needs and measures of effectiveness.
I am working on my first game using unity right now and I wholeheartedly agree. Almost all of my effective refactoring is turning interacting systems into standalone chunks that don’t care about the rest of the system
It’s very hard to do. I imagine my 4th game will go far smoother after I figure out what works and what doesn’t
> if handed a big ball of mud, is it riskier to start from scratch and go for something more triumphant, or try to evolve the mud gradually?
Reminiscent of Chesterton’s fence. But then, we end up in such a “complex” situation only when one thing can have multiple causes & effects — which is difficult to model correctly in a clean slate formulation.
The simplest solution seems to be to avoid making software that complex in the first place (we can exert far more control than in the physical world).
But then if we think about Peter Naur’s perspective about programming as a mode of theory building (of the domain) (unsurprising, given the basic cybernetics principles such as the law of requisite variety & the good regulator theorem), then the answer seems to be — unless your domain is really complex, think hard before you implement, and keep refactoring as your understanding improves (and truly to pick problem formulations / frameworks / languages which make that feasible. Of course, easier said than done.) The key point is to keep refactoring “continuously“ to match our understanding of the domain, rather than just “adding features”.
Aside: In my experience, software built on a good understanding of the domain will function well, untouched, for a long time — so long as it is suitably decoupled from the less-well-understood parts. The latter kind, though, generates constant churn, while also being an annoying fit. Really brings home the adage “A month in the laboratory can save a day in the library.”
> But then, we end up in such a “complex” situation only when one thing can have multiple causes & effects — which is difficult to model correctly in a clean slate formulation.
This is why you should keep paying your employees that worked for the company for years, having written all of the mediocre code when they still could not program well at all.
This lines up with a principle from the Toyota Production System (TPS) in manufacturing--reduce complexity.
In TPS, they found that a focus on reducing complexity leads to improvements in the metrics you'd want to measure: better quality, reduced costs, and customer satisfaction.
I wish this process was called 'factoring' and you had to be able to name the concept that was being isolated. Often 'refactoring' just means moving code around or isolating code for it's own sake. If a factor was properly isolated you shouldn't have to do that one again. Sometimes you choose different factors, but that's much less common.
> If a factor was properly isolated you shouldn't have to do that one again.
This assumes that later code changes don't undo/blur the factoring, which while ideal is not at all consistently the case in the real world.
Refactoring is a little over arrow of a name, because code hygiene is more than just isolating factors, but the “re” part is right because you are always aiming to remove infelicities that were actively added in previous coding.
"Factoring" is sometimes used in the Forth world, since code being factored into small words is of such eminence.
And it offers good lessons about what's worth factoring and how. Forth words that are just static answers and aliases are OK! They're lightweight, and the type signatures are informal anyway. "Doing Forth" means writing it to exactly the spec and not generalizing, so there's a kind of match of expectations of the environment to its most devoted users.
On the other hand, in most modern environments the implied goal is to generalize and piling on function arguments to do so is the common weapon of choice, even when it's of questionable value.
Lately I've cottoned on to CUE as a configuration language and the beauty of it lies in how generalization is achieved while resorting to a minimum of explicit branches and checks, instead doing so through defining the data specification around pattern matching and relying on a solver to find logical incoherencies.
I believe that is really the way forward for a lot of domains: Get away from defining the implementation as your starting point, define things instead in a system with provable qualities, and a lot of possibilities open up.
Big balls of mud result from a process that resembles reinforcement learning, in that modifications are made with a goal in mind and with testing to weed out changes that are not satisfactory, but without any correct, detailed theory about how the changes will achieve the goal without breaking anything.
In the situation I am thinking of, the tests that select successful modifications are, almost by definition, integration tests, because with a big ball of mud, you don't know what the proper specification for the components are, and they don't have clear interfaces.
By 'tests' I am including live failures, which are also a feature of mudballs.
A distributed system is always much more difficult to test than a functionally-equivalent localized version. That's not, of course, a reason to give up on testing, but one must be realistic about how much faith one can put in it to make up for an inadequate use of abstraction and separation of concerns.
> ... still the only real textbook on cybernetics (and, one might add, system theory). It explains the basic principles with concrete examples, elementary mathematics and exercises for the reader. It does not require any mathematics beyond the basic high school level. Although simple, the book formulates principles at a high level of abstraction.
Not sure what cybernetics formally means, but apparently it has to do with complexity management
>
W. Ross Ashby is one of the founding fathers of both cybernetics and systems theory. He developed such fundamental ideas as the homeostat, the law of requisite variety, the principle of self-organization, and the principle of regulatory models. Many of these insights were already proposed in the 1940's and 1950's, long before the presently propular "complex adaptive systems" approach arrived at very similar conclusions. Whereas the concepts surrounding the complexity movement are often complicated and confused, Ashby's ideas are surprisingly clear and simple, yet deep and universal.
I find it really sad that cybernetics completely evaporated as a field with the closest remnant being cognitive science. I think there is a huge need for more interdisciplinary fields
A lot of it was incorporated or duplicated in feedback control theory, but mostly in the context of industry, so it didn't really feed back (heh, sorry) into other, more academic, areas. And, on the other hand, it spun off into (IMO) fluffy "second-order" cybernetics and became a kind of toy philosophy.
I find it sad too. PID controllers are great but from my POV they're barely the first step.
However, another way to look at it is, you can study and apply "Intro to Cyb" and leapfrog into the future.
> If you run an even-moderately-sophisticated web application and install client-side error reporting for Javascript errors, it’s a well-known phenomenon that you will receive a deluge of weird and incomprehensible errors from your application, many of which appear to you to be utterly nonsensical or impossible.
...
> These failures are, individually, mostly comprehensible! You can figure out which browser the report comes from, triage which extensions might be implicated, understand the interactions and identify the failure and a specific workaround. Much of the time.
> However, doing that work is, in most cases, just a colossal waste of effort; you’ll often see any individual error once or twice, and by the time you track it down and understand it, you’ll see three new ones from users in different weird predicaments. The ecosystem is just too heterogenous and fast-changing for deep understanding of individual issues to be worth it as a primary strategy.
I think this is useful even for systems (SW stacks) that are much smaller and "knowable": you start by observing, trying small things, observing more, trying different things, observe more and slowly build a mental model of what is likely happening and where.
His defining characteristic is where you can permanently work around a bug (not know it, but know _of_ it) vs find it, know it, fix it.
What a long winded article on what has been known to scientists for decades as "emergence". Emergent properties are systems level properties that are not obvious/predictable from properties of individual components. Looking and observing one ant is unlikely to tell you that several of these creatures can build an anthill.
Yes, but to a lot of people that sounds like a lot of woo-woo. What this article does is explain it in a clear and persuasive way to the people in a particular field.
The fact that you didn't pick this up leads me to think you are more interested in being smart than helpful, but perhaps I am wrong about that.
Your comment was very puzzling to me, as I couldn't figure out what kind of misunderstanding about this article would prompt a comment such as this. But finally a possibility occurred to me: perhaps you think the point of this article was simply to say that there exist "systems that defy detailed understanding". It is possible that one could think that, if one went in with preconceived expectations based only on title of the post. (But this is a very dangerous habit in general, as outside of personal blogs like this one, almost always headlines in publications aren't chosen by the author.)
But we all know such systems already: for instance, people! No, this post is a supplement/subsidiary to the previous one ("Computers can be understood" — BTW here's another recent blog post making the same point: https://jvns.ca/blog/debugging-attitude-matters/), carving out exceptions to the general rule, and illustrating concretely why these are exceptions (and what works instead). It is useful to the practitioner as a rule-of-thumb for having a narrow set of criteria for when to avoid aiming to understand fully (and alternative strategies for such cases). Otherwise, it's very easy to throw up one's hands and say "computers are magic; I can't possibly understand this".
(The point of the article here is obvious from even just the first or last paragraphs of the article IMO.)
I often wonder if things would be better if systems were less forgiving. I bet people would pay more attention if the browser stopped rendering on JavaScript errors or misformed HTML/CSS. This forgiveness seems to encourage a culture of sloppiness which tends to spread out. I have the displeasure of looking at quite a bit of PHP code. When I point out that they should fix
the hundreds of warnings the usual answer is “why? It works.” My answer usually is “are you sure? “.
On the other hand maybe this forgiveness allowed us to build complex systems.
This often devolves into extremely fragile systems instead. For instance, let's say you failed to load an image on your web site. Would you rather the web site still work with the image broken or just completely fail? What if that image is a tracking pixel? What if you failed to load some experimental module?
Being able to still do something useful in the face of something not going according to plan is essential to being reliable enough to trust.
Systems need to be robust against uncontrollable failures, like a cosmic ray destroying an image as it travels over the internet, because we can never prevent those.
But systems should quickly and reliably surface bugs, which are controllable failures.
A layer of suffering on top of that simple story is that it's not always clear what is and what is not a controllable failure. Is a logic error in a dependency of some infrastructure tooling somewhere in your stack controllable or not? Somebody somewhere could have avoided making that mistake, but it's not clear that you could.
An additional layer of suffering is that we have a habit of allowing this complexity to creep or flood into our work and telling ourselves that it's inevitable. The author writes:
> Once your system is spread across multiple nodes, we face the possibility of one node failing but not another, or the network itself dropping, reordering, and delaying messages between nodes. The vast majority of complexity in distributed systems arises from this simple possibility.
But somehow, the conclusion isn't "so we shouldn't spread the system across multiple nodes". Yo Martin, can we get the First Law of Distributed Object Design a bit louder for the people at the back?
> systems should quickly and reliably surface bugs, which are controllable failures
I was thinking, if the error exists between keyboard and chair, I want the strictest failure mode to both catch it and force me to do things right the first time.
But once the thing is up and running, I want it to be as resilient as possible. Resource corrupted? Try again. Still can't load it? At this point, in "release mode" we want a graceful fallback -- also to prevent eventual bit rot. But during development it should be a red flag of the highest order.
That's an interesting distinction. I think each resource should be self contained. Malformed HTML? HTML error. Malformed or missing image? Browser displays an image error.
The key here is that the web wasn't designed for engineers but for amateurs to slap something together sloppily in the first place.
As an aside it's curious how ridiculously forgiving HTML and JS are while CSS craps itself on a single missing semicolon. As though it were okay for the thing to be semantically and functionally malformed and malfunctioning... as long as it looks good!
I remember being very fond of xhtml. It seemed much more logical and sensible, every beginning having an end, all things in balance. I don’t really know what the argument against it is/was?
It is a valid usecase of html to be used to splash a layer of paint on top of your business model in a situation where you’re too busy to think in a mode where the concept of ”system correctness” makes any sense. It’s not a system in this context at all, it’s my flyer in a trade show so don’t you dare come being all pedantic on me :)! (Sure, CMS manages the markup usually nowadays. In the nineties html was used as the actual user facing layer though.)
Of course, we can discuss whether that culture of busýness was ever actually constructive, but that’s a discussion for another day.
Indeed, less rigidity and higher tolerances lead to reliability - similar to what we do in construction of buildings: a skyscraper would fall one day if it wasn't for its flexibility under effects of elements such as wind.
That's not an apt analogy. A system with tight tolerances can still be flexible, we just know more precisely how it can flex and when it will break.
A better analogy would be if your construction workers didn't have standard or prescribed bolts in their design, so they just take what's lying around and hammer and weld bits together until it seemed sturdy enough. Suffice it to say, this is not a recipe that would work to build today's sky scrapers. There is considerable design and sanity checking that goes into this stuff which the web at every point completely lacked.
XHTML was a promising start in the right direction, but they unfortunately bungled it.
Interesting related trivia: engineers build safeguards around that flexibility - in the same way that a poorly built bridge will shake itself apart in the wind, a building without adaptive dampening or the right properties of flexibility could shake itself apart in the wind.
Personally, as a coder AND as a user - I want the program to flat out fail. As a user, a system that aborts on error maybe a PITA to use, but I have confidence in the output it provides.
As a programmer, I like that same confidence in output AND it requires me to address the failures in some way...
It is true that the underlying technology used to write the code to begin with should be less forgiving. If you use a strictly typed, compiled language instead of PHP, you would have no choice but to fix a lot more of the errors because it would not compile otherwise.
Once it is running on production though, things are quite different. You need the right combination of errors being well reported and gracefully handled without aborting or breaking the rest of the functionality unnecessarily. At that point people are relying on it to get their jobs done and they will usually find ways to work around the errors and even the corrupt data this might result in so they can keep meeting their deadlines while the programmers work on fixing the problem. This is much better than those same employees not being able to do their jobs or getting payed to stand around and do nothing. I guess this attitude is largely driven by the practicalities of where I work. If the employees that rely on the code to work get behind or can't complete their work on time, our company is nailed with thousands of dollars in fines as per the contract agreements we have to agree to in order to get the business/contracts to begin with, and then our customers can't bill their customers, so they are not happy.
Even in languages like C# it will let you get away with lots of horrendous things. Generally unless you put on options like "Treat Warnings as Errors" most programmers will just ignore them, or wrap some statements in 'pragma' and disable the warnings. I've seen people just wrap an exception around the entire application or put a giant exception filter instead of actually fixing the problem.
Poor/Lazy developers will find ways around more stringent checks.
Are you sure? Your comment contains a minor syntax error.
Should you have been unable to submit it, or should people not be able to view it, until you correct it?
>My answer usually is “are you sure? “.
^
Line 1:
Syntax error: "“" not allowed here.
JavaScript is quite forgiving, but that's usually okay. If something doesn't work it's usually not the end of the world.
In this case everyone correctly read your second opening quotation mark as a closing quotation mark.
This allows us to focus on what you're saying (functionality.)
If we couldn't figure out why you included some typos, we would just ignore that part and focus on the rest of your comment.
When someone replies with the nitpicking style it doesn't help anyone. (In fact my first version of this comment was downvoted, before I wrote out the rest of my explanation.)
I think all the leniency in front end JS is pretty good for the same reason. It lets us communicate, and the sandboxed client environment (browser security is built assuming web sites could be malicious) means that the stakes are quite low.
Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, be liberal in what you accept").
"I bet people would pay more attention if the browser stopped rendering on JavaScript errors or misformed HTML/CSS."
This was strongly suggested by those who fought for strict XHTML, but then Sam Ruby, who was leading the HTML5 effort, asked the question, "I find an image that I know my daughter will like. I send it to her. It is SVG. She wants to upload it to her Myspace page. However, the image won't render, because SVG is a form of XML, and Myspace is non-compliant. And yet, if I send her a JPEG or GIF image, she can upload that to Myspace."
The point was we typically embed content from one page into another page, and no one believed there would ever come a day when every page on the Web would be strict compliant. So HTML5 went in the other direction, dropping most requirements and allowing pretty much anything.
As I've written elsewhere, the fundamental problem we face is that a markup language, such as HTML, is completely unsuitable to the apps we now like to build and run over the Web. We rely on HTML to function as the GUI of TCP/IP, but it was not actually designed for that, as it was descended from SGML, and it carries with it a publishing history. What would make more sense would be use of a data format, such as JSON or EDN, which can then be given visual characteristics, without ever having to participate in one hierarchy or any one understanding of a DOM. Developers understandably complain that Java/Swing had 9 different layout options, the product of much experimentation, but having a variety of layout options does allow more flexibility of styles of building a GUI, with some approaches being simpler than what we get with the React/JS translation into HTML.
From a philosophical perspective, I would say this is an example of the inherent finitudes of human understanding. And I would add that such finitudes are deeply intertwined with many other basic finitudes of human existence.
I firmly believe that in theory all computer systems can be understood.
But I agree when he says, it has become impractical to do so. But I just don't like it personally, I got into computing because it was supposed to be the most explainable thing of all (until I worked with the cloud and it wasn't).
I highly doubt that the original engineers who designed the first microchips and wrote the first compilers, etc... relied on 'empirical' tests to understand their systems.
Yet, he is absolutely correct, it can no longer be understood, and when I wonder why I think the economic incentives of the industry might be one of the reasons?
for example, the fact that chasing crashes down the rabbit hole is "always a slow and inconsistent process" will make any managerial decision maker feel rather uneasy. This make sense.
Imagine if the first microprocessors where made by incrementally and empirically throwing together different logic gates until it just sort of worked??
Even if you can reason about the code enough to come to a conclusion that seems like it must be true, that doesn't prove your conclusion is correct. When you figure something out about the code, whether through reason and research, or tinkering and logging/monitoring, you should embed that knowledge into the code, and use releases to production as a way test if you were right or not.
For example, in PHP I often find myself wondering if perhaps a class I am looking at might have subclasses that inherit from it. Since this is PHP and we have a certain amount of technical debt in the code, I cannot 100% rely on a tool to give me the answer. Instead I have to manually search through the code for subclasses and the like. If after such a search I am reasonably sure nothing is extending that class, I will change it to a "final" class in the code itself. Then I will rerun our tests and lints. If I am wrong, eventually an error or exception will be thrown, and this will be noticed. But if that doesn't happen, the next programmer who comes along and wonders if anything extends that class (probably me) will immediately find the answer in the code, the class is final. This drastically reduces possibilities for what is possible to happen, which makes it much easier to examine the code and refactor or make necessary changes.
Another example is often you come across some legacy code that seems like it no longer can run (dead code). But you are not sure, so you leave the code in there for now. In harmony with this article, you might log or in some way monitor if that path in the code ever gets executed. If after trying out different scenarios to get it to run down that path, and after leaving the monitoring in place on production for a healthy amount of time, you come to the conclusion the code really is dead code, don't just add this to your mental model or some documentation, embed it in the code as an absolute fact by deleting the code. If this manifests as a bug, it will eventually be noticed and you can fix it then.
By taking this approach you are slowly narrowing down what is possible and simplifying the code in a way that makes it an absolute fact, not just a theory or a model or a document. As you slowly remove this technical debt, you will naturally adopt rules like, all new classes must start out final, and only be changed to not be final when you need to actually extend them. Eventually you will be in a position to adopt new tools, frameworks, and languages that narrow down the possibilities even more, and further embedding the mental model of what is possible directly into the code.
I suspect that systems that defy understanding demonstrate something that ought to be a corollary of the halting problem, i.e. just as you can't figure out for sure how long an arbitrary system will take to halt, or even figure out for sure whether or not it will, neither can you figure out how long it will take to figure out what's going on when an arbitrary system reaches an erroneous state, or even figure out for sure whether or not you can figure it out.
I’m not sure about this. Define your “erroneous” state as “halt”. Now the question becomes, for a systems that halts, find out how it reached this state. The mathematical answer to this is simply the description of the Turing machine that produced this state. Whether you can understand this description or not isn’t relevant.
Recalls Gall's Law[1]. "A complex system that works is invariably found to have evolved from a simple system that worked."
Also, TFA invites a question: if handed a big ball of mud, is it riskier to start from scratch and go for something more triumphant, or try to evolve the mud gradually?
I favor the former, but am quite often wrong.
[1] https://en.m.wikiquote.org/wiki/John_Gall
reply