Indeed, one could even automate the process, some sort of tool that automatically copies the dependency’s code, kinda like an install. I think someone already wrote such tool.
Copying someone else's code and putting it in your software repository is an excellent way to find yourself violating a wide variety of open source licenses.
This is actually one of the most common causes for GPL violations. And companies have gotten into serious trouble for that.
Either rewrite the code, or maintain it as a dependency and follow the licensing rules. Don't simply copy code from it into your project unless you have explicit reason to believe that that is OK to do.
I'm not sure if I agree - commonly-used packages will have known and documented faults, while validating something you (your organization) created can be challenging.
I recall using an unusual library a while ago where the author had embedded their own log4j-style implementation. I couldn't configure that the same way I configured other logging.
Hardly a problem today since I just write to `stderr` and have the container log aggregator push elsewhere but boy was it annoying.
Avoiding dependencies is a noble goal, and something to be valued, but this simple rule is too simplistic.
The problem lies in the fact that there are a great many things I can hack together in an afternoon to "replace" some kind of external dependency, but the quality discrepancy of these hacks is highly variant. My understanding of what can or should be done in an afternoon might differ with my colleagues'.
Unfortunately, like all things in engineering, you have to carefully reason about the pros/cons, requirements, and costs. After that analysis, you can make a judgment on depend vs. build (also, buy vs. build).
Agreed. For libs that are "afternoon-y" in their scope (so, not an HTTP server or crypto), if you need to get off the fence you can use some cheap heuristics to assess the quality of a library without auditing its code. For instance, you can look at its popularity (in downloads or Github stars), its release/version history, its number of open issues, and its development activity. If I see high issue counts and many major releases with breaking changes, I'm going to avoid it. If I see 2+ years of stability with mostly minor releases, low issue counts, and high use rates, I figure it's going to probably be better than whatever half-baked solution I could scribble in an afternoon.
I wouldn't consider a high number of open issues a problem on its own. All big popular projects with a history have a high number of open issues. There are some exceptions, who may be closing isses aggressvely, but it is more about a style of managing of those issues, not about project health.
Over time an issue tracker inevitably becomes a collection of hard-to-reproduce bugs, incomplete patches, underspecified feature requests, random tracebacks, etc. Maintainers can choose to just close everything which is not actionable immediately, or be in comfort with such issues, and let them live in the bug tracker. I personally like a style when an issue is closed only if it is fixed, or if it doesn't contain useful information, or if it is a duplicate.
A better indicator is activity and responsiveness of the maintainers in the issue tracker.
I don't really worry about something I could write in an afternoon.
I can look at the code, get a good grasp of it (hopefully), judge the quality, docs, prospects of getting updates/needing updates/being able to update it myself, pretty comfortably. In other words, the risk evaluation is incredibly straight forward.
Additionally, the risk itself is fairly low. If it goes out of date or stops working or just turns out to suck, the most I risked is an afternoon of work. Leftpad was a debacle due to it's scale, but fixing Leftpad was pretty easy (I'm not recommending importing one liners as dependencies mind you)
-
But when it comes to stuff that isn't small, it's usually also the kind of stuff that holds the most insane amounts of risk for a project and is the hardest to evaluate.
Stuff like application frameworks, threading frameworks, massive networking libraries, etc.
The interface is _huge_. To the point that even when you try and wrap their complexity in nice packages with separation of concern and encapsulation they leak out into the rest of your code and end up being a nightmare to ever change.
Instead of spending an afternoon writing dependencies like this, spend that time investigating your "too-big-to-fail" dependencies. Try and keep a finger on their pulse, because they're the ones that will really come back to bite you if things go south.
> Additionally, the risk itself is fairly low. If it goes out of date or stops working or just turns out to suck, the most I risked is an afternoon of work.
Sometimes, the opportunity cost (time spent) is the largest term in the risk equation, but often there are other terms that might be orders of magnitude larger. For example, the risk of depending on the wrong abstraction, or becoming coupled to a hack.
What you're saying makes sense. My only point is that there's a lot more subtle judgment required in these decisions than often meets the eye.
A simple example would be an HTTP client. It’s easy to write a naive thing that makes GET requests with no request body, TLS, connection pooling, etc. Why should I use a dependency when I can write it in an afternoon? Well, I used to think that before I tried writing one :) The first draft was easy. Adding features got messy.
I had the opposite experience. All I needed was a way to do a simple GET. That's it (and that's all it still is, by the way). Instead of spending half an hour writing the code, I decided to use libcurl---that's what it's for, right?
Until I found it wasn't installed on some of our test machines (it was needed for testing, not for production and for reasons beyond my pay grade, libcurl was not available on the machines). Then I thought, well, I could include the libcurl into our vendor repo. It worked, but it was a nightmare to use. It took way too long to figure out the proper "configure" options to use for what systems, it nearly tripled the time to build it on the build servers, and even then, it was hit-or-miss.
After several years of this misery, I removed libcurl, and wrote what I should have years earlier. Using libcurl as a dependency did NOT save us any time.
> The problem lies in the fact that there are a great many things I can hack together in an afternoon to "replace" some kind of external dependency, but the quality discrepancy of these hacks is highly variant.
Perhaps it's a domain-specific thing, but when someone uses the words "hack together" I imagine it means using dependencies without really understanding what's going on in them, precisely to avoid figuring out how to code a solution properly.
Writing it yourself obviously needs to also imply doing it correctly. Even if that means you must learn a bit about what is the right way to do it (a side benefit, though usually viewed as a downside).
This is horrible advice. There's a reason that you don't write your own hashtable implementations.
Yes, I can write a hashtable implementation in an afternoon, but it's going to have bugs that I'll spend the next year fixing, and still not achieve the performance of the pre-built version.
All that work of finding existing solutions and learning how to use them? That's part of the job.
Find a bug in the dependency? Submit a patch.
Worried about the dependency changing? Lock the version.
Too many external repos to retrieve those dependencies? Use a local cache.
> I can write a hashtable implementation in an afternoon, but it's going to have bugs
If it has any bugs that would surface in a year of production (while the dependency version wouldn't) then you didn't write an equivalent in an afternoon.
The advice, if it's to be useful at all, must be things that you could completely replace in the same quality, in an afternoon.
I'd extend "afternoon" to "half a week", but in general I agree with OP.
> Yes, I can write a hashtable implementation in an afternoon, but it's going to have bugs that I'll spend the next year fixing, and still not achieve the performance of the pre-built version.
The meaning of "afternoon work" should be considered "of good enough quality". Tests, structure, reasonable docs, all that. It shouldn't be a fastest written something, it should be a normal code.
> and still not achieve the performance of the pre-built version.
Some losses in performance are acceptable for greater visibility and better fit for the project. If you need non-trivial performance gains - well, those are also achievable by code, are you sure you can actually write such code in a few days?
> Find a bug in the dependency? Submit a patch.
That's the point. To submit a good patch, you have to internalize the system. It's easier to do if the system is yours - doesn't do much except what you need.
> Worried about the dependency changing? Lock the version.
Now you've locked yourself out of upstream bug fixes.
We do reinvent the wheel whenever we need to have an actual wheel for a device, not an abstract concept. Similarly, we write for loops, "reinventing" them for our specific purpose. Those are all different wheels, loops and needs. Don't mistake the "idea" of a hashtable with an implementation.
> Worried about the dependency changing? Lock the version.
And get p0wned a year later when some security researcher finds a vulnerability in code that you don't even use, but pulled in as part of that dependency.
> This is horrible advice. There's a reason that you don't write your own hashtable implementations.
Of course you do and release(d) them as open source (public domain). Take Java - it has decent a HashMap but it's node based. It's memory inefficient to a point its nodes and arrays are top 3 of memory consumption. An array based hashtable takes around 3.6 times less memory for larger ones (on 4bytes compressed pointers) and over 10 times less for smaller ones. Perf. wise it's on par or better as well (nowadays architecture is heavily driven by locality and access patterns)
Also you make your code so it can switch between both on the fly, if need be.
> Of course you do and release(d) them as open source (public domain).
How ironic though. Of course it did work a few times, but if the advice is to not use dependencies, then the better advice would be to not use dependencies that were written in an afternoon to avoid using some other dependency :)
Indeed! Although I spent like a weekend to do it (the inital release was 512 loc). It passed all standard jdk/jsr-166 Map tests[0] and then some more, incl. perf., memory consumption, garbage collection harness. Tests are also public domain.
Also the release is not available as dependency, so the interested user would have to clone the repository on their right own.
The part with afternoon deps would be that all their code can be read and cloned, if need be. Free to pick the few functions needed - I'd assume around 200-400 loc top.
Agree-ish - buttt makes me think: How about... dont keep a dependency you could replace with an afternoon of programming?
Factor, re-factor, and (most especially) DELETE should be tools in the toolbox -- but see if you need it/keep it (e.g. protoype it in, etc first) before you re-write.
Lines of code is a passable metric for the quality of work if they are considered "spent", not "created". That is, the programmer's work is better, all else being equal, if it takes fewer lines of code. Who said so? Knuth, Wirth, Dijkstra, Perlis?..
A hundred afternoons later my small application is finally completed; now I must maintain update and document it all for ever rather than relying on third party components.
What I really wish is people would look closer to home; for example use some of the thousands of functions that ship with your operating system before downloading a package.
This reminds me of the Node left-pad module problem in a way. I think if something is so trivial to write, you should write it rather than using a dependency.
If it is non-trivial, I prefer the official standard libraries for a programming language. That is if a solution exists in the standard library.
I think the Go standard library with its batteries included mantra and the level of support it gets is good example of a library that should be used when a solution exists within it or by utilizing it.
If you copy the code into your project you at least need to keep track of the original authors and licensing or you're in violation of the copyright 99% of the time.
The single best way to avoid dependencies is to use a language with a large standard library.
Given the variance in standard library coverage, it’s rarely productive to argue about this topic in a language agnostic way. Using only stdlib in Go is very different from using only stdlib in JavaScript.
The best way to avoid dependencies is to use a language that is built in a way such that dependencies are worthless. Like APL, J, or kdb+/q. All of these languages are incredibly small, have almost non-existent standard libraries, and yet are designed in such a way that a large standard library becomes superfluous.
Having a large standard library speaks poorly of the composability and orthogonality of language primitives.
Because kdb+/q has such few datatypes and you can't define anymore, it's very easy:
q) .j.k"[2.2, 3.5]"
2.2 3.5
Serialization is also trivial:
q) .j.j 0 1 2 3
"[0,1,2,3]"
In general, sending stuff across memory boundaries (files, network, ram, etc) is exceedingly trivial in kdb+/q. To execute a function on a remote server, simply connect to a server and send across the call to the handle. For example, to send synchronously compute 1+1:
q) h:hopen `::6666
q) h(+;1;1)
2
You can send over anything you want, even the entire source code of program to be executed! This is a really flexible environment, where you can create really powerful app-engines. All members of a cluster can send code, data, and messages to any other node, async or sync.
Much like Common Lisp and SmallTalk, you can easily connect to production nodes and modify code while the service is running.
It's rare to find such a dynamic, flexible, and interpreted language that also has world class performance, even often beating hand written C. Combined with an integrated database, you get a distributed system that can't be beat, at least performance-wise. And the craziest thing is that all of this sits in at 650kb executable with libc has the only dependency. And all probably in less lines of code than a simple javascript webapp!
i'm so used to python's stdlib that whenever i go to javascript i get legit angry.
i really like writing typescript code, but holy shit, when you have to pull libraries to even the smallest thing (lol left-pad), it get super infuriating.
JavaScript standard library was maybe bad in the past but nowadays they added a lot of features (yes, even left-pad).
Sure, if you know that you have to support old and broken browsers you have to use these dependencies to ensure correct support, but if you know that your code will run on a specific interpreter (for example a modern version of nodejs, or modern browsers) you don't have to worry about it too much.
Also most JavaScript programmers tends to abuse dependencies, I mean even for things that are really 10 lines of code that you write in 1 minute.
Is that really 5 minutes? (For when left-pad was relevant)
var cache = [
'',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' '
];
function leftPad (str, len, ch) {
// convert `str` to a `string`
str = str + '';
// `len` is the `pad`'s length now
len = len - str.length;
// doesn't need to pad
if (len <= 0) return str;
// `ch` defaults to `' '`
if (!ch && ch !== 0) ch = ' ';
// convert `ch` to a `string` cuz it could be a number
ch = ch + '';
// cache common use cases
if (ch === ' ' && len < 10) return cache[len] + str;
// `pad` starts with an empty string
var pad = '';
// loop
while (true) {
// add `ch` to `pad` if `len` is odd
if (len & 1) pad += ch;
// divide `len` by 2, ditch the remainder
len >>= 1;
// "double" the `ch` so this operation count grows logarithmically on `len`
// each time `ch` is "doubled", the `len` would need to be "doubled" too
// similar to finding a value in binary search tree, hence O(log(n))
if (len) ch += ch;
// `len` is 0, exit the loop
else break;
}
// pad `str`!
return pad + str;
}
One of the points of the article is that when you code a dependency for your purpose, you get smaller code. For example, I know I'll always pass it a string, so I don't need to type-check.
I started with the problem of "I need a function that pads a string with a character out to some length". Coding it up took under 1 minute, easily under 5 minutes.
function leftPad (str, len, ch) {
const neededPadding = len - str.length;
if (neededPadding <= 0) {
return str;
}
return ch.repeat(neededPadding) + str;
}
It took me longer to write this comment.
The fact that the left-pad code has optimizations (which don't matter for the place I'm using it) and type-checks (which don't matter; my higher level unit tests would catch that mistake) is beside the point.
More generic is better. If I go into a codebase that uses standard libraries. Even if I don’t know how to do something about it someone on the Internet does. Your custom framework - not so much.
I don’t have to care about how the underlying libraries work. I can treat them as a black box.
> I don’t have to care about how the underlying libraries work. I can treat them as a black box.
When I wrote haskell, the majority of the libraries did just work, and I didn't have to dig into their code to find bugs often.
When I wrote javascript, hundreds of the libraries I used did not just work. I usually had to care very much about their details because they were poorly implemented, full of bugs and incorrect abstractions, and often abandoned soon after.
I agree that there's benefits in reusing some well-socialized and well-implemented generic frameworks and abstractions.
It's not worth using generic abstractions that are not well understood, buggy, and don't match your needs closely. In that case, write your own.
More generic is not always better. Above, I'm arguing that it's important for code to be easier to reason about. If a generic abstraction helps with that, cool, but it's not always going to be the case.
That’s also why I stay away from the clusterf%%% of front end development and JS if possible except for simple AWS Lambda scripts that have one dependency - AWS SDK.
Any other scripting I do with Python. Any more complicated development it’s using a language with an ecosystem with adults - C# or Go.
Almost everything I work with has bugs, so chances are I'm going to run into one. It's a lot easier for me to fix bugs when there are fewer layers and more of them are written by me. Of course, I can't write all the layers, but if they run on my service, I have to be prepared to fix them, or suffer from them being broken until a benevolent force fixes them for me. (Sometimes that happens, but usually not for the harder problems)
As long as it works and is well documented, yes. But the moment there’s a bug somewhere or the docs aren’t adequate for your needs, you are in much worse trouble.
Again my general rule - you (generic you) are no special snowflake. What are the chances that you come across a bug in a library that has had 2,176,677 (in the case of Dapper) or Entity Framework (supported by MS) that no one else has come across, found a workaround, and posted the answer somewhere on the Internet compared to your code where you didn’t think about a corner case?
I can count on one hand[0] the number of JS dependencies I've used in enterprise/large projects for extended periods of time where I've never needed to manually debug the library or read the source code. For enterprise software, sometimes the easiest solution is to directly hotpatch a vendored version of the dependency.
This is especially true with dependencies where bugs are fixed in major versions, but where upgrading and dealing with breaking changes would require significant code refactoring.
To drive the point home, I've been bitten by bugs in NPM itself.[1] Fixing that required reading through the source and manually swapping out one NPM's internal dependencies to a newer version.
And it doesn't matter if someone somewhere has had the same problem and posted it on the Internet unless I can find their answer online faster than I can fix the problem myself in my own library. Often this is not the case, filtering through issue trackers and trying to find the one blog post or comment that tells me how to solve the problem can be a big time sink.
[0]: Okay, maybe 2. But the point stands, it's not a rare or exceptional occurance.
I've had some serious train wrecks because of shit NPM libraries over the years.
Probably the worst was with a decently popular library someone had brought in, that tried to do a refactor from callbacks to async/await, without understanding at all how async/await worked. They'd leaked an async operation in the library code, so 'await'ing a specific function call in their API that returned a promise, didn't actually await everything the call was doing, ending up in a debugging nightmare. Of course their perfectly manicured suite of 8000 tests with 110% test coverage didn't catch it either, because the number of people who can write good quality tests is shockingly low, and library-writers aren't somehow magically ahead of the pack in that regard.
JS really feels like PHP did back when I was a newbie learning that shit. In other ecosystems, the 95th percentile devs seem to write all the libraries, so everyone comes here and posts repeatedly about how great dependencies are. In JS, it's the average dev writing all the libraries, and the average dev's code is enough to make my brain bleed.
I'm a big proponent of different advice for different ecosystems. If you're doing front-end JS, the pendulum has swung so far to one side that 'NIH syndrome' is treated like it's going to lead to the fourth reich, which makes 'chill out a bit on dependencies' pretty good advice if you're looking to get a leg up in the industry. But I'm sure there's other ecosystems where the same advice will just leave you with a tangled mess while your competitors leapfrog you in productivity with a good 3rd party dependency.
I'd say take any advice in threads like these with a grain of salt unless it's given in a bit of a narrower context. Taking some one liner about software engineering in general and applying it to your specific project is probably just a coin flip as to whether it's going to improve your code or not.
This is actually a great example of why you SHOULDN'T use left pad.
I've not profiled it, but I'm going to guess that now-a-days this will be faster than the current implementation on npm.
function leftPad (str, len, ch) {
str += '';
len = len - str.length;
if (len <= 0) return str;
if (!ch && ch !== 0) ch = ' ';
ch += '';
return ch.repeat(len) + str;
}
Why? because the VM is (very likely) going to do exactly what the cache would have done. It can replace `ch.repeat(len) + str;` with a presized string allocation and a memcpy of ch + str characters.
> This code checks for ch !== 0, yet the Number type in JS has no repeat method.
> Static typing would actually fix this kind of coding error.
That's not a coding error, that's addressing an oddity of JS coercion rules that a less experienced developer could easily have missed.
> if (!ch && ch !== 0) ch = ' '
That code says that if `ch` is falsey and not equal to 0, then set it to a space. The only arguable falsey value that should be excluded here is a literal `false`, but that's not a single character and is fairly ambiguous either way. I'd certainly fall on the side that a literal false should not be converted to `'false'` here.
> ch += '';
The next line converts to to a string by adding it to the empty string.
> return ch.repeat(len) + str;
So by the time it gets to this line we know ch is a string.
Static typing is great, but the bug you claim is there is not actually there.
What magic would enable that? I don't think the VM is likely to cache a string of 6 spaces, it has no way of knowing that would be a common parameter, and no heuristic to determine that it's likely.
It may specialize or inline, but that's a separate matter.
JIT is the magic sauce and this a pretty regular optimization.
Step 1, inline repeat.
Step 2, remove the intermediate array allocation
Step 3, allocated a string array sized for the pad + str
Step 4, Use one of the many CPU instructions to repeatably copy the padding character and then the `str` into the same array of characters.
None of these optimization would be out of the question for the Jit (and I'd expect them). You don't need the cache at all, it's just a waste. The only thing it saves it creating the intermediate string which is HIGHLY likely to be optimized away with the simple code.
Yeah benchmarks can be much more difficult to get right than it might seem at first glance. Good thing they didn't try to write their own benchmarking code, otherwise they might have fallen into those traps you just mentioned.
Luckily, they didn't, and instead pulled in the the `benchmark` library as a development dependency[1]. The author of said library works on V8, and already considered all those problems and much, much, more[2].
Okay, you win. I'm not going to read the entire source of that package just to make a point. Though I do find it strange that the author of that library would write an entire blog post on the topic and then not take his own advice in the implementation of the library he wrote.
Yes, particularly when you know the types that will be passed in. True that typechecking in JS can be a little timeconsuming, so it might take 10-15 for me to write a general purpose left pad.
Also I seem to remember that someone benchmarked the cached version and found it to be slower than the naive approach anyway. I could be mistaken there.
This is so hilariously overengineered: not failing when passed the wrong types, arbitrarily caching padding of len < 10, etc
The most egregious is the bad big O analysis for a pointless binary search. The loop does indeed run O(log(n)) times but `ch += ch` still takes O(ch.length) which is growing exponentially. It ends up being a complicated way of still taking O(n) time while creating a lot of intermediate strings.
It isn't any faster than just creating the padding with a loop or `new Array(len).fill(ch).join('')` or `ch.repeat(len)`
It's not overengineered if thousands of downstream projects are relying on it, some of which might see significant benefits from those performance optimizations.
Too bad that's a flawed benchmarking methodolgy. JITs are notoriously hard to correctly profile and the benchmark lib isn't even sort of doing the right thing.
For example, it's missing warmup. The results aren't being consumed in a way that wouldn't optimize them away. The framework itself imposes a pretty large amount of overhead (moreso than I'd expect from leftpad).
It is somewhat likely that what they are measuring isn't leftpad performance, but rather how fast the JIT ends up optimizing the benchmark code.
Yeah benchmarks can be much more difficult to get right than it might seem at first glance. Good thing they didn't try to write their own benchmarking code, otherwise they might have fallen into those traps you just mentioned.
Luckily, they didn't, and instead pulled in the the `benchmark` library as a development dependency. The author of said library works on V8, and already considered all those problems and much, much, more[1].
There's no portion of the code that does warmups. There's no portion of code that "blackholes" the results to keep the JIT from optimizing away the code under benchmark. There is a lot of code though... so that's... good?
You make the assumption that just because a lib is popular or widely used it is "correct" or "the best". When it comes to microbenchmarks, that's usually flawed. Very VERY few people actually get them right, benchmark.js is no exception.
That, of course, doesn't mean that benchmark.js can't be useful. For macrobenchmarks it will be roughly right. However, for something as small as leftpad, it's almost certainly not the right way to measure performance.
Okay, you win. I'm not going to read the entire source of that package just to make a point. Though I do find it strange that the author of that library would write an entire blog post on the topic and then not take his own advice in the implementation of the library he wrote.
Tell me, where in that blog post does he mention doing warm up cycles or avoiding having the JIT optimize away the method? (Hint, he doesn't mention that... so, no, he didn't actually miss his own advice.)
The article is completely consumed with getting the timing of benchmarking right. Which, to be fair, is a place where microbenchmarks often go wrong. It, however, isn't the ONLY place they go wrong.
> There's no portion of the code that does warmups.
This isn't true - Benchmark.js will repeatedly rerun benchmarks until it gathers statistically meaningful results.
> There's no portion of code that "blackholes" the results to keep the JIT from optimizing away the code under benchmark.
True, and there's actually nothing benchmark.js can do to ensure that doesn't happen in the general case but when this does happen the results are usually pretty obvious - we'd see billions of ops/sec. Incidentally the left-pad benchmarks do not suffer from this issue.
> Luckily, they didn't, and instead pulled in the the `benchmark` library as a development dependency. The author of said library works on V8, and already considered all those problems and much, much, more[1].
This is the "benchmark lib" that was mentioned in the very first sentece of the comment you replied to.
That would depend on the underlying implementation of the string object, no? (If the string is implemented as a linked list it's O(n))
Anyway, it makes no practical difference in this case, since the one they labeled "O(n)" is the naive implementation that most people would write if they implemented left-pad themselves.
I highly doubt js engines would compile a string down to a linked list. But you're right they might compile it to a circular buffer or deque which can have O(1) prepends.
Well, my point was more that if a programmer thinks '5 minutes work' it's often 10+ minutes; so when a programmer thinks 'an afternoon', you can possibly lose a week. And then the article really doesn't work.
And yeah, I would and do write leftpad myself it it's not in the stdlib. But if there is a large library full of similar (string) functions that I might need, I would include that library. Not a singular dependency for this type of function.
The issue I have with this is a lack of specification. Left pad _what_?
Numbers or ASCII-only-printing? OK that's a reasonable. Is there a desired overflow behavior?
Past that it becomes more an issue of where and why. The suddenly not-trivial example includes questions about fonts, layout, and multi-byte characters. Emoji, etc.
Incidentally, in pseudoscope:
Create a valid full-space pad string (termination / etc), then decrement back from the end of the source string and over-write the pad characters from the end to the start of the string, exiting either on no more pad characters or no more input.
A second algorithm might combine those two steps as one pass, fill the output buffer from back to front. Only for C style strings would this be an issue given the dynamic end point for the data structure.
That was my first thought: I've seen these projects before — they're where you find 5 slightly different implementations of similar logic, no logging or tests, failures as soon as someone uses Unicode, etc. and I get an order of magnitude performance improvement by replacing that code with an external module which has had the other 19 afternoons' worth of work it actually takes.
> and I get an order of magnitude performance improvement
Have you heard the adage about premature optimization being the root of all evil? Yes, even with the second part. What is the premature optimization here, in your opinion?
Most of cases developers create something new - that's the state of industry now, not too good but it's how it is. If you'd be refactoring the existing code - sure, find the problem, design the solution, have reasons going from A to B. If, however, you're writing new functionality, you don't know if you'll have problems of this kind with this code - so optimize for developmentality. You can remove those excessive crutches later - if and when you need them. In my experience, having them trumps looking into code and spending time figuring what it does mere months later - your own code, that is.
The point was that when something is large enough to be “an afternoon”, it's probably more work than you're expecting and you haven't yet discovered important details. If there's something which does what you need, it's far more likely that _other people_ have invested time sanding off the rough edges which you have yet to discover.
If it's not hard to use that library you're probably better off unless it's a problem you understand very well and will see a real advantage to tackling differently. For example, if you use a library and don't like it that experience will still be useful for having clarified what exactly it is that you want to do and the rough size of what you're taking on.
Well, when the programmer was burned too much by incorrect scoping before?
Don't buy generalized statements like "programmers are always underestimate efforts needed", or even, for that matter, "a task always requires all the possible time it might take" (Parkinson's law). There are exceptions from them :) which sometimes, in a good team, look more than laws themselves.
"This'll take an afternoon" - three weeks later......
Programmers are notorious for this.
BUT even apart from this problem ... you absolutely should use every dependency you can that will save you time.
Try to write less code not more. When you write code you write bugs, add complexity, add scope increase need for testing, increase the cognitive load required to comprehend the software, introduce the need for documentation..... there's a vast array of reason to use existing code even if you truly could estimate it and build it in an afternoon.
You also assume that you understand all the edge cases and fickle aspects of the dependency, all the weird ins and outs that the dependency author probably spent much resources understanding, fixing and bug hunting.
There's a hard fact that proves the above poster to be wrong..... how many dependencies took only an afternoon of time in total to write? Hard to say (maybe look at the github commit history) but I'd guess almost none. It didn't take the dependency author an afternoon, so why will it take you an afternoon?
Even worse .... you just lost an afternoon coding features for your core application.
Multiply this by every dependency that "you could build in an afternoon" and you'll be in Duke Nukem Forever territory.
I'd advise doing the opposite of this articles suggestion.
Find a dependency that will save you an afternoon? Grab it.
Somehow here we assume a good programmer routinely makes errors in time estimation by an order of magnitude, yet conveniently forget cases when, say, a non-trivial GPL library is embedded as a dependency into a project, and customer is asking for code, and legal team runs with hairs on fire because company didn't plan to release the code...
But that is a completely different topic. Licensing is an issue, yes, but that is part of the upfront decision process.
In this day an age, this many years into open source licensing, if your team is not on top of that from day one, they have failed as a team.
I worked at Bell Labs from the mid 1980s to 2000, and by early 1990s (1992? 1993?) they already had a full internal team dedicated to open source licensing issues, including training and consulting. That was 27 or 28 years ago. Before some of the developers on this thread were even born.
Exactly. I'm not reinventing the wheel. I may write some convenience wrapping around Spring Security, for instance, but why would I rewrite auth-z when it's a solved problem?
> you absolutely should use every dependency you can that will save you time
> Find a dependency that will save you an afternoon? Grab it.
Agree. The point of the article, though, is that dependencies are often saving much less time than they promise - so much less that it's better to avoid them.
And when you run into a bug or design problem in a dependency of a dependency of a dependency?
It often takes less time to write some code than to understand someone else's code.
Most programmers I've worked with get lost easily when jumping through layers of other people's code. I certainly do.
Solid, well tested dependencies that solve hard problems are worthwhile. But dependencies have a cost in debuggability and maintenance, so it's worth using them with care. And often, they aren't worth the time, when compared to writing a dozen lines of code.
I get easily lost jumping through layers of code written by corporate developers in an afternoon. I generally don't have problems jumping through layers of popular, well documented and single-purpose third-party libraries.
While I agree that if you think it'll just take an afternoon, for the sake of this article it had better!
But conceding that charitable assumption to the article, I agree with its basic premise: dependencies cost a lot of time in diffuse, non-codey ways.
There are AAA dependencies you pull into every project, but most other dependencies require a good degree of due diligence, evaluation, risk, and their own long-term maintainance.
Its not that it always tips the scales all the way to 'roll your own', but
I think the cost of new dependencies is underrated.
But that analysis is part of the design process on the front end. You don't just 'take' libraries or utilities without evaluating them. And you don't just write bespoke libraries without thinking about the APIs.
So do your upfront work, by all means. It isn't an all or nothing decision.
- Dependencies break over time. They have a nonzero maintenance cost.
- They impose API boundaries on you that may not fit your existing data structures
- It's harder to change underlying bugs
- They might introduce security issues
Sure, use dependencies. But there's a reasonable position between "never write any code" and "never take on dependencies". Of which NPM is one of the only ecosystems being at one extreme.
I could say all of the same things about the in house tools that the “architect” wrote three jobs ago - including the bespoke ORM, object mapper, and logging framework.
Or two jobs ago where two developers who had worked at the company for 10 and 15 years respectively were maintaining a bespoke 15 year old EHR written in PowerBuilder and depended on SQL Server 2003 - in 2016.
Every company thinks they are their own special snowflake where cross cutting concerns can’t be handled by a third party.
Funny you should bring up these as examples. I've made (1) and (3) using an existing object mapper. They were made out of limitations with JPA/Hibernate and for higher-level functionality in logging. The ORM was never sent to prod. The logging events were gold. Specific events were 'major' and filtering on a user ID in a narrow timespan could show the expected/unexpected events for the traces as a sequence diagram through all the microservice layers. Clicking on an event then searched Loggly for all logs for trace-id at time. It got to the point that non-techs were answering customer issues with it and we hardly had to check/wait for Loggly to answer.
This is pretty much spot on. Except you also missed the main cost, which is the insane amount of time it takes to learn the 85 dependencies on your project to an extent that you actually understand what your code is doing.
Every single project I go into seems to have a smorgasboard of dependencies, then when I take the time to investigate one of them I find out it's being used incorrectly by at least 50% of the team because they don't even understand how they work at the most basic level. Which is pretty understandable because by the time anyone gets through understanding 10 of the 85, they've probably been kicked off the team for not actually building anything.
People love to say rubbish like "write less code!", as if LoC is the only metric that matters (weren't we past that thought process by the 90s?). Which goes a long way to explaining all the fucking terrible codebases I have to work with where it's impossible to accomplish anything without reading documentation for 8 hours, when it would take 20 minutes to just read even a semi-readable piece of code that implements whatever requirements you need from the dependency.
On a C code project for a large Fortune 100 company a half dozen years ago, I encountered a pesky header include that made no sense. And that header was part of a patch that I really did not want to pick up, so I started digging into it.
Turns out that they had some constant in the code, and the developer just did a grep for that value in the source tree, and that constant already existed in an existing header file, so they just included it.
And that CONSTANT_VALUE_STRING had nothing to do with the technology that the C source was addressing. So some lazy slacker pulled in a random header file that contained the proper constant value for an unrelated technology.
The dependency on that was pure lunacy on so many levels.
And that was an internal dependency, not an external library.
So the lesson here? Not all dangerous dependencies are external.
Everyone who's used NPM in production for a not-insignificant amount of time has realized just how bad nodejs dependency hell can be. Unfortunately, webdev-du-jour has decided pulling in a hundred npm packages is better than writing a few hundred lines of code.
I keep hoping things like [1] are a joke but I'm starting to suspect they're not.
I'm sorry that my framework and bundler are using so many packages. Lemme just quickly install Android Studio and download a few gigabyte to develop and build my application. Ah yikes I'm on a different version, need to redownload now.
At least Android Studio doesn't break when you try to deploy it a few months down the line (with package lock), with the exact same version, because a dependency of a dependency of a dependency made an unreviewed and untested "security fix" that caused a regression.
> "This'll take an afternoon" - three weeks later......
> Programmers are notorious for this.
From my experience with these personal failings, the problem usually comes from the question being phrased in the context like, "before you begin working on this, how long do you think this will this take you to complete?". If there's no opportunity to scope, with requires not insignificant work towards the solution, the estimates will always be wrong. If I understand the actual scope of the problem, which means have the architecture mostly worked out, and have a bit of experience (and luck), my estimates can be pretty close, usually eaten up by that oh-so-seductive feature creep that ruins my work file balance.
I recently read through the (free online) 'book' on Basecamp's 'shape up' methodology; I thought the 'hill chart' describes this really well - the work needs to be in progress going up the hill discovering what it's all about, before you get to the top and can accurately assess how much 'real work' (!) there is to do, and then it's all downhill from there.
> you absolutely should use every dependency you can that will save you time.
Absolutely. As long as it does save you that time over the foreseeable lifetime of the project. Or you are deliberately incurring a technical debt because of some deadline.
On the other hand, saving an afternoon (or even a week), over the next two weeks means very little.
Essentially, it'll take you an afternoon to write and then weeks of work properly fixing the bugs and handling the edge cases. Potentially and probably, while you're trying to do something else.
We all too often forget the scope: requirements, developing, testing, to say the least.
My favorite example is NPM. While the author has a point, I tend to rely on the wisdom of the crowd. Sometimes there is a reason why a couple of million developers - in the case of NPM packages - seem to be lazy.
In my experience, we ended up copy/pasting and modifying some code and syncing it with the "superfluous" package. Good intentions, badly executed.
Leftpad was the right itch at the right time and people found better ways to deal with NPM. NPM got better after that, as well as native implementations.
This goes both ways. When was the last time someone properly scoped the maintenance effort of an external library? This goes double for external systems, like kafka or mysql. I've never seen anyone so far even get within two orders of magnitude of the real cost of operating kafka, much less an organization that accurately compared that to the cost of DIY.
The "don't reinvent the wheel" argument often acts as though using a 3rd party lib is "free", and building it yourself is costly with no benefit.
This is sometimes true, but often not. From SFTP libraries to SVG rendering libraries, there have probably been about 3-5 major dependencies of my company's project that I have had to learn and extend or fix bugs in to make them work just in the last year.
And sometimes this means using our own fork that we have to keep maintained.
I'm not saying I would have rather written these particular dependencies from scratch, but they were definitely not cost free. Nor are they all of better quality than what I would have produced had I written them from scratch.
That's the other common refrain - to "defer to the expertise of the crowd".
Don't get me wrong, many 3rd party libraries are of great quality by amazing men and women who I am very thankful for. But certainly not all of them.
There's no magic that says "every third party library is made by an expert with the highest standards".
It really depends (haha) on what it is. I needed to copy a file in npm scripts. can't use `cp` because that fails on windows. I looked on npm to copy a file. First hit 197 dependencies, 1170 files, 47000 lines of JavaScript.
Taking 197 dependencies means 197 things that need updates several times a year at a minimum. Any of those updates could break my code, introduce a bug, add a vulnerability on top of the ones already in the packages. So it's not like adding more dependencies is magically free.
I’ve been working on the same medium-size (fewer than 1M LoC) codebase for about 7 years now. I feel like over the years, my estimates of how long something will take have gotten better for one reason: I’ve found the scaling factor I have to apply to my intuitive estimate that brings into the realm of reason.
So, if I think something looks like about a day’s work, I’ll actually estimate it at about 3.5 or 4 days. Thus, for a project to qualify as “just an afternoon,” I’d have to naively estimate it at under an hour.
I rarely have time to spare, but I also rarely go over by more than maybe a third.
Your multiplier may vary depending on how horrifying your codebase is. On a side project with good test coverage, my multiplier is only about 2.
I do this all the time. My head tells me "five lines, tops" -- corresponding to about 10 minutes of "programming." Add in testing, bugs, another 10-20 lines of comments and docs, we're looking at an afternoon.
Never do I give that raw 10-minute estimate to anybody, because it can be wrong by a factor of 10.
Ironically, these days with front end development I'm finding it hard to accurately scope how long it will take to incorporate 3rd-party dependencies. The docs make it seem straightforward enough, but they don't cover how to use it correctly under TypeScript instead of ES, or how to use it with Angular instead of React, or how to build it with Rollup instead of webpack, and I often spend an entire day googling obscure blog posts on how to get a dependency working in my own ecosystem.
(1) Most programs don't have that many direct dependencies, especially smaller dependencies. It's often dependencies of dependencies. In npm, adding just `jest` will make your `node_modules` directory explode.
(2) if Linus's Law is "given enough eyeballs, all bugs are shallow", then having fewer eyes on a library means more bugs.
(3) Oftentimes you think you can replace a dependency with an afternoon of programming, but it turns out, it's not quite as simple as you think.
(4) Sometimes you only use a small piece of a library to start with, but over time use more and more. If it's your own code, then you're going to be continuously refactoring, updating, rewriting it. If it's a library, then you can just start using the additional pieces as you need.
Even if this is true, the problem is you don't know what's going to take an afternoon of programming. How many times have we all said "should only take an hour" just for it to take four days?
If someone spent the time and effort building something you need, I don't see a problem with using it. It all depends on what kind of system you're building, what the security and stability guarantees are, and in general what trade-offs you want to make.
I knew which video it would be before clicking... But to your point they're often not really "store-bought" bricks though. More like bricks someone was giving away on the side of the road, you're free to use them to build your house but with no guarantees they work and the instructions are missing or incomplete. Oh and they're the wrong shaped brick but you only figure that out later.
Ugh please just don't use popups at all. I block every single one I see with custom CSS rules. GDPR cookie warnings, "please subscribe", stupid "can i help you" chat popups, everything.
When coding, be prepared to put it in the garbage bin! If you're prepared for this, you can code things more quickly, and not worry about the code slowing you down later. This works when you're unsure exactly what to build and need to iterate (agile). First build is an iteration, not arbitrary "sprints" or "increments"!
The cost of rebuilding MUST be budgeted though. If you don't have this freedom, things are bound to suck one way or another. Then, next best thing, build it as simple as you can, and put effort into making it composable and pluggable. So you retain freedom to swap out components. This is also an investment, and takes some more time and effort.
If you even can't even have that, results are bounded by those restrictions.
My rule is, don't use a dependency to implement your core business. Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.
To be clear, this is about the lifetime support of code. It's very, very rare that code can be written once and never touched. But that long tail of support eats up time and money, and is almost always discounted in these conversations. I don't even care that Jackson JSON parsing has years of work behind it, when I can hack together a JSON parser in a day. I care that Jackson will continue to improve their offering without any further input, while that's not true of my version.
Well, one special edge-case would be where you only need to parse some extremely tiny subset of JSON (for example: you only need to parse dictionaries whose keys and values are positive integers, like {1:2,3:4}). Then, depending how expensive the full json parser is, it might be worth your while just writing the limited parser yourself.
Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse, but that's not a law of physics. Sometimes in certain limited, well-defined projects, it really is true that YAGNI.
No, I won't beat them. But if it a limited subset that I can implement with twenty lines straightforward code, that will often be cheaper.
I've been on projects where they imported xml-parsers many times bigger than the rest of the whole codebase just to send a well formatted order number.
With all the technical debt associated with it, which is the problem basing your project on a dependency that would allow you to easily scale and add features is a huge benefit.
This is like saying you should roll your own crypto because you only need to do a very limited sub set of crypto operations so why use something like NaCl or Tink.
Encryption is a terrible edge case. If you are forced to half-ass encryption, you should seriously question the project requirements. Bad encryption can be worse than none at all. Things won't end well if data security is treated as a detail.
YANGI is nice but when the PM asks you why it would take two months to accept a new JSON format from a client and you’ll answer well because we didn’t want to use an industry standard fully functional and vetted JSON parser so we essentially wrote our own edge case parser we both know how that conversation will end.
And YANGI doesn’t have anything against dependencies.
When there are new requirements you do a quick estimate if you should add four new lines to the existing 20 or if it is worth to switch to an external library. 4 new lines to the 20, just add them to the core. But if this is regularly occurring that you have to add things, or requirements affecting this particular little parser that was supposed to be simple and static isn’t, then you should probably change your decision and use the library.
But you do that only then. Because chances are that with your approach you are going to drag along a large generic library that you only use a tiny fraction of. And that also has costs. In particular if your immediate impulse always is to add another library instead of writing things yourself.
By using a third party library you are writing twenty lines less code, so it's cheaper in that aspect.
There are probably libraries that are faster than your twenty lines of un-optimized code, so it's cheaper as far as computing resources are considered too.
The only time it could matter is when you ship the code to the client through the wire (such as a Javascript bundle).
You can’t use a library with zero lines of code. On top of this library’s always have development overhead outside of the code you write. Ex: What version number should you use? Did the latest version break something? Did the old version break something on the latest compiler? Etc etc.
It’s cheaper in the sense that it is faster to write and maintain those 20 lines of code. Because someone has to evaluate the library, understand it well enough to actually call it and then make sure it stays up to date. And often there are a few lines of code to translate your data into a form that the library requires etc.
Plus for every developer to come, one call to an external library usually also means 30 pages of documentation to trawl through if they ever want to change anything, 29.9 of which is completely irrelevant to whatever your narrow use case is.
That's the real cost. The size of the code means absolutely nothing.
And it's going to take more than an afternoon to evaluate these parsers. You have to look at the options, evaluate the API, evaluate if they're stable and supported, evaluate if they integrate well with you're project, evaluate any dependencies they might have, etc. Then you need a plan to manage these dependencies long term.
If you're needs can be solved adequately by strtok then that's a far simpler and more maintainable solution that can be knocked out in an afternoon.
Your example is more apt than intended: That's not valid json, which only allows string keys. If you use a library it'll either barf now or later when they fix it, so if you're forced to work with an API like that and can't change it, a custom parser is really the only way to go.
That's not JSON, though. It's absolutely something else. Maybe a JS snippet. Maybe YAML. Definitely not JSON, though.
(Some JSON libraries do have option flags, but usually it's about whether, during deserializing into a known type, unknown fields are an error or silently ignored. Or whether C-style comments are an error or considered as whitespace.)
While acceptable, also misleading: JS only does string keys, but unlike JSON it'll convert whatever it's given into strings. Not a problem most of the time, since it'll do the same conversion for both accessing and setting, but good to be aware of if you're doing something like iterating Object.keys()
> Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse
If you've done your parser correctly, you'll be able to replace its implementation with the new dependency, with little to no need for extra refactoring in the rest of the codebase.
You can also apply YAGNI to 'do we need our own custom parser'?
You don't know what your requirements are. The customers haven't told you yet.
If you pick a library with a straightforward interface, especially one that isn't too opinionated, you can always drop in a custom implementation later on. Frameworks, not so much (but that cuts both ways; the people who will write libraries often love writing frameworks too)
Great rule. I was wondering, how do you manage updating the Jackson JSON parsing package. What if you have 100 such packages and they get updated weekly with breaking changes ?
If you're in a larger corporater environment this can also be used to create some predictable labour needs - create a seasonal updating taskforce so that the business get a more transparent view of how much labour is being sunk into maintaining these, break it down into specific dependencies if you've got one or two that you think are particularly expensive- showing after the fact labour numbers from one season may motivate sane inhousing for next season.
Only update dependencies when your code requires the new version, depends on a bug fix or it fixes a security vulnerability. Otherwise, continue using the same version.
Have good test coverage to catch bugs that may originate in dependencies and subscribe to a third-party service to track vulnerabilities in your dependencies.
Then you get 5 year out of date packages, which eventually have a security vulnerability, and now you have the task of upgrading and working through 5 years of (potentially) breaking changes and deprecations.
It's generally easier in the long run to keep your dependencies up to date. If a package has a new breaking change each week, that's a sign you probably shouldn't be using it for production code.
If you have a hundred direct dependencies and they all break the API on a weekly basis then: you are either at a scale where you can handle that, or you are using wrong dependencies, or you are doing something wrong.
I can understand max 10 dependencies iterating so quick. But only when they are your own internal dependencies and these should definitely not break the API weekly.
There's lots of opinions on this, all with good justification. My current team leaves most dependencies unlocked and depends on good automated tests to sniff out broken dependencies. If necessary we lock dependencies to a particular version or range (e.g. <2.0.0). Once tested, we freeze for distribution.
Some people just never upgrade until they need to. That's workable, though when you do need to upgrade a package you may be spending the rest of the week working out a cascade of breaking changes.
If you only upgrade when you need to, but not necessarily to the latest versions, odds are that whatever breakage is caused by the latest nodejs/npm/etc incompatibility has already been documented in issue trackers or stackoverflow
For what reason are you updating your packages? Is there a severe security issue in that package or, if it works today, could you pin it to that version and wait until there is a compelling reason to update it.
Here's some reasoning - if this project was inhoused would we detect and patch it any quicker? Would we have a dev constantly assigned to it that would be pushing out patches to the rest of the team... or is it the sort of software we'd write once and then wait until a compelling reason to invest more into. Whether software is inhouse or outsourced you still retain decision making about how much time to invest in its maintenance.
> if this project was inhoused would we detect and patch it any quicker?
If it's a bespoke library, no one but you and hackers directly targeting you will test for security vulnerabilities. (Good thing you have a red team... right?) For widely-used libraries, the number of vulnerabilities isn't going to be much different from your own library, but the likelihood that they're found and exploited in your system is quite lower.
So no, in most cases, you would not detect and patch vulnerabilities quicker, because you probably don't see them until it's too late.
> if it works today, could you pin it to that version and wait until there is a compelling reason to update it.
If you pin versions for a long time, eventually there comes a point where you have to update something because of a critical bug or security advisory, and of course since it's a critical bug or advisory, you have to update "right now", "priority 1", "all hands on deck", "the board is involved" and everything. The fix is in version 5.1.2 of the library, but you're stuck at 2.6.5, so now you have to do three major version upgrades (with all the changes to your codebase that entails) before you can even think about upgrading to the version containing the security fix. And that's still an easy case. If the library in question is a framework like Rails or React, version upgrades of that size may be a major undertaking that takes weeks or months to prepare, execute and validate. That's very much not fun when management is pressuring you to close that vulnerability.
I think it's never a good idea to sit on ancient libraries. Put a recurring task in your team backlog to update dependencies on a schedule. It's not going to result in less work spent upgrading, in all likelihood it's more work in terms of raw hours worked compared to the update-on-security-advisory strategy, but it's much more plannable and less stressful. That doesn't mean you have to upgrade to latest-greatest immediately (you always have the freedom to hold off a particular upgrade until the new major version has had some time to mature etc.), but there should be some time reserved on your schedule for doing your updates.
For instance, I have my update-all-lib-deps reminder in my calendar on the 1st of every even month. When it comes up, I put a task in my backlog with a checklist containing every application I have to check, upgrade and deploy. Go 1.15 just came out today, so that's going to be on my desk come October. Great timing, actually, we're going to be one or two point releases into the 1.15 branch at that point, so it's going to be a safe and easy upgrade.
> don't use a dependency to implement your core business
In logic language, you're saying "If X is your core business, don't outsource X".
> Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.
The rest of your argument is interpreted as "If X is not your core business, don't in-house X".
These two logical implication statements are not equivalents of each other, but are converses. Casual language often conflates If, Only-If, and If-And-Only-If.
You should spend time implementing your core business implies that you shouldn’t spend time implementing things that aren’t in your core business, otherwise the first statement is pretty useless.
If I wanted to learn more about rigorous, non-elementary logic, do you have a recommended resource? I've taken a course in intro level probability theory which covered it generally and another course that built on it lightly but nothing rigorous and I am wooed by how concise things become in a logical form.
A Tour Through Mathematical Logic. You don't have to do any proofs. If you learn Propositional Logic and First Order Logic you'll already have most of the tools to invent the rest.
I think the problem is that the individual contributor has decided to make that chunk of logic their business. This will probably not benefit the team or the organization.
Quite agree, every single line of code written requires lifetime support. Code adds up and reduces productivity gradually, so only write code in core business logics.
Beside _lifetime support_, working on that core business feature make us _understand_ deeply about the that feature.
I've seen people integrate dependency for their core business. It helped to get started fast, but will create a blockage that required understanding deeper to overcome
I think a JSON parser is not a good example though — takes longer than a few hours / an afternoon, to write a JSON parser, add tests, fix bugs, corner cases. More like a week, or weeks, ...
I suppose a JSON parser was just an example. Made the whole answer sound weird to me though :- ) when the blog is about afternoon-projects and then a reply is about a week(s), could be month(s), long project.
There's a fair middle ground when the dependency in itself doesn't have dependencies, and is small enough with a permissive license such that the entirety of its code can be dropped in to your project. Especially for very specific functionalities. I have used such tiny xml parsers, and I'm not affected by the fact that my copy is no longer the latest version. Its not so far from copying and pasting snippets of existing code.
> I think a thing is not a good example though — takes longer than a few hours / an afternoon, to write a thing, add tests, fix bugs, corner cases. More like a week, or weeks, ...
You're making my point for me. This is exactly what I meant by the lifetime of support you're signing up for by writing lines of code. Once you write that code, you're now in the business of supporting that code. Was that a good decision for your business?
Same with CSV. It looks easy, but it isn't. I've never seen anyone who writes their own CSV parser actually implement features necessary to conform to the standard like quoting and escape sequences. The end result is software that breaks when delimiters or quotes appear in user input. Honestly, I prefer xlsx spreadsheets because of that. Nobody fools themselves into implementing the parser or serializer for the format themselves. The only tiny pitfall with them is when people create spreadsheets manually in excel and write numbers as text, but parsing strings to numbers is absolutely trivial. You have to do that with CSV anyway.
So you're saying that I should implement my own ormapper just because my product is using a database?
And even this is not thee case, writing everything yourself will end up in your own hands. No Bugfixes, no patches or improvements without spending man work.
I've worked in such a company and it was a mess accompanied by dev leaders who's to proud of their code to allow any change.
I'm confused by your response. Is your core business mapping objects to databases? As in, that's what you get paid for? If not, my heuristic is that you should not be writing an ORM tool.
Shared dependencies can also reduce, code though. Do you really want 10 slightly different implementations of the same thing after you've brought in a few large dependencies?
It's a matter of judgement, but here's a few observations:
- With a little experience, you know what gets fiddly and what doesn't. Today for instance, I needed a way to remove tags in an SVG document, which looks a lot like HTML tags. I quickly ended up finding that Regex is not the solution (a well known guy on SO wrote an answer that looks like a huge warning sign). I also couldn't enumerate all the corner cases. So I found a lib that does it, along with an SO answer that turns it into a two-liner.
- Dependencies vary in quality. Some are basically like another standard lib. Boost for instance is very well used. The tough ones are where the lib seems to be "finished", where there seem to be few commits recently, but the project was once lively and functional. IIRC libev comes to mind here. And then there are the totally dead projects, where there's a load of issues open and nobody saying anything.
- Try to lock down versions. If you get a thing working with a certain version, there's no reason you need the newest new as soon as it's pushed. You can probably live with doing a scan for updates now and again.
- Your afternoon of programming needs to have a clear end. That hashmap you wrote will very likely spew out issues over the next few days. CSV parser, maybe. Bessel function, that'll work.
The old classic. I still end up using regexes when I need to clean up HTML because even though it's not the right way, it's the way that is sufficient 95% of the time.
Your dependency that you could code "in an afternoon" may handle far more corner cases than you suspect. (That may be what you meant by "fiddly".) Sure, you don't care about covering all those corner cases... but you might care about some, even some that you haven't thought about yet. And you might care about some more next month. That can make that "afternoon" take a lot longer than you expect.
>Try to lock down versions. If you get a thing working with a certain version, there's no reason you need the newest new as soon as it's pushed. You can probably live with doing a scan for updates now and again.
Agree ! it irks me a lot that I often see update bots tracking new releases.. it is just begging to be exposed to regressions.
We need to find a happy medium though. Otherwise whenever you actually need to update something (e.g. you need add a new dependency which only handles one of your other dependency if it jumps 20 releases ), you have a huge version gap to cover.
Specifically on the SVG filtering example, which I think is a good illustration of when to use or not use a dependency:
Writing an SVG (or at least XML) parser is a necessary task for writing a filter that doesn’t get stuck due to weirdo issues. That is way more than an afternoon of work! But once you have a parser, dropping tags you don’t want or transforming them somehow is totally an afternoonable task size. So, do use a dependency for SVG parsing, but don’t look for a special “SVG filter all” package. Just do the filtering yourself.
Totally agreeing with this but the issue is maintenance. Sure, I can find a package and copy and paste a few files and make it my own — or do it from scratch even!
But then, that's not really my core domain, so I'll most likely never touch that piece again anytime soon. A quality package tends to have that covered (with a small risk, too if it's an untrusted source).
So true. Oftentimes I will look at the source of a simple ruby gem or node package and see that it is actually really just one single file, with 100 various .yml/spec/lint/test/cloud/coverage/cov/tox/blah landmines in the repo to confuse you. In those cases, I'll copy-paste the code with attribution back at the top of my source file.
The headache is not worth all of the pomp and circumstance for some of these tiny little tools.
- Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law. -> It's gonna take longer than an afternoon. Always
- Opportunity cost: Your app needs to DO something, and spending time building dependencies steals quality from your core business proposition, even if it's a small (Haha: see previous point) dependency.
- While there are a ton of crap libraries out there, there are also a lot of good ones, which include hard lessons learned, about even simple tasks. You _could_ rewrite a very simple http client in an afternoon (telnet hostname 80\n GET / HTTP/1.0\n\n), but you'll probably have a ton of glaring flaws just waiting to bite you _badly_.
1) As everyone else has said, we are horrible at estimating.
2) We think we might only need 1% of an external dependency, until we do not. Case in point: By this criteria, no one needs a grid/datatable library.So we write it "in an afternoon". Then we're asked for an export to Excel. Later on we're asked for pagination with large data sets. And then an export to PDF. How many afternoons did that cost you?
3) Writing new code is an excellent opportunity to create new bugs and implement bad design because we're on hurry since it took nowhere near an afternoon of code to write and we cut corners.
Anyway, those are just three that come to mind immediately.
This is not an argument against the "afternoon rule", but is really an argument against making simple-sounding black and white rules to solve to problems that are inherently complex, and are thus, gray areas.
> We think we might only need 1% of an external dependency
I've seen the opposite just as often. Someone things an external dependency can just plug in and solve the problem, then 16 hours later they've made the basic thing work with an external dependency with a bunch of internal code on top of it. Now you have a pile of fragile code which breaks when the dependency changes underneath you, you have a bunch of in-house code supporting a thing which might not be getting up-line support anymore, and you still have a bunch of internal code to maintain.
> Case in point: By this criteria, no one needs a grid/datatable library.So we write it "in an afternoon". Then we're asked for an export to Excel. Later on we're asked for pagination with large data sets. And then an export to PDF. How many afternoons did that cost you?
You can always migrate to a library at a later date if the scope changes. More-so, you don't know how the scope is going to change, how do you pick a dependency based on future scope changes?
That's exactly what my point is: This is way too complex to boil down to a simple rule of thumb to apply to all cases. There are arguments and counter arguments (and real-life examples and counter examples) ad-nauseum.
It's better to say: "Hey experienced people: How would you decide whether to introduce an external dependency or roll your own if the immediate requirement at hand seems small enough to write on your own in half a day?". Then we can get into a fun conversation that will boil down to "It depends."
Developers should try to build things before they reach out and grab a package. Usually it doesn't take too long to figure out if you are making a giant hairball. By trying to implement it yourself, you get a better understanding of the problem regardless.
1) please don't include me on that. Just say that YOU are bad at estimating. I estimated whole projects just fine, except management doesn't want to hear the truth or they are selling dreams to upper management. That's different.
20 afternoon dependencies -> now you have month of work, more code to test (and write tests), more code to support in the future, more code to understand for new developer. Add edge cases which you are not aware of, you are screwed
my basic rule if dependency has small code base (<300-500 lines) I will copy/paste that chunk of code into repo and refer original repo (assuming LICENSE is appropriate)
This sounds like pre-mature optimization. If the library does what you need it to do, use it. If it becomes a problem later, then optimize it.
That last thing you want to do is spend a bunch of time reimplementing code when: 1) It may not matter at all, 2) You might miss important edge cases, or 3) You got everything right but you still have to maintain it forever.
If it's going to take you three days to integrate the library, maybe it's not such a good library, or maybe it's really complicated because there are a lot of edge cases. In that case, dig into the code and see if you can figure out what it's doing.
But if you think you can spend an afternoon rewriting a library that would take three days to integrate, there is a good chance you might be missing something important.
I remember starting with an overloaded library from NPM where I used some basic functionality. That worked fine for a while. When later I got lost fixing a defect in the tangle, I just ripped out the parts I needed and made a trimmed version. The interface remained the same, for the parts I was using. Nothing to adapt.
In this way I had little investment in the beginning. And once I knew what I needed, it was another small investment to clean the code.
Never having thought about it before, I realize I usually go the opposite way. By writing about a days worth of code on something - if only to realize its true scope - then going in search of a lib of some kind to handle the dirty bits.
I've always called this the "guy in the basement" effect.
In the beginning, the guy in the basement would have an awesome idea for a library. He would spend 27 hours a designing it, coding it, testing it, and making it awesome.
The "guy in the basement" moves out because his partner wants to spend time with him. They start to live lives outside of the world of maintaining open source libraries. The partner asks, "Are you getting paid for all of this?" and the guy says, "No. It's open source. It's a good thing." but in saying it out loud to his partner, his priorities shift.
Over the next year, the library deteriorates. It gets forked, other people ask to maintain it, he relents, but no single person has the vision. The library becomes a mish mash of priorities and no longer has the awesome cohesion it once had.
Everyone using the library is very sad. Some are angry. An awesome replacement appears from another guy in the basement.
I'm sure I could write a basic HTTP client implementation for the current needs of my project in an afternoon. That doesn't mean that I shouldn't be using libcurl or whatever instead, though.
It's perfectly reasonable to conclude "my need is a subset of a larger but well-defined task for which there's a bunch of mature, proven and widely available libraries – I should leverage them".
A huge benefit of using dependencies in your codebase is that other members of your team (including future members) have a good chance of already knowing how the dependency works, and how to use it.
If I rewrite package Foob because it only takes me 3 hours to write (let's call it Boof), and my colleague Mary on another team also does so, then the chances that our needs, specific implementations & usage patterns are the same, is pretty low. When I transfer to her team six months from now, I have "my" code that I know how it works, and I also have "her" code that I don't yet understand. I've got to relearn the new interface.
Or, if we see this as opportunity to unify our needs across codebases/teams, one of our libraries get the features of the other bolted on. In this way, we're just spending our combined time building a less-good Foob, instead of actually doing something useful.
But instead, if Mary says that Foob is high-quality because she looked at the code, checked the community, etc, then I can just trust her judgment. Foob already has most the features that I thought I ain't gonna need. I can re-use Foob in multiple projects, and the documentation for Foob is certainly better than what I would have written in an afternoon, so future team-members will also pick up Foob faster than my Boofy custom implementation.
I had this experience with a particular React hook recently. We ended up with like 5 teams implementing very similar functionality, and then when developer #6 came along and tried to replace them all with a common implementation it broke 2 out of 5 use cases because of very subtle edge cases. I guess React runtime behavior is kind of high on the scale of how hard code is to reason about, so maybe this wouldn't happen in an easier codebase. But still a very instructive exercise.
I think the mistake described in this blog post needs a name. Let's call it premature dependence.
Like premature optimization, premature dependence compounds your maintenance burden over time because you predicted a problem which, if you had waited for evidence, would never have materialized.
In premature optimization, you imagine that a simple solution won't perform adequately, so you invest in building a more sophisticated solution up front. In premature dependence, you imagine that a simple implementation won't adequately address your needs, so you bring in a feature-rich dependency right away.
Both are driven by the fact that we spend the vast majority of our time, effort, and emotional energy dealing with rare cases. You don't spend hour after hour, day after day, meeting after stressful meeting talking about the database that scaled fine or the code you wrote once at the beginning of the project and never touched again. If you don't account for how the successful stuff disappears into the background, your model of reality will be skewed by rare traumatic events. (Which is kind of the point, evolutionarily speaking, since it serves as a kind of crude risk analysis, but it's not hard to do risk analysis better than your amygdala does.)
Of course, applying the rule in practice requires judgment. Some problems are inherently difficult; some explosions in complexity are predictable; you don't have to pay long-term costs on POCs and failed experiments; if you say you support ISO-XXXX format then you had better support every obscure corner case; etc. But it's good to keep in mind that the long-term cost of an extra dependency can be greater than the long-term cost of a few hundred lines of unchanging, unit-tested code.
I mean... this depends. I use Rails mostly in my work, and in recent years I’ve pared down the libraries I would use (ie. that will go into the Gemfile). I generally stay away from any other externalities beside this list of libraries, and so I make it so that they’re almost “my own.” Seems to be a good balance between getting too dependency dependent vs veering off to reinvent the wheel.
Obviously each situation is unique, however, I’ve been burned more often by trying to clean up hand rolled out code than using a library.
I can recall quite a few instances where engineers, even experienced ones, went down this path ended up trying to reinvent such age old technologies as version control system, web server, just to name two.
Dependencies are part of the same calculations that normal programming follows.
Easy to program vs Easy to maintain vs Easy to understand vs Easy to scale vs Easy for the cpu to do.
Dependencies are easy to program, you include them.
Dependencies are hard to maintain, in that you can't expand or change their behavior.
Dependencies can be either hard or easy to understand (compared to your own code).
Dependencies are hard to scale, something you coded yourself can avoid validating inputs that have already been validated, or otherwise be smarter about how it does things to avoid overhead.
(one example would be a system for parsing info from a large amount of data, most dependencies will make you either store it all in one place on disk or memory, where a custom solution can more easily stream it to avoid memory or disk overhead. (take a ftp server that wants to calculate and store 5 hashes for each of the files uploaded, it can either have 5 libraries each read the file after upload, or it can load the entire file back into memory after upload or it could just stream the info to 5 custom rolled hashing systems that handle calculating the data as its being uploaded and written without needing to re-read it from disk after upload. one will be faster and scale better.))
So half the articles about dependencies on hacker news can be broken down to one message:
Don't use a dependency when rolling it yourself is better for your use case.
The other half are about conveying why and when rolling it yourself can give you benefits for your use case.
I agree with this philosophy. I see the writer has used popular memes to convey the problem with humour. I would also add this quote from the movie "Heat":
"Don't let yourself get attached to anything you are not willing to walk out on in 30 seconds flat if you feel the heat around the corner."
Many others have mentioned the problem with estimating time cost. I think the core issue is that very few things actually take _only_ an afternoon of programming to build.
One good example of this is a general application framework like Dropwizard or Spring. These are relatively giant dependencies; and you could fairly easily do without them early-on in a project (or maybe for the life of a project). But, odds are, you'll spend an afternoon (or several) in the future refactoring to use such a framework, because xyz feature would benefit from that foundation. The judgement call of using it early or late is why we get paid the big bucks!
OP here: A lot of people are objecting, "What if you estimate wrong, and it takes more than an afternoon?" This objection is very bad.
It is not possible to add a new dependency in less than afternoon because you need to evaluate alternatives, test it, learn how it works, make sure it's not accidentally GPL, etc. So there are not two methods, the less-than-an-afternoon method and the more-than-an-afternoon method. There are two methods that both take at least one afternoon. If you estimate wrong and you can't write the code in an afternoon… Then stop working on your handwritten version and find a dependency? But now you know what the dependency is really supposed to do and why it's non-trivial, so you're in an even better position to evaluate which ones are good.
> But now you know what the dependency is really supposed to do and why it's non-trivial, so you're in an even better position to evaluate which ones are good.
I came in here to say this. If you think you're not qualified to write the function, you're probably also equally unqualified to choose someone else's implementation of it.
There is a lot of stuff out there-- stuff which is widely used-- which is not fit for your purposes, ... perhaps not for anyone's. And there is no replacement for a bit of domain expertise.
Not a lot of people can correctly write cryptography code on the first try, but we definitely advocate for people pulling well known cryptography libraries and using them instead of building their own, for obvious reasons. Not many people are qualified to write a lot of things, but are capable of making sound dependency judgements with heuristics. The trick is to use good heuristics and to not use a library for every tiny thing.
It's my experience that people very often do not make sound dependency judgements on cryptography dependencies.
They probably do better than writing it on their own, but that isn't necessarily saying much-- and I think the difference isn't actually that great (essentially the thing they pick will often tend to have same flaws as what they would have written, because essentially we're drawing from the same distribution).
I agree that heuristics could help but not much time is spent discovering and socializing what those are, particularly to the extent that they are domain specific.
Naive heuristics can also backfire. E.g. it can be easy to mistake contentious behaviour with flaws and end up preferring code that has absolutely zero mitigations against an attack over code that discloses the limitations of their mitigations.
People keep bringing up crypto and I really have no idea why. Is there someone who believes they can write a crypto algo in an afternoon? If someone is that deluded, they aren't going to benefit from advice one way or the other.
> It is not possible to add a new dependency in less than afternoon because you need to evaluate alternatives, test it, learn how it works, make sure it's not accidentally GPL, etc
For a small module that would take less than an afternoon. Checking a module license takes less than it took to read the comment.
I probably spend almost 2 months evaluating hashmaps.
There are a dozen separate maps in FreePascal standard library. But they all have some issues. Like a max key length of 255 chars, only working with orderable keys, not rehashing itself, treemap instead of a hashmap, or actually not working...
In the end I used a map from another library and modified it heavily. It also has a big issue of not really deleting items, only keeping a tombstone that is not removed until rehashing, but the advantage is that it keeps insertion order
I agree with the author in that you should think twice about adding a dependency. I’d be willing to invest way more than an afternoon, depending on the circumstances. Adding a dependency is also work. How much depends on your work environment, so you should think for yourself.
I'd rather go for an heuristic that's the dependency management equivalent of "Ask for forgiveness, not permission":
1.Find the best library that is good enough to solve your current needs, double-check to understand its capabilities and shortcomings (to internalize any known future needs), and code against it.
2. Invest the time to build your own stuff only when you start to see limitations in future iterations. By this time you'd also have a battle-tested understanding of both the dependency's shortcomings and your own requirements
In my 12 year professional career and 10 years of programming before that, I can't count how many times I thought something was an afternoon of programming and I was terribly terribly wrong.
Generally good advice to be conscious about taking dependencies, but it's also worth sanity checking that something is really "an afternoon of programming."
Agreed. One has to learn to be realistic, and account for time spent thinking, debugging and testing when coding from scratch. left-pad is an afternoon project. SQLite connector may be a week project. Full JSON parser? Let's book a month to be safe.
I agree with this in principle. There is nuance, but the only way I look at 3rd party dependencies is as liabilities, normalizing for function.
The only question for me from this point is "Which dependencies are the biggest liabilities for us?" The answer to this question depends wildly on what exactly it is that you are trying to do.
Overall, our strategy is to typically use a 3rd party dependency by default in order to quickly get a MVP rolled out. Once we have a clearer picture of how we can solve some particular problem in a concrete way, we make a decision regarding whether we should drop that dependency or keep it around. Most of the time, we will develop an interface which exposes our need for the dependency in a vendor-agnostic way. Then, we can target any arbitrary implementation against this interface and quickly swap them around as required in DI.
The biggest question in these discussions is always going to be "Well... how long would it take to write our own?". Even if someone is being realistic and gives you something 2x as long as you were hoping for, you should consider all of the other factors. Keep in mind that if you write it yourself, you can probably iterate on it without much difficulty as well (i.e. custom change requests that a 3rd party would completely ignore). Conversely, if you have to maintain it and its really buggy, you can't hope someone else is going to eventually solve your problems for you.
For small dependencies it's often not the implementation that matters as much as the tests. And even if you think you can write all the tests in an afternoon, they're often born out of actual usage, which you can't replicate so easily.
I'd say if a dependency is small enough that you can write it in an afternoon, you can even more easily read it's source and tests and decide if it's high enough quality to use as-is.
This is interesting, because I had this issue. In one java lib I wrote, I needed to build an URL, so I naively pulled apache http common URL builder, got it done in one line, and went on.
But a few time later, I realized (well someone reported an issue) that depending on the android version, it would break. My first reflex was to try to bundle just what I wanted using some graddle class renaming plugin, it was a nightmare.
Then I just realized how stupid and lazy I had become, being able to just pull dependency for anything made me forgot I could just write the code.
In the end I dropped the dependency and wrote what I needed myself.
What about "do not re-invent the wheel"? If you are schedule driven, then you'll want to get the coding done ASAP. Take an action for later to reduce dependencies, and prioritize it appropriately. Also, keep a local copy of the version you pulled and reference that for builds so you are not at the mercy of some maintainer/repository that has no stake in your game.
I once worked for a startup that had a custom C++ basic library for strings, arrays, smart pointers and things like that. At some point I spent the better part of a week looking into a bug that, long story short, was caused by a bug in our custom String class. Please don't do this. Every line of code in your project that your team maintains is a liability.
Part of the reason why I use python is not because its "brilliant" its because it has most of the stuff you need built in. I'm too old for supporting my own tech debt.
Yes, its slow, yes there are bits that stink of poo. But 95% of the time there is something that'll do the job, most of the time. I'm not going to put my self on the hook for supporting the 0.5% cornercases the appear in 85% of all incident reviews.
Should I need something specific, then of course I'll write it, but only after google has said no, or the libs I've found don't work.
Its a silly rule, and I imagine stems from a young buck wanting to prove them selves. Thats great, but I finish at 17:00, and I don't ever intend on staying late. I suggest everyone tries it at least once. It'll make you a better engineer, honest.
I tend to avoid dependencies, if at all possible. I wouldn't mind spending a week of programming to avoid some dependencies.
Of course, there is no "one size fits all" rule, here. If it's a "one-off" internal tool, without much impact, then it would not be worth spending much time on, and a dependency might be exactly the right thing (famous last words).
I'm pretty obsessive about quality, and like to ensure the best quality possible in all my work. The weakest link, and all that, so dependencies need to be vetted very carefully.
There's some things that just can't be done without dependencies; sometimes, crappy ones (like SDKs), but that's less frequent than you'd think.
IME those who write code instead of using a dependency are those who DON'T seem to care about quality. They can't be bothered to see if the problem is already solved. They can't be bothered to design a good interface and extract the code into a re-usable module. Just just solve their own immediate problem, which then gets re-solved in different ways a dozen times in different projects in the company. This is what I find frustrating and what I find 3rd party dependencies protect against. They define a set of culturally accepted/known APIs/Functionality that is consistent and doesn't have to be relearned between repositories.
In my experience, I have seen some Jurassic-scale disasters, because of poor dependency choices.
I think a lot of people just google for dependencies, and then add the first one that has a slick Web site, without thinking much about the code they are adding.
I am not a "never dependency" person, but I am anal about quality. Totally obsessed. I feel that quality is something that many, many programmers eschew, in favor of bling and buzzwords.
For me, I won't put my seal on something until I have tested it six ways to Sunday. In some cases, it may be unit tests, but, more often, it is a test harness, which can be a much more ambitious project than a few XCTests[0]. In fact, I am running into a lot of folks that don't know what a test harness is; which is jaw-dropping.
Since I do a lot of device control stuff, unit tests are not particularly useful. In order to create unit tests that would be effective, I'd need to design a huge mock, and that would not be worth it; likely introducing more problems than it solves.
An example is that I am currently developing a cross [Apple] platform Bluetooth LE Central abstraction driver[1]. This needs to have test harnesses in all the target systems (iOS/iPadOS, MacOS, WatchOS and TVOS). I actually have it driving a released app[2] (which is really just a "productized" implementation of the iOS test harness), but I do not consider the module ready for release, as I have not completed all of the test harnesses. I am in the middle of the WatchOS harness now. I may "productize" the MacOS test harness. My test harnesses are really serious bits of code. Complete, ready-to-ship apps, for the most part. Going from a test harness to a shipping app isn't a big deal.
Clarify: actually within an afternoon, not looks-like an afternoon (include debugging, testing, docs, api design, etc see Brook's 3x). So, e.g. leftShark().
Never spend an afternoon programming something that you may have to waste future time debugging and supporting when a dependency already handles it perfectly.
I once decided to write my own UI pagination component. I thought it would be fun and, why bring in a library for something so simple? Three years later, we now use this component for all our pagination. Probably four of us know it really well by now as it turns out, there have been many bugs around it. Turns out pagination was not such a trivial problem to solve when you consider all the edge-cases and crazy PM requirements.
Every once and a while some engineer who's fixing a bug on it will ask, "why in the world did we write this? Why don't we just use this bootstrap pagination instead?". I ask in return, "would it fix the bug you currently have with it?". Every single time the answer has been no. It's almost always centered around the state of how the parent component is using it. I wrote with the intention of using querystrings to store pagination state in the url, but someone else at some point decides to store their pagination data in memory and just adds a few props and now it's doing it two different ways, etc...
I think the criminal offense here was not writing pagination from scratch. It was that I didn't extract it into a library and write some basic documentation.
> Many times, the best approach is first searching online and reading the code to a few other solutions, then writing your own with the knowledge you’ve gained from seeing how they work.
I think this is a great approach. Any general solution library will be much larger and more complex than one project's use case. Seeing how other people have solved a given problem can give a dev a nice jump start on creating a compact solution to their given problem.
For whatever reason many devs just love using dependencies and almost always underestimate the long term costs, in terms of maintenance and vulnerability risk.
I once had to write a simple utility script that was literally 5 lines of Python. All it did was loop through a list and perform a couple simple actions. As I mentioned it to the PM, another dev jumped in and said, "Oh hey, there's Library_X that we can install that will totally handle that for you". I responded that the library could very well be great, but script was already done, had only taken about 15 minutes to write, and would be very easy to reason about and update down the line. Reading about and implementing a whole new library to do the same thing just didn't seem useful.
In my experience this is quite typical in the python community, I think it's a consequence of the batteries-included philosophy. Many people are very averse to using primitive data structures and operations to solve their problems, so working with a python codebase is like gluing libraries together
> BRB, going to go roll my own AES -- what could possibly go wrong?
Probably not much more than picking an AES library written from someone else.
It's fairly unlikely that your custom AES would function at all unless it was functionally correct.
Your bespoke AES will likely have timing sidechannels, but so will most AES you go pick off the shelf. (in fact, if you happen to need CBC mode, virtually any AES you pick off the shelf will end up having timing side-channels, because most code that doesn't is GCM only)-- particularly because it's hard to be both high performance and side-channel free without using SIMD, and to do that you need to operate on multiple blocks at a time.
In fact, since you happen to know your environment is targeting only hardware with AES-NI (in my hypothetical), you'll just use that and you'll even be free of side-channels too.
The "abstinence only cryptography" advice was originally about inventing your own blockciphers and such, not implementing existing ones.
Okay so you're smart and so you'll want to go take your AES from some highly reviewed sourse with lots of smart people. You pick.. say.. the Linux kernel. Welp if you picked the plain C implementation in it, congrats, you just got yourself a bunch of timing side-channels.
There are, of course, plenty of things that can go wrong in making your own implementations-- so it's not entirely misplaced to apply it to implementations-- but sadly you are not more likely to avoid them by simply picking one written by someone else. If you understand the domain then you can select a good library and evaluate it, but if understand that you could likely also write your own. :(
On the plus side, a simple blockcipher isn't the sort of dependency that is likely to have a substantial on-going cost either. It won't randomly change its behaviour (we hope, at least not until haxors take over the upstream repository), its functionality is simple and well specified, etc.
So it is probably fine to use one from a library and the OP's advice doesn't apply. But all the reasons why the code you wrote would probably be broken? They probably still apply to the library.
Eh. Yes, more likely, though unless you're talking about something with openssl level ubiquity it might not do much.
For example: my post pointed out that Linux's naive AES has a huge timing sidechannel (e.g. the same bug JoeBob's would). This isn't news. It's also not fixed.
Many times I've been asked to review a cryptographic library and found that it had problems, had them for been for years.. Sometimes the issues had been reported and just ignored.
In some cases reporting the issue just causes the author to take it down... creating its own problems for people who were depending on it!
At the moment I have two private outstanding bug reports for total breaks in cryptosystem library code that I just stumbled into while browsing the internet where the authors/maintainers haven't replied and it's been more than a month. After a bit longer, I'll make the reports in public, but I expect the software will continue to go uncorrected (or just be taken down in response).
One piece of advice I'd give for anyone taking a dependency: go read through its bug tracker of open bugs (and recently resolved ones) -- and their public patch queue if they have one. Also do the same for all transitive dependencies. You can gain some pretty valuable knowledge and more benefit from shared bug finding.
Of course, if you're not a subject matter expert you might not be able to judge if a report is correct or if the subject is serious-- though you will probably be able to tell if the maintainers are active/responsive.
I gave a talk once on the problem of "Selection Cryptography"-- where I argue that merely _picking_ an implementation of cryptographic code (much less the primitives to use) is an act of rolling your own cryptography that triggers similar risks to writing some which also must be managed.
A year ago I needed a min-heap to build a priority queue at work.
So first I grabbed 'heap' from npm (272k weekly downloads) and set it to work. But a few days later I realized my code was executing slower than expected because it sometimes needed to clone the data structure, and the clone instantiation would break the heap invariant in the array internals. It turned out there's been an issue open about this since early 2017.
Then I went for the 'collections' package (35k weekly downloads) and brought in its heap implementation. That worked like a charm for about six months until a bug came in that made it seem like a completely different package was breaking. After almost a whole day of debugging, it turns out that 'collections' silently shims the global Array.from function (scream emoji) without mimicking its behavior when dealing with non-Array iterables presented by the other package.
So finally I wrote my own heap -- well, I cribbed from Eloquent JavaScript [0] but I did have to briefly remember a little bit about how they're supposed to work. So while I don't totally buy the "Never..." rule in the post title, thinking more carefully about writing versus importing a dependency would have saved me a great deal of headache in this case.
This would be a perfect spot to solve those issues and make the packages better... Or y'know, publish your implementation and have it used by people with the same issue
Those are reason why I like small "single" file libraries in C and C++ (one header, one source file): you know where the source code is, reviewing it is relatively painless, and adding it to your project is dead easy.
Yet I have seen people ignoring the benefits of a 2K lines dependency (in a single compilation unit), over an otherwise equivalent 20K lines dependency (in 100 compilation units) that requires the autotools. (The disadvantages of a particular lightweight dependency is a separate topic.)
You want your modules do be deep: minimum interface for maximum functionality. Dividing the size of an interface by 10 for the same functionality is a pretty sweet deal in most cases. In my opinion, that deal is often underrated.
Never take afternoon advice from a programmer about how to run a business.
Because at the end of the day, that's what team management and engineering teams are. Functional units of a business. And advice like the OP's will lead you to death by a million shed bikes.
Adding a dependency is like adopting a puppy. Sure it seems free.. but it's not. It may be the best thing to do, but go in with eyes open on the costs.
Something like left-pad, absolutely never worth it. Something like OpenSSL, basically always worth it (warts and all). Everything in the middle, evaluate carefully.
When adopting a dependency you're adopting all their development practices. Do they break backward compatibility every minor release because compatibility is for the boring people? Never address CVEs? Every additional dependency constrains your future self by a little bit. These add up.
Time it takes to build something is not even remotely the best metric for value or ease. Programming is never a box and things that took one afternoon to write at first could be thousands of lines long a year from now. To me this is as horrible as advice as it is to use dependencies for everything.
On average programmers produce average quality of software. Libraries available online are often shitty - but most probably they are still over the average quality of all software, because there is a selection bias and more people are looking at them etc. So if average programmers replace average libs - then the expected result is that the overall quality of software would go down.
Especially when it comes to JavaScript, I tend to follow "never use a dependency that you could replace in a month".
Here is something that has happened to me more times than I care to count: I have a dependency on some library. It's working well for me. I build some sort of small project with it, like the info website for my mother's business. Doesn't need to be updated very often, just needs to be there and have her contact information. A year later, I get some automated email telling me some dependency of a dependency of my dependency has some kind of obscure security issue. I'm busy. This project isn't supposed to be a core, constantly making project. I don't really have time to evaluate if the problem is an issue for me, so I go to just upgrade everything and appease the squawk box. Oh, turns out that in the time between when I used the library and the security issue was discovered, the developers have completely redesigned the whole thing and the only version that includes the security fix is a version that isn't compatible for my config scripts anymore.
All because I didn't want to spend the time to write some stupid simple string concatenating code for writing an HTML template, I now have to spend time completely relearning this library. And probably rebuilding the build scripts, too, because nobody can just leave well enough alone.
"Oh, but if you wrote the code, you might have written a security bug, too". And I could fix it on my own, too.
When did programmers stop programming? Most of the arguments I see for ridiculous levels of package integration come from a place of devs not trusting themselves to write stupid simple code. Like left-pad. "Someone smarter than me has figured at all the edge cases". You don't need all the edge cases. You need to just learn how to do your job.
I should mention that one thing that I always do with dependencies, these days, is encapsulate them.
I use things like Dependency Inversion or Dependency Injection, or simple "glue" APIs.
This is from being burned by integrating dependencies into my projects, and painting myself into corners.
It helps to do things like swap out DB backends, or things like mapping libraries. YAGNI is important, but it's also important to encourage flexibility. Dependency encapsulation is pretty much perfect for that.
Of course, there are probably cases where it's impossible to use some dependencies without making them integral parts of the project, but it's been my experience that I can write lots of stuff, based on modules.
My work is basically a clump of dependencies; it's just that I wrote the dependencies.
I like modular architectures, with each module/layer as an independent-lifecycle project; complete with testing, documentation, and APIs. Takes a bit longer to write the modules, but the integration is quite smooth.
Does it really take people "a few hours" to integrate a dependency? For personal projects it's more like 15 minutes, and even for stuff at work it doesn't generally take longer than an hour.
Broadly I find that FOSS libraries are great when you stick with their happy path. If you have unique/specific requirements, it may make sense to build it yourself. But there's no reason you can't spend an hour trying to use something for free before re-writing from scratch.
"This HN discussion https://news.ycombinator.com/item?id=24123878 is topical for me: at this very moment, I am implementing C++ MFCC code myself, because my attempt to integrate Kaldi (on windows) was unpleasant. It already took more than an afternoon, but I learned good things! \
I'm more sympathetic to the Use-All-The-Dependencies crowd than some might suppose. It definitely isn't my way, but I see them as a fellow subclass of programmer, evolved for other environments. It is amazing what can be cobbled together in a weekend now. \
The old Knuth vs McIlroy story is relevant: http://leancrew.com/all-this/2011/12/more-shell-less-egg/
Generally, use-the-tools is correct, but sometimes you really do want a Knuth (or maybe a Carmack)."
"
Dunning-Kreuger, the long and short of it is many problem appear "simple" and take an afternoon.. right? For something that converts a date from ISO format to American (M-D-Y) sure it might be simple, but anything that listens or talks to the network (DNS? APIs? etc.) I guarantee you didn't think of all the corner cases.
Also all those afternoons add up.
Also the comment "You’ll know exactly how it works and what its shortcomings are."
Hahahahaha. no. If that was true bugs wouldn't be a thing.
So... why not just take the source of something you like and include it?
Probably if one "thinks" they can do it in a half an afternoon, you can look at the source of something that simple for two or three things and just clone/renamspace/whatever to the source.
At lest in web development, the website is always changing thus so is the code. Therefore the maintenance burden favours bespoke code over libraries as the former is easier to adapt to changing requirements.
However ultimately in reality a decision based on product ownership culture, when it’s more important to produce results now ( velocity) rather than in a few months time (maintainability).
It’s hard to change culture.
It works great up to the point that you have one week to deliver a feature, and that one afternoon is the difference between you delivering a half-tested or well-tested feature.
Also, as pointed by other comments, people are generally not good at estimating.
Controversial to say the least. While I know the value of in-housing as much code as possible; I only think it should be done once (https://luord.com/2016/06/25/nih).
After that one time, I've strove to limit the in-house code only to the actual domain and business rules of the application/system.
I would actually advise another way: if you have to waste the whole afternoon putting algorithm together, then it probably going to be a week after you implement all tests and fix all the bugs. But even if it was just afternoon, still not worth it. You don't want to maintain what you don't need to maintain
Typical programmer, wholeheartedly believing they can build a something in an afternoon and be done with it. I don't care how well you scoped the project -- it almost always takes longer. It's naive and all this does is cost businesses time and money. It's usually much more economical to go with a third-party than to build everything in-house, which needs to account for the costs of new-feature dev and long-term maintenance. I have an entire business running on that premise.
reply