Package managers like npm, cargo and LuaRocks have made it easy to include external code in projects. I advocate that we need to complement this improvement in the ease of use with additional measures to ensure supply chain security.
Author: David Heiko Kolf, 2025-06-07
In 2021 I published version 2.6 of my Lua library "dkjson". In order to improve the error reporting when decoding JSON strings, I had to make significant changes to a part of the library. Unfortunately I had no one I could ask for a code review of my changes and only added a few more unit tests in order to give me some comfort when publishing the new version.
The step which worried me most was putting the new version on LuaRocks, the major package directory for Lua. The moment after I published the version, it was immediately going to be used by other developers, who might not even realize that they updated to a version released just minutes ago.
I was mainly worried about accidentally breaking other people's programs, but it got me thinking: What is the worst that someone could do?
I am assuming that most developers will not run the library in a sandbox. Malicious code which might get introduced would — at least for a short time — be run with the user privileges of anybody executing it. This is a lot of responsibility resting on my shoulders as the maintainer. How are those powers secured? With a simple API key, entered on the command line and written in plain-text in my command line history. At least until I cleared the history, my most powerful password was also my least securely stored password.
As I worried about whether I can trust myself with those powers, a more worrying thought rose in me: How many packages do I use, can I trust all of their maintainers?
I was pretty lucky with my JSON library: it has a limited scope and as long as no new errors are discovered, I can consider it "done" and not worry about it. I am not forced to continually work on it. Other libraries however are not as lucky:
All those tasks keep piling up for years, without you ever seeing a compensation for it. Some maintainers break under the pressure. And rather than just abandoning their project, they might either start sabotaging their own libraries or they become easy targets for malicious actors who offer to "help" them.
In some programming environments (for example Rust) I feel overwhelmed by the package ecosystem. In those environments it is no longer possible for me as a programmer to decide which packages I want to use and which maintainers I want to trust. The moment I include one package it will itself pull in multiple other projects.
I am afraid that these dependency trees make it easier to hide malicious changes. I do not think that many people can realistically keep track of all the changes to hundreds of included projects. And even if you do not trust a specific project, what can you do? You will no longer be able to use any package that depends on it.
In discussions I have often read the argument that those dependency trees are unavoidable in modern software development and that the associated risks have to be accepted. I believe that these risks are too high to be accepted and I am convinced we can implement safer software development practices.
There are two libraries which serve as good examples to me: Lua, a powerful scripting language, and SQLite, a local database which can be embedded in programs. Both of these libraries (which are written in C) implement non-trivial functionality in relatively small libraries which do not have any dependencies beyond the C standard library. Both of these libraries manage to do it by implementing many algorithms themselves, specifically tailored to their use case.
I have sometimes read the advice Do not reinvent the
wheel
given to programmers with the meaning that if
anyone has already implemented the necessary functionality in an open
source library, you have to use that
library. I disagree with this interpretation. Implementing an algorithm
on your own is not inventing it. Just because someone published a library
does not mean that they are an expert in the covered functionality and
can keep their project secure.
Using a library or implementing something yourself both have costs and
those costs have to be evaluated for every case.
For stand-alone programs it might make sense to use a library whenever possible, but when writing a library I believe that it is better to write specialized code rather than adding a dependency and forcing it on all other users of your library.
One possibility to reduce dependencies is to write the library in a Sans I/O style where functions receive raw data as input and return raw data as output and the user has to forward it to other libraries.
Another possibility is to rely on Dependency Injection to decouple dependencies: A library would not directly require the dependency but it defines an interface with the required functionality. A separate package would provide an implementation for that interface by linking it against the dependency. This would make it easier for users of the library to provide their own alternatives for the dependencies.
The next points are community efforts and cannot be decided by individual developers. They are decisions that have to be made in the standard library of a programming language and in the implementation of the common package manager and its directory.
Once a library gets popular enough that large parts of the ecosystem depend on it, I question whether it should still stay the responsibility of a single maintainer. Wouldn't it be better if a foundation takes the responsibility and the library is published in a namespace which clearly advocates its reliability? This can lighten the burden on the original programmer.
Declaring one library as the standard for a certain task can also help to avoid duplicated code. Without such a standard, multiple libraries which can end up in the same project might reference different dependencies for the same functionality. This would lead to even more bloat than if they had implemented all those functions themselves, as almost every library includes more functions than needed in a given situation.
A package management directory should not just list the direct dependencies of a given package, but also all the indirect dependencies so that users can easily make an informed choice about the resulting complexity.
Package managers should contain the source code for the packages and make it easy to view all the source code changes between different versions.
When a maintainer publishes a new version, the directory would treat it like a "pull request" and a second person (either from a foundation or a second known person from the project) would have to confirm that version. This needs to be just a cursory glance whether the library still seems to serve its original and advertised purpose. This can help with compromised developer machines where the API key got stolen.
In a worst case scenario the package directory itself might also get compromised. To reduce the possible impact of this event, code signing can help. Dependencies would not just be specified by their name and version but also by a secure hash for the current version and the fingerprint of a cryptographic public key. Subsequent versions need to carry a valid signature from the maintainer created using the private key.
Sometimes it makes sense to not release a single library but a group of libraries where the necessary functionality can be chosen as needed for a specific use-case. When those libraries all carry a similar name, but without a protected namespace, it is easy for other actors to release similar libraries, which can now receive undue trust.
Without namespaces it is also probable that the simplest name for a functionality is taken by the earliest published library, giving it the flair of the "standard" implementation, even though it might contain serious flaws. Better implementations published at a later time would have to use more obscure names. With namespaces it is easier to tell which library is just a personal hobby project and which is an actual authoritative standard implementation.
Here is an idea which I have not yet tested myself: Would it make sense to put the downloaded source code of all the dependencies in your project into source control? This way there would not just be a change of some version numbers in your source control history, but also the actual changes which were introduced by the version numbers. The size of the source control repository might increase dramatically, but wouldn't it be more honest regarding the actual complexity of your project? It can also help when the online package directory is not reachable.
After writing the above paragraph I noticed that the Debian project actually includes all dependencies for the Rust compiler in their source repository. (Though I do not know whether it is used for security reviews).
I am convinced that software security can be dramatically increased when attention is given to the issues of supply chains. When those topics are however ignored or downplayed, I am afraid it can cause even more vulnerabilities than (for example) code written in non-memory-safe programming languages.