Feature Request: consistency between what is published on NPM and the source code published on public repos


(Vincenzo Ferrari) #1

What’s the feature?
When a new package is being published on the NPM registry no one can say that the code is the same of that published on Github (or on to another public repo).

So, I’ve created a Proof of Concept, called SNPM, that instead of allowing the user to push the code directly on NPM, they need to push it on a public repo (like Github) and then ask NPM to fetch it.
This procedure is validated using checksums of what being uploaded.

What problem is the feature intended to solve?
Consistency between NPM packages and their source code published on public repos, like Github.

Is the absence of this feature blocking you or your team? If so, how?
No but it will increase a better security and help preventing this kind of situations: (on Github) eslint/eslint-scope/issues/39 and (on Twitter) npmjs/status/1017517577038450693

Is this feature similar to an existing feature in another tool?
(on Github) npm/npm/issues/19539

Is this a feature you’re prepared to implement, with support from the npm CLI team?
Yes, it’d be nice.

I’ve already implemented a Proof of Concept: SNPM
I’ve also explained the concept with a detailed article: Secure NPM

I hereby add another proposal: npm verify

What’s that?
A command to verify the package checksums.

How?
It does exactly what NPM should do in a SNPM context (as defined in this article): download the sources from Github, build them, calculate the checksum and verify it.
So, even the final user can verify the package effortlessly, by typing:

$ npm verify

or

$ npm verify <package_name>

How does it sound?


(Vincenzo Ferrari) #2

Is the absence of this feature blocking you or your team? If so, how?
No but it will increase a better security and help preventing this kind of situations: https://github.com/eslint/eslint-scope/issues/39 and https://twitter.com/npmjs/status/1017517577038450693


(Jordan Harband) #3

The source code isn’t necessarily on Github or even available anywhere.

You can use unpkg.com to inspect the code that’s actually published; is this not sufficient?


(Todd Kennedy) #4

This would make publishing private packages extremely difficult – some people might not be using github (or git at all) and would generate a lot more work on the registry end of things.


(Kat Marchán) #5

You can do this with the latest versions of npm as long as the package was published with a recent version and isn’t doing anything funny with its build!

npm supports reproducible builds so you can do checksum level verification by doing:

$ npm pack --dry-run --json <registry-pkg> | grep integrity

And then repeat the command with github:usr/pkg#semver:1.2.3.

The checksums should match. If they don’t, I encourage y’all to talk to your maintainers about publishing with the latest npm version and fixing any build shenanigans that’s stopping this from working.

One thing you could do from here: make an RFC that describes a new command, or an npm audit subcommand that lets you automate this process for individual packages. It won’t be possible to verify all your deps and transitive deps until more packages are published with a repro-build-capable version of npm, so it would just end up breaking most of the time. But I think an individual checker would be a very good step in this direction and can eventually become a whole-tree verifier (and can probably be the thing we use to do deep whole-tree package signature verification in the future, once that feature lands). Something like npm audit check-integrity or such? Or just npm verify-tree <pkg> (eventually just npm verify-tree).


Validating NPM package integrity via CLI commands
(Vincenzo Ferrari) #6

First of all, thanks to everyone for joining this discussion!
I do really appreciate it! :smile:

@ljharb: if the source code cannot be available, then the package should be marked as “closed” or “proprietary”.
In this case, I’m concerned about the open source software.
But in general, I want to be sure that those packages I’m downloading from NPM are safe and one of the best way to do that is to guarantee the consistency between source codes and binaries.
I didn’t know unpkg (thanks for sharing) but I think this job should be done by NPM during the publishing phase and not then, by the user on a certain version of the package. (a preemptive strategy instead of a curative one)

toddself: as I said above, private packages should be marked as “closed” and they might follow a different publishing path.
Yes, it would require extra work for the registry but I think it worths all of it.

@zkat: I don’t understand what the integrity property refers to.
I tried with facebook/react and it returned two different checksums:

$ npm pack --dry-run --json react | grep integrity

“integrity”: “sha512-7eocFH2ryezvBVXJbptblDSuLAQa8nOSDdAYtv/CHTG0btXuC1axHhVV6W8KVdWMNq7cF/w9Z/xVuoEK6IzXhQ==”,

$ npm pack --dry-run --json github:facebook/react#semver:16.4.1 | grep integrity

“integrity”: “sha512-3GEs0giKp6E0Oh/Y9ZC60CmYgUPnp7voH9fbjWsvXtYFb4EWtgQub0ADSq0sJR0BbHc4FThLLtzlcFaFXIorwg==”,

Maybe I’m making some mistakes here, but I didn’t get on what files that checksum has been calculated.

The main purpose of npm verify is to check the consistency between the source code and the binaries, not to check the entire tree deps (so, yes, it would be used only for new packages).
Anyway, I think we’re missing the real point here: NPM should prevent to publish open source packages without an integrity check with the source code (stored online not on the file system of the author’s computer), especially for those packages that are a result of a build/compilation/transpilation process.

However I’ll make an RFC for npm verify :muscle:


(Kat Marchán) #7

@wilk it doesn’t work for React partly because they have a monorepo, partly because it’s unclear their build process produces reproducible tarballs.

If it does produce reproducible tarballs, you can just repeat the npm pack command at the right level with the expected publish settings for the react package, and check against that.


(Kat Marchán) #8

Here’s a working example:

➜ npm pack 'zkat/pacote#v8.1.6' --dry-run --json | grep integrity
    "integrity": "sha512-wTOOfpaAQNEQNtPEx92x9Y9kRWVu45v583XT8x2oEV2xRB74+xdqMZIeGW4uFvAyZdmSBtye+wKdyyLaT8pcmw==",

➜ npm pack pacote@8.1.6 --dry-run --json | grep integrity
    "integrity": "sha512-wTOOfpaAQNEQNtPEx92x9Y9kRWVu45v583XT8x2oEV2xRB74+xdqMZIeGW4uFvAyZdmSBtye+wKdyyLaT8pcmw==",

(Vincenzo Ferrari) #9

Ok, got it, thanks.

However, this does not solve the current issue: authors still have the power to publish code that is not what the community expect to find on a public repo.
I mean, it’s good to have tools that allow you to check if a specific version is exactly what you found on Github (even if it doesn’t work all the times) but any published version should be verified firstly by NPM.

Let’s see this issue from another point of view: what does NPM provide to avoid a situation like this one -> https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 ?


(Jordan Harband) #10

Since any package authored with babel, or coffeescript, or typescript has published code (that’s almost always gitignored), and thus doesn’t match the repo, how would you handle this case?

Many packages also npmignore some files. I feel like the common case is that the npm package doesn’t exactly match the git repo.


(Vincenzo Ferrari) #11

@ljharb an author can publish two kind of packages: private (or closed, proprietary) and public (or open).
For private packages (those that do not have their source code hosted in a public repo), NPM should warn consumers that are downloading private packages, so the responsibility will be upon them.
For public packages, NPM should attest the consistency between the source code and what’s published on the registry.

Now, for public packages, there are three scenarios:

  1. an author may mirror the whole source code on the NPM registry
  2. an author may publish part of the source code (npmignore, for instance)
  3. an author may publish an artifact as a result of a building process (typescript, babel, minification, etc)

As you can see, it’s always related to a building process:

  1. build(sources) -> sources: this is the first case. The build produces the exact copy of the source code (mirroring, actually no-build)
  2. build(sources) -> partial sources: this is the second case. The build produces an artifact that is composed by a part of the sources.
  3. build(sources) -> artifact: this is the third case. The build produces an artifact that has nothing in common with the sources.

In any case, NPM should attest that the artifact (and published on the registry) is coming from that sources.
Currently, NPM trusts the author but this situation should change in respect of consumers.

Is that THE solution? Not really.
This is just an improvement that pushes open source packages to be more compliant to the open source manifesto.


(Vincenzo Ferrari) #12

It seems Cargo does something like SNPM: https://doc.rust-lang.org/cargo/reference/publishing.html#github-permissions


(Artem Varaksa) #13

No, crates.io just uses GitHub as its only authorization method right now.

Thus it requires the read:org GitHub permission if you want to publish a crate as a team owner, otherwise, as per your link:

you will never be able to add a team as an owner, or publish a crate as a team owner

The only thing I’m aware of is that Cargo checks if the local git repository is not dirty, but it does not check if it is pushed to the remote.


(Vincenzo Ferrari) #14

You’re right, my bad, sorry.


(Aleksei Gurianov) #15

Docker Hub has similar concept called AUTOMATED BUILD. https://docs.docker.com/docker-hub/builds/
Though, monorepos makes following this concept really hard.
But I think we still should explore in this direction.
NPM team could encourage community to use automated builds by introducing some kind of warning messages, and reports about non-automated packages in deps-tree.


(Bali Bebas!) #16

Here’s a practical example:

npm pack --dry-run --json . | grep integrity
npm pack --dry-run --json package | grep integrity

Where . is a directory representing the contents of the gzipped tarball uploaded to NPM and package is the name of a package in the NPM registry. If using git tag before running npm release you will find the value of the archive created there will result in the same SHA-512 digest produced with npm archive tgz.

This is useful as it can be used to check if what’s installed is, in fact, the untampered latest release:

https://registry.npmjs.org/package/latest

In addition to giving you a deterministic method for comparing hashes before, during and after a release.