Let's get rid of root-owned files in the cache!

You know that feeling when you go chasing what you think is a relatively small bug, and it turns out to be a whole nest?

For v6.10.1, we addressed an issue with EISDIR errors when running as root. And, a fix in pacote@9.5.2 addresses getting root-owned files in the cache when installing.

But I like sweeping fixes and tests where possible, and we have this nice big suite of integration tests.

On the isaacs/prevent-root-owned-cache-files branch, I’ve been working on this problem.

It’s a work-in-progress branch, and it’ll get rebased and cleaned up later. The commit history is a bit of a mess at the moment. But in a nutshell, this does the following:

  • Update all the tests so that they use a known cache folder, instead of creating and removing their own cache folder. (Somewhat like the common.pkg that made parallel tests possible.)
  • Fixed a few random cases (log files and metrics) that I knew wouldn’t be going through pacote.
  • Added a npm run sudotest script so that you can run all the tests as root. (If this fails, you may need to so sudo killall -9 node to stop any mock registries that your user account can’t kill directly.) (Took some work to make the tests all pass in this mode, because some rely on scripts being run, so they needed --unsafe-perm set.)
  • If the test is being run as uid 0, and there’s a SUDO_UID and SUDO_GID in the environment, then it scans that folder on teardown, and verifies that nothing in it is owned by uid 0.

Holy moly. A lot of stuff fails.

test/tap/access.js
test/tap/adduser-always-auth.js
test/tap/adduser-legacy-auth.js
test/tap/adduser-oauth.js
test/tap/adduser-saml.js
test/tap/all-package-metadata-cache-stream-unit.js
test/tap/all-package-metadata-entry-stream-unit.js
test/tap/all-package-metadata-update-stream-unit.js
test/tap/all-package-metadata-write-stream-unit.js
test/tap/cache-shasum-fork.js
test/tap/dist-tag.js
test/tap/doctor.js
test/tap/hook.js
test/tap/link.js
test/tap/logout-scoped.js
test/tap/logout.js
test/tap/optional-metadep-rollback-collision.js
test/tap/org.js
test/tap/owner.js
test/tap/prepublish-only.js
test/tap/publish-access-scoped.js
test/tap/publish-access-unscoped.js
test/tap/publish-invalid-semver-tag.js
test/tap/publish-scoped.js
test/tap/publish.js
test/tap/search.all-package-search.js
test/tap/search.js
test/tap/sorted-package-json.js
test/tap/team.js
test/tap/unpack-foreign-tarball.js
test/tap/update-examples.js
test/tap/url-dependencies.js

It looks like anything using npm-registry-fetch drops root-owned files in the cache when run as root.

I have to put this down until at least Monday or Tuesday. If anyone wants to help me dig in, it’d be great. My guess is that there’s a spot where we’re loading up npm-registry-fetch and need to be passing in a uid and gid.

I find it handy to run npm run sudotest -- -b -s sudofails.txt || sudo killall -9 node to work through the tests and just re-run failures.

Sound like fun? This is probably the most common issue being reported on here these days, so let’s squash it once and for all.

(It’d also be good to get pacote and npm-registry-fetch themselves up to 100% test coverage.)

If you’re interested in helping, or just have clues or debugging to share, let me know. You can send a PR against https://github.com/npm/cli/tree/isaacs/prevent-root-owned-cache-files and I’ll pull those in as well. Everyone who pitches in gets an honorable mention in the 6.10.2 release notes, as well as the gratitude of a few million users :smiley:

Figured out the root cause of a bunch of these. Maybe all of them. Plenty of tedious typing to be done, but it should be straightforward enough to plow through in time for the 6.10.2.

Explanation here: https://github.com/npm/cli/commit/fe25213feaa1fd31c35ae5a005e5c03faeabf77c

Help still wanted. Now you’ve got a pointer :slight_smile:

1 Like

Note: I think that passing uid/gid arguments around through all the lib/whatever.js -> libnpmwhatever -> npm-registry-fetch -> make-fetch-happen -> cacache calls, and inferring from the SUDO_* environs, is not ideal.

Rather than take a uid/gid option, cacache should infer the uid and gid from the ownership of any folder and file it creates, and try to maintain that.

The patch referenced in my previous message shows why it’s breaking down, and is a record of my own investigation as to why the presence of uid/gid arguments wasn’t solving the problem. But the better solution is to DTRT all the time, regardless of whether SUDO_UID is set, and remove yet another baton that must be carefully passed all the way down the line.

Ok! cacache@12 fixes this problem. But it’s a breaking change. make-fetch-happen@5.0.0 pulls that in. npm-registry-fetch@4.0.0 pulls that in. I have a lot more semver-major updates to push, and then this’ll be mostly done.

I’ve got all the tests passing now, though, once they have updated versions of everything (patched in place just to make sure it was working before I go off to semver-major-update the world.)

Tomorrow I’ll power through those updates, and then 6.10.2 should have no root owned files in the cache ever. It’s just a typing problem now.

1 Like

Alright, all the updates are published, and it’s running sudo tests on Travis-CI. https://travis-ci.com/npm/cli/jobs/216611827

Once this is green, I’ll restructure the commit history so it’s not such a mess, and push out a prerelease on npm@next that we can announce in the mailing list this week. Next week, I’ll cut the real release and move it over to npm@latest.

The #bugs category is about to get a lot less repetitive, I hope.

2 Likes

This has landed in Release: 6.10.2-next.0 and I can no longer reproduce these errors myself. Neither can travis-ci.

I’m closing this.