npm randomly hangs on install


(Jordie23) #1

What I Wanted to Do

npm install and have everything install

What Happened Instead

When it doesn’t work, npm seems to do one of two things:

  • Primary issue: Progresses part way and then it hangs and never progresses further. Wanted to see if it’d time out at some point, we had it go for 11 hours one time before having to stop it.
  • Secondary (possibly related?) issue: Complains that the cache file is malformed JSON and dies.

Reproduction Steps

It happens randomly on npm install. We are using private packages, so I can’t just post everything here, I can try and post a redacted log if that helps?


Regarding the hanging:

It’s hard to reliably replicate the problem, but when it does happen it completely stalls our work. At the times it happens we check NPM status page and there are no reported issues.

We have tried with verbose logging and even --loglevel silly, but as it doesn’t output any information until whatever it is doing is finished, we never know what exactly it is hanging on. The last line is always the last successful operation. The last line usually appears to be a successful extraction, e.g.

sill extract has-flag@3.0.0 extracted to /var/lib/....../node_modules/@babel/highlight/node_modules/has-flag (2ms)

It’s not usually the same package, so appears to be hanging at different points? Is there something else we can look at in this output?

Regarding the JSON error:

When I check the cache file on the system it appears the JSON blob is suddenly cut off. E.g. The end of the file just goes ....."eslint-config-airbnb":"^12.0.0","eslint-config-google":"^0.6.0","eslint-plugin-impo then EOF. Nothing more after that o. So either the JSON is not validated before being cached? Or writing to the file is cut off abruptly somehow?

I have noticed when playing with the responses from the registry, that when getting package information for one of our private npm packages the CDN (cloudflare) will not include a content-length header when the CDN doesn’t have a cached version, and then it will include it when cloudflare has a HIT. Not sure if this will affect how the cache is stored? The response is 5.97MB uncompressed, about 286.09kb compressed (this is the number reported when there is a content-length header, which is correct). Malformed cache file was around 380kb.

We have tried npm cache clear --force when there is a JSON error and that will resolve it barring any further hanging, but it’s not a proper solution. Especially for our CI system.

Should the JSON cache be validated before saving? Or if invalid on reading could it be deleted and ignored and then retry direct from the registry instead of dying?

Platform Info

It happens for our team randomly on our laptops (mac OS) and also on our CI system (ubuntu 16.04). All npm versions are 6.4.1 and node 8.

$ npm --versions

 My local:
{ npm: '6.4.1',
  ares: '1.10.1-DEV',
  cldr: '32.0',
  http_parser: '2.8.0',
  icu: '60.1',
  modules: '57',
  nghttp2: '1.25.0',
  node: '8.11.1',
  openssl: '1.0.2o',
  tz: '2017c',
  unicode: '10.0',
  uv: '1.19.1',
  v8: '6.2.414.50',
  zlib: '1.2.11' }

Our CI node:

{ npm: '6.4.1',
  ares: '1.10.1-DEV',
  cldr: '32.0',
  http_parser: '2.8.0',
  icu: '60.1',
  modules: '57',
  napi: '3',
  nghttp2: '1.32.0',
  node: '8.12.0',
  openssl: '1.0.2p',
  tz: '2017c',
  unicode: '10.0',
  uv: '1.19.2',
  v8: '6.2.414.66',
  zlib: '1.2.11' }
$ node -p process.platform
local: darwin
CI: linux

We are located in Australia. Our CI system and some developers are in Sydney and hitting cloudflare’s SYD edge, some developers are in Melbourne and hitting the MEL cloudflare edge cache.

The problem is happening on 3 completely different networks and ISPs. All are high speed connections. We are not using any proxies on any connection.

Let me know if I can provide any further information to help.

(Jordie23) #2

Is there any debugging I can do to see what is hanging exactly? Or what was the last URL accessed by npm?

(Kat Marchán) #3

You can usually use -ddd or --loglevel silly if npm is hanging to see what the last thing it did was. Usually if it hangs like this, it’s often a network problem. If it hangs longer than 30s, idk what it might be.