SSL handshake failures on registry.npmjs.org


(Gavin Aiken) #1

We’re seeing intermittent SSL handshake errors on registry.npmjs.org. This is primarily affecting npm and jspm on our CI build server, but we have been able to recreate the SSL error with a simple curl command and a node script. It only affects about 1 in 20 requests but that is enough to break most builds because of the number of npm and jspm commands that are run.

There is no proxy involved, our servers have direct outbound access. We are using node v10.15.0 and npm 6.7.0.

Here are some sample commands with the error we see when the SSL handshake fails.

# npm install --cache /tmp/empty-cache express
npm ERR! code EPROTO
npm ERR! errno EPROTO
npm ERR! request to https://registry.npmjs.org/content-type failed, reason: write EPROTO 140504976476032:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1407:SSL alert number 40
npm ERR! 

npm ERR! A complete log of this run can be found in:
npm ERR!     /tmp/empty-cache/_logs/2019-01-29T20_34_38_620Z-debug.log
$ jspm install -y
... some output deleted...
warn Error on lookup for npm:aurelia-ui-virtualization
     Error: write EPROTO 140049876600704:error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/record/rec_layer_s3.c:1407:SSL alert number 40

         at WriteWrap.afterWrite [as oncomplete] (net.js:788:14)

err  Error looking up npm:aurelia-ui-virtualization.

Here’s a simple node script which illustrates the problem:

const https = require('https');
 
let req = https.get('https://registry.npmjs.org/indexof', (res) => {
  console.log('statusCode:', res.statusCode, 'ip:', res.connection.remoteAddress);
});
 
req.on('error', (e) => {
  console.error(e);
});

error when that fails:

{ Error: write EPROTO 140413770143616:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/s23_clnt.c:802:
 
    at _errnoException (util.js:992:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:864:14) code: 'EPROTO', errno: 'EPROTO', syscall: 'write' }

And here’s a curl command which also shows the error:

# curl -v https://registry.npmjs.org
* About to connect() to registry.npmjs.org port 443 (#0)
*   Trying 104.16.24.35...
* Connected to registry.npmjs.org (104.16.24.35) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* NSS error -12286 (SSL_ERROR_NO_CYPHER_OVERLAP)
* Cannot communicate securely with peer: no common encryption algorithm(s).
* Closing connection 0
curl: (35) Cannot communicate securely with peer: no common encryption algorithm(s).

Note that in all cases, i.e. with the jspm or npm command, the test script, and curl, they succeed approx 19 out of 20 (or something like that) times. It appears to be a bad server or servers in the farm which serve registry.npmjs.org which we only hit occasionally.


(Gavin Aiken) #2

After more debugging and digging we found a work-around to avoid the issue. registry.npmjs.org resolves (currently) to a list of 12 different IP addresses, presumably load balancers, and it looks like the problem lies with one or more servers behind just one of those twelve IPs, the bad one in question being: 104.16.24.35. Even if we hit that IP exclusively we don’t always have SSL handshake issues, but we do see them more frequently. So presumably it is only some bad servers, not all of them, behind that load balancer.

If we avoid hitting that IP, we see no SSL handshake issues. So we have implemented a DNS hack to prevent our servers from seeing that IP as a possible resolution of the name registry.npmjs.org. If others are seeing the problem, until npm gets their server issues resolved, you could try the same thing, either with hosts files or a DNS hack like we did.


(Gavin Aiken) #3

Update - looks like the problem has been fixed, we are no longer seeing the issue even on that one IP we singled out as the troublesome one. Presumably someone at npm fixed a bad server!


(system) closed #4

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.