Scoped packages have inappropriate cache expiry, causing inconsistencies due to caching

registry

(Rhys Arkins) #1

What I Wanted to Do

Get consistent results for scoped package versions from the npm registry.

What Happened Instead

Recently published scoped packages can sometimes “disappear” temporarily soon after they first appear, leading to npm install errors for projects that have already updated to the latest.

Reproduction Steps

It cannot be reproduced simply. I see this happening a lot as part of a dependency update bot/service I run over thousands of repositories. It only happens for scoped dependencies and not non-scoped.

Details

My service creates PRs when it detects new versions of npm packages available for a repository, and it also uses an npm follower to get them pretty quickly. Sometimes the new version is absent from the registry response in a subsequent check (e.g. an hour later) and that results in either:

  1. The previous PR being closed automatically, as it seems no longer applicable, OR
  2. A new “rollback” PR being necessary if the previous PR was already merged

It’s almost always scoped packages that this happens for, which makes it highly unlikely that it’s a problem with my service/cache. If my client had a mistaken cache then you would expect to see 90%+ of the roll back PRs being non-scoped as a proportion of total, whereas they are 99% scoped.

An example of this happening on 2018-07-16 was with the @babel packages. In one example I looked at, at 18:03 UTC @babel/* versions 7.0.0-beta.54 were detected, and at 19:39 UTC a result was returned showing 7.0.0-beta.53 as “latest”. At 20:10 UTC I triggered another check where 7.0.0-beta.54 was returned again.

This is causing big problems for npm users, because subsequent npm install commands can fail. Having it happen to scoped packages is particularly unfortunate because these are often “internal” packages and users do want to merge them immediately - sometimes automatically - after tests pass, so that downstream packages can use them immediately. Holding back updates of scoped packages for ~2 hours to work around this problem is not feasible.

I don’t expect this to be an easy one to be solved but I think it’s imperative considering that private scoped modules is npm’s main revenue stream and it seems to not be working reliably. Maybe certain npmjs caches are not updating correctly so it’s based on chance?

Platform Info

This is not using the npm CLI tool - queries are done directly using registry REST API.


(Kat Marchán) #2

Can you be more specific about which endpoints you’re using, and how you’re using them? I’m assuming you’re using a registry follower to get notified of changes, and then you do a request to https://registry.npmjs.org/some-pkg?


(Rhys Arkins) #3

Yes, the registry follower is what initially leads us to detect/update dependencies fairly quickly - scoped or not. Otherwise the process is running hourly in the background at all times of the day.

Whether the process is triggered by a registry follower or not, the lookup approach is the same - we perform a REST query to https://registry.npmjs.org/some-pkg. So in the case of @babel/cli it would be https://registry.npmjs.org/@babel%2Fcli, for example. All servers are located in the AWS Oregon region, so geography shouldn’t be making any impact.


(Kat Marchán) #4

Can you put together a script that shows what you’re doing and be more specific about which part fails? It might be helpful to clarify questions about what exactly your expectations are in re this syncing.


(Rhys Arkins) #5

Can I start with a more clear timeline for now, hopefully that’s enough to help describe it more clearly? I’ll use @babel/cli as an example:

  1. 2018-07-16T17:59Z - 7.0.0-beta.54 published (according to npmjs record)
  2. 2018-08-03T18:03Z - npm registry queried, 7.0.0-beta.54 returned as “latest”
  3. 2018-08-03T19:39Z - npm registry queried, 7.0.0-beta.53 returned as “latest”
  4. 2018-08-03T20:10Z - npm registry queried, 7.0.0-beta.54 returned as “latest”

Obviously it’s step 3 that’s the problem. I don’t know if the entire npm registry somehow “rolled back” to an old version, or if it’s a problem with some of your backend servers sometimes not “following” correctly and I’m being unlucky to hit the “some” occasionally.

The system is quite complex and decoupled so it’s quite possible that plenty of times step 2 could instead return the old version, but in those cases that doesn’t cause any errors - the upgrade will just be picked up on the next scheduled run.

If the problem is now clear, I’m happy to take suggestions if there’s something I can log in future to assist debugging it on the backend.


(Rhys Arkins) #6

I’ve added additional debugging to my app to try to help troubleshoot this issue. Here is an example using @types/node. 10.5.8 showed up (correctly) as latest, then it disappeared, then came back again.

According to the registry@types/node v10.5.8 was published at 2018-08-11T01:15:34.524Z and we received this soon after:

"package": "@types\/node",
"latest": "10.5.8",
"headers": {
	"date": "Sat, 11 Aug 2018 01:46:55 GMT",
	"content-type": "application\/json; charset=UTF-8",
	"set-cookie": [
	  "__cfduid=dd572030e03fbbbd83ba34dc87d4d140d1533952015; expires=Sun, 11-Aug-19 01:46:55 GMT; path=\/; domain=.registry.npmjs.org; HttpOnly"
	],
	"cache-control": "public, max-age=14400",
	"cf-cache-status": "HIT",
	"cf-ray": "44870801d84c79ff-SEA",
	"etag": "W\/\"f2e287a3fb26a2e72deedbec13526bbb\"",
	"expect-ct": "max-age=604800, report-uri=\"https:\/\/report-uri.cloudflare.com\/cdn-cgi\/beacon\/expect-ct\"",
	"last-modified": "Sat, 11 Aug 2018 01:15:50 GMT",
	"vary": "Accept-Encoding",
	"server": "cloudflare",
	"content-encoding": "gzip",
	"age": "6731"
},
"msg": "npmjs res.headers for scoped package",
"time": "2018-08-11T03:39:06.589Z"

The last-modified header corresponds with the registry’s publish time for v10.5.8. An age of 6731 seconds is about 112 minutes, which would be about 01:27 GMT, i.e. about 12 minutes after publish.

Skip ahead an hour, and npmjs has “reverted”/disappeared the new version:

"package": "@types\/node",
"latest": "10.5.7",
"headers": {
	"date": "Sat, 11 Aug 2018 01:14:32 GMT",
	"content-type": "application\/json; charset=UTF-8",
	"set-cookie": [
	  "__cfduid=d43bbafa7c8456f405ab94d5d30d401981533950072; expires=Sun, 11-Aug-19 01:14:32 GMT; path=\/; domain=.registry.npmjs.org; HttpOnly"
	],
	"cache-control": "public, max-age=14400",
	"cf-cache-status": "HIT",
	"cf-ray": "4486d892aaaa2a55-SEA",
	"etag": "\"7671afc73c8b186fb2cfc08723065c2a\"",
	"expect-ct": "max-age=604800, report-uri=\"https:\/\/report-uri.cloudflare.com\/cdn-cgi\/beacon\/expect-ct\"",
	"last-modified": "Tue, 07 Aug 2018 00:31:30 GMT",
	"vary": "Accept-Encoding",
	"server": "cloudflare",
	"content-encoding": "gzip",
	"age": "12617"
},
"msg": "npmjs res.headers for scoped package",
"time": "2018-08-11T04:44:49.336Z"

The last-published-date here reverts back to 7th August 00:31, which is a little strange because the publish date for 10.5.7 is 2018-08-06T22:04:36.827Z, but it’s pretty close. The age of 12617 seconds though corresponds to the publish date of 10.5.8, if you deduct it from the time field recorded.

One hour later and it’s fixed again:

"package": "@types\/node",
"latest": "10.5.8",
"headers": {
	"date": "Sat, 11 Aug 2018 05:45:57 GMT",
	"content-type": "application\/json; charset=UTF-8",
	"set-cookie": [
	  "__cfduid=db04294f52a361f116acd553da892398f1533966357; expires=Sun, 11-Aug-19 05:45:57 GMT; path=\/; domain=.registry.npmjs.org; HttpOnly"
	],
	"cache-control": "public, max-age=14400",
	"cf-cache-status": "HIT",
	"cf-ray": "448866236d96bcc2-SEA",
	"etag": "\"f2e287a3fb26a2e72deedbec13526bbb\"",
	"expect-ct": "max-age=604800, report-uri=\"https:\/\/report-uri.cloudflare.com\/cdn-cgi\/beacon\/expect-ct\"",
	"last-modified": "Sat, 11 Aug 2018 01:15:50 GMT",
	"vary": "Accept-Encoding",
	"server": "cloudflare",
	"content-encoding": "gzip",
	"age": "574"
},
"msg": "npmjs res.headers for scoped package",
"time": "2018-08-11T05:55:31.012Z"

The last-modified-date is correct again, however the age field seems to have reverted. The etag reverts back to the same however it is now a strong etag, not a weak one.

In summary between these 3 captures:

  1. latest version has reverted from 10.5.8 to 10.5.7 and then fixed again to 10.5.8
  2. last-modified reverted from “Sat, 11 Aug 2018 01:15:50 GMT” to “Tue, 07 Aug 2018 00:31:30 GMT” and then back again
  3. The first etag is weak while the third one matches but is strong: "etag": "W\/\"f2e287a3fb26a2e72deedbec13526bbb\"" and "etag": "\"f2e287a3fb26a2e72deedbec13526bbb\"".

(Rhys Arkins) #7

On 13th August I saw one case that looked like the same problem described above but was for a non-scoped package. Because I only added lengthy debugging for scoped packages, I can’t confirm for sure that it was the same problem, but it seemed to be. Every other case in the meantime has been for scoped packages however.


(Rhys Arkins) #8

I just noticed something that may be the root cause here.

❯ curl -I https://registry.npmjs.org/@babel%2Fcore
HTTP/2 200
date: Thu, 16 Aug 2018 19:46:38 GMT
content-type: application/json; charset=UTF-8
content-length: 68764
set-cookie: __cfduid=d7edb749929f07ceb5430cb687a8b7c571534448798; expires=Fri, 16-Aug-19 19:46:38 GMT; path=/; domain=.registry.npmjs.org; HttpOnly
cf-cache-status: HIT
cache-control: public, max-age=14400
accept-ranges: bytes
cf-ray: 44b66880f99b3c9b-CPH
etag: "08a0defe41956f6047ea9ffe5f2927cc"
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
last-modified: Thu, 09 Aug 2018 20:10:08 GMT
vary: Accept-Encoding
server: cloudflare

What catches my eye is this:

cache-control: public, max-age=14400

Notice the difference when a non-scoped package is queried:

❯ curl -I https://registry.npmjs.org/babel
HTTP/2 200
date: Thu, 16 Aug 2018 19:47:28 GMT
content-type: application/json; charset=UTF-8
content-length: 249178
set-cookie: __cfduid=dedd20e8a503364ff80c6e6b719c675b91534448847; expires=Fri, 16-Aug-19 19:47:27 GMT; path=/; domain=.registry.npmjs.org; HttpOnly
accept-ranges: bytes
cache-control: max-age=300
cf-cache-status: HIT
cf-ray: 44b669b3e86b3cf5-CPH
etag: "dffa6c00f74dd795004a6c754a143a0a"
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
last-modified: Sat, 26 May 2018 01:41:28 GMT
vary: accept-encoding, accept
server: cloudflare

I wonder if the root cause is that npmjs is returning a 4 hour cache time for scoped packages but only 5 minutes for non-scoped. This would explain why any RFC-compliant proxies or clients could be returning “old” results after a package is published.


npm view/npm install can't find latest package version
(Kat Marchán) #9

this can probably at least partially be fixed CLI-side, because we don’t remove cache entries for packuments when we publish a package to the registry, iirc. So if you install a package, then you publish it, your local cache won’t even bother with a 304 check.


(Rhys Arkins) #10

I have renamed this issue because I think the original one was probably wrong.

What I think now is the root cause is the 14400 second cache-control max-age value returned for scoped packages. As per my understanding, this instructs clients that they can keep reusing any existing registry response for up to 4 hours without revalidating, which naturally is going to cause a lot of problems for RFC-compliant clients or proxies.

For my client/app that queries the registry directly using an RFC-compliant client, I have simply disabled all caching for scoped packages to work around this. Because my app had multiple machines/caches it meant sometimes it found a cached result and sometimes not, and this is why it looked like versions were “disappearing” for up to 4 hours after first detected, any time the query landed on a machine that already had a cached (old) result. I am pretty sure npmjs/cloudflare are still returning “new” results and not caching internally for the 4 hours.

However for anyone in an enterprise setting using any npm client behind an RFC-compliant proxy, the proxy may be caching and returning old scoped package values for hours longer than you expect, and there may be no workaround until the cache time of scoped packages is reduced to a lower value like 300s for non-scoped.


(Rhys Arkins) #11

The npm registry now returns 300s max-age for Cache-Control for scoped packages too, so I consider this to be resolved.


(system) #12

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.