A couple of years back as the US presidential campaign was ramping up, the Trump camp did something stupid. I know, we’re all shocked but bear with me because it’s an important part of the narrative of this post. One of their developers embedded this code in the campaign’s donation website:
Until now. I woke up on the other side of the world to most people this morning and my Twitters had gone nuts overnight with this story:
— Scott Helme (@Scott_Helme) February 11, 2018
One site with a cryptominer is one thing (although the fact it was on the UK’s Information Commissioner’s Office is noteworthy in and of itself), but it was much, much more than that. It was the US Courts too. And the UK’s National Health Service. Even my own state government down here had been hit. In fact, more than 4k impacted sites were quickly identified and they spanned all sorts of different industries. However, it wasn’t the sites themselves that had been compromised, rather a script they had a dependency on:
— Scott Helme (@Scott_Helme) February 11, 2018
This is Texthelp and they exist to “help everyone read, write and communicate with clarity in class, at work and in life”. They create assistive technologies, one of which is a product called Browsealoud which does this:
Our innovative support software adds speech, reading, and translation to websites facilitating access and participation for people with Dyslexia, Low Literacy, English as a Second Language, and those with mild visual impairments.
This short video makes the use case pretty clear:
As Texthelp points out on their site, there’s a bunch of regulatory requirements around accessibility which government sites in particular need to play nice with. The value proposition of Browsealoud is that it makes integration dead simple, just copy and paste this one script:
And now we’re back to the Trump problem except it’s no longer hypothetical, it’s real. That script – the one at http://www.browsealoud.com/plus/scripts/ba.js – was maliciously modified to inject a cryptominer and by virtue of it being embedded directly into thousands of sites around the world, the malicious script cascaded down to users of those sites. (Incidentally, at the time of writing that script is offline, consequently breaking every site dependent on it and, one would imagine, possibly leaving them in breach of their accessibility requirements.) Here’s what the modified script looked like:
De-obfuscated, that first snippet of code looks like this:
And there’s your problem – the file at https://coinhive.com/lib/coinhive.min.js is being embedded directly into the site. (Incidentally, Coinhive is a quasi-legitimate service to “Monetize Your Business With Your Users’ CPU Power”, there doesn’t appear to have been any direct involvement from them in this case.)
Now, onto solutions and ultimately onto the paradox referred to in the title. We have a very robust, well-proven defence for this in subresource integrity (SRI). We’ve had this for ages and Scott pumped out a piece in response to this incident explaining precisely how to use it. If you look at the source code of this blog you can see it used courtesy of the “integrity” attribute when I embed Report URI JS:
<script src="https://cdn.report-uri.com/libs/report-uri-js/1.0.1/report-uri-js.min.js" integrity="sha256-Cng8gUe98XCqh5hc8nAM3y5I1iQHBjzOl8X3/iAd4jE=" crossorigin="anonymous"></script>
If – for whatever reason – that library is modified upstream of my website, the sha256 hash of the file will be different to the one specified above and the browser simply won’t run it. It stops attacks like the one today dead. We’ve also got awesome support for it across the major browsers and yes, Edge is behind the curve here but that’ll hit in the next version:
In Scott’s blog post, he also points out that we have content security policies (CSP) which provide another layer of defence. A good policy would have stopped the cryptominer from being loaded from coinhive.com in the first place as it wouldn’t have appeared as a white-listed script source. In short, we have the technology to fix this so why did things blow up so spectacularly today? This is where it gets a bit tricky…
Let’s compare the two scripts I’ve just mentioned, those being Report URI JS and Browsealoud. Here’s the respective paths they’re embedded from:
We will never modify Report URI JS 1.0.1 from its current state. It is, for perpetuity, locked in to that version number. You can safely use an integrity attribute on your script tag because if ever we want to change the implementation, we’ll simply rev the version. If you want fixes or features in version 1.0.2 then you’ll need to update your own script source and, in turn, the value of the integrity attribute. All of which means this:
Versioned external libraries can easily be protected with SRI because the contents of that specific version will never change.
Now, onto Browsealoud and you’ll note there’s no version number when their script is referenced. But whilst this is embedded in precisely the same way as Report URI JS, it’s a different philosophy because rather than being a static library, Browsealoud is a service. Refer back to the comment at the start of the file I showed earlier:
/* [Warning] Do not copy or self host this file, you will not be supported */
At some point in the future, Texthelp may decide to change the Browsealoud implementation. They may make a bug fix to that file. They might change the API endpoints the library calls. They could change the branding. They might add a new feature. They could decide to do anything and by virtue of their subscribers simply embedding the JS directly into their website and effectively saying “ok, over to you guys, implement the service however you like”, they can do anything. And someone did – they put a cryptominer in the file. Which means this:
Non-versioned external libraries can’t be protected with SRI if there’s an expectation that the service providing them may change them in the future.
And that’s the paradox. So how do we fix it? Well firstly, we need to do a bit of threat modelling: If you drill down into the source code of this blog, you’ll notice a script is dynamically injected into the head of the page which looks like this:
<script src="http://troyhunt.disqus.com/embed.js" data-timestamp="1518392252947"></script>
Wait – isn’t this exactly the same story as with Browsealoud?! Yes, it is, and I’m opening visitors to this blog up to a very similar (but ultimately different) risk. If someone pwns that Disqus script, they could add their own arbitrary JS to my site. The threat modelling aspect of this, however, is that I know this is a risk for all the reasons a whole bunch of other people who hadn’t thought about this until today now know it’s a risk. The decision I’ve made has been a conscious one; there is enough value in the Disqus service and a low enough impact on a personal blog were it to be compromised that on balance, it’s an acceptable risk.
However, the bit where my embedding Disqus is ultimately different to the way the other sites were embedding Browsealoud is that I also have a CSP on this blog. That blog post was made only 11 days ago and as you’ll read there, I faced some barriers to get it in place. But now that it’s there, it would stop this attack dead because coinhive.com is not an allowable script source. Yes, the Disqus script could still be modified by the attacker and their arbitrary JS would run in my visitors’ browsers because I don’t have SRI, but no, it wouldn’t be able to pull down the cryptominer. A robust CSP is an awesome defence and because I’m also reporting any violations, I’d know immediately if someone did manage to modify that Disqus script. Compare that to today’s situation where some folks responsible for government sites had absolutely no idea what was going on:
Some government site operators are denying being impacted by the cryptojacking today, despite still having the references to the infected file on their site… pic.twitter.com/t3xgU3zbIz
— Scott Helme (@Scott_Helme) February 11, 2018
This is why CSPs and reporting are so invaluable as they bring visibility you never would have had before. (Incidentally, even though today’s version of Edge can’t do SRI, it can block and report when a CSP is violated so this defence is extra important for the Microsoft browser.) I know I’m waxing lyrical about CSPs and reporting here, but the technology is genuinely that good and it’s why I joined Report URI in the first place!
Now, getting back to that threat modelling, I would argue that governments websites are not the type of site you want to allow this to happen with. They should be using SRI and they should be only allowing trusted versions to run. This requires both the support of the service (Browsealoud) not to arbitrarily modify scripts that subscribers are dependent on and the appropriate processes on behalf of the dev teams. For example, by locking yourself into a discrete version in this fashion you’re not going to automatically get any software updates. But think of what we’re really saying here – that an external service shouldn’t be able to modify active content that executes in your visitors’ browsers without your explicit say so. That sounds very reasonable in this situation and what’s more, it’s something that we should be doing anyway. Have a read of Using Components with Known Vulnerabilities within OWASP’s Top 10 Web Application Security Risks:
If you’re serious about this stuff (as governments should be), then this needs to feature in your software management program. There are resources mentioned above to help you do this – retire.js is a perfect example as it relates to client-side libraries. And yes, this takes work:
Is the tl;dr that good security takes some planning? If so, yes, I agree 😀
— Troy Hunt (@troyhunt) February 12, 2018
But there are also things we can do to help organisations hosting scripts to help their users “fall into the pit of success”, so to speak. For example, follow Cloudflare’s lead and when you provide code snippets for embedding tags, give them the SRI version:
I’d like to see them go further and default to the SRI version (as we do with Report URI JS) or further highlight its value. When I teach people about SRI in my workshops or talk about it at conferences, the vast majority of people don’t know what it is so we need to help educate further on that front. Regardless, Cloudflare’s approach is much better approach than Pastebin’s:
That’s to embed the code sample with the cryptominer from earlier on and as you can see, there’s no SRI on the script tag. If someone modifies that script upstream of the site it’s being embedded it, it’ll simply run whatever is in the file. When I embedded it above, I elected to drop it into the page via the iframe option and I have a frame-src directive in my CSP to allow pastebin.com. That’s a pretty good middle ground of bringing in external content without introducing an unnecessary level of risk, but I’d still love to see that integrity attribute in Pastebin’s sample code.
Then there’s the counter-argument that you should just serve these libraries yourself and not be dependent on a CDN. Besides the point of that not working when we’re talking about services like Browsealoud and Disqus, that also presents all sorts of other problems, particularly around cost and performance. My first big traffic spike on Have I Been Pwned (HIBP) came just days after launching it when I observed the following over a 24 hour period:
I realised, for example, that I’d served up 15GB of jQuery alone – that’s minified and HTTP compressed too. Crikey.
These days, a big day would result in me serving close to half a terabyte of data which could easily come from a public CDN. This is not data I need to pay for. It’s also not data my visitors need to load from a single origin at potentially high latency and they wouldn’t need to load it at all if they’d already been served that file from another site using the same CDN. There are many, many good reasons for using a globally distributed CDN to serve content and with a combination of SRI and CSP, we can do this without wearing the risks of what we saw happen earlier today. Last thing on that front – I’d also argue that it’s one thing to use a CDN hosted by Cloudflare or Google and quite another to use one provided by an organisation that before today, most people had never even heard of.
Frankly, I think we all got off a bit lightly from today’s event. This was a very rudimentary and opportunistic attack. It was also highly visible and happened at one of the quietest periods of the week. Imagine for a moment if that really clever thought piece from last month about harvesting credit cards had have come to reality instead. Do read that – it’s enormously thought provoking – and it’s hard not to conclude that we totally dodged the proverbial bullet today. Question is, will it be enough to drive change in the way sites are creating dependencies on external scripts?
Finally, if you’d like to see a demo of precisely how the browser handles SRI when the script has been modified upstream, check out this talk from NDC Oslo last year (embedded at 7:06 where the SRI bit begins, runs for about 11 mins):