Security ain’t simple, and it will never be

Every few months or so, we get a message from a customer that sounds like this:

I am looking to integrate JWT to my app. I found this tutorial and trying to follow it in my code. I am now trying to encrypt the signature with an RSA public key and decrypt it later with my private key to compare the hashes, but for some reasons my encryption results are always different.

If you don’t follow what’s happening, and I think most of my readers don’t, here’s what.

First, one guy publishes a tutorial that explains the townsfolk a general process of building a space rocket. Just take some titanium for the body, solder a guidance system (shouldn’t be that much harder than soldering that SatNav chip to your Arduino board), get some rocket fuel – just be careful, it is a bit super-deadly – and in a few months top you’ll be able to check for yourself whether the Great Wall can really be seen from space.

This makes Mick, an honest town lad, interested (he was a bit into rockets himself back in Y7), and he decides to launch a space travel business, using that tutorial as a guide for building his own space rocket. Mick decides to replace titanium with aluminum (as that is cheaper that way), but his aluminum doesn’t stay in shape as per the instructions because the feathering is too heavy for it. He feels frustrated and decides to get rid of some of the feathering.

Meanwhile, the town is getting interested in the project, and Mick’s bookings are growing steadily.

* * *

When my friend got her first car, her mum said to her: I’m super happy for you, darling. Could you please promise me that you will always bear in mind one important thing: it may not always look like that, but you are about to take care of a 3-tonne killing machine. Please be careful.

My friend recalls these words every time she turns the key.

We need to grow up. We need to understand that security is serious. We need to bear in mind that by integrating security into a product we are taking care, well, not of a killing machine, but of something of a very similar scale. Taking it lightly is extremely dangerous.

And I think Mick is as much of a victim here as his customers are. Tutorials like the one mentioned in the beginning of this post make complex things look simple. They make high-risk systems appear risk-free. They say, ah look at this funny thing here, it is called security and even you can do it. Go ahead!

I have actually been a Mick numerous times myself. I love doing things with my hands and consider myself a capable DIY’er – something of an orange or even green belt. And yet, dozens of times I have let YouTube DIY videos delude myself into thinking that a job is not as complex as I thought it was. Hey, just look how easy it was for that young couple to build a patio. Surely it can’t be that hard?

The outcome? I don’t want to talk about it.

And that’s why I stopped writing any manuals, guidance, todo’s, instructions, or whitepapers on security topics unless I am absolutely certain that the audience is capable of following them. Even when I do, I warn my readers that the job they are looking to embark on requires excellent technical competence, and I do so boldly and unambiguously. Security engineering is one of the largest surfaces for the dropped washers, and by directing irresponsibly you are playing your own part in creating the future chaos.

So, let’s re-iterate it for one last time:


WARNING:

Security is complex and can be dangerous if approached irresponsibly. Please, do not make it look simple.


Picture credit: FDA

Check Your Backups, Now

Last week, a number of services hosted in Google Cloud suffered a dramatic outage. Following a maintenance glitch, services like YouTube, Shopify, Snapchat, and thousands of others became unavailable or very slow to respond. Overall, the services were down for more than four hours, before the availability of the platform was finally restored.

The curious thing about this incident was not the outage itself (sweet happens), but the circumstances behind it that made it last that long. Cloud service providers, as a rule, aim for the highest levels of availability, which are carved in their SLAs. So how could it happen that one of the leading global computing platforms was taken down for more than four hours? Happily, Google is very good in debriefing its failures, so we can have a sneak peek at what have actually happened behind the scenes.

It all started with a few computing nodes which needed to undergo routine maintenance and thus had to be temporarily removed from the cloud – a common day-to-day activity. And then something went wrong. Due to a glitch in the internal task scheduler, many more other, worker nodes had been mistakenly dismissed – drastically reducing the total throughput of the platform, and causing a Chertsey-style gridlock.

Ironically, Google did everything right, exceptionally right. They considered that risk on the design stage. They had a smart recovery mechanism in place that should have kicked in to recover from the glitch and provide the necessary continuity. The problem was that the recovery mechanism itself was supposed to be run by the faulty scheduler. Yet, being a system management task with a lower priority than the affected production services, it was pushed far back in the execution queue. And since the queue was miles long by that time, the recovery service in the choking cloud has never made its way to its time slice.

Any lessons we can learn from this incident? There are myriads; the deeper your knowledge about cloud infrastructures is, the more conclusions you can draw from it. A security architect can draw at least the following two:

1. Backing up systems is a process, not a one-off task. Your backup routine might have worked at the time you set it up, but things break, media dies, and passwords change.  Don’t risk, go and test your backups now – emulate a disaster, pull that cord, and see if your arrangements are capable of providing continuity. Don’t be tempted just to check the scripts – try the actual process in the field. Put this check on your schedule and make it a routine.

2. When designing a backup or recovery system, take extra care to minimize its dependencies on the system being recovered. It is worth remembering that modern digital environments are very complex, and you might need to be quite imaginative to recognise all possible interdependencies. The recovery system should live in its own world, with its own operating environment, connectivity, and power supply.

It is very easy to get caught in this trap, as it gives us the imaginary peace of mind we’re craving for. We know that the system is there for us, and we sleep well at night. We know that should a bad thing happen, it will give us its shoulder. We only realise it is not going to when it’s too late to do anything to make it right.

Just as I was writing this, my friend called me with a story. She went on an overseas trip, and, while being there, wanted to Skype home. Skype, however, having realised her IP was unusual, applied extra security and sent her a verification e-mail. It all would have ended there, if only her Skype account wasn’t bound to a very old e-mail account at an ISP that was blocked in the country for political reasons – so she couldn’t get to her inbox to confirm her identity. Luckily it was just Skype and luckily she knew about VPN – but the things might have become way more complex with a different, life-critical service.

So, really, you will never know how a cow catches a hare. There are way too many factors that may kick in unexpectedly, and, worst of all, unknown unknowns are among them. Still, by using the above two approaches wisely and persistently, you may reduce the risks to the negligible level, which is well worth the effort.

Picture credit: danielcheong1974

Facepalm

Facebook, ever again, shows that it prefers to learn on its own mistakes rather than someone else’s. This time, it’s about storing passwords in plain text: a textbook security negligence, at different times stepped on by Equifax, Adobe, and Sony.

And this really doesn’t help in building confidence in the social network. We entrust them our most personal pieces of information, and they don’t give a damn about keeping it safe.

We have found no evidence to date that anyone internally abused or improperly accessed them.”, said Pedro Canahuati, Facebook’s vice president of engineering, security, and privacy. Given all the recent breaches in this company’s security, I can’t help translating this to human language as “we didn’t bother so we didn’t put any access control audit mechanisms in place, so whoever saw your passwords, there is no (and can’t be) any real evidence to that.”

Just a couple of days ago I was asked to send money via Facebook payment service. In the middle of the payment process I realized it is not possible to make the payment – which would have been a one-off one for me – without having Facebook remember either my card or Paypal details. I stopped, closed the Facebook tab, and paid with a different method. Glad I did.

Picture credit: Alex E. Proimos

The Greatest Backdoor

The greatest backdoor of all times might be running right before your eyes.

Earlier today we were quite surprised to discover that our Windows build server rebooted after installing another set of automatic updates. This looked weird, as automated reboots without an administrator’s approval have never been on our security policy. Still, given that we have just upgraded our Windows Server from 2012 to 2016, we believed it to be a misconfiguration issue and embarked on correcting it.

Surprisingly, disabling automated restarts in Windows Server 2016 appeared to be not an easy task. Believe it or not, but unlike it used to be in Server 2012, there is no direct setting in Server 2016 to disable the reboots. You have to employ awkward workarounds, like always having someone logged in, to stop your server from rebooting. Otherwise, it will always reboot automatically, every time a yet another bunch of updates are downloaded and installed.

This looks very worrying. Many server administrators quite reasonably prefer to be in control of reboots of their servers to harmonise them with their working hours, system load, backup and maintenance schedules, and myriad other factors. A mission-critical server that reboots out of the blue in the middle of the night may (and will) lead to all sorts of problems – from a local DoS after failing to complete the restart, to a gaping hole in the company’s network if a third-party IPS fails to co-operate with the updated version of some Windows component.

From a more distant perspective, by removing the possibility to disable automated reboots, Microsoft has acquired a gigantic ‘power switch’, which it can use to force thousands of servers across the world into rebooting by simply sending them a specific ‘update’ package. This puts the owners of those servers into an uncomfortable position of hostages. Even if we do believe in good intentions of the Seattle company, how can we be sure that someone won’t break into their update delivery environment one day, and use the legitimate update procedure to send to all the Windows servers out there a deadly restart command?

Image credit: pngtree.com