The Dropped Washer Effect

One of these buildings can melt your car down. Can you identify the culprit?

Have you ever come across a situation where something, utterly negligible and minor, had become a cause for a major disruption or even an accident? Such as a small crack in an underground water pipe, dripping inconspicuously for a couple of years, and eventually causing a landslide after accumulating a critical mass of water? Or a seemingly common glass building capable of focusing the sunlight so that it melts the bodywork of cars parked nearby?

If so, chances are high that you observed an example of the Dropped Washer effect. Named after a Boeing 737 accident in Okinawa, Japan, the dropped washer effect describes large-scale adverse events that happened because of the cause of an incomparably lower significance. The unfortunate Boeing ended up burning out completely because of a missing slat mechanism washer, 0.625 inches wide, that the engineering crew forgot to replace after the aircraft’s last service.

One characteristic of the potential dropped-washer features that makes them particularly naughty is their zero perceived value for the business. Offering no added opportunities and presenting no apparent risks for the product, they often do not even exist in the minds of the product stakeholders. This important peculiarity makes it all too easy for them to slip every safety measure employed in modern production flows – from risk assessment to quality control.

Happily, in many cases there are techniques that can help increase our chances of spotting and eliminating the dropped washers from our projects.

Check out my new paper here.

Picture credit: Reuters

The Greatest Backdoor

The greatest backdoor of all times might be running right before your eyes.

Earlier today we were quite surprised to discover that our Windows build server rebooted after installing another set of automatic updates. This looked weird, as automated reboots without an administrator’s approval have never been on our security policy. Still, given that we have just upgraded our Windows Server from 2012 to 2016, we believed it to be a misconfiguration issue and embarked on correcting it.

Surprisingly, disabling automated restarts in Windows Server 2016 appeared to be not an easy task. Believe it or not, but unlike it used to be in Server 2012, there is no direct setting in Server 2016 to disable the reboots. You have to employ awkward workarounds, like always having someone logged in, to stop your server from rebooting. Otherwise, it will always reboot automatically, every time a yet another bunch of updates are downloaded and installed.

This looks very worrying. Many server administrators quite reasonably prefer to be in control of reboots of their servers to harmonise them with their working hours, system load, backup and maintenance schedules, and myriad other factors. A mission-critical server that reboots out of the blue in the middle of the night may (and will) lead to all sorts of problems – from a local DoS after failing to complete the restart, to a gaping hole in the company’s network if a third-party IPS fails to co-operate with the updated version of some Windows component.

From a more distant perspective, by removing the possibility to disable automated reboots, Microsoft has acquired a gigantic ‘power switch’, which it can use to force thousands of servers across the world into rebooting by simply sending them a specific ‘update’ package. This puts the owners of those servers into an uncomfortable position of hostages. Even if we do believe in good intentions of the Seattle company, how can we be sure that someone won’t break into their update delivery environment one day, and use the legitimate update procedure to send to all the Windows servers out there a deadly restart command?

Image credit: pngtree.com

Skill vs Technology: a zero-sum game?

Last week I came across two peculiar stories dedicated to the role played by technology in the evolution of the civil aviation industry. While the stories were barely related to each other at first glance – and had I come across them at different points in time I probably would have never spot that huge connection between them – but luckily I was still thinking about the first story when I bumped into the second one, and immediate realisation of the scale of the apparent trend made quite an impression on me.

That first story was about the role of technology in the crash of Air France transatlantic flight 447 back in 2009. The primary conclusion from the investigation that the article elaborates on is that the pilots were so got used to flying with assistance of the autopilot that they became completely lost when they faced the need to fly the aircraft manually. They’ve just got no understanding of the situation whatsoever, as they’ve lacked the hands-on feel for flying the aircraft at cruise altitudes – something normally handled by the autopilot. In addition to that, the autopilot, designed with intelligence and pilot-friendliness in mind, didn’t warn the pilots that the aircraft was approaching a complete stall, after interpreting the way-too-sharply plummeting speed as an indicator of a probable false alarm.

Confused and lost, the pilots applied several corrective actions to get the aircraft back on its course. Unfortunately, due to the pilots’ lack of situational awareness, those actions had become fatal. The A330 lost its airspeed and crashed into the ocean, killing all 228 people on board. Ironically, the investigation had shown that should the pilots not intervene in the situation, AF447 would have continued at its cruise altitude as it should even with its autopilot switched off.

The second story I read was on a far more positive side, depicting prospective transition of the London City airport air traffic control tower from the airfield itself to a small place called Swanwick, Hampshire, some 80 miles away. Specifically, twelve HD screens and a thick communication channel are going to be used to replace the existing watch tower, and are claimed to provide far better insight into aircraft landings and take-offs performed at the airport, as well as a number of augmented reality perks. The experience of LCY is then expected to be picked up by other airports around the country, effectively making air traffic control tower operations an outsourced business.

The fact that impressed me the most about these two articles was that they both, despite being barely related per se, are essentially telling us the same story, the story of skills typically attributed to humans being taken over by technology. It’s just that the first article tells us about the end of the story, while the second one is rather at the very beginning of it.

Just like such advances in technology as the glass cockpit and way-too-smart autopilots led to pilots losing their grip in manual flying, switching to augmented HD view of the runway will inevitably lead to air traffic control operators losing their basic skills, like tuning binoculars or assessing meteorological conditions by a dozen of nearly subconscious cues. The trained sharpness of their eyes, now supported by HD zoom, will most certainly diminish. Sooner or later, the operators will be unable to manage the runway efficiently without being assisted by the technology.

And this is the challenge we are going to face in the near future. The more human activities typically referred to as ‘acquired skills’ are going to be taken over by technology and automation, the less able we are going to be about those skills ourselves. If a muscle isn’t constantly trained, it wears off. If a musician stops playing regularly, she eventually loses her skill to improvise. If a cook doesn’t dedicate as much time to cooking, his food loses its character, despite cooked from the same quality ingredients and using the same proportions.

And that’s not necessarily bad. As the technology is inevitably making its way to our lives, taking over those of our skills which it can perform better than we do, there is no reason not to embrace it – but embrace thoughtfully, realising the consequences of us losing grip on them. Remember that we had lost a big deal of our skills to the past already. Your great-grandfather is very likely to had been particularly good in fox hunting, your grandad probably performed much better than you in fly fishing, and certainly a much wider proportion of population were capable in horse riding two centuries ago than it is today. Those skills had been taken away from us by technology and general obsolescence, but do we really need them today?

What we need though is to have a clear understanding of the consequences of sharing activities we got used to do with technology, and be prepared to observe a steep decline in the quality of our own respective hand skills as technology gradually takes them over. Understanding that problem alone and taking our imperfect human nature as it is will most certainly help us manage the risks around technological advances more efficiently.

(pictured: a prototype of the digital control room at NATS in Swanwick, credit: NATS)

Detective story. Almost a classic.

When we are away, our house is looked after by security cameras. Whenever a camera detects motion in its view, it captures a set of photos and sends them to a dedicated mailbox. This setup adds to the peace of mind about the safety of our house while we are away, and comes with a nice bonus of observing random shots of the cat wandering around the house.

Our last trip added a piece of an action to the scheme. On the second day’s morning I woke up only to find ~200 camera e-mails in my inbox (the cat’s portraits typically account for 5-8). “Gotcha!”, I rubbed my hands. But I was too quick. All the 200+ photos, apart from 2-3 that actually captured the cat, were quite boring and very similar to each other: an empty room and some blurred spots in the centre. And no sign of burglars.


And that was only a beginning. Hour after hour, camera e-mails continued to come in, one in a minute. Finally, I gave up and returned to doing my business as usual. This decision proved to be tactically correct, as every morning since I woke up to find yet another 200-300 new camera e-mails in my inbox. Every morning I opened 2-3 emails randomly, observed the empty room and the spots, and proceeded to my business. At the time, I didn’t pay attention to the fact that all those messages were only coming in when it was night time at the camera time zone, and this fact was of big significance.

I managed to get back to this avalanche of alerts well after I returned back home. My findings appeared to be quite amusing.

In one of the rooms monitored by a camera a flying insect had found itself a shelter. When the lights went low in the evening, the camera switched from daytime to infrared mode, which resulted in a dim reddish backlight being turned on. Apparently, the bug was attracted to this backlight, and began to flutter around the camera. The camera detected the bug’s motion, and in full accordance with its setup activated the shutter and dispatched the pictures where instructed. During the daylight the camera was turning to a simple piece of furniture, the insect was losing its interest in it, and the flow of e-mails was stopping for a while – to start the cycle over in the dusk.

But that’s not the end of the story. To send out the photos, the cameras use a dedicated e-mail address at my hosting account. To prevent this e-mail account from being used by spammers, the number of messages that can be sent through is capped with 300 per day. The bug was apparently in a darn good shape, as it was managing to consume the whole message allowance way before noon – after which the mail server stopped accepting further messages from the cameras until the start of the next day. This meant that should the hypothetical burglars have planned their dark affairs for the afternoon, they could have avoided the scrutiny of the cameras, and make it off without being noticed – and all due to some tiny bug in the system (*).

The moral of this fable is,

(1) no matter how good at risk assessment you are, there always will be an unaccounted bug whose fluttering will turn all your mitigations down to a joke;

(2) sometimes the measures you expect to protect you (I’m speaking about my outgoing e-mail limits) may turn against you;

(3) (the most important of all!) leave much less food for your cats than you normally do when you go away, so they have an incentive to hunt for any nonsense fluttering around your cameras!

(*) They actually couldn’t – you don’t think that some levitating invertebrate would just knock my whole security system down, do you?

Good News

One of English translations of Victor Hugo’s words On résiste à l’invasion des armées; on ne résiste pas à l’invasion des idées reads as No army can stop an idea whose time has come. In our case, the army is even going to help promote such an idea instead of resisting it.

Atlantic Council is set to host a discussion that was long awaited by me and a solid crowd of experts in information security, business continuity and cyber risk management. The Cyber Risk Wednesday: Software Liability discussion will take place on November the 30th in Washington, DC.

The discussion will be dedicated to a difficult question of increasing liability of software vendors for defects in their products, and the ways of trading it off with economic factors. Taking into account the extent to which software, in a variety of forms, infiltrates into the inmost aspects of our lives (such as a smart house running a hot tub for you), as well as the extent to which we trust software in managing our lives for us (letting it run driverless cars and smart traffic systems), the question of liability is vital – primarily, as a trigger for vendors for employing proper quality assurance and quality control processes. That’s why I wholly welcome the Atlantic Council’s initiative, and truly hope that it will help raise awareness of the problem and give a push to wide public discussion of the same.

On security perimeters

It is my humble point of view that the ‘security perimeter’ concept used widely by security professionals and men in the street provides more harm than good. There are many reasons as to why it does, but the main one is that the use of this concept gives a false sense of security.

If you ask an average person what a security perimeter is, they will probably tell you something like ‘it is a warm and cozy place where I can relax and have my cake while everyone outside is coping with the storm.’

The problem is that it is not entirely so. Contrary to popular belief, security risks don’t go away when you are inside the perimeter. Instead, they transform, they change their sources, targets and shapes, but they are still there, waiting for the right moment to strike. What is particularly bad is that those risks are often overlooked by security staff, who only concentrate on risks posed by hostile outside environment (the storm) – but not the ‘safe’ environment inside the perimeter (yet, an odd cherry bone in the cake that might cause the man to choke to death).

The chaos at JFK is a good (well, not for its participants) illustration of this point. For sure, the area of supposed shooting was viewed by security people as belonging to the security perimeter (and extremely well-protected one – I bet it’s nearly impossible to get to the area even with a fake, not to say a real gun). They probably believed that as long as the borders of the perimeter are protected up to eleven, they don’t need to care about anything happening inside it. Indeed, they might have done a great job about protecting the passengers from gunfire, but they overlooked an entirely different type of risk – which, happily, didn’t cause any casualties.

That’s why any security perimeter (in the meaning of ‘straightforward defence facility’) should be viewed not as a security perimeter, but rather as a transition point from one security setting to another. In no way the inner setting is more secure than the outer one – and sometimes it can even be more dangerous than the outer one (imagine there’s no one in to help the choked man deal with the bone). Thinking in this way will help to make a clearer picture of the variety of risks targeting every particular security setting, and come up with appropriate countermeasures.

Cloud services as a security risk assessment instrument

One of the hidden gems of cloud computing platforms (how many more of them are out there?) is the possibility of performing quite accurate quantitative assessment of risks to security systems.

The strength of a big share of information security measures rests on computational complexity of attacks on underlying security algorithms. To name a few, you need to factor some 2048 bit integer to crack an RSA key, you need to get through an average of 2^127 tries to recover an AES encryption key, you need to iterate over 20 million dictionary passwords to find the one matching the hash, and so on – I’m sure you’ve got the idea. All of these tasks require enormous amounts of time and computational resources, and the unavailability of those to the vast majority of potential attackers is the cornerstone of modern data security practices. This hasn’t changed much for last several decades – yet, something around it has.

In ye goode olde days, a security architect had to rely on some really vague recommendations when deciding which security parameters to employ in the system, which often sounded more like voodoo predictions rather than a well-defined formally justified methodology. These ones from NIST, for example, literally say, ‘if you want your data to be secure up until 2030, protect it with 128 bit AES’. Hmm, okay. And what are the chances of my data being cracked by 2025? 2035? What if the data I encrypt is really valuable – would it be worthwhile for the attacker to jump over their head and try to crack the key well before 2030? What is the price they’d have to pay to do that and what are the chances they’d succeed?

The rise of cloud computing platforms brought in a big deal of certainty onto the table. With the availability of commercial cloud platforms one can estimate the costs of breaking a computation-dependent security scheme unbelievably accurately. Back in 2012, the cost of breaking a scheme by a potential attacker could hardly be estimated. It was believed that the NSA probably has the power to break 1024 bit RSA, and a hacker group big enough could probably break SHA-1 with little effort. Probably.

Everything is different today. Knowing the durability of the security system they need to deploy or maintain, and being aware of the computational effort needed to break it, a security architect can estimate the ceiling of the price the attacker needs to pay to conduct a successful attack on the system – in dollars and cents.

To obtain that estimation the security architect would create a scalable cloud application that emulates the attack – e.g. by iterating over those 20 million passwords in distributed manner. Afterwards, they would work closely with the cloud service provider to figure out the price of running that application in the cloud, which will be a function of the system’s security parameters and the amount of time needed to conduct the attack. Having built the price function, they would be able to make a justified and informed decision about the security parameters to employ, by balancing the attack duration and cost with any benefits the attacker would get from a successful attack. This is a huge step forward in the field of security risk assessment, as it allows to describe the strengths and weaknesses of the security system in well-defined ‘I know’ terms rather than ‘I feel’, and view the system from a business-friendly ‘profit and loss’ perspective as opposed to enigmatic ‘vulnerabilities and their exploitation’.

It is worth mentioning that a good security architect would then monitor any changes around the cost of breaking the system, including changes in the cloud service providers’ SLAs and price schedules, and be prepared to make any necessary amendments to the risk figures and the security plan. With computation prices going down all the time, reviewing the risks periodically is vital to guarantee the continuous security of the system.