Cloud services as a security risk assessment instrument

One of the hidden gems of cloud computing platforms (how many more of them are out there?) is the possibility of performing quite accurate quantitative assessment of risks to security systems.

The strength of a big share of information security measures rests on computational complexity of attacks on underlying security algorithms. To name a few, you need to factor some 2048 bit integer to crack an RSA key, you need to get through an average of 2^127 tries to recover an AES encryption key, you need to iterate over 20 million dictionary passwords to find the one matching the hash, and so on – I’m sure you’ve got the idea. All of these tasks require enormous amounts of time and computational resources, and the unavailability of those to the vast majority of potential attackers is the cornerstone of modern data security practices. This hasn’t changed much for last several decades – yet, something around it has.

In ye goode olde days, a security architect had to rely on some really vague recommendations when deciding which security parameters to employ in the system, which often sounded more like voodoo predictions rather than a well-defined formally justified methodology. These ones from NIST, for example, literally say, ‘if you want your data to be secure up until 2030, protect it with 128 bit AES’. Hmm, okay. And what are the chances of my data being cracked by 2025? 2035? What if the data I encrypt is really valuable – would it be worthwhile for the attacker to jump over their head and try to crack the key well before 2030? What is the price they’d have to pay to do that and what are the chances they’d succeed?

The rise of cloud computing platforms brought in a big deal of certainty onto the table. With the availability of commercial cloud platforms one can estimate the costs of breaking a computation-dependent security scheme unbelievably accurately. Back in 2012, the cost of breaking a scheme by a potential attacker could hardly be estimated. It was believed that the NSA probably has the power to break 1024 bit RSA, and a hacker group big enough could probably break SHA-1 with little effort. Probably.

Everything is different today. Knowing the durability of the security system they need to deploy or maintain, and being aware of the computational effort needed to break it, a security architect can estimate the ceiling of the price the attacker needs to pay to conduct a successful attack on the system – in dollars and cents.

To obtain that estimation the security architect would create a scalable cloud application that emulates the attack – e.g. by iterating over those 20 million passwords in distributed manner. Afterwards, they would work closely with the cloud service provider to figure out the price of running that application in the cloud, which will be a function of the system’s security parameters and the amount of time needed to conduct the attack. Having built the price function, they would be able to make a justified and informed decision about the security parameters to employ, by balancing the attack duration and cost with any benefits the attacker would get from a successful attack. This is a huge step forward in the field of security risk assessment, as it allows to describe the strengths and weaknesses of the security system in well-defined ‘I know’ terms rather than ‘I feel’, and view the system from a business-friendly ‘profit and loss’ perspective as opposed to enigmatic ‘vulnerabilities and their exploitation’.

It is worth mentioning that a good security architect would then monitor any changes around the cost of breaking the system, including changes in the cloud service providers’ SLAs and price schedules, and be prepared to make any necessary amendments to the risk figures and the security plan. With computation prices going down all the time, reviewing the risks periodically is vital to guarantee the continuous security of the system.

Why your mobile phone is NOT a second authentication factor

Mobile phones are often employed as the second factor in various two-factor authentication schemes. A widely used authentication scenario involves a text, call or other kind of notification sent to your mobile phone by the service you are accessing, and authenticating you by your ability to confirm its contents. The problem here is that despite claiming their support for two-factor authentication, a lot of Internet services actually design or set it up improperly, ending up with providing not the security, but a false sense of it.

Let’s recall what two-factor authentication (2FA) is. In contrast to traditional single-secret authentication schemes, such as password-based, with 2FA you combine two different pieces of evidence to prove your identity, so that an attacker gaining access to one of the evidences couldn’t take your identity over without having access to the other. This is supposed to significantly reduce the risk of your account being hacked, as the attacker now needs two different pieces of evidence (such as your password and your fingerprint) to gain access to your account.

A lot of people are confused by the terminology 2FA evangelists use to explain the nature of the scheme. They often classify the authentication compounds into something you have, something you know and something you are, and demand that the two pieces of evidence you present to authenticate yourself must fall into two different categories. This classification is not entirely correct and slightly mixes the things up. Strictly speaking, under certain conditions you can successfully use two something you know‘s as two authentication factors; conversely, simply the presence of something you have and something you know together doesn’t guarantee the security of the overall scheme.

A much more important (and correct) requirement for choosing the two authentication factors is their independence of each other. Neither of the factors, when cracked by the attacker, should give her a tiny bit of information about the second factor. If a 2FA scheme manages to satisfy this condition, it can be good (subject to the implementation details and exact authentication methods used), but if it doesn’t – it’s definitely not.

A very common problem with using 2FA on a mobile phone (in-app or in-browser) is that the two factors chosen by the services are not entirely independent from each other. A typical phone usage scenario involves an e-mail app which is always open and authenticated; a number of social network apps with the user signed in; an Internet browser with a bunch of opened sessions. In most cases, access to the e-mail app alone will be enough to gain access to any services you use that are bound to your e-mail address. If your phone gets stolen, the services which are set up to use your mobile phone number as a second authentication factor, as well as a ‘recovery’ password reset point, when requested, will text their one-time access codes… correct, straight into the hands of the thief.

This way, typical something you have and something you know factors, when used exclusively on one device, blur into one big something you have. Any ‘second’ factor employed by a mobile app or service, unless it works via a communication channel totally external to your mobile phone, just extends the first factor and doesn’t add up to the overall account security.

Notwithstanding the above, your phone still can be used as a proper 2FA factor. The main idea here is that you should not be able to authenticate yourself solely with your phone, whatever services and their combinations your phone offers you would use. There must be some other, external and independent factor involved. A variety of options are available here, from using your desktop computer or a different mobile phone for providing the second factor, up to sending in your fingerprints or retina sample. If the authentication can be performed with the sole use of your phone, it is never a 2FA.

Access wrongs

I was given this receipt at one of the shops in Gloucester. The owner of this shop puts her network at risk by conducting all her business activities, however small, from under the superuser account. It may only take a dozen of seconds for an intermediate level opportunist hacker to set up a backdoor while their friend is distracting the cashier – and then hack into the shop computer and arrange a whopping 99%-off sale for them and their friends.

ReceiptEd
Negligence towards proper access rights management is one of the most common causes for security breaches. While other data protection instruments, such as encryption, backup routine, or firewalls, are ‘touchable’ in some sense, access control is ‘imperceptible’, and requires a lot of care to be set up properly. At the same time, improperly configured access control may easily jeopardize all other security instruments, as it would let the attacker walk in through the front door instead of squeezing through the window leaf or tearing down a (however protected) side wall.

The need-to-know principle, or the principle of least privilege, remains to be the most effective rule for setting up access rights correctly. To put it short, only give users access to information and functions they need to perform their duties – and forbid all the rest.

My pet, my flaw

I’ve just found out that my cat’s vet surgery inserts their patients’ names to the transaction identifiers when taking direct debit payments. For example, here’s the whole direct debit transaction ID as shown on my bank statement:

DE06287259MYCROFT

where ‘MYCROFT’ would be my cat’s name.

Taking into account that many, many, many banks use their customer pets’ names as one of security questions, this might pose a real risk. A dishonest bank worker, – or, well, anyone who could intercept the statements en route to your house, – might use that information to gain access to your bank account.

Moreover, the transaction info also says that the payment was taken on behalf of a nationwide vet alliance and not my petite village surgery. This makes me think that a large number of other small vet surgeries over the country use the same umbrella company to take direct debit payments on their behalf, putting the assets of their customers at risk.

Once upon a time on the twenty ninth

I was thinking for quite some time which topic to start my blog with, but the topic has suddenly come up by itself.

Yesterday we came across a sudden issue in our component library. Due to a three-year-old typo in a low-level piece of code the library appeared to be, so to say, not entirely leap-year-friendly. Once in four years, on 29th of February, the typo came into effect by altering the behaviour of the containing function. The function started producing wrong results on 00:00 February 29, and was doing so until 23:59:59, returning back to normal with the first second of March (all times UTC). The most unpleasant part about it was that the typo propagated up to a higher level piece of the API, blocking up a good share of the product functionality. As a result, our day started with a manifestation of angry (totally understandable) customers at all our support channels, sharing their dissatisfaction and demanding a solution.

To make a long story short, that was followed by a fairly busy day and most of the night. Thanks to the selfless efforts of our team, we’ve managed to employ the emergency procedures and come up with a temporary and then permanent solution for our customers. Now that the customers can relax and sleep well, we can take our breath and make some initial conclusions.

The first conclusion is that however unlikely an issue is, is still can happen. Our yesterday’s issue was caused by a combination of different factors. The typo shouldn’t have been there. Even if it was, it should’ve been caught by the QA routine. Even if it wasn’t caught by QA, there was a fuse that was supposed to prevent the error from affecting any higher level components. The fuse, alas, didn’t work either.

This was topped up by the absence of our primary build person from the company premises due to their day off, and by the fact that the 29th of February had fallen on Monday this year. Should it have fallen on Tuesday or any other weekday, we’d have discovered the problem much earlier, as our US peopleĀ  would have still been at work when the problem started exposing itself.

Therefore, be prepared. Prepare an emergency plan and check and update it regularly. Be prepared to the bad. Be prepared to the worst you can imagine – and to even worse than that. Don’t expect bad and good things to trade off at some ‘average failure’ – assume, all the worst things will happen at once.

Second, create backup functions. By concentrating a particular business function in hands of one person or department, you are taking on a huge risk of losing that function in case if that person or department becomes unavailable. There is no need to imagine disastrous pictures of a PM ran over by a bus or a department catching fire – a broken car, a poorly child, an Internet cable accidentally cut off by a gardener, or something as simple as the responsible person’s day off, as it was in our case, will be quite enough to lose vital time. As we encourage the members of our team to share their knowledge and skills with each other (I believe encourage isn’t the right word here – here at EldoS we all are passionate about sharing our knowledge and learning about new things, so basically all we do is not getting in the way), we’ve managed to find a competent replacement for the build person quickly, and launch the build process once the broken functionality was fixed.

If there is no way to backup a particular function, try to create a contingency plan, which would offer a temporary solution until the function is restored.

Aim for a capability of the organisation to perform most of its critical functions even under severe shortfall of available personnel. You never know when a problem happens and which of the functions will be unavailable.

Third, communicate. There is nothing worse than uncertainty for a customer facing a problem with your product. Tell your customers everything you know about the problem, in as much detail as possible. Let them know about any estimated time scales for the fix/solution to be available. Tell them what kind of consequences to expect. Don’t try to hide anything, as it will most likely become evident anyway, and you will lose your customers’ trust.

Create a prioritized backlog of customers affected by the issue, basing on the scale, criticality and urgency levels of the problem for them. Handle those in critical situation individually. Think if you can create a bespoke solution for them quicker. Sometimes, a dumb and cumbersome workaround – like an advise to move their computer clock a day ahead in our case – might show a viable temporary solution for some of your customers until the proper update is prepared and deployed.

Fourth, don’t stop once all your customers are up and running again. Treat every fault as an opportunity for reviewing and improving your processes and procedures. Not only search for similar issues and fix them; ask yourself, are there any flaws in the way you create your product that could have triggered the issue? Are you and your customers totally happy with the response times? Are they happy with the form in which the fix was provided? Is there anything you can do to prevent anything similar from happening in the future, to decrease the scale of the impact, or to speed up the delivery of the fix?

Bad things do happen, and often, despite directing our constant efforts at preventing them, we can’t really do anything about that. However, once a bad thing has happened, the best and most reasonable we can do about it (apart from dealing with the consequences, of course) is to learn from it, and to use our new experience for improving our processes – ending up with much better product and customer experience.