Archive

Monthly Archives: December 2011

Several years ago I worked at Microsoft and during that time I was involved in interviewing several hundred candidates. For those not familiar with the MS interview process (it may have changed) it is broken up into a number of one hour interviews where each interviewer is probing the candidate for specific qualities (not skills – qualities; a person can be taught C# but they can’t easily be taught to work well on a team). My role in the interview was typically to gauge how effective they were at actually engineering a solution – and to be specific here, the focus was not on their proficiency with a particular language; I actually asked for pseudo-code because I didn’t want the candidate caught up in worrying about syntax problems (the IDE/compiler does that for you anyway). What I was looking for was how the candidate made sure they understood the problem, and then what thought they put into designing a solution.

The problem I gave candidates to solve was the following:

  • Given a string of arbitrary characters, find every sequence that was a palindrome (a sequence of characters that was the same regardless of being read left->right or right->left)
  • Do NOT mark nested palindromes (i.e. one palindrome contained within another). for example aeraweabbadfrar should mark abba and rar, but NOT bb (nested within abba)
  • overlapping, but not nested palindromes, should be identified. for example aeraweabbabdf should mark both abba and bab

Within 5 seconds of giving the problem I knew immediately when certain people were going to fail, because they picked up a marker and immediately started writing pseudo code on my whiteboard. They were not going to arrive at anything resembling a working algorithm. For starters, what I provided was an incomplete spec definition. Without getting into implementation specifics (like if this is C/C++ is the string null terminated or is a length param passed), here are some questions that should be asked:

  • Should upper and lower case characters be considered equivalent or different?
  • Are we talking only about basic Latin alphas? Do numbers count? Do extended Latin characters? Is ô equivalent to ö or are they different? Hell, are we even restricted to Latin chars?
  • How should the palindromes be marked? I.e. print to screen, returned in array, written to file, etc?

The point of leaving out some details was to see if the person would just make assumptions, or if they would press me for a full problem definition. Now there is all sorts of psychology at play during an interview that could cause a person to exhibit behavior not indicative of their reaction under normal work conditions, so this alone isn’t a make or break. However asking questions to fully spec out a problem definition was a characteristic behavior of a person who was first really thinking about the problem. The other characteristic behavior was writing out a whole lot of sample inputs, covering a variety of many different input cases, to understand what sort of rules they needed to include in their logic BEFORE they started writing code. Potential variants in input included:

  • even vs. odd length string
  • even vs. odd length palindromes within string
  • no palindromes
  • one mega palindrome
  • many separated palindromes (i.e. palindromes with non-palindrome characters separating them in the string)
  • nested palindromes
  • single and multi overlapping palindromes (e.g. cattacat has both cattac and tacat – so only a single overlap. cattacata has cattac, tacat, and ata overlapping in a series)

And so forth. If a person did not at the very least spend a good deal of time brainstorming variants of input before they even started writing a solution, they were guaranteed to arrive at a solution that failed with at least some of the above variants. In my entire time in Redmond, interviewing hundreds of candidates, exactly 3 arrived at a working, pretty much perfect algorithm. A couple more created mostly working algorithms that maybe missed one or two specific variants, or had some simple off by one errors when traversing a string that a couple of minutes at an actual debugger would have found pretty easily – I didn’t hold that against people. The vast, vast majority produced crap, and by in large they were the ones that started trying to white board code immediately.

I don’t know who to attribute the quote to, but I’ve heard it in a number of places: “Crappy programmers spend 90% of their time writing code, 10% of their time thinking; Great programmers do the exact opposite” but it pretty accurately captures the few who did well in the interview and the multitude that did poorly. I also have a different manner of thinking about the same situation – making a distinction between Software Engineering and Software Writing. People who write software largely go about it the same way I go about this blog post – they have some general notion they want to convey, and start typing until they manage to convey it. At the end it isn’t going to be particularly well structured, and could benefit from a whole lot of editing. People who engineer software go about it completely differently – in the extreme, the go about as the folks writing shuttle software go about it. Engineering is a whole lot of time spent fully understanding the problem the code tries to solve, understanding all of the conditions the code will be used in, creating blueprints for the code, documenting everything so that QA and other devs can understand things easily, and only a very little amount of time actually writing code. Engineering is a whole lot of process, and it’s a fair bit of unfun work. That’s why coders by and large prefer to just sit down and write code, and really hate when companies try and dump a bunch of process on them.

A good engineering process pays for itself though. For starters, it’s pointless to write code if it is to solve a problem or serve a function that isn’t well understood. Money is just being thrown away in that scenario (for example, in a past job I shot down a proposal by a dev team to expose error logs to end users to help debug issues when engaging support. For any number of reasons, including security, that isn’t a freaking solution, but more specifically the two real problems that needed to be solved were that enough customers were seeing issues with the product that this feature was proposed in the first place, and that the code wasn’t well instrumented enough to provide support with the error details to address the issue. Giving the customer server error logs doesn’t solve either problem but it does create new ones. I really wish I had the power to fire folks on the spot when they propose stuff like that). On top of that though, ad hoc code tends to be less reusable, more error prone, of lower general quality, less secure, harder to test, offering lower performance, and with a shorter shelf life (because of the proceeding points). You may get initial code faster if the dev just starts writing, but you also get your dinner quicker if the chef doesn’t spend time preparing it. In either case, just because it is faster doesn’t mean it is worth paying for, especially as the speed is an illusion. In the end all of the drawbacks to ad hoc code – the lower quality, increased number of errors, etc. is going to cause the dev to spend much more time revisiting that code rather than working on other things. Test passes are going to take longer. The customer is going to be less happy with the finished product, so the next version is going to revisit things from the past version instead of forging new ground.

So my plea to companies is to really focus on building a great engineering process and then enforcing it. You will produce products faster, your customers will like them more, and for folks like me it makes it a whole lot easier to bake security into development (and thus make your products more secure). It’s a really tall order to get devs to engineer with security in mind when they follow no other engineering process. Its damn easy when devs already have coding standards they follow that just need to be updated for security, when the design process is already very analytical and just needs a slight nudge to include security specific concerns, when QA is already given ample documentation to really test something and can be shown how to include a handful of security tests as well, and so forth. For companies that want to produce secure software the first question they should ask is whether or not they have a culture of software engineering or software writing. If it’s the latter, they have a lot more work ahead of them.

~ Joshbw

The W3C published a draft for a Content Security Policy standard, which I think is generally good news. Hopefully a full standard will get all of the browser vendors on board to finally support it (come on MS – you have otherwise tried to be at the front of the pack for security features; you created HTTP-Only, implemented a reflected xss filter, etc. Why so slow on CSP).

That said, I question how much meaningful impact this will really have, as it is an opt in security feature with fairly high implementation overhead for a lot of sites. It would have been very unpopular, but I think the right thing for the W3C to have done would have been to make same origin CSP the default for all sites declared as an HTML 5 document (<doctype html>), and require the decleration to use HTML 5 specific features. Cross origin scripts, iframe content, etc could still be explicitely enabled via header directive as specified by CSP now, but essentially the site would have to opt out of the most secure configuration rather than opt in as they do now. The incentive for devs to actually code with CSP in mind would be access to HTML 5 features – want to have access to local storage, file system API, video, etc.? Then seperate your script from your markup. (I think the same opt-out approach should be taken with x-frame-options. By default you aren’t frame-able unless you specifically attest that you are).

As mentioned, this would be very unpopular if implemented, at least initially. By five years later it would just be the way things are done, nobody would care, and the change would have done more to blunt XSS than all of the OWASP awareness, dev training, and encoding libraries combined. Buffer overflows haven’t declined because people are better at writing C/C++ – they have declined because more code is written in languages that don’t enable buffer overlows by design (though C# does have the lovely “unsafe” keyword for people hellbent on pointing a gun at their feet). If we want to see the same change for web vulnerabilities we have to similarly change the actual technologies in use to ones that make it difficult for devs to introduce vulnerabilities – starting by mandating rather than suggesting a seperation of markup and script is a step in the right direction.

~ Joshbw