Wednesday, 2 November 2016

EXE: 10 years later

I've recently returned from Vienna where our CCS 2006 paper on EXE, co-authored with Vijay, Peter, David and Dawson, has received the ACM CCS Test of Time Award, which "recognizes works that had significant impact on the field, as seen from a ten-year perspective."

Besides contemplating the fact that I'm old enough now to have papers that are a decade old, I was also reflecting on the various decisions taken while building EXE, as I see them ten years later.

The CCS'06 paper on EXE was one of the first papers on modern symbolic execution—"modern" in the sense that symbolic execution was originally introduced in the 70s, but at a time and in a form that was not ready for primetime.  EXE was the direct successor of EGT, a system in which we introduced the main ingredients of dynamic symbolic execution (DSE), but in a simplified form (and which was developed concurrently with the DART system, which introduced a different flavour of DSE). EXE is also the direct predecessor of KLEE, a better-known system (as unlike EXE was open-sourced and then used and extended by many research and industry groups), which inherits many of EXE's design decisions, but drops many others.

EXE benefitted from the ongoing revolution in SAT solving, which is for sure one of DSE"s main enablers.  But it also introduced a second key ingredient, mixed concrete-symbolic execution (also introduced by the DART paper mentioned above, but in a different form), a really simple idea, but with a significant impact on generality and scalability.  Mixed concrete-symbolic execution simply means that if all the operands of an instruction are concrete, the instruction is executed as in the original program (and in the case of EXE, essentially in its unmodified form). This makes it easy to interact with uninstrumented code, such as closed libraries and the OS, handle difficult cases such as floating-point arithmetic by continuing execution with partially concrete inputs, and run symbolically only the code that depends on the symbolic inputs, without having to extract it explicitly from a larger system (which is what enabled us to construct malicious disk images with EXE in a scalable fashion, while instrumenting an entire user-mode version of Linux).

The other core strength of EXE was its co-designed constraint solver STP.  At the time STP was developed, there was nothing resembling it.  Our experience with a previous state-of-the-art solver in EGT had been disappointing, both in terms of the performance and accuracy required in a DSE context, but also in terms of generic software reliability.  While nowadays there are many other competing solvers, some of which are much more capable than STP in terms of the theories they can handle, STP still seems to outperform most of them for standard DSE usage.  In other words, STP still stands to its original design: support the minimum needed by symbolic execution, but efficiently. To achieve this, David and Vijay recognized from the beginning the advantage of eager translation to SAT, which allowed STP to exploit the recent advances in SAT solving technology.  But perhaps the even more important reason for which STP made such a big impact in EXE is a reflection of the way in which it was developed: in our regular meetings, Dawson, Peter and I would try to understand how STP operates, while Vijay and David would try to understand the constraint queries representing the latest bottleneck in EXE (or as we put it in the paper, our approach was to "start simple, measure bottlenecks on real workloads and tune to exactly these cases").  As a result, I think our collaboration worked extremely well, with very little wasted time on both sides, something that I've tried to emulate in subsequent collaborations.

While STP was instrumental in EXE's scalability on the constraint solving side, we also recognized the need for targeted constraint solving optimizations at the level of the symbolic execution itself. Symbolic execution uses constraint solving in a widely different way from other program analysis techniques such as bounded-model checking or static program verification: it issues a huge amount of queries, but which can be easily decomposed in independent sub-queries, and which share many similarities among themselves.  That's why two such targeted optimizations introduced by EXE, constraint solving independence and constraint caching, have survived in KLEE, with the former at least being present in pretty much all other DSE systems.

There are other aspects of EXE which made it into its successor KLEE, such as having search heuristics as a first-order concept, expressing all constraints in terms of the initial symbolic input bytes, having models that are written (and executed) as actual code, and the simple user interface and output format.  But there are also several aspects that didn't make it into KLEE.  

One example is the pointer tracking performed by EXE, based on techniques used by Safe C compilers, which allowed it to quickly determine the referent object during a pointer dereference. While in theory this is the right thing to do, the many corner cases introduced while interacting with uninstrumented code led us to adopt a much simpler but less efficient solution in KLEE, in which we scan the entire list of memory blocks to find referent objects. Perhaps a hybrid solution here would work best, in which pointer tracking would only be adopted for the unproblematic cases. 

Another example is the static instrumentation done by EXE, as opposed to KLEE's interpreter-like dynamic instrumentation. Dynamic instrumentation has many advantages, but there are tradeoffs involved.  For instance, the source-to-source transformation made some debugging tasks easier and it also made it possible to run native code at full speed (by comparison, consider that a large concrete memcpy can easily become a performance issue in KLEE). This being said, EXE's static instrumentation was tightly coupled with its forking mechanism based on UNIX's fork, which was more problematic.  The fork-based design did have some neat advantages, such as free copy-on-write sharing, and the straightforward ability to explore multiple paths concurrently.  But it also prevented other optimizations, such as a finer-grained sharing of data across paths.  The main problem though was the pain involved in debugging a complex multi-process application, despite the utilities we developed to help with this.  

The final lesson, I think, is one of persistence.  While I consider EXE one of the most beautiful papers I've written so far, its impact would not have been the same if we just stopped working on this topic after CCS 2006 (and it is very tempting to abandon existing lines of research, as publishing "follow-up" research, regardless of its value, is usually harder).  Despite some tough times initially, I am really glad we didn't.