Additional Publications

Abstract   Hierarchical decomposition is a fundamental principle that encourages the organization of program elements into nested scopes of access, instead of treating all as ``global.'' This paper offers a foundational study of heap decomposition inference, the problem of statically extracting the decomposition hierarchy latent in the runtimes of object-oriented programs, henceforth revealing the compositional nature of the heap. The centerpiece of the paper is Cypress, a sound, precise, and scalable constraint-based ownership type inference coupled with a novel application of linear programming over integers. All constraints in Cypress are linear, and the precision of decomposition -- placing objects to scopes as non-global as possible -- can be reduced to a linear programming problem. Cypress has been implemented as an open-source tool that can decompose real-world Java applications of more than 100K LOC and up to 6000 statically distinct instantiations.
Abstract   This paper introduces Green Streams, a novel solution to address a critical but often overlooked property of data-intensive software: energy efficiency. Green Streams is built around two key insights into data-intensive software. First, energy consumption of data-intensive software is strongly correlated to data volume and data processing, both of which are naturally abstracted in the stream programming paradigm; Second, energy efficiency can be improved if the data processing components of a stream program coordinate in a "balanced" way, much like an assembly line that runs most efficiently when participating workers coordinate their pace. Green Streams adopts a standard stream programming model, and applies Dynamic Voltage and Frequency Scaling (DVFS) to coordinate the pace of data processing among components, ultimately achieving energy efficiency without degrading performance in a parallel processing environment. At the core of Green Streams is a novel constraint-based inference to abstract the intrinsic relationships of data flow rates inside a stream program, which uses linear programming to minimize the frequencies -- hence the energy consumption -- for processing components while still maintaining the maximum output data flow rate. The core algorithm of Green Streams is formalized, and its optimality is established. The effectiveness of Green Streams is evaluated on top of the StreamIt framework, and preliminary results show the approach can save CPU energy by an average of 28% with a 7% performance improvement.
Abstract   Atomicity enforcement in a multi-threaded application can be critical to the application’s safety. In this paper, we take the challenge of enforcing atomicity in a multilingual application, which is developed in multiple programming languages. Specifically, we describe the design and implementation of JATO, which enforces the atomicity of a native method when a Java application invokes the native method through the Java Native Interface (JNI). JATO relies on a constraint-based system, which generates constraints from both Java and native code based on how Java objects are accessed by threads. Constraints are then solved to infer a set of Java objects that need to be locked in native methods to enforce the atomicity of the native method invocation. We also propose a number of optimizations that soundly improve the performance. Evaluation through JATO’s prototype implementation demonstrates it enforces native-method atomicity with reasonable run-time overhead.
  • Michael Cohen, Haitao Steve Zhu, Senem Ezgi Emgin, Yu David Liu, "Energy Types," OOPSLA, 2012.
Abstract   This paper presents a novel type system to promote and facilitate energy-aware programming. Energy Types is built upon a key insight into today’s energy-efficient systems and applications: despite the popular perception that energy and power can only be described in joules and watts, real-world energy management is often based on discrete phases and modes, which in turn can be reasoned about by type systems very effectively. A phase characterizes a distinct pattern of program workload, and a mode represents an energy state the program is expected to execute in. This paper describes a programming model where phases and modes can be intuitively specified by programmers or inferred by the compiler as type information. It demonstrates how a type-based approach to reasoning about phases and modes can help promote energy efficiency. The soundness of our type system and the invariants related to inter-phase and inter-mode interactions are rigorously proved. Energy Types is implemented as the core of a prototyped object-oriented language ET for smartphone programming. Preliminary studies show ET can lead to significant energy savings for Android Apps.
Abstract   In multithreaded object-oriented programs, locality refers to the dynamic scoping relation among threads and objects in a potentially shared-memory context. This paper calls for type-theoretic investigations into two forms of locality -- thread locality and aggregate locality -- and proposes a unified type inference algorithm to reason about them over unannotated real-world Java programs.
Abstract   This paper addresses energy consumption in multi-threaded programs. In particular, it demonstrates why synchronizations -- a fundamental fabric of multi-core software -- may lead to unnecessary energy consumption, and proposes a pattern-based compilation technique to improve energy efficiency. The key insight is that energy efficiency may be improved by adjusting the relative speed of individual threads participating in a synchronization, and different synchronization patterns can offer clues on how adjustments should be made.
Abstract   This paper describes an operational semantics for futures, with the primary target on energy efficiency. The work in progress is built around an insight that different threads can coordinate by running at different "paces," so that the time for synchronization and the resulting wasteful energy consumption can be reduced. We exploit several inherent characteristics of futures to determine how the paces of involving threads can be coordinated. The semantics is inspired by recent advances in computer architectures, where the frequencies of CPU cores can be adjusted dynamically. The work is a first-step toward a direction where variant frequencies are directly modeled as an essential semantic feature in concurrent programming languages.
Abstract   This paper proposes an energy-efficient approach for programming languages that support work stealing. The key insight is that thieves and victims in the work stealing algorithm can coordinate their execution paces for more energy efficiency, through dynamic adjustment of CPU frequencies.
Abstract   Staging is a powerful language construct that allows a program at one stage of evaluation to manipulate and specialize a program to be executed at a later stage. We propose a new staged language calculus, <ML>, which extends the programmability of staged languages in two directions. First, <ML> supports dynamic type specialization: types can be dynamically constructed, abstracted, and passed as arguments, while preserving decidable typechecking via a System F-sub style semantics combined with a restricted form of Lambda-omega-style runtime type construction. With dynamic type specialization the data structure layout of a program can be optimized via staging. Second, <ML> works in a context where different stages of computation are executed in different process spaces, a property we term staged process separation. Programs at different stages can directly communicate program data in <ML> via a built-in serialization discipline. The language <ML> is endowed with a metatheory including type preservation, type safety, and decidability as demonstrated constructively by a sound type checking algorithm. While our language design is general, we are particularly interested in future applications of staging in resource-constrained and embedded systems: these systems have limited space for code and data, as well as limited CPU time, and specializing code for the particular deployment at hand can improve efficiency in all of these dimensions. The combination of dynamic type specialization and staging across processes greatly increases the utility of staged programming in these domains. We illustrate this via wireless sensor network programming examples.
Abstract   Cyber-physical systems are a coordinated combination of computational and physical elements. This position paper calls for the design of a unified object model that blends the boundary of the two. The unified object model has the benefits of bringing classic software engineering technologies and tools -- such as UML -- to the new application domain of cyber-physical systems, and further equipping programs written for these systems with the traditional strengths of object-oriented languages, such as encapsulation, code reuse and customization, and strong guarantees for avoiding run-time errors.
  • Aditya Kulkarni, Yu David Liu, Scott Smith, "Task Types for Pervasive Atomicity," Proceedings of the 25th ACM Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA 2010). Reno, Nevada, USA, October 2010.
Abstract   Atomic regions are an important concept in correct concurrent programming: since atomic regions can be viewed as having executed in a single step, atomicity greatly reduces the number of possible interleavings the programmer needs to consider. This paper describes a method for building atomicity into a programming language in an organic fashion. We take the view that atomicity holds for whole threads by default, and a division into smaller atomic regions occurs only at points where an explicit need for sharing is needed and declared. A corollary of this view is every line of code is part of some atomic region.We define a polymorphic type system, Task Types, to enforce most of the desired atomicity properties statically.We show the reasonableness of our type system by proving that type soundness, isolation invariance, and atomicity enforcement properties hold at run time. We also present initial results of a Task Types implementation built on Java.
Abstract   Today's cloud computing platforms are typically "opaque": Amazon EC2 users only receive virtual units of CPU and memory, and physical details of the platform are hidden. Such opacity prevents programs from online optimizations and deployment adjustment, and is penalizing the very applications cloud computing attempts to attract: high-performance software. On the other extreme, a completely transparent design of clouds would lead to severe security and reliability concerns. In this position paper, we take the middle-of-the-road approach, proposing a language model for well-defined programmable interactions between the cloud platform and the client program. Our proposed ideas draw from the previous work of Classages, where first-class interactions can be dynamically established and terminated for two parties with mutually satisfiable contracts. In this paper, we justify how our design can help cloud programs to fine-tune performance, and how it may impact on other important issues of cloud computing, such as security, scheduling, and pricing.
Abstract   Staging is a powerful language construct that allows a program at one stage to manipulate and specialize a program at the next. We propose <ML> as a new staged calculus designed with novel features for staged programming in modern computing platforms such as embedded systems. A distinguishing feature of <ML> is a model of process separation, whereby different stages of computation are executed in different process spaces. Our language also supports dynamic type specialization via type abstraction, dynamic type construction, and a limited form of type dependence. <ML> is endowed with a largely standard metatheory, including type preservation and type safety results. We discuss the utility of our language via code examples from the domain of wireless sensor network programming.
  • Yu David Liu, Scott Smith. Pedigree Types. In Proceedings of the 2008 International Workshop on Aliasing, Confinement and Ownership in Object-Oriented Programming (IWACO). Paphos, Cyprus. July 2008.
Abstract   Pedigree Types are an intuitive ownership type system requiring minimal programmer annotations. Reusing the vocabulary of human genealogy, Pedigree Types programmers can qualify any object reference with a pedigree -- a child, sibling, parent, grandparent, etc -- to indicate what relationship the object being referred to has with the referant on the standard ownership tree, following the owners-as-dominators convention. Such a qualifier serves as a heap shape constraint that must hold at run time and is enforced statically. Pedigree child captures the intention of encapsulation, i.e. ownership: the modified object reference is ensured not to escape the boundary of its parent. Among existing ownership type systems, Pedigree Types are closest to Universe Types. The former can be viewed as extending the latter with a more general form of pedigree modifiers, so that the relationship between any pair of objects on the aforementioned ownership tree can be named and -- more importantly -- inferred. We use a constraint-based type system which is proved sound via subject reduction. Other technical originalities include a polymorphic treatment of pedigrees not explicitly specified by programmers, and use of linear diophantine equations in type constraints to enforce the hierarchy.
Abstract   This paper introduces a new language model, Coqa, for deeply embedding concurrent programming into objects. Every program written in our language has built-in the desirable behaviors of atomicity, mutual exclusion, and race freedom. Such a design inverts the default mode of e.g. Java, where programmers may write programs highly vulnerable to surprising runtime errors if they do not take advantage of synchronized blocks and other advanced concurrency features. Coqa takes advantage of basic object-oriented notions to express important concurrent programming primitives, resulting in a powerful but concise model that includes thread creation, shared object access and open nesting. A key property of our model is the notion of quantized atomicity: all concurrent program executions can be viewed as being divided into quantum regions of atomic execution, greatly reducing the number of interleavings to consider. The language design draws from different schools of concurrent language design, especially Actors, and constructs a unified design close to mainstream object-oriented programming practice. We justify our approach both from a theoretical basis by showing that a formal representation, CoqaKernel, has provable quantized atomicity properties, and by implementing CoqaJava, a Java extension incorporating all of the Coqa features.
Abstract   Software deployment is a complex process, and industrial-strength frameworks such as .NET, Java, and CORBA all provide explicit support for component deployment. However, these frameworks are not built around fundamental principles as much as they are engineering efforts closely tied to particulars of the respective systems. Here we aim to elucidate the fundamental principles of software deployment, in a platform-independent manner. Issues that need to be addressed include deployment unit design, when, where and how to wire components together, versioning, version dependencies, and hot-deployment of components. We define the application buildbox as the place where software is developed and deployed, and define a formal Labeled Transition System (LTS) on the buildbox with transitions for deployment operations that include build, install, ship, and update. We establish formal properties of the LTS, including the fact that if a component is shipped with a certain version dependency, then at run time that dependency must be satisfied with a compatible version. Our treatment of deployment is both platform- and vendor-independent, and we show how it models the core mechanisms of the industrial-strength deployment frameworks.
Abstract   This paper presents Classages, a novel interaction-centric object-oriented language. Classes and objects in Classages are fully encapsulated, with explicit interfaces for all interactions they might be involved in. The design of Classages touches upon a wide range of language design topics, including encapsulation, object relationship representation, and object confinement. An encoding of Java's OO model in Classages is provided, showing how standard paradigms are supported. A prototype Classages compiler is described.
Abstract  Module systems are well known as a means for giving clear interfaces for the static linking of code. This paper shows how adding explicit interfaces to modules for 1) dynamic linking and 2) cross-computation communication can increase the declarative nature of modules, and build a stronger foundation for language-based security and version control. We term these new modules Assemblages. We additionally develop a sound constraint-based type system particularly suited to a module system supporting bounded type parametricity, cross-module type recursion, and polymorphic type binding during dynamic linking and cross-computation communication.
Abstract In this paper we describe Ensemble, a proposed language framework for sensor network programming. Our goal is to provide a programming framework to scientists and engineers that will allow them to directly code sensor network applications, without the need for expertise in low-level device programming. The key concepts in Ensemble are high-level communication protocol connectors, and the ability for systems programmers to define new communication protocols as metaprotocol extensions.
Abstract This paper proposes a component programming language that supports an integrated notion of both compile-time and run-time component. The centerpiece of this paper is the static, compile time notion of assembly, complementing our previous work on the dynamic, runtime notion of cell. An assembly is a declarative, stateless piece of code that facilitates code combination. It offers explicit typed interfaces to outsiders, called linkers, which can be used to link smaller assemblies into bigger, compound assemblies. Each assembly may in turn be loaded at run-time, producing a cell in the runtime environment. A cell is a dynamic, stateful component that interacts with other cells via explicit runtime interfaces. Thus, the static assemblies and the dynamic cells are fully integrated. 
Abstract  This paper defines a security infrastructure for access control at the component level of programming language design. Distributed components are an ideal place to define and enforce significant security policies, because components are large entities that often define the political boundaries of computation. Also, rather than building a security infrastructure from scratch, we build on a standard one, the SDSI/SPKI security infrastructure.