Hui Lu

Assistant Professor
Department of Computer Science
SUNY Binghamton

I am looking for self-motivated students to work with. If you are interested, please feel free to contact me.

Research Interests

My research interests broadly span operating systems/hypervisors, distributed systems, and performance analysis and optimization, especially in the context of virtualization-based cloud infrastructures. Particularly, my research centers around file and storage systems, and involves building fair, reliable and efficient storage systems to empower virtualized clouds.

Improving Fairness, Efficiency and Reliability of Cloud Storage Systems

Figure 1: My three main efforts (vFair, BASS and StorM) in multi-layered cloud storage systems.

To provide high resource density and low cost of ownership, modern cloud computing dictates a sophisticated multi-tenancy architecture, consisted of a mix of layered software and hardware to support seemingly limitless computing capabilities on a horizontal scaling basis. Regardless of the significant benefits brought by multitenancy, compelling challenges in aspects of platform efficiency, fairness and isolation of resource sharing, and in-cloud data security and reliability arise. My Ph.D. research work makes three main contributions to address these challenges.

Fairness and Isolation In multi-tenancy clouds, the storage resources are commonly shared among VMs from multiple tenants. Effective storage resource management, however, turns out to be challenging, as VM workloads exhibit various I/O patterns and diverse loads. Many schedulers were proposed and focused on proportionally sharing available storage I/O resources in a work-conserving manner. However, after re-examining the state-of-the-art schedulers in the virtualization environment, I observed that the work-conserving property causes synchronous I/O requests (e.g., web or database servers with low I/O-concurrency) to be arbitrarily delayed by interleaved batches of asynchronous I/O requests (e.g., big data processing applications with high I/O-concurrency), thereby causing significant unfairness and isolation breach in resource sharing between these two popular cloud workloads.

To this end, I proposed an advanced storage resource scheduling framework, vFair, which achieves high degrees of fairness and isolation in I/O resource sharing among competing VMs, regardless of their I/O workloads and patterns. Specifically, to address the need for a fine-grained I/O resource allocation scheme, I proposed a novel service-time based allocation model, which takes per-I/O cost into consideration (instead of simply dividing the number of I/O requests as most schedulers do). Further, I employed a two-level scheduling architecture for arbitrating I/O resources across VMs to allow them to receive individual fair shares (guided by the finegrained allocation model). I have developed a Xen-based prototype of vFair and evaluated it with a wide range of storage workloads. The evaluation results showed that vFair significantly improves fairness (e.g., the normalized proportional share ratio increases by up to 33 times for I/O intensive applications), while keeping storage systems highly utilized (nearly 100%).

Addressability Mismatch Diving into the multi-tenancy clouds, I found that cloud storage systems (e.g., block storage and object storage) are commonly multi-layered, mixed with software and hardware layers as illustrated in Figure 1. Compared to local storage, I measured that these multi-layered cloud storage systems tend to impose higher I/O overheads due to longer I/O path. Although, to mitigate or hide such high overheads, the storage systems rely heavily on the VM/container-side system cache (which buffers volume data in memory pages), severe performance issues were still observed for write I/O requests. After thoroughly investigating the behaviors of cloud block storage, I found that the addressability gap between storage and network layers plays an important role in negatively impacting the write I/O performance, and further affecting the overall performance of many popular cloud applications.

To overcome this challenge, I proposed a novel byte-addressable storage system, BASS, to bridge the addressability gap between the storage and network layers in cloud block storage. BASS re-designed the whole storage stack -- common to all block-based file systems -- to make key data structures and functions aware of variable-length I/O data rather than fixed, block-based ones, hence seamlessly exposing byte-addressability to existing block-based file systems (e.g., Linux Ext3/4 and Windows NTFS). Further, with byte-addressability, new read/write policies become feasible -- I proposed a highly-efficient non-blocking approach to improve the performance of writes. Experiments showed that significant I/O performance improvement (up to 10x) and network bandwidth saving (up to 94%) are achieved by BASS for writes. Meanwhile, BASS does not incur any overheads for other operations such as reads.

Data Security/Reliability The multi-tenancy cloud architecture (in Figure 1) faces another big challenge -- data security and reliability. Existing cloud architecture provides all customers the same, non-customizable set of resources and services, which deprives tenants of any control over their in-cloud data. Concerns over security and privacy of server-side resources have made consumers hesitant to move to public clouds. Sensitive files and proprietary code stored in the cloud may be leaked, and since tenants are at the mercy of the services offered by cloud service providers, they cannot further enhance the security and reliability of their data.

To solve this significant challenge, I designed, developed and evaluated a novel storage middle-box platform, StorM, for deploying tenant-defined security/reliability services. In this platform, the service logic (e.g., access logging, encryption, and replication) of a middle-box is defined by the tenant, but the creation and operation of the middle-box is taken care of by the cloud provider. During prototyping StorM on top of OpenStack, I addressed three practical challenges including network splicing (i.e., steering storage traffic across different networks while keeping strong isolation), platform efficiency (i.e., mitigating or hiding I/O latency caused by data rerouting/processing), and semantic gap (i.e., retrieving high-level I/O accesses from low-level data packets). I also ported three tenant-customized security/reliability services -- storage access monitor, encryption/decryption and data replication -- to demonstrate the efficacy of StorM.

Selected Publications

BASS: Improving I/O Performance for Cloud Block Storage via Byte-Addressable Storage Stack.
In Proc. 7th ACM Symposium on Cloud Computing (SOCC'16).

StorM: Enabling Tenant-Defined Cloud Storage Middle-Box Services.
In Proc. 46th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'16).

vFair: Latency-Aware Fair Storage Scheduling via Per-IO Cost-Based Differentiation.
In Proc. 6th ACM Symposium on Cloud Computing (SOCC'15).

Other Research in Cloud Computing

I have collaborated closely with industry for other practical research problems in cloud computing:

Multi-VM Live Migration My observation from extensive experiments using a variety of multi-tier cloud applications suggested that different VM migration strategies result in non-trivial performance impacts on a multi-tier application. After a deep investigation, I found that the root cause is the interdependence between functional components of the multi-tier application. This observation motivated vHaul, a system that coordinates multi-VM migration to approximate the optimal scheduling. The evaluation results showed that vHaul significantly reduces application-level service latency during migration by up to 70%.

Software-Defined Networking I designed and implemented an intelligent SDN management framwork for hybrid networks consisting of both SDN and legacy switches, called HybNET . The framework leverages network virtualization principles to create paths between nodes using VLANs (for legacy switches) and switch slices (for OpenFlow switches). This SDN-based management framework provides complete compatibility between legacy and SDN switches, without compromising the advantages and flexibility of SDN-enabled switches.

Performance Optimization As a pioneer in virtualization, I conducted massive performance characterization and analysis using popular industrial consolidation workloads on advanced multi-core systems. I observed and proposed optimization solutions to improve both performance and scalability of virtualization software (e.g., Xen and KVM) on multi-core systems. One of my optimization solutions to improve the efficiency of Xen hypervisor's vCPU scheduler has been widely used in the community (Schedule Rate Limiting).

Selected Publications

vHaul: Towards Optimal Scheduling of Live Multi-VM Migration for Multi-tier Applications.
In Proc. 8th IEEE International Conference on Cloud Computing (CLOUD'15, Applications Track).

Hybnet: Network manager for a hybrid network infrastructure.
In Proc. 13th ACM/IFIP/USENIX International Middleware Conference (Middleware'13, Industry Track).

Virtualization challenges: a view from server consolidation perspective.
In Proc. 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments(VEE'12).

For a complete list of my publications, please see my publications page.