Current Projects


HOPS is the first open platform-as-a-service distribution of Hadoop v2. HOPS provides support for virtualized Hadoop on different platforms: HOPS can currently be deployed and managed on AWS EC2, OpenStack, and Bare-Metal. HOPS also provides a highly available, more scalable distribution of HDFS, where the NameNode is replaced by a highly available, replicated in-memory database. Our HDFS distribution supports much larger amounts of meta-data that is now customizable support. HOPS also supports many different data-intensive computing platforms, such as MapReduce and Spark through YARN. Stratosphere will support come soon. Read more here.


Apache Flink is a platform for efficient, distributed, general-purpose data processing. It features powerful programming abstractions in Java and Scala, a high-performance runtime, and automatic program optimization. It has native support for iterations, incremental iterations, and programs consisting of large DAGs of operations. Flink Streaming is an extension of the core Flink API for high-throughput, low-latency data stream processing. The system can connect to and process data streams from many data sources like RabbitMQ, Flume, Twitter, ZeroMQ and also from any user defined data source. Read more here.

Karamel is an orchestration engine for Chef Solo that enables the deployment of arbitrarily large distributed systems on both virtualized platforms, e.g., AWS, and bare-metal hosts. A distributed system is defined in YAML as a set of node groups that each implement a number of Chef recipes, where the Chef cookbooks are deployed on github. Karamel orchestrates the execution of Chef recipes using a set of ordering rules defined in a YAML file (Karamelfile) in each cookbook. For each recipe, the Karamelfile can define a set of dependent (possibly external) recipes that should be executed before it. At the system level, the set of Karamelfiles defines a directed acyclic graph (DAG) of service dependencies. Karamel system definitions are very compact. We leverage Berkshelf to transparently download and install transitive cookbook dependencies, so large systems can be defined in a few lines of code. Finally, the Karamel runtime builds and manages the execution of the DAG of Chef recipes, by first launching the virtual machines or configuring the bare-metal boxes and then executing recipes with Chef Solo. The Karamel runtime executes the node setup steps using JClouds or ssh. Karamel transparently handles faults by retrying, as virtual machine creation or configuration is not always reliable or timely.. Read more here.

The iSocial ITN aspires to bring a transformational change in Online Social Service provision, pushing the state-of-the-art from centralized services towards totally decentralized systems that will pervade our environment and seamlessly integrate with future Internet and media services. Online Social Networking (OSN) decentralization can address privacy considerations and improve service scalability, performance and fault-tolerance in the presence of an expanding base of users and applications. The project will pursue the vision of a decentralized Ubiquitous Social Networking Layer and the development of a novel distributed computing substrate that provides Decentralized Online Social Networking (DOSN) services and supports the seamless development and deployment of new social applications and services, in the absence of central management and control. Read more here.


BiobankCloud is an EU-funded FP7 project that is developing a cloud-computing platform as a service (PaaS) for the storage, analysis and inter-connection of biobank data. Our platform will provide security, storage, data-intensive computing tools, bioinformatics workflows, and support for allowing biobanks to share data with one another, all within the existing regulatory frameworks for the storage and usage of biobank data. Read more here.


The CLOMMUNITY project aims at addressing the obstacles for communities of citizens in bootstrapping, running and expanding community-owned networks that provide community services organised as community clouds. That requires solving specific research challenges imposed by the requirement of: self-managing and scalable decentralized infrastructure services for the management and aggregation of a large number of widespread low-cost unreliable networking, storage and home computing resources; distributed platform services to support and facilitate the design and operation of elastic, resilient and scalable service overlays and user-oriented services built over these underlying services, providing a good quality of experience at the lowest economic and environmental cost. Read more here.


The goal of this project is to develop an End-to-End information-centric Cloud (E2E-Cloud) for data intensive services and applications. The E2E-Clouds is a distributed and federated cloud infrastructure that meets the challenge of scale by aggregating, provisioning and managing computational, storage and networking resources from multiple centers and providers. Like some current data-center clouds it manages computation and storage in an integrated fashion for efficiency, but adds wide-scale distribution. Some of the challenges addressed can be seen as overcoming the limitations of current clouds, many of which are far more important in a distributed cloud than the single data-center cloud. Read more here.


Completed Projects


Kompics is a message-passing component model for building distributed systems by putting together protocols programmed as event-driven components. Systems built with Kompics leverage multi-core machines out of the box and can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems. Read more here.


Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. CATS is a distributed key-value store that uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Read more here.


ElastMan is an elasticity controller for Elastic Cloud-based services. ElastMan combines feedforward and feedback control. Feedforward control is used to respond to spikes in the workload by quickly resizing the service to meet SLOs at a minimal cost. Feedback control is used to correct modeling errors and to handle diurnal workload. To address nonlinearities, our design of ElastMan leverages the near-linear scalability of elastic Cloud services in order to build a scale-independent model of the service. Read more here.


PonIC is an initial implementation of an integration of Pig and Stratosphere. The current prototype supports a subset of the most common Pig operations and it can be easily extended to support the complete set of Pig Latin statements. Stratosphere has desirable properties that significantly simplify the plan generation. We argue that Pig can highly benefit from using Stratosphere as the back-end system and gain performance, without any loss of expressiveness. Read more here.