International Industry-Academia Workshop on Cloud Reliability and Resilience

7-8 November 2016, Berlin, Germany

With the increasing adoption and reliance on cloud platforms and services, it is undeniable that cloud computing is becoming a utility such as water, energy, transportation, or telecommunications. This status brings the responsibility for public providers to ensure the development of highly reliable platform and services.

Nonetheless, a study from Gartner found that 47% of all documented cloud problems were caused by service outages. Their duration ranged between 40 minutes and five days. Another study from Ponemon Institute found that outages on average cost US$ 690,204. To aggravate these results, the increasing use of commodity hardware to build data centers will negatively contribute and will lower the reliability of existing cloud computing platforms.

Thus, the development of new strategies, techniques, and methods to evaluate and increase the reliability and resilience of cloud platforms from a software perspective is fundamental.

This workshop intends to bring together industry, academia, and regulators to identify the most relevant requirements in the field of cloud reliability and resilience, on one hand, and existing state-of-the-art solutions, on the other. We invite engineers, scientists, and experts to discuss and contribute to the creation of a new generation of highly reliable cloud platforms.

The workshop places focus on the following topics:

Challenges of data center reliability
Methods and algorithms for failure prediction
Damage detection and problem diagnosis
Automated repair and recovery of cloud systems
Disaster recovery in cloud computing
Fault-injection as an approach for reliability
Evaluation of cloud platforms reliability
Cloud reliability metrics and benchmarks
Service Level Agreement (SLA) and reliability
Quality of Service in the cloud
Standards, regulations, and legislation

General Chairs

Henrik Abramowicz, EIT Digital
Jorge Cardoso, Huawei ERC

Steering Committee

Dr. Götz Reinhäckel, Head of Cloud Engineering, T-Systems International, Germany.
Dr. Jeff Voas, US National Institute of Standards and Technology (NIST), US.
Prof. Paulo Esteves Veríssimo, University of Luxembourg, Luxembourg.
Michel Drescher, Cloud Computing Standards Specialist, University of Oxford, UK.
Valentina Salapura, Chief Architect, Resiliency and Business Continuity, IBM, US.

Invited Speakers

Building Blocks for Site Reliability At Google, Sebastian Kirsch, Google, Switzerland.
Breaking Azure for Fun and Profit, Pavel Michailov, Microsoft, US.
Using Event-driven Automation and Workflows for Auto-remediation, Dmitri Zimine, Brocade, US.
High Availability and Disaster Recovery in OpenStack: From humble beginnings to enterprise reliability, Florian Haas, Hastexo, Austria.
A Tale of Ice and Fire, or: The Cloud and The Standards, Michel Drescher, University of Oxford, UK.
I’m No Hero: Full Stack Reliability at LinkedIn, Todd Palino, LinkedIn, US.
Resilient Cloud Storage – The Consistency View, Neeraj Suri, TU Darmstadt, Germany.
A Cloud is Not Enough, Reliable Delivery Matters More, Ajay Gulati, ZeroStack, US.
Dependable Storage and Computing using Multiple Cloud Providers, Alysson Neves Bessani, University of Lisbon, Portugal.
Cloud Based Fault Injection for Anomaly Detection, Craig Sheridan, flexiOPS, UK.

Location

EIT Digital Berlin Co-Location Centre

Events Archive