presentation - Système d’Exploitation, systèmes Répartis, de l’Intergiciel à l’Architecture Access content directly

Most of the research works conducted in the SEPIA group address the issue of resource management in datacenters. In the following, we distinguish strategies which specifically target energy optimization in the datacenter (consumption and thermal effects), from works which address the improvement of virtualized datacenter consolidation (which is more general, but may also address energy saving). In a third category, we present works that focussed on the improvement of operating system support (at the level of a single server) in such environments.

Datacenter (energy) optimization

On the side of the environment of the datacenter, one research area is related to the cooling infrastructure and its optimisation. The challenge is to model the datacenter from a thermal perspective, considering the workload and application profiles, the power model and the cooling model. This model can be used to study thermal imbalances and to build strategies relying on physical  machine placement or task scheduling.

A second line of works at the infrastructure level is related to power provisioning with renewable energies. Integrating renewable energies to limit the CO2 emissions of datacenters attracted lots of attention in the last decade. The investigated approach is to influence scheduling in the datacenter according to the availability of renewable energies.

One of the issue with servers in Datacenters is that their power consumption is not fully proportional to the load, i.e. some idle power is consumed even when low load is experienced. We investigate a novel approach for building data centers with heterogeneous machines carefully chosen for their performance and energy efficiency ratios (Big, Medium, Little). Then a scheduler can exploit this infrastructure by migrating applications and switching machines on or off, so that the energy consumed by the data center is fully proportional to the load.

Consolidation of virtual machines, management of Cloud infrastructures

In virtualized datacenters, many optimization techniques are targeting server consolidation, i.e. packing virtual machines (VM) on as less servers as possible in order to power-off (suspend) unused servers. Several improvements to server consolidation can then be introduced.

One vein of improvement to consolidation is to adapt the architecture of the hosted applications in order to enable further consolidation, e.g. by changing the number of VMs or the VM sizes.

Another approach is to study heuristics which help improving VM placement, according to multiple criteria.

In cloud environments, the main limitation to consolidation is the lack of memory. Memory gets saturated way before CPU. We propose to exploit remote memory in the datacenter so that this constraint can be relaxed and consolidation improved.

OS support improvement, for Cloud and HPC

A significant part of our research targeted the improvement of operating system and virtualization supports used in datacenters.

An important cause of inefficiency in today’s virtualized environments is that scheduling decisions are made at two places, the hypervisor and the guest operating systems. A way of improvement is to establish a collaboration between these two levels.

One important issue of virtualization systems is to enforce predictability of performance. The performance of a guest operating system should not be perturbed by the fact that it executes in a virtual machine (VM). However, unpredictability can be observed for many reasons : cache contention, hypervisor level contention, hardware heterogeneity. We address the above sources of unpredictability.

Another important issue in virtualization is to take into account evolutions of hardware, and especially the shift towards NUMA architectures. In NUMA architectures, the major challenge comes from the fact that the hypervisor regularly reconfigures the placement of VM resources (vCPU, pages) over the NUMA topology. However, the guest operating systems are not designed to consider NUMA topology changes at runtime. This issue can be addressed at the different levels (application level, guest operating system level, hypervisor level).

One critical issue in virtualized systems is to have accurate monitoring tools regarding the behavior of VMs. This is a condition to implement efficient resource management policies. The issue is to implement monitoring tools which are not intruvive regarding the guest operating system or regarding performance, and which provide an accurate monitoring.