The second reason HPC workloads are slow to migrate too the cloud is related to cloud networks. A majority of cloud service providers designed the network infrastructure for individual compute resources common in IT services data centers. /for example your corporate file servers do not need a high speed network between them as long as they each have good connectivity to the client systems. It is the same for web servers, database servers, and most other it workloads.
HPC systems rely heavily on high speed, low latency network connections between te individual servers for optimal performance. This is because of how they share memory resources across processors in other systems. They utilize a library called MPI (message passing interface) to share information between processes. The faster this information can be shared the higher the performance of the overall system.
HPC systems use networks that are non-blocking, meaning that every system has 100% of the available network bandwidth between every other system in the network. They also use extremely low latency networks reducing the delay from a packet being sent from one system to another to as low as possible.
In cloud based systems there is usually a high blocking factor between racks and low within a rack, resulting in a very unbalanced network creating increased latency for high perfance workloads, so poor that some HPC application will not execute to completion. In recent months some cloud providers have made efforts to redesign network infrastructure to support HPC applications, but there is more work to be done.
No comments:
Post a Comment