|Alessio Pagliari, Quentin Jacquemart, Guillaume Urvoy-Keller
The 2017 ACM Internet Measurement Conference in London, 1-3 November 2017
Bandwidth-hungry multi-cloud applications are becoming ubiquitous in situations where data streams need to be aggregated close to the edge of the network. For example, data originated from geographically distributed IoT devices can be combined into a big data stream corresponding to the devices’ general location . The PrEstoCloud project  investigates such scenarios, with the objective of building a proactive framework able to automatically adapt the architecture to the changes within these data streams. In such scenarios, computation tasks, distributed over different data centers, need to exchange high-rate traffic with predefined SLAs. This entails that the application management middleware is aware of the available bandwidth between the different data centers.
Measuring bandwidth in such scenarios has received little attention in the literature, apart from [1,2] which discuss the accuracy of measurements with a single provider, or between AWS and Azure. Unfortunately, iperf, used in [1,2], relies on bulk transfers that are undesirably costly in pay-as-you-go IaaSs. A comparison of state-of-the-art tools for measuring capacity and available bandwidth concludes that Pathload  is the most efficient and accurate . We opted for it for our measurements.
The convergence of Pathload depends on trends in the time series of inter-packet delay, to decide whether it should probe for higher or lower bandwidth. This is possible with full control on the hardware, which is unusual in virtualized environments. Moreover, we lack ground-truth because cloud providers do not disclose information on their internal network characteristics. To investigate these issues, we used Mininet to assess the performance of Pathload in a virtualized and controlled environment. We then performed bandwidth measurements using Pathload in a public cloud environment, alongside TCP and UDP measurements with iperf. We observed that Mininet affects the accuracy of Pathload. We intend to capture and analyze the measurement traffic to better understand the nature of the noise induced by virtualization. Field trials where performed using several pairs of data centers from AWS on different continents. We observed that Pathload results were often similar to those of iperf.
An additional difficulty arises from the heavy use of multipath in public data centers, with load balancing done at the connection level. Our next challenge is to estimate the relation between the virtual and physical paths, to determine the extent to which Pathload measurements constitute a good predictor for application level transfers on different ports.
 V. Persico et al. Measuring network throughput in the cloud: the case of Amazon EC2. Computer Networks, 93:408–422, 2015.
 V. Persico et al. On the performance of the wide-area networks interconnecting public-cloud datacenters around the globe. Computer Networks, 112:67–83, 2017.
 Guerrero et al. On the applicability of available bandwidth estimation techniques and tools. Computer Communications, 33(1):11–22, 2010.
 Pu et al. Low latency geo-distributed data analytics. ACM SIGCOMM CCR, 45(4), 421-434. 2015
 Verginadis et al. Proactive Cloud Resources Management at the Edge for Efficient Real-Time Big Data Processing. Closer 2017.
 Jain et al. Pathload: A measurement tool for end-to-end available bandwidth. PAM 2002.