A machine learning framework for TCP roundtrip time estimation
 Bruno Astuto Arouche Nunes^{1}Email author,
 Kerry Veenstra^{1},
 William Ballenthin^{2},
 Stephanie Lukin^{3} and
 Katia Obraczka^{1}
https://doi.org/10.1186/16871499201447
© Nunes et al.; licensee Springer. 2014
Received: 5 June 2013
Accepted: 12 March 2014
Published: 26 March 2014
Abstract
In this paper, we explore a novel approach to endtoend roundtrip time (RTT) estimation using a machinelearning technique known as the experts framework. In our proposal, each of several ‘experts’ guesses a fixed value. The weighted average of these guesses estimates the RTT, with the weights updated after every RTT measurement based on the difference between the estimated and actual RTT.
Through extensive simulations, we show that the proposed machinelearning algorithm adapts very quickly to changes in the RTT. Our results show a considerable reduction in the number of retransmitted packets and an increase in goodput, especially in more heavily congested scenarios. We corroborate our results through ‘live’ experiments using an implementation of the proposed algorithm in the Linux kernel. These experiments confirm the higher RTT estimation accuracy of the machine learning approach which yields over 40% improvement when compared against both standard transmission control protocol (TCP) as well as the well known Eifel RTT estimator. To the best of our knowledge, our work is the first attempt to use online learning algorithms to predict network performance and, given the promising results reported here, creates the opportunity of applying online learning to estimate other important network variables.
1 Introduction
Latency is an important parameter when designing, managing, and evaluating computer networks, their protocols, and applications. One metric that is commonly used to capture network latency is the endtoend roundtrip time (RTT) which measures the time between data transmission and the receipt of a positive acknowledgment. Depending on how the RTT is measured (e.g., at which layer of the protocol stack), besides the time it takes for the data to be serviced by the network, the RTT also accounts for the ‘service time’ at the communication end points. In some cases, RTT measurement can be done implicitly by using existing messages; however, in several instances, explicit ‘probe’ messages have to be used. Such explicit measurement techniques can render the RTT estimation process quite expensive in terms of their communication and computational burden.
Several network applications and protocols use the RTT to estimate network load or congestion and therefore need to measure it frequently. The transmission control protocol, TCP, is one of the best known examples. TCP bases its error, flow, and congestion control functions on the estimated RTT instead of relying on feedback from the network. This pure endtoend approach to network control is consistent with the original design philosophy of the Internet which keeps only the bare minimum functionality in the network core, pushing everything else to the edges. Overlays such as content distribution networks (CDNs) (e.g., Akamai [1]) and peertopeer networks also make use of RTT as a ‘network proximity’ metric, e.g., to help decide where to redirect client requests. There has also been increasing interest in network proximity information from applications that run on mobile devices (e.g., smart phones) in order to improve the user’s experience.
In this paper, we propose a novel RTT estimation technique that uses a machinelearning based approach called the experts framework[2]. As described in detail in Section 3, the experts framework^{a} uses ‘online’ learning, where the learning process happens in trials. At every trial, a number of experts contribute to an overall prediction, which is compared to the actual value of the RTT (e.g., obtained by measurement). The algorithm uses the prediction error to refine the weights of each expert’s contribution to the prediction; the updated weights are used in the next iteration of the algorithm. We contend that by employing the proposed prediction technique, network applications, and protocols that make use of the RTT do not have to measure it as frequently.
As an example application for the proposed RTT estimation approach, we use it to predict TCP’s RTT. Through extensive simulations and live experiments, we show that our machine learning technique can adapt to changes in the RTT faster and thus predict its value more accurately than the current exponential weighted moving average (EWMA) technique employed by most versions of TCP. As described in Section 2, TCP uses the RTT estimates to compute its retransmission timeout (RTO) timer [3], which is one of the main timers involved in TCP’s error and congestion control. When the RTO expires, the TCP sender considers the corresponding packet to be lost and therefore retransmits it. TCP relies on RTT predictions and measurements in order to set the RTO value properly. In TCP, the RTT is defined as the time interval between when a packet leaves the sender and until the reception, at the sender, of a positive acknowledgment for that packet.
If the RTO is too long, it can lead to long idle waits before the sender reacts to the presumably lost packet. On the other hand, if the RTO is set to be too aggressive (too short), it might expire too often leading to unnecessary retransmissions. Needless to say that setting the RTO is critical for TCP’s performance.
We can split the problem of setting TCP’s RTO into two parts. The first part is how to predict the RTT of the next packet to be transmitted, and the second is how the predicted RTT can be used to compute the RTO. In this paper, we focus on the first part of the problem, i.e., the prediction of the RTT; the second part of the problem, i.e., setting the RTO, is the focus of future work.
To estimate the RTT, we propose a new approach based on machine learning which will be described in detail in Section 3. Our experimental results show that RTT predictions using the proposed technique are considerably more accurate when compared to TCP’s original RTT estimation algorithm and the wellknown Eifel [4] timer. We then evaluate how this increased accuracy affects network performance. We do so by running network simulations as well as live experiments. For the latter, we have implemented both our machine learning as well as the Eifel mechanism in the Linux kernel.
The remainder of this paper is organized as follows. Section 2 presents related work, including a brief overview of TCP’s original RTT estimation technique. In Section 3, we describe our RTT prediction algorithm. Section 4 presents our experimental methodology, where we describe the scenarios conceived for our simulation studies, discuss simulation parameters, and define metrics for performing evaluations. Sections 5 and 6, respectively, present our results from both simulation and live experiments. Finally, Section 7 concludes the paper and highlights directions for future work.
2 Related work
In this section, we present a brief overview of previous work on RTT estimation and later discuss some relevant machine learning applications.
2.1 TCP RTT estimation
Most current variants of TCP also implement Karn’s Algorithm[5], which ignores the SampleRTT corresponding to retransmissions. Another consideration is TCP’s clock granularity. In several TCP implementations, and also in the simulation tool used in this work, the clock advances in increments of ticks commonly set to 500 ms, and the RTO is bounded by RTO _{ min }=2·ticks and RTO _{ max }=64·ticks^{b}.
In prior work, a number of approaches have been proposed to estimate TCP’s RTT. Tracedriven simulations reported in [6] to evaluate different RTT estimation algorithms show that the performance of the estimators is dominated by their minimum values and is not influenced by the RTT sample rate [6]. This last conclusion was challenged by the Eifel estimation mechanism [4], one of the most cited alternatives to TCP’s original RTT estimator; Eifel can be used to estimate the RTT and set the RTO. Eifel’s proponents identify several problems with TCP’s original RTT estimation algorithm, including the observation that a sudden decrease in RTT causes RTTVAR and consequently the RTO to increase unexpectedly^{c}. As it will become clear from the experimental results presented in Subsection 6.3, our approach is able to follow quite closely any abrupt changes in the RTT and outperforms both TCP’s original RTT estimator as well as that of Eifel’s.
Another notable adaptive TCP RTT estimator was proposed in [7]. It uses the ratio of previous and current bandwidth to adjust the RTT. In [8], a TCP retransmission timeout algorithm based on recursive weighted median filtering is proposed. Their simulation results show that for Internet traffic with heavy tailed statistics, their method yields tighter RTT bounds than TCP’s original RTT computation algorithm. Leung et al. present work that focuses on changing RTO computation and retransmission polices, rather than improving RTT predictions [9].
2.2 Selected machine learning applications
Machine learning has been used in a number of other applications. Helmbold et al. use an experts framework algorithm to predict harddisk drive’s idle time and decide when to attempt to save energy by spinning down the disk [10]. For this problem, the cost of making a bad decision (i.e., when spinning the disk down and back up costs more than simply leaving it on) is very well defined since the decision of spinning down the disk does not affect the length of the next idle time.
Unlike the spindown example, when predicting the RTT, every prediction causes the next RTT to be set to a different value and that influences every event that happens thereafter. Thus, the problem of defining the cost of a bad RTT prediction is not as straightforward. Our solution to this problem is discussed in Section 3.
Moreover, in the spindown cost problem, the traces used in the evaluation were ‘offline’ traces, i.e., traces captured from live runs and later on used as input to the algorithms being evaluated. In the case of the spindown problem, there is no problem in using offline traces since the algorithm’s estimations do not influence the outcome of the next measurement. In the case of TCP RTT estimation, however, as previously discussed, since the RTT estimations influence TCP timers and these timers affect the outcome of the next RTT measurement, offline traces can be used to set and tune parameters of the algorithm, but they are not suitable for evaluating the performance of the system. The work presented in [8, 9, 11, 12] on RTT estimation compares their solution against TCP’s original RTT estimation algorithm, but they base their evaluation on offline traces.
Machine learning techniques have not been commonly employed to address network performance issues. The work described in [13] is a notable exception and proposes the use of the stochastic estimator learning algorithm (SELA) to address the call admission control (CAC) problem in the context of asynchronous transfer mode (ATM) networks. Their goal was to predict in ‘real time’ if a call request should be accepted or not for various types of traffic sources. Simulation results show the statistical gain exhibited by the proposed approach compared to other CAC schemes. Another SELAbased approach, this time, applied to QoS routing in hierarchical ATM networks was proposed in [14]. In this approach, learning algorithms operating at various network switches determine how the traffic should be routed based on current network conditions. In the context of WiMax networks, a crosslayer design approach between the transport and physical layers for TCP throughput adaptation was introduced in [15]. The proposed approach uses adaptive coding and modulation (ACM) schemes.
Mirza et al. propose a throughput estimation tool based on support vector regression modeling [16]. It predicts throughput based on multiple realvalue input features. However, to the best of our knowledge, to date, no attempt to use online learning algorithms to predict network conditions has been reported.
3 Proposed approach
In this section, we present the fixedshare experts algorithm as a generic solution for online prediction. Later, we describe its application to the problem of predicting the RTT of a TCP connection.
3.1 The fixedshare experts algorithm
The weight w_{t,i} should be interpreted as a measurement of the confidence in the quality of the i th expert’s prediction at the start of trial t. In the initialization of the algorithm, we make ${w}_{1,i}=\frac{1}{N},\forall i\in \{1,\dots ,N\}$. The algorithm updates the experts’ weights at every trial after computing the loss at trial t by multiplying the weight of the i th expert by ${e}^{\eta {L}_{i,t}({x}_{i},{y}_{t})}$. The learning rate η is used to determine how fast the updates will take effect, dictating how rapidly the weights of misleading experts will be reduced.
After updating the weights, the algorithm also ‘shares’ some of the weight of each expert among other experts. Thus, an expert who is performing poorly and had its weight severely compromised can quickly regain influence in the master prediction once it starts predicting well again. The amount of sharing can be changed by using the α parameter, called the sharing rate. This allows the algorithm to adapt to ‘bursty’ behavior. Indeed, in Section 5, we use, among others, bursty traffic scenarios to evaluate the performance of our algorithm.
Algorithm 1 summarizes the steps involved in our fixedshare experts algorithm. The first line in the algorithm summarizes the algorithm’s parameters, namely: (1) the learning rate η, which we define as a positive real number, and (2) the sharing rate α, a real number between zero and one that dictates the percentage of the weights shared at every trial. In our approach, expert weights are initialized uniformly, which is what is described in the ‘Initialization’ step of Algorithm 1. The basic four steps of our fixedshare experts algorithm are also summarized in Algorithm 1 (and in Figure 1). In step 1, the prediction defined as a the weighted average over the individual predictions x_{ i } of every i th expert is computed. The loss function is computed in step 2 and is used in step 3 to penalize and decrease the weights of the experts that are not performing well, and in step 4, experts share their weight based on the sharing rate α.
In [2], the basic version of the experts framework is presented along with bounds for different loss functions. The algorithm is also analyzed for different prediction functions, including the weighted averaging we use. The implementation described in this paper, with the intermediate pool variable, costs O(1) time per expert per trial.
3.2 Applying experts to TCP’s RTT prediction
To apply the proposed algorithm to TCP’s RTTprediction problem, the experts predictions x_{ i } shown in Algorithm 1 serve as predictions for the next RTT measured. y_{ t } is the RTT value at the present trial, equivalent to the SampleRTT in the original TCP RTT estimator. ${\u0177}_{t}$ is the output of the algorithm, or in other words, the RTT prediction itself, equivalent to the EstimatedRTT in the original TCP RTT predictor, mentioned in Section 2.
Algorithm 1 Fixedshare experts algorithm
We want the loss function L_{i,t}(x_{ i },y_{ t }) to reflect the real cost of making wrong predictions. In our implementation (see Algorithm 1), the loss function has different penalties for overshooting and undershooting the RTT estimate as they have different impact on the system’s behavior and performance. An underestimate of the RTT will result in an RTO computation that is less than the next measured RTT, causing unnecessary timeouts and retransmissions. We thus employ the following policies: if the measured RTT y_{ t } is higher than the expert’s prediction x_{ i }, then it means that this expert is contributing to a spurious timeout and should be penalized more than other experts that overshoot within a given threshold. Big overshooters are more severely penalized. The challenge here is identifying the appropriate cost for misspredicting the RTT. The cost could be simply the difference between prediction and measurement, or a factor thereof. Exploring other loss functions for the TCP RTT estimation problem is the subject of future work.
Setting the value x_{ i } of the experts is referred to as setting the experts spacing. To space the experts is to determine the experts’ values and their distribution within the prediction domain. When predicting RTTs, the experts should be spaced between RTT _{ min } and RTT _{ max }, defined in some TCP implementations (including the one in the network simulator we used in our experimental evaluation) to be 1 and 128 ticks, respectively. Based on observations of several RTT datasets, we concluded that the majority of the RTT measurements are concentrated in the lower part of this interval. For that reason, we found that spacing the experts exponentially in that interval, instead of uniformly (or linearly), leads to better predictions. The exponential function used in our implementation of the FixedShare Experts approach is ${x}_{i}={\text{RTT}}_{\text{min}}+{\text{RTT}}_{\text{max}}\xb7{2}^{\frac{(iN)}{4}}$. The $\frac{1}{4}$ multiplicative factor in the exponent of the spacing function was experimentally chosen to smooth out its growth. This increases the difference between the experts and generates diversity among them, which increases predictions’ granularity and accuracy.
Another consideration is that the algorithm, as stated in Algorithm 1, will continually reduce the experts’ weights towards zero. Thus, in order to avoid underflow issues in our implementation, we periodically rescale the weights. Different versions of the multiplicative weight algorithmic family, including a mixing past approach that also mixes weights from past trials are discussed in [17]. However, the mixing past approach incurs higher space and time costs and thus was not considered in our work. Another sharing scheme known as variablesharing was also considered in preliminary experiments, where the amount of shared weights to each expert was dependent of their individual losses. However, the additional complexity and cost of this scheme outweighed its benefits in terms of RTT estimation accuracy.
4 Experimental methodology
Simulation parameters for each simulation scenario
Parameter  Scenario I  Scenario II  Scenario III  Scenario IV  Scenario V 

Routing  AODV  AODV  AODV  AODV  fixed 
Mobility  RWP  RWP  none  RWP  na 
MAC/PHY  802.11.b  802.11.b  802.11.b  802.11.b  802.3 
Speed [min,max](m/s)  [1,50]  [1,50]  0  [1,10], [20,30], [40,50]  na 
Pause (sec)  0  0  0  0  na 
Area (meters × meters)  1,500 × 1,000  1,500 × 1,000  1,500 × 1,000  1,500 × 1,000  na 
Duration (min)  25  25  25  90  90 
Number of nodes  20  10  20  20  8 
In Section 5, we present results for a total of five simulation scenarios, which are summarized, along with their parameters, in Table 1. The goal of having a variety of scenarios is to subject the proposed RTT estimation approach to a wide range of network conditions. Scenarios I and II represent ad hoc mobile networks composed of 20 and 10 nodes, respectively. These two scenarios differ from each other only in terms of node density. Scenario III is a static wireless network composed of 20 nodes uniformly distributed over the simulated network area. Scenario IV is also a 20node mobile network, but the traffic pattern is different from scenarios I and II. In scenario IV, we also vary the mobility of the network by varying the nodes’ average speed. Finally, scenario V is a wired network composed of eight nodes, four on each side of a bottleneck link.
It is important to highlight that the mobile scenarios employed in our evaluation were used to evaluate the proposed RTT estimation technique under conditions that cause high variability and randomness in the network and thus in the RTT values. Following the RWP mobility regime, nodes move randomly causing data paths to break and new ones to be created which then results in high variability of the RTT. Our goal was then to ensure that our RTT estimator is able to adjust to high variability conditions and yield accurate RTT estimates.
We present our results in Section 5 using a number of performance metrics defined as follows:

Mean error on RTT prediction: absolute difference between the RTT prediction ${\u0177}_{t}$ and the measured RTT y_{ t } at trial t, averaged over all trials, defined as: $\frac{1}{T}\sum _{t=1}^{T}{\u0177}_{t}{y}_{t}$. In the simulations, RTT is measured in ‘ticks’ of 500 ms. This value is equal to 4 ms in the real live experiments described in Section 6. The mean RTT prediction error metric is thus, also given in ‘ticks’.

Average congestion window size (cwnd): the size in bytes of the congestion window at the TCP sender, averaged over all prediction trials, defined as: $\frac{1}{T}\sum _{t=1}^{T}{\mathit{\text{cwnd}}}_{t}$, where c w n d_{ t } is the congestion window size measured at trial t.

Delivery ratio: ratio between total data packets received by the destination and data packets sent at the source, computed for every flow and averaged over all flows.

Percentage of retransmitted packets: ratio between retransmitted data packets and total data packets sent at every flow, averaged over all flows.

Goodput: total number of useful (data) packets received at the application layer divided by the total duration of the flow and averaged for every flow, giving a ratio of packets per second.
Following, we discuss the impact of the parameters in the accuracy of the proposed RTT prediction algorithm and justify the choice of values set to these parameters in our simulation evaluations.
4.1 Impact of experts framework parameters (N,η,α)
We experimented with several combinations of the fixedshare algorithm parameters. The number N of experts affects the granularity over the range of values the RTT can assume. In our experiments, N>100 had no major impact on the prediction accuracy.
The learning rate η is responsible for how fast the experts are penalized for a given loss. We want to avoid values of η that are too low since it increases the algorithm’s convergence time; conversely, if η is too high, it forces the expert’s weights toward zero too quickly. If this is the case, then as weight rescaling kicks in, the algorithm assigns similar weights to all experts, making the algorithm’s master prediction fluctuate undesirably around the mean value of the experts’ guesses. We chose a learning rate in the interval 1.7 < η < 2.5 as it provides good prediction results for all the scenarios tested.
When sharing is not enabled, i.e., α = 0, the outcome of the algorithm is given only by decreasing exponential updates, making it harder for the algorithm to follow abrupt changes in the RTT measurements: experts that experience prolonged poor performance lack influence because their weights have become too depreciated. In this case, it would take more trials so that these experts start gaining greater importance in the master prediction. Enabling full sharing (α = 1), similar weights are assigned to every expert, and the master prediction fluctuates close to a mean value among the experts guesses.
We present in more detail the simulation scenarios in the next section, highlighting the goals for every evaluation and discussing obtained results. Following our observations, in all results reported hereafter concerning our proposed approach, we use N = 100, η = 2 and α = 0.08.
5 Simulation results
In this section, we discuss simulation results obtained by applying the fixedshare experts framework to estimate TCP’s RTT and help set TCP’s RTO timer. We compare our results against TCP’s original RTT and RTO computation algorithm by Jacobson [3]. We evaluate the RTT prediction quality and how it impacts the previously defined performance metrics. We consider different scenarios by varying network density, mobility, and traffic load. Both wireless as well as wired networks are used in our evaluation.
In all graphs presented below, each data point is computed as the average over 24 simulation runs with a confidence level of 90%.
5.1 Scenario I  mobile scenario (20 nodes)
First, we considered a mobile ad hoc network (MANET) composed of 20 nodes. The goal of this scenario is to evaluate the performance of the RTT prediction algorithms when routes in the network change widely. In other words, we want to show the algorithm’s response to RTT fluctuations. For that reason, we also varied the number of TCP flows during the simulations to change network load and congestion levels. We evaluated scenarios with number of concurrent flows equal to 3, 7, 17, 34, 68, 100, and 130. Although flows were evenly distributed among nodes, they started at random times during the experiments and their sizes varied from 1,000 to 100,000 packets.
5.2 Scenario II  mobile scenario (10 nodes)
In this scenario, we try to subject the network to varying network conditions in order to produce higher RTT variations. We accomplish that by running the same scenario, but now decreasing the density of the network, which includes only ten mobile nodes. The objective is to cause more frequent route changes which would result in more frequent and wider RTT variations. For example, in scenario I, the mean RTT variance for 34 and 100 flows are 0.4763 and 0.5501 ticks, respectively. For scenario II, with only ten nodes, the mean RTT variance for the same congestion scenarios are 0.7878 and 0.9315 ticks, respectively. Except for the reduced number of nodes, all the other parameters for this scenario are the same as before.
5.3 Scenario III  stationary network
Our goal in this experiment is to isolate the effect of traffic load on the performance of the proposed RTT estimator. Therefore, we factor out node mobility and consider a wireless ad hoc network where all nodes are stationary. We varied traffic load the same way we did for scenario I.
5.4 Scenario IV  bursty traffic
In this scenario, we subject the network to bursty traffic loads as a way to evaluate the performance of the proposed RTT estimation strategy as traffic load fluctuates. The results shown here reflect the simulation of a network with 20 mobile nodes in which every node starts a TCP flow of 1,000 packets every 200 s; this happens throughout the 90 min of simulation. Thus, nodes would transmit for a while and then remain silent until the next cycle of 200 s. We also vary the speed of the nodes between (1, 10), (20, 30), and (40, 50) m/s, which allows average speeds of 5, 25, and 45 m/s, respectively.
It is also worth noting the interesting behavior present in the plots for all the reported network performance metrics, where for lower speeds, these metrics reflect better network performance (i.e., lower number of retransmitted packets, higher goodput and higher delivery ratio), since the routing paths do not change as frequently. This incur in fewer losses and lower routing overhead. For the average speed of 25 m/s, the situation changes and the metrics reflect the worst performance. However, when further increasing the average speed to 45 m/s, the network metrics start to improve again. This behavior is consistent with the results presented in [21], which shows that, when topology changes happen at packet delivery time scales, network capacity can improve when nodes are mobile rather than stationary.
5.5 Scenario V  wired network
6 Linux implementation and experiments
In this section, we present our implementation of the fixedshare experts algorithm for the Linux kernel and report on the experiments we conducted and their results.
6.1 Fixedpoint arithmetic
The simulation results reported in Section 5 refer to the implementation of the fixedshare experts algorithm (as described in Section 3) as implemented on the QualNet network simulator. This implementation uses real numbers. Thus, a straightforward Linux implementation would use floating point arithmetic [22] along with floating point functions of the gcc compiler’s libc library [23]. Unfortunately, while floating point numbers support both a wide range of values and high precision, the Linux operating system lacks support for floating point manipulation in the kernel^{d}.
In our Linux implementation we used a sign magnitude representation in which a 33rd bit records the sign (an alternate implementation of the data type would reduce the integer part to 15 bits so that the resulting type would fit entirely within a single 32bit processor register.)
Our fixed point numeric type has the following characteristics: range=65535.99998to+65535.99998 and precision=0.000015. We consider these characteristics adequate for the range of numeric values expected.
6.2 Linux implementation
We implemented both the fixed share experts algorithm and the Eifel algorithm [4] in the Linux kernel version 2.6.28.3. We modified the function tcp_rtt_estimator() to return the output of the RTT as evaluated by either of the RTT prediction algorithms. Our implementation of the Eifel algorithm, to the best of our knowledge, is faithful to the algorithm described in [4] for predicting the RTT and setting the RTO.
Our Linux implementation of the fixed share experts algorithm differs from our QualNet implementation of the algorithm in two areas. First, the implementation scales RTT measurements. A measured RTT of 1 tick in the simulator means 500 ms, while a measured RTT of 1 tick in the Linux implementation means 4 ms. Consequently, our Linux implementation scales RTT measurements from the operating system by $\frac{1}{125}$ before passing them to the fixed share experts algorithm, and it scales RTT predictions from the experts algorithm by 125 before returning them to the operating system. Such scaling prevents the implemented algorithm from misinterpreting the greater precision of the Linux RTT measurements as larger prediction errors.
The second difference in the Linux implementation of algorithm is inspired by the algorithm’s response to large and abrupt reductions in the measured RTTs. Large RTT reductions cause the weights of formerly correct experts to experience greater losses in extreme cases immediately underflowing to 0. In these cases, if the weights of the newly correct experts already have decayed to zero, then all experts’ weights will be zero simultaneously, and the machine learning algorithm will be unable to make a prediction. Normally, the fixedsharing feature of the experts algorithm helps increase the weights of newly correct experts, but sharing cannot compensate for this situation since sharing a total weight of zero among the experts has no effect on the experts’ individual weights. To compensate for this occasional situation, we modified the algorithm in the Linux implementation to detect the case and to reinitialize the experts’ weights with values from a uniform distribution whose mean matches the most recently measured RTT. This change to the algorithm does not affect the simulation results because the simulated RTT changes were sufficient to cause all experts’ weights to go to zero.
6.3 Experimental results
We acquired data from live file transfer runs using our modified TCP kernel modules that implement the Eifel and the experts algorithms. Data collection happened over 30 file transfers of a 16MB file. To help filter out the effects of gradual network changes, we interleaved the transfers controlled by our experts approach, the Eifel retransmission timer, and Jacobson’s algorithm. In total, there were 10 runs of each algorithm for each of the three conceived scenarios.
The live experiments used a different set of scenarios than the simulations. In Scenario 1, the source of the file transfer was a Linux machine containing the modified modules for the experts and Eifel algorithms and the original Kernel code and TCP timer. This machine was connected to the wired campus network at the University of California, Santa Cruz. The destination was another Linux machine connected to the Internet, physically located in the state of Utah in the USA. Scenario 2 was similar to Scenario 1, except that the source was now connected wirelessly to a 802.11 access point, which was connected to the Internet through the UCSC campus network. Scenario 3 was a full wireless scenario, where both source and destination were connected to the same 802.11 access point. All the measurements were collected at the source of the file transfer.
Prediction error, cwnd, and number of retransmissions averaged over 10 runs of the same experiment
Scenario1  

Metric  Eifel  Jacobson  Experts 
Error  11.21(1.02)  8.19(0.97)  5.10(0.61) 
cwnd  61.55(10.42)  69.82(6.32)  74.87(9.73) 
rexmits  26.40(13.83)  31.12(17.72)  13.02(8.24) 
Prediction error, cwnd, and number of retransmissions averaged over 10 runs of the same experiment for Scenario 2
Scenario2  

Metric  Eifel  Jacobson  Experts 
Error  114.52(9.15)  74.11(6.23)  67.64(4.64) 
cwnd  55.38(8.34)  66.71(5.35)  74.89(7.03) 
rexmits  314.20(36.25)  367.70(42.10)  250.80(20.07) 
Prediction error, cwnd, and number of retransmissions averaged over 10 runs of the same experiment for Scenario 3
Scenario3  

Metric  Eifel  Jacobson  Experts 
Error  298.24(21.41)  199.23(12.32)  131.65(7.58) 
cwnd  18.91(6.68)  31.08(7.38)  38.11(5.91) 
rexmits  204.62(32.43)  363.21(39.86)  159.61(24.18) 
6.3.1 RTO computation
7 Conclusions
In the present work, we proposed a novel approach to endtoend RTT estimation using a machine learning technique known as the fixedshare experts framework. We employ our approach as an alternative to TCP’s RTT estimator and show that it yields higher accuracy in predicting the RTT than the standard algorithm used in most TCP implementations. The proposed machine learning algorithm is able to adapt very quickly to changes in the RTT. Our simulation results show a considerable reduction in the number of retransmitted packets, while increasing goodput, particularly in more heavily congested scenarios. We corroborate our results by running ‘live’ experiments on a Linux implementation of our algorithm. These experiments confirm the higher accuracy of the machine learning approach with more than 40% improvement, not only over the standard TCP predictor but also when comparing to another well know solution, the Eifel retransmission timer [4]. Nevertheless, work is still needed in the case of this particular application in order to learn how to take better advantage of the improved estimations and change the way we set the RTO timer.
Moreover, the task of determining the appropriate loss function for RTT prediction in the case of setting retransmission timers is not trivial. Further work to understand the cost of making wrong decisions regarding the RTT prediction problem, under the context of TCP, is needed. Finally, we believe our work opens the possibility of applying online learning algorithms to predict other important network variables.
Endnotes
^{a} In this paper, we use the terms experts framework, experts FW, experts fixed sharing, fixedshare experts, or simply the experts algorithm, interchangeably, referring to the proposed machine learning algorithm.
^{b} These values were used in our simulations; however, on real implementations, they can vary. That was the case for the TCP implementation on the Linux distribution used in our experiments, and we comment on that in Section 6.
^{c} The solution proposed by Eifel for this problem (not the algorithm itself) made it to recent TCP kernel implementations and were used in our experiments reported in Section 6.
^{d} Within the Linux kernel, one can surround inline floating point code with the Linux macros kernel_fpu_begin and kernel_fpu_end, but the code must avoid function calls and must avoid using any routines of the libc library.
Declarations
Acknowledgements
Financial support was granted by the CAPES Foundation Ministry of Education of Brazil, Caixa Postal 250, Brasilia  DF 70040020 Brazil. This work was also partially supported by NSF grant CCF091694 and a US ArmyARO MURI grant.
Authors’ Affiliations
References
 Akamai Technologies, Inc . Last accessed, Dec. 1st 2013 http://www.akamai.com
 Herbster M, Warmuth MK: Tracking the best expert. Mach. Learn 1998, 32(2):151178. 10.1023/A:1007424614876View ArticleMATHGoogle Scholar
 Jacobson V: Congestion avoidance and control. SIGCOMM Comput. Commun. Rev 1995, 25: 157187. 10.1145/205447.205462View ArticleGoogle Scholar
 Ludwig R, Sklower K: The Eifel retransmission timer. SIGCOMM Comput. Commun. Rev 2000, 30(3):1727. 10.1145/382179.383014View ArticleGoogle Scholar
 Karn P, Partridge C: Improving roundtrip time estimates in reliable transport protocols. ACM Trans. Comput. Syst 2001, 9: 27.Google Scholar
 Allman M, Paxson V: On estimating endtoend network path properties. In SIGCOMM ‘99: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication. New York: ACM; 1999:263274.View ArticleGoogle Scholar
 Lou W, Huang C: Adaptive timerbased TCP control algorithm for wireless system. In Wireless Networks, Communications and Mobile Computing, IEEE International Conference on, Volume 2. Maui, HI, USA: ; 2005:935939.Google Scholar
 Ma L, Arce G, Barner K: TCP retransmission timeout algorithm using weighted medians. Signal Process. Lett. IEEE 2004, 11(6):569572. 10.1109/LSP.2004.827957View ArticleGoogle Scholar
 Leung K, Klein T, Mooney T, Haner T: Methods to improve TCP throughput in wireless networks with high delay variability [3G network example]. In Vehicular Technology Conference, 2004. VTC2004Fall. 2004 IEEE 60th, Volume 4. Los Angeles, CA, USA: ; 2004:30153019.View ArticleGoogle Scholar
 Helmbold DP, Long DDE, Sherrod B: A dynamic disk spindown technique for mobile computing. In MobiCom ‘96: Proceedings of the 2nd Annual International Conference on Mobile Computing and Networking. New York: ACM; 1996:130142.View ArticleGoogle Scholar
 Haeri M, Rad A: TCP retransmission timer adjustment mechanism using modelbased RTT predictor. In Control Conference, 5th IEEE Asian, Volume 1. Melbourne, Australia: ; 2004:686693.Google Scholar
 Ngwenya D, Hancke G: Estimation of SRTT using techniques from the practice of SPC and change detection algorithms. In AFRICON, 2004. IEEE 7th AFRICON Conference in Africa, Volume 1. Gaborone, Botswana: ; 2004:397402.View ArticleGoogle Scholar
 Atlasis AF, Loukas NH, Vasilakos AV: The use of learning algorithms in ATM networks call admission control problem: a methodology. Comput. Netw 2000, 34(3):341353. 10.1016/S13891286(00)000906View ArticleGoogle Scholar
 Vasilakos A, Saltouros M, Atlassis AF, Pedrycz W: Optimizing QoS routing in hierarchical ATM networks using computational intelligence techniques. Syst. Man Cybernet. Part C: Appl. Rev. IEEE Trans 2003, 33(3):297312. 10.1109/TSMCC.2003.817354View ArticleGoogle Scholar
 Anastasopoulos M, Petraki D, Kannan R, Vasilakos A: TCP throughput adaptation in WiMax networks using replicator dynamics. Syst. Man Cybernet. Part B: Cybernet. IEEE Trans 2010, 40(3):647655.View ArticleGoogle Scholar
 Mirza M, Sommers M, Barford P, Zhu X: A machine learning approach to TCP throughput prediction. SIGMETRICS Perform. Eval. Rev 2007, 35: 97108. 10.1145/1269899.1254894View ArticleGoogle Scholar
 Bousquet O, Warmuth MK: Tracking a small set of experts by mixing past posteriors. J. Mach. Learn. Res 2003, 3: 363396.MathSciNetMATHGoogle Scholar
 QualNet . 6100 Center Drive, Suite 1250, Los Angeles, CA 90045. Last accessed, Dec. 1st 2013 http://www.scalablenetworks.com
 Perkins C, Royer E: Adhoc OnDemand Distance Vector Routing. In Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications. New Orleans, LA, USA: ; 1997:90100.Google Scholar
 Group IW: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specification. IEEE Std. 802.11, 345 E. 47th St, New York, NY 10017. USA: IEEE Computer Society; 1997.Google Scholar
 Grossglauser M, Tse D: Mobility increases the capacity of ad hoc wireless networks. Netw. IEEE/ACM Trans 2002, 10(4):477486. 10.1109/TNET.2002.801403View ArticleGoogle Scholar
 IEEE Computer Society Standards Committee Working group of the Microprocessor Standards Subcommittee, American National Standards Institute: IEEE Standard for Binary Floatingpoint Arithmetic. ANSI/IEEE Std 754–1985., 345 E. 47th St, New York, NY, 10017, USA: IEEE Computer Society; 1985.Google Scholar
 The GNU C Library . (51 Franklin Street, Fifth Floor, Boston, MA 02111, USA 2009). Last accessed, Dec. 1st 2013 http://www.gnu.org/software/libc/manual
 Omondi AR: Computer Arithmetic Systems: Algorithms, Architecture, and Implementation. : Prentice Hall International (UK) Limited; 1994.MATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.