Logistic regression based in-service assessment of mobile web browsing service quality acceptability

Isak-Zatega, Sibila; Lipovac, Adriana; Lipovac, Vlatko

doi:10.1186/s13638-020-01708-2

Research
Open access
Published: 12 May 2020

Logistic regression based in-service assessment of mobile web browsing service quality acceptability

Sibila Isak-Zatega¹,
Adriana Lipovac² &
Vlatko Lipovac²

EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 96 (2020) Cite this article

1709 Accesses
6 Citations
Metrics details

Abstract

In this paper, we presented a logistic regression model that we applied for assessment of the users’ quality of experience with web browsing service over mobile network. With this regard, we chose the Average-Time-to-Connect-TCP network service quality parameter as an independent predictor, obtained by passive monitoring of live traffic data, captured by a passive probe on the mobile network Gn interface, and related to detailed records of the Transport Control Protocol. In parallel with in-service measuring the selected network parameter, we conducted simultaneous subjective tests of the quality of experience acceptability to users, specifically for web browsing service. Particularly, it was found that the model provided correct acceptability classification in 84.5% of cases, while reducing the chosen independent predictor for 100 ms implied increasing the chance of the service acceptability by factor of 1.65. Based on the obtained results, it comes out that the applied logistic regression model provides satisfactory estimation of the web browsing service quality experience acceptability.

1 Introduction

Web browsing is among dominant cellular network applications and is expected to grow by 39% annually over the coming 6 years [1]. Growing users’ demand for reliable data delivery comes along with their expectation for adequate Quality-of-Experience (QoE), too, making the latter the most important user decision criterion in selecting a specific service provider. Consequently, network operators are kin to ensure the best possible QoE level, for which the conditio sine qua non is their ability to reliably and accurately assess the achieved customer satisfaction with their services, such as web browsing.

The majority of existing QoE estimation models are based on the mean opinion score (MOS) testing, but MOS-based monitoring of web browsing QoE in particular requires fairly complex metrics [2,3,4].

Therefore, it has become crucial to relate the subjective QoE to measurable technical service quality parameter(s), which can be in-service monitored in the operator environment, so enabling objective and real-time QoE estimation [3].

Moreover, as network operators are mostly interested in testing users’ acceptability of provided web services [5,6,7], i.e., the “binary measure to locate the threshold of minimum acceptable quality that fulfills user quality expectations and needs for certain application or system” [7], consequently, in recent years, a number of QoE models based on acceptability have been proposed, especially for video signal delivery [8,9,10], as well as for interactive data services [11].

Specifically, the ITU-T Recommendation G.1030 provides some experimental results regarding the users’ perception in relation to web browsing response time, as well as some guidelines for QoE estimation [2]. The according experiments were conducted to evaluate the suitability of the developed network emulator system for the QoE estimation [3] and to validate the ITU-T Recommendation G.1030. The obtained results show logarithmic dependency between the QoE and the page load time for a simple web page.

Furthermore, in contrast to the studies which are mostly based on direct user feedback [2,3,4], the QoE is sometimes estimated from passive network tests [12, 13]. Specifically, the relationship between the QoE and the Quality-of-Service (QoS) for web browsing services was analyzed based on HyperText Transfer Protocol (HTTP)/Transmission Control Protocol (TCP) traces collected in the network, where cancelation rate of HTTP requests was used for QoE estimation, but without any validation of the achieved results by simultaneous real-life subjective QoE testing.

Moreover, though in some studies subjective user ratings are combined with network-level information, experimental findings coming out of the recorded TCP and HTTP traces and web browsing service QoE are reported only by graphical means and are not backed by any analytical model [14].

Therefore, in this paper, we address the aforementioned challenges by developing the QoE acceptability predictive model for web browsing service over mobile network, where the model is based on network parameters, in-service measurable by passive monitoring of live traffic data, and practically implementable by mobile operators.

As linear regression is not appropriate for modeling acceptability-based QoE, where the outcome variable–the acceptability is binary, logistic regression is the method of our choice. To our best knowledge, so far, no attempt has been made to assess acceptability of the mobile web browsing service by means of the logistic regression.

With this regard, in the previous work [15], the extent of the relationship between the in-service measured live traffic data parameters and the web browsing user QoE in the mobile network was analyzed by using the Spearman’s rank-order correlation. Taking into account both strength and direction of the relationship, it came out that the parameters Average-Time-to-Get-1^st-Data and Average-Time-to-Connect-TCP exhibited the strongest relationship with the web browsing QoE evaluated by means of the ordinary 5-point Likert scale (with ratings: excellent, good, fair, poor, bad).

Therefore, in this paper, we followed that indication by applying logistic regression on the selected parameter to assess the users’ acceptability of the quality level experienced with web browsing service in particular.

However, in contrast to other investigations using mostly ordinal logistic regression in compliance with the type of test data determining categorical both the independent variables and the dependent one and with numerical MOS rating converted to categorical data, here, we put an accent on today’s network operator main QoE imperative with web browsing in particular: to get an objective binary-type customer QoE rating–acceptability, which we model here by binary logistic regression applied to the Average-Time-to-Connect-TCP parameter that we found most relevant in this sense.

The rest of this paper is organized as it follows: In Section 2, we review the basics of the logistic regression to be used in the QoE acceptability prediction model. The test setup and tools that we used for conducting the experiment are also described in Section 2, while we present the test results and the analysis of the experimental data in Section 3. Conclusions are drawn in Section 4.

2 Methods

Before analyzing the acquired data by means of logistic regression, we review the concepts of the model and then apply it for the QoE prediction.

2.1 Logistic regression

Regression is mostly used as a means to predict a random variable from a number of mutually independent random variables and a constant.

Specifically, logistic regression is used for predicting the probability that a certain observation will be sorted into one out of two categories of a dichotomous dependent random variable, based on one or more independent random variables, which can be continuous or categorical. In many aspects, logistic regression is similar to linear regression, with the exception of the dependent variable type, which, in contrast to linear regression, does not provide estimated value of the dependent variable, but the probability that it will belong to a certain category, based on the values of the independent variables.

Among the three types of logistic regression, namely binary, ordinal, and nominal, the first one is used when the dependent variable is binary, i.e., takes one out of two categories. Moreover, if a dependent random variable can take more categories, then the ordinal logistic regression or the nominal one is to be used for ordered and unordered categories, respectively.

However, as it is already mentioned in Section 1, though ordinal logistic regression has been most frequently used (even after properly converting MOS scoring), as our focus here is on QoE acceptability, we consider here the binomial logistic regression, commonly referred to simply as logistic regression.

Essentially, it is a supervised machine-learning classification algorithm used to predict the conditional probability:

$$ \varPi \left({x}_i\right)=\Pr \left(Y=1/{X}_i={x}_i\right);i=1,2,\dots, N $$

(1)

that a certain individual observation belongs to one out of two categories, i.e., that the corresponding dichotomous dependent random variable Y takes one out of two possible values (1 or 0), conditioned by one or more (N) continuous or categorical mutually independent random variables X_i taking their corresponding values x_i [16,17,18].

Let us assume the simple linear form of the logit transform (from now on just logit) of Π(x_i), specifically for a single value x_i = x [16]:

$$ \mathrm{logit}\left[\varPi (x)\right]=\ln \left(\mathrm{odds}\right)=\ln \left(\frac{\varPi (x)}{1-\varPi (x)}\right)=\upalpha +\upbeta x $$

(2)

where the odds are defined as the ratio of the probability Π(x) that the event (outcome of interest) will occur for a particular value x of the random variable X, and the probability 1 − Π(x) that the event will not occur, while β is the slope coefficient, and the constant α is referred to as the intercept.

From (1) and (2), Π(x) can be expressed as:

$$ \varPi (x)=\Pr \left(Y=1|X=x\right)=\frac{e^{\alpha +\beta x}}{1+{e}^{\alpha +\beta x}} $$

(3)

where the iterative maximum likelihood (ML) method is used for estimating the according α and β values by testing the null hypothesis that these do not make the logistic regression accurate enough. In this case, small significance (represented by the p value) indicates strong evidence to reject the null hypothesis.

In order to use the binomial logistic regression in practice, the following main assumptions need to be fulfilled [18]:

1.
Logistic regression requires the observed dependent random variable Y to be dichotomous and a function of one or more mutually independent and non-collinear predicting random variables–predictors.
2.
The logit transform must be a linear function of continuous predicting random variables.
3.
Each test observation must be independent from others and all test categories should be mutually exclusive and exhaustive.
4.
Data must not exhibit significant outliers, high leverage points, or highly influential points; otherwise, the reliability of the estimates may degrade significantly.

2.2 Test setup

The test setup is presented in Fig. 1. As it can be seen, the experiment was carried out on a live network. The test configuration included the client, the gateway that was connected to the live High Speed Packet Access Evolved (HSPA+ Rev.8) mobile network (providing up to 42 Mb/s with 64QAM in downlink, and 11.5 Mb/s with 16QAM in uplink), which is connected to the internet. The gateway ran on Linux OS, while the NetEm [19] enhancement of the Linux traffic control (TC) facilities enabled introducing packet delay and packet loss in the experiment. We chose the test point to be at the Gn interface, where the actually used Oracle Performance Intelligence Center (PIC) [20] with passive probe captured the traffic data, Fig. 1.

Each test participant took part in experiments using the client operating on Windows 8 PC. The client device was connected to the gateway via 100 Mbps Ethernet full duplex link. We used the NetEm network emulator on Ubuntu OS of the gateway to vary the network conditions by adding delay and packet loss. The Huawei E3272 LTE USB modem was used for testing, while being managed by the embedded Connection Manager software, which allowed setting the preferred access network.

We enabled the HSPA+ to be the preferable access network in the experiment. The client system was connected to internet through the mobile network via the gateway. In both laptops, the automatic software updates were disabled. The participants in the experiment used Mozilla Firefox 35.0.1 web browser. The HTTP and TCP extended Detailed Records (xDR) from the data captured on Gn interface were made available by using ProTrace application on the Oracle PIC platform. This way, we defined and activated new statistical sessions which generated the in-service parameters’ values aggregated over 5-min intervals. The parameters were defined from the HTTP and TCP xDR’s for the Mobile Station International Subscriber Directory Number (MSISDN) [15] of the test SIM card.

The experiment was conducted by ten users, five female and five male, whose age ranged between 12 and 45 years. All participants used the internet at least 1 h a day and usually via the WiFi access, except when switching to the mobile internet access (only if WiFi was unavailable).

We investigated the relationship between the QoE and in-service parameters through the following test scenario:

Each participant tested web browsing six times under different network conditions, determined by the NetEm (adding delay or packet loss during the experiment).

Duration of a single test was limited to 5 min, while the participants accessed web pages of their choice and simply answered whether the technical quality of web browsing service was acceptable or not, with “yes” or “no”, respectively.

Following that, by running statistics sessions in Oracle PIC platform, processing the collected values of the relevant in-service parameter measured on Gn interface–Average-Time-to-Connect-TCP, which is the average time between SYN and ACK in the TCP three-way handshake sequence, needed to establish the TCP connections within a 5-min interval [1].

2.3 Test tools

We used the Oracle PIC as monitoring and data gathering system that helps service providers to manage their assets, encompassing network performance, QoS, and customer analysis [20]. The PIC uses passive probes to capture traffic data and forward probe data units (PDU) to the Integrated xDR Platform (IXP). The IXP stores these traffic data and correlates them into detailed records. The PIC provides applications that mine the detailed records to provide value-added services such as network performance analysis, call tracing, and reporting [21]. For the purpose of this research, we used the HTTP and TCP sessions on the Gn interface of the mobile network, defining parameters, and statistics sessions by using the ProTraq application [21].

Furthermore, we used the NetEm as enhancement of the Linux traffic control facilities that allows adding delay, loss, duplication, and other impairments as well, to the packets outgoing from the selected network interface. NetEm is built using the existing QoS and the differentiated services (DiffServ) facilities in the Linux kernel [19].

3 Discussion and results

We analyzed the fields of data records collected from HTTP and TCP sessions on the Gn interface, and selected the ones to define in-service parameters in Oracle PIC [15]. With this regard, some in-service parameters from HTTP and TCP xDRs, based on the data captured by extensive testing that we made on Gn interface, are presented in the Appendix, while in Fig. 2, the exemplar relevant TCP record time intervals can be seen.

Now, the task is to find out which out of the set of in-service measured parameters, is mostly influencing the QoE acceptability in particular, so to be selected as the logistic regression predicting variable.

With this respect, we consider the correlation to be the best indication, and therefore we calculated it for various parameters, as it is presented in Table 1.

Table 1 Spearman correlation coefficient between QoE and in-service tested parameters [15]

Logistic regression based in-service assessment of mobile web browsing service quality acceptability

Abstract

1 Introduction

2 Methods

2.1 Logistic regression

2.2 Test setup

2.3 Test tools

3 Discussion and results

3.1 Verifying the logistic regression assumptions

3.2 Test cases and estimated logistic regression parameters

3.3 Intercept-only model and its extension by prediction

3.4 Testing the logistic regression model goodness of fit

3.5 Category prediction

4 Conclusions

Availability of data and materials

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords