Open Access

A method of assessment of LTE coverage holes

EURASIP Journal on Wireless Communications and Networking20162016:236

https://doi.org/10.1186/s13638-016-0733-y

Received: 2 June 2016

Accepted: 17 September 2016

Published: 3 October 2016

Abstract

Analyzing coverage holes in mobile networks is still an important problem that needs to be addressed, mainly in Long-Term Evolution (LTE) networks, which have been recently deployed. However, each type of coverage hole has to be handled depending on the effect they have on the users. In particular, they are characterized by causing abnormal disconnections or inter-Radio Access Technology (RAT) handovers when there is an underlying RAT available to maintain the connection released by the LTE network at the cost of reducing the service performance. Therefore, in this paper, an approach to detect cells with coverage holes and diagnose their type and severity is proposed. Furthermore, this paper proposes a method capable of analyzing the impact each coverage hole has on the users both in LTE and in the underlying RAT at the same time. To that end, it performs an inter-technology follow-up of those users that leave LTE technology to continue their services in the underlying RAT and then quantifies the effect of the coverage hole by means of a new inter-technology indicator estimated from the mobile traces of both RATs. The proposed system has been validated using data from a live LTE network and its co-located 3G network, showing its effectiveness in detecting coverage holes and diagnosing their type.

Keywords

Self-organizing networksTroubleshootingSelf-healingRoot cause analysisDiagnosisSelf-optimization

1 Introduction

In the context of mobile networks, it is important to ensure that end-user services are properly provided, i.e., they are neither interrupted nor ended abnormally. With that aim, maintenance and troubleshooting tasks should be properly performed in the shortest possible time. However, the relentless growth in the intricacy of the mobile networks lead to an increase in the cost and complexity of the maintenance tasks. In order to face this, the Third-Generation Partnership Project (3GPP) has standardized the concept of Self-Organizing Networks (SON) [1]. SON aims to fully automate operational tasks in mobile networks, while reducing operational expenditures (OPEX) or capital expenditures (CAPEX). Therefore, SON is being welcomed by both mobile operators and scientific communities which are focused on improving and automating the traditional processes that take place in mobile networks. As a result, several references can be found in the field of automation such as [28]. SON functions can be grouped into three categories: self-configuration, self-optimization and self-healing [1]. Within this paradigm, the troubleshooting process is totally automated by self-healing systems [9], [10] whose aim is to detect the problematic cells, diagnose their problem, and provide the recovery or compensation actions.

Detection and assessment of coverage holes are both related to fault detection and diagnosis within self-healing [10] and coverage optimization [11]. Traditionally, coverage holes have been detected through drive tests, which are characterized by being time consuming and expensive. For this reason, particular attention is being given to automatically detect coverage holes through the already standardized mobile traces [12] and Minimization of Drive Tests feature [1316], which allows operators to automatically store both the user measurements and the signaling messages. Some existing studies propose different methods to detect coverage holes. An approach to improve the accuracy of the coverage hole prediction based on a spatial Bayesian framework is presented in [17]. In addition, another technique to detect real coverage holes by means of radio environment maps is provided in [18]. However, according to [19], an LTE coverage hole has a different impact on the network depending on whether it is covered by an underlying radio access technology (uRAT) or not, so the way of compensating or optimizing them will be different. Since LTE networks are continually growing, not only the detection of coverage holes, but also the quantification of their impact would be helpful to determine how it should be improved.

The main difference of the proposed solution with the approaches available in the literature is that the presented diagnosis system does not only detect cells with coverage holes (as in the previous references) but also diagnoses their type and severity. This is achieved through the use of both traditional statistics indicators (obtained from the Operation, Administration and Maintenance (OAM) system) and the user information obtained from the mobile traces gathered both in LTE and in the co-existing uRAT. Note that it is not possible to discern the impact that a specific fault has on the whole mobile system if only the information of LTE networks is analyzed. Therefore, the contributions made in this paper to tackle this problem are the followings:
  • A method to combine data gathered from LTE with data from the co-existing uRAT has been proposed. In particular, the mobile traces are used to analyze the performance of the users when they leave LTE technology to continue their session in the co-existing uRAT. Through this method, an inter-technology track of the user is performed providing its inter-technology event flow.

  • Another key contribution is an inter-technology metric that estimates the active time that the users are on LTE, so it is calculated at user level based on the proposed inter-technology event flow and then aggregated at cell level to determine the overall impact.

  • The proposed metric has been used to design the detection and diagnosis phases of a self-healing system. The main benefit is its ability to identify coverage holes and classify them depending on their impact on LTE and the co-existing uRAT. As a result, experts can design their particular remedial action based on the specific impact, making them case specific.

2 Problem formulation

A coverage hole is a region where the received signal level of the serving cell and any other neighbor is below the levels required to maintain the service under a minimum level of quality and robust radio performance. In particular, coverage holes are caused by the attenuation caused by physical obstacles (such as new buildings, and hills), unsuitable antenna parameters, a hardware fault, or inadequate RF planning [13].

In the constant struggle to deal with coverage holes, operators have to use theoretical propagation models in the coverage planning of LTE networks, manually analyze statistics performance indicators and extensively collect user measurements through the traditional drive test. It is, therefore, a time-consuming task that has a great impact in CAPEX and OPEX. Consequently, 3GPP has standardized the automation of the detection and optimization of coverage hole as one of the most important use cases of SON. Furthermore, in order to automate the collection of the user measurements and monitor the quality and coverage of the radio interface, 3GPP has also standardized the automatic collection of the mobile traces throughout the trace functionality and the MDT. As a result, data reported from users may be used to automatically detect and analyzed coverage holes in LTE networks avoiding the costly drive tests. When automating the analysis of coverage holes, it is important to note that their symptoms and effect are different depending on whether or not the LTE network is capable of maintaining the service in LTE and, if not, whether the affected area is covered by any uRAT. As a result, when users enters in a coverage hole within the LTE technology, the users will suffer from call drop, radio link failure or they will be redirected to an underlying RAT if the coverage hole is covered by any legacy system (e.g., 2G/3G RAT) [13]. From the end-user’s point of view, any of these consequences has a negative impact on their performance, but the particular impact is different:
  • Call drop (Fig. 1): It happens when a connection is unexpected released before the service requested by the user can be completed. As a result, the call drop occurs when the service is in progress so the user’s packets are not scheduled either because the lack of available resources or because the connection quality in terms of SINR is below a threshold. This situation is the one that has the worst impact on the user because they entirely lose their connection resulting in customer dissatisfaction. Cells affected by LTE coverage holes without uRAT are characterized by a high call drop rate.
    Fig. 1

    LTE call drop and inter-RAT HO from LTE to 3G system

  • Radio link failure (RLF): The connection is momentarily lost due to the bad quality of the air interface during a specific time interval. Unlike call drop, during a RLF, the connection is not released despite the low level of SNIR but is saved by either the serving cell or any LTE neighbor throughout the reestablishment procedure [20]. As a consequence, the user experiences service and audio gaps since the RLF occurs until the connection is successfully re-established in any LTE cell. In this scenario, the LTE network is capable of autonomously recover the connection, maintaining the service in LTE.

  • Handover to legacy system (Fig. 1): In this scenario, the LTE connections suffering bad quality are transferred to a neighbor of the legacy system (e.g. 2G, 3G) through the inter-RAT handover (iRAT HO) procedure. In particular, an iRAT HO may be triggered by B2 event [20], that is, when the LTE serving cell becomes worse than threshold 1 (Th1_B2) and inter-RAT neighbor becomes better than threshold 2 (Th2_B2). In particular, the B2 event for an iRAT HO is formally expressed by the following conditions:

    Entering condition 1:
    $$ {M_{s}}+\text{Hyst}<\text{Th}_{1} $$
    (1)
    Entering condition 2:
    $$ {M_{n}}+O_{\text{fn}}-\text{Hyst}>\text{Th}_{2} $$
    (2)
    Leaving condition 1:
    $$ {M_{s}}-\text{Hyst}>\text{Th}_{1} $$
    (3)
    Leaving condition 2:
    $$ {M_{n}}+O_{\text{fn}}+\text{Hyst}<\text{Th}_{2} $$
    (4)

    where M s is the measurement result of the serving cell s, it can be either the Reference Signal Received Power (RSRP) or the Reference Signal Recieved Quality (RSRQ). M n is the measurement result of the inter-RAT neighbor cell (e.g., the Received Signal Code Power (RSCP) in case 3G neighbor). Hyst is the hysteresis parameter for B2 event. O fn is the frequency specific offset of the frequency of the inter-RAT neighboring cell. Th1 and Th2 correspond to the threshold parameter for this event for serving cell and target cell respectively.

    An example of an iRAT HO from LTE to 3G is presented in Fig.1. When a user fulfills the entering conditions 1 and 2 during a specific time interval, configured through the Time To Trigger (TTT) parameter, the iRAT HO procedure is launched by the eNodeB in order to transfer the connection to the target Radio Network Controller (RNC) and its nodeB (NB) in 3G. To that end, the Mobility Management Entity (MME) and the Serving Gateway (S-GW) along with the Serving Gateway Support Node (SGSN) are in charge of executing the iRAT HO. As a result of iRAT HOs, the requested services are maintained by the legacy systems avoiding unexpected user disconnection. However, this has a negative impact on the user experience, since the service performance is reduced, (e.g., reducing the speed or increasing the latency). Cell affected by coverage holes with uRAT will have high number or iRAT HOs.

Consequently, areas with LTE coverage holes present high number of customer complaints regarding frequent call drops, service gaps, or downgraded performance.

3 System model

3.1 Framework

The proposed method aims to automatically detect cells with bad performance due to coverage holes and diagnose them in order to determine the type of coverage hole and their severity. In particular, the framework proposed in this paper follows the scheme shown in Fig. 2.
  • Data collection: The cells are monitored by means of different metrics, such as configuration management (CM) parameters, performance management (PM) parameters, performance indicators (PI), and mobile traces. The first of these, CM, represents the current configuration of network elements (e.g., the maximum transmit power). PM counts the number of times a specific event o procedure has taken placed (e.g., the number of dropped calls). Regarding the PI, these metrics are calculated through the combination of several PM, obtaining statistical measurements at cell level (e.g., the call drop ratio). Finally, the mobile traces consist of the measurement and information reported by the UE along with the signaling messages interchanged between the network elements including the user equipment. In a network, the Operations Support System (OSS) is in charge of collecting all those metrics from the network elements (Fig.1).
    Fig. 2

    System model

  • Threshold estimation: To be able to identify whether or not an indicator of a cell is degraded, the normal performance of the cell should be characterized to determine the reference conditions. Then, for each metric a threshold is defined, so that the indicator is considered degraded if it is over that threshold. There are different methods to automatically design these thresholds from the historical dataset created from the metrics and indicators provided by the OSS of both LTE network and its uRAT (e.g., 3G) during a period of time. In particular, those historical datasets are composed of the specific values of each indicator for each cell, but without including any label or information about the status of the cell or the degree of deterioration. As a result, the thresholds need to be estimated through unsupervised methods since the data is unlabeled. In this paper, for simplicity, the percentile-based discretization (PBD) method [21] will be used hereafter. This unsupervised method is based on the assumption that in a mature network only a low percentage, X %, of the data has anomalous values. Then, for each indicator, the thresholds are fixed at the Xth percentile of the values in the dataset. Note that these thresholds are estimated from the real values gathered from the network (Fig.2), which allows operators to particularize the method for each network, for each cell and even for different period of time (week day/weekend, busy hour,...).

  • Detection and diagnosis system: The LTE cells are analyzed by the detection system to identify those cells with insufficient coverage. Then, the selected cells are deeply analyzed in order to classify the coverage hole and determine the degree of severity (Table 1).
    Table 1

    Detection and diagnosis rules

    CH type

    Detection

    Diagnosis

     

    BCR

    IRAT HO

    HOSR

    Ret

    ATOL

    LTE CH without uRAT

    >ThrBCR

    <ThrIRAT

    <ThrHOSR

    <ThrRet

    Severe LTE CH with uRAT

    >ThrBCR

    >ThrIRAT

    >ThrHOSR

    >ThrRet

    \(< \text {Thr}_{\text {ATOL}_{L}}\phantom {\dot {i}\!}\)

    Optimized LTE CH with uRAT

    >ThrBCR

    >ThrIRAT

    >ThrHOSR

    >ThrRet

    \(> \text {Thr}_{\text {ATOL}_{H}}\phantom {\dot {i}\!}\)

3.2 System indicators

In order to assess the performance of the cells and identify coverage holes, some metrics are required. In particular, the inputs of the proposed system are the set of metrics described hereafter.

3.2.1 Indicators based on cell-level information

  • E-RAB Retainability (Ret) [22]: it represents the ability of the network to provide a service without causing abnormal disconnections, that is, when there is an impact on the end-user. It is calculated as the percentage of normally terminated connections over the total connections. Note that E-RAB Retainability gives a first indication for areas with lack of LTE coverage that are not covered by any uRAT.

  • Number of bad coverage reports (BCR): when a user starts experiencing poor RF conditions in LTE, it sends an event-triggered measurement report (i.e., A2 event) to its serving cell indicating that the coverage (i.e. RSRP) is below the threshold (Th_A2). This PI counts the amount of RSRP reports that fulfill the A2 event, so the worse the RF condition, the higher the BCR.

  • Handover success rate (HOSR): it shows the percentage of handover successfully executed.

  • Inter-RAT HO rate (IRAT HO): it indicates the percentage of the normal disconnections in LTE that have been redirected to any underlying RAT. This indicator is extremely important to identify those LTE coverage holes with uRAT.

3.2.2 Indicator based on user-level information

In addition to the previous cell-based PIs, in this paper a new inter-technology PI based on mobile traces is presented which will be named Average active Time On LTE (ATOL). This indicator is focused on those connections that leave the LTE technology by means of inter-RAT handovers. The aim is to analyze the performance of those connections that the LTE network redirects to other RATs. Therefore, this methodology requires data from both the serving and the target mobile systems.

In Fig. 3, a block diagram with all the structures involved in the analysis of IRAT HOs from LTE to 3G is presented. First of all, the required information from the serving and the target mobile networks is collected and stored in their Operations Support System (OSS) systems, considering both mobile traces and cell-level data (such as CM and PM). Then, these data are processed to extract the event data reported in the mobile traces and, based on that, the proposed indicators are calculated. Finally, the obtained values are shown in a visualization tool. Note that this method requires the activation of mobile traces in the equipment of the involved RATs that are located in the same geographical area and during the same period of time. Furthermore, since this methodology monitors user connections across different RATs, the availability of the IMEI or IMSI associated to those connections is an essential requirement. For example, for an LTE network, traces from the MME must be collected in order to extract the IMEI or IMSI of the connections recorded in the mobile traces.
Fig. 3

Flow diagram illustrating the solution and all the structures that relates to the proposed system

The process of calculating the proposed ATOL indicator is characterized by being a bottom-up method (i.e., the user-level information is aggregated at cell level) and can be summarized in three phases.

  • Firstly, generation of the inter-technology event flow: those connections that the LTE network redirects to another RAT are tracked, generating their inter-technology event flow from the information reported in the mobile traces. Figure 4 shows a flow diagram of how the inter-technology event flow can be obtained. Essentially, this consists of constructing a chronological event flow that temporally organizes all the events that belong to the same UE connection identified through their IMEI, considering the information of both the serving and the target network. After that, all events associated to each flow are temporarily ordered, so that the beginning and the end of each connection can be determined. In particular, the start time (Tstart) is considered as the time in which the first event of a connection is received and, similarly, the end time (Tend) corresponds to the instant of the last event. It should be noted that only those event flows of the serving network whose termination reason indicates that they have performed a handover to an underlying RAT (3G or 2G) are considered. This guarantees that the analysis is focused on connections that have changed to other technology. The next step (Join event flows with the same IMEI in Fig. 4) consists of matching the event flow of each connection in the serving network with their corresponding event flow in the target network. Once the inter-technology event flow is obtained (as represented in Fig. 5), each part of the event flow can be determined. First, the connection of the user is set up and configured through the Radio Resource Control (RRC) protocol [20]. Second, during the LTE connection, the eNodeB releases the user’s connection redirecting it to the 3G network through the IRAT HO procedure, if the user reports good levels of 3G signal while the LTE measurements are degraded. Finally, the connection of the user is setup in the 3G network where the user’s data is sent/received until the connection is released. From this event flow, the start time (Tstart) can be defined as the time in which the first event of a connection is received and, similarly, the end time (Tend) corresponds to the instant of the last event.
    Fig. 4

    Method for generating the inter-technology event flow

    Fig. 5

    Event flow of an IRAT HO between LTE and 3G

  • Secondly, calculation of the user-level ATOL: it is defined as the percentage of time that a user is on LTE compared to the total duration of its connection (taking into account the duration both in LTE and in the underlying RAT (uRAT)). Formally, this indicator can be calculated by the following equation:
    $$ \text{ATOL}_{\text{user}}=\frac{\text{duration}_{\text{LTE}}}{\text{duration}_{\text{total}}} $$
    (5)

    where durationtotal is the total duration of the inter-technology event flow and durationLTE represents the duration of the connection in the LTE network, i.e., the time interval between the beginning of the connection in LTE (Tstart_LTE) and the time of IRAT (see Fig. 5). In particular, the time of IRAT (T IRAT) is estimated as the middle point between the beginning of the uRAT event flow (Tstart_3G) and the end of the LTE event flow (Tend_LTE), taking into account that the sub-event flows may be overlapped due to the previous signaling.

  • Lastly, calculation of the high-level ATOL: the individual ATOL indicators are aggregated in order to obtain the ATOL indicator at cell level by means of the following average:
    $$ \text{ATOL}_{\text{average}}=\frac{\sum_{i=1}^{\text{NumTrackedIRATs}}\text{duration}_{\text{LTE}_{i}}}{\sum_{i=1}^{\text{NumTrackedIRATs}}\text{duration}_{\text{total}_{i}}} $$
    (6)

    where NumTrackedIRATs represents the total number of IRATs that have been tracked in the analyzed cell, \(\text {duration}_{\text {LTE}_{i}}\phantom {\dot {i}\!}\) represents the duration of the connection i in the LTE network and \(\text {duration}_{\text {total}_{i}}\phantom {\dot {i}\!}\) represents the total duration of the connection i.

    The benefit of the proposed metric is that it is focused exclusively on those users that actually perform the IRAT HO to uRAT, so the conclusions obtained from the ATOL average metric provide specific information about those particular users that were affected by the coverage hole and so were redirected to the underlying RAT.

4 Detection and diagnosis systems

The proposed method consists of two steps: detecting cells with coverage problems and then diagnosing whether the cause of this bad performance is the existence of a coverage hole, determining its characteristics and its type, which is essential for a successful recovery. These phases are performed through the application of IF-THEN rules (Table 1) over the PIs defined in Section 2.

4.1 Detection of coverage hole

As stated before, within a coverage hole, the received LTE signals from the serving and the LTE neighbor cells are below the required threshold to properly maintain the service of the users. As a result, those lack of coverage areas are characterized by having lot of users with poor RF conditions. Namely, the signal levels received by those users are below the threshold of A2 event (Th_A2), Fig. 6. This means that the number of bad coverage report in cells with weak coverage is much higher than cells whose coverage is well optimized. Therefore, any cell whose bad coverage reports (i.e., BCR) is below the specified threshold is considered normal, while BCR values above the threshold may indicate that the cell has coverage problems (see Table 2). Only those cells detected as problematic are selected to be diagnosed by the next stage.
Fig. 6

Call drop due to an LTE coverage hole (TCH)

Table 2

Detection and diagnosis thresholds

Threshold

Value

ThrBCR

80

ThrIRAT

1.93 %

ThrHOSR

95.45 %

ThrRet

99.5 %

\( \text {Thr}_{\text {ATOL}_{L}}\phantom {\dot {i}\!}\)

20 %

\( \text {Thr}_{\text {ATOL}_{H}}\phantom {\dot {i}\!}\)

80 %

4.2 Diagnosis of coverage holes

According to the impact on the performance of the cell, three types of LTE coverage hole can be identified through the rules presented in Table 2.

(1) LTE coverage hole without any uRAT coverage (total coverage hole, TCH): it is the most severe kind of coverage hole, where there is no other mobile network in a particular area to continue providing the service. Figure 6 shows an example of this scenario where a user leaves LTE coverage since the received RSRP is below the lowest threshold to successfully maintain the LTE service (Th_LTEDrop). In this scenario, although condition 1 is fulfilled (Th1_B2 is Th1-Hyst), the iRAT HO is not triggered because there is no other RAT that can successfully maintain the service, that is, the measured RSCP is not above the threshold (Th2_B2 is Th2-O ft+Hyst) to fulfill condition 2. As a result, the user connection drops. This type of LTE coverage hole not only causes a high number of dropped connections but also HO failures. This means that, retainability and HOSR will be damaged (i.e., they are below the threshold) whereas the percentage of IRAT HO stays below its threshold, indicating that this coverage hole does not cause an increase in the number of IRAT HOs because there is no other RAT in that place to continue the service. Thus, it might be concluded that this lack of coverage is caused by the high propagation loss existing in that particular area. This attenuation caused by physical obstacles produces a deep gap in the whole mobile network. As a result, the operator should design a compensation action to this kind of coverage hole following its strategy, while the problem is solved by the deployment of a new cell.

(2) LTE coverage hole with uRAT coverage (severe coverage hole, SCH): typical phenomenon of this situation is that the LTE network redirects the connections to the uRAT, since it is not capable of maintaining the service due to poor channel quality. An example of this situation is presented in Fig. 7. In this case, the user received the LTE signals with poor conditions (below Th_A2 and Th_B2) so it searches for any neighboring cell to perform a handover. Since it is located in an LTE coverage hole, there is not any LTE neighbor to continue the service, but instead a 3G cell is received with enough signal level during the required time interval (denoted by TTT) so the iRAT HO triggered before the connection drops. Therefore, instead of having lots of dropped connections, cells suffering this kind of problems have a high percentage of IRAT HOs (Table 2). Namely, this kind of coverage holes will not be identified if only the retainability is analyzed, since the most impacted indicator is the IRAT HO. Due to a severe LTE coverage hole, the LTE signal level is significantly reduced in a very short time, causing that the great majority of the affected users quickly leave the LTE network. Thus, in this situation, the IRAT HO is above the threshold and the ATOL average is very small (lower than the minimum accepted given by \(\text {Thr}_{\text {ATOL}_{L}}\phantom {\dot {i}\!}\) threshold, Table 2) meaning that the users leave LTE very soon and so they are served by 3G longer that by LTE. This kind of coverage holes may be solved by the deployment of a small cell in order to provide LTE service and also it requires compensation actions while the new cell is deployed in order to improve the user experience in this area.
Fig. 7

iRAT HO due to a severe LTE coverage hole with 3G coverage (SCH)

(3) Optimized LTE coverage hole with inter-RAT coverage (optimized coverage hole, OCH): in this situation the connections suffering bad RF conditions due to the LTE coverage hole are properly redirected to the underlying RAT as in the previous case. The main difference between this and the previous scenario is that the percentage of time the user is served by the LTE cell. In particular, when a cell is optimized the effects of the coverage hole are reduced so the user is properly maintained in LTE longer than in the case of SCH. Thanks to that, the negative effect of the coverage hole on the user experiences are reduced since the user can enjoy the LTE benefits during almost all its connection (Fig. 8). It is important to note that if the coverage hole were not correctly optimized, the iRAT would be delayed so long that the connection of the user would be dropped. As a result, the cell would present high concentration of call drops reducing its E-RAB retainability and low percentage of iRAT HO, so it would be considered as TCH. Thus, in OCH cells, both the HOSR and E-RAB retainability present a good performance, i.e. they are above the threshold (Table 2). The high value of BCR and iRAT HO (i.e., those are above the thresholds) indicate that this kind of cell has areas with poor LTE signal levels covered by uRAT, since the number of users that leave LTE technology is greater than expected. By means of ATOL average, it is possible to identify the impact that the iRAT HO has on the user experience. In particular, in this scenario unlike the SCH, the IRAT HO is performed at the end of the user’s connection (Fig. 8), so they are served by LTE most of the time (i.e., ATOL average is above the \(\text {Thr}_{\text {ATOL}_{H}}\phantom {\dot {i}\!}\) threshold, see Table 2). This means that the LTE coverage hole is fully optimized so in this situation, the impact that this problematic area has on the user experience is minimal. This is because an optimized LTE coverage hole does not give rise to call drop on the one hand, and the user is served by the LTE cell most of the time enjoying a high-quality of service during almost all its connection. This indicates that these kind of poor LTE coverage areas are properly managed and, so it is not strictly necessary to perform compensating actions while the LTE coverage hole is resolved. It is important to highlight that standard KPIs obtained from PM and counters are statistics at cell level so they are not adequate to evaluate the specific performance of the affected users before and after the iRAT HO. Furthermore, these indicators are specific for each domain (either LTE or uRAT) so they do not provide combined information about the user performance in LTE and the uRAT simultaneously. For this reason, it is necessary to follow the user from LTE to 3G and then solely evaluate the performance of each affected users by means of mobile traces. Note that ATOL average is a single metric that provides information about both technologies exclusively focused on the affected users, that is, on the users that move from LTE to the uRAT and so their performance is downgraded.
Fig. 8

iRAT HO due to an optimized LTE coverage hole with 3G coverage (OCH)

5 Results and discussion

5.1 Characteristics of the live network

The proposed method has been validated in a live LTE network. The considered urban area is a section of the whole network, consisting of 120 LTE cells that are located in the same area as all the 3G cells of a particular Radio Network Controller (RNC); thus, LTE and 3G coverage areas are overlapped. The trial was carried out during the busy hours to analyze the coverage of the network. In particular, the cell-level indicators were stored at hourly bases while the user-level indicators were stored every 15 min during the busy hours and subsequently aggregated at hour level. Given that this method aims to identify coverage problems which are permanent over time, cells can be monitoring at any time period. However, the ATOL average metric is estimated based on the group of users that actually performed an IRAT HO, which may be small. Thus, in order to ensure a good diagnosis, the tracing period should be long enough. This can be also achieved tracing the users during the busy hours, which provide more sample in a shorter time.

For these indicators, the thresholds have been estimated through the PBD method since the PI values have been collected without any information about the presence of faults. That is, the training data include both faulty and faultless data. For the PBD, the 20th percentile is used for those PIs that decrease when there is a fault and the 80th percentile is used for those PIs whose value increases due to a fault. The thresholds obtained with PBD are presented in Table 2. Furthermore, in this study, ATOL is considered to be low when it is below 20 % and high when it is above 80 %.

5.2 Performance evaluation

To demonstrate how this approach can be applied practically in the field, seven different cells have been selected. These cells were manually diagnosed by troubleshooting experts, identifying that they had bad performance due to the existence of coverage holes. Furthermore, they are characterized by having all the required cell-level and user-level information along with the user-level information from the 3G network that is under those problematic cells. After analyzing those cells with the proposed methodology, it was found that four of them had a total coverage hole where there was no other radio coverage, two had a severe LTE coverage hole with 3G coverage, while the last one had a coverage issue that is already optimized.

Figures 9 and 10 show the average of each PIs for the analyzed cells along with the normal cells (N) of the network, i.e., those cells whose PIs were not degraded. Those figures also include the designed threshold for each PI. Note that the ATOL average is not calculated for those cells with TCH, since this indicator is not relevant to diagnose TCH (see Table 2). As it can be observed, all the cells selected by the detection phase have a high number of bad coverage reports (Fig. 9) compared to the normal cells.
Fig. 9

Average of BCR and E-RAB retainability for the analyzed cells

Fig. 10

Average of HOSR, IRAT HO, and ATOL average

Cells with TCH have a degraded E-RAB retainability and HOSR, while the number of IRAT HO is low (Fig. 10), i.e., its IRAT HO is lower than the threshold and very similar to IRAT HO for normal cell. This means that the existence of a coverage hole in a TCH cell does not significantly increase the number of IRAT HOs compared to the percentage of IRAT HO that typically takes place in normal cells. This kind of coverage holes cause a great impact since the users cannot change to a better cell, so they stay in their serving cell until their connections abnormally end. Thus, the majority of the serving RSRP measurements reported by the users of this kind of cells are very degraded. In Fig. 11, it can be observed that more than 90 % of the reported serving RSRP in case of TCH are below -100 dBm.
Fig. 11

Cumulative distribution function (CDF) of the serving RSRP measurements reported by the user of each type of cell

However, in those cells whose coverage hole is covered by a 3G network, the number of drops (Fig. 9) and failures HO (Fig. 10) remains low, whereas the number of IRAT HOs considerably increases. Furthermore, the high values of BCR (i.e., greater than its defined threshold) determine that those cell presents coverage problems. Note that the BCR metric is a counter so the higher the BCR value, the greater the number of users under poor RF conditions; but it does not determine the degree of severity. Thus, these two situations (OCH and SCH) require the analysis of the behavior of those IRAT HOs throughout the proposed ATOL average metric, in order to identify what happens to the majority of those users that leave LTE. In particular, cells with SCH present a low ATOL average, meaning that the majority of their users are leaving LTE just after establishing the connection in LTE. Conversely, the cell with OCH has a high ATOL average, so this cell is capable of maintaining the LTE service most of the time. Note that normal cells do not present a prevailing behavior meaning that there is not any dominant problem. In SCH and OCH, the percentage of serving RSRP below −100 dBm is also greater than in the normal situation (see Fig. 11), but not as high as in the TCH because the users change their connection to a 3G cell before being in those poor radio conditions. Note that the Serving RSRP CDF curves provide useful information to get a general idea about the values of RSRP reported by all the users in those cells, but they are not focused exclusively on those users that actually perform the IRAT HO to 3G as the proposed ATOL average metric. Thus, it is not possible to determine the performance of those users that leave LTE which is the main key to distinguish between SCH and OCH and identify the user experience. Furthermore, the separation between the Serving RSRP CDF curves of SCH and OCH is not significant enough to set a good threshold and be a generic way to properly differentiate those scenarios (SCH and OCH).

6 Conclusions

The detection and diagnosis phases of a self-healing system have been presented to identify different kinds of coverage holes depending on their impact. To that end, a new inter-technology PI based on mobile traces has been introduced. This PI allows to analyze the behavior of the users that leave LTE by means of IRAT HOs. The proposed system has been evaluated in a live LTE network with a set of real data collected from this network and its co-located 3G network, showing its effectiveness to detect and diagnose coverage holes.

Declarations

Acknowledgements

This work has been partially funded by Optimi-Ericsson, Junta de Andalucía (Agencia IDEA, Consejería de Ciencia, Innovación y Empresa, ref.59288; and Proyecto de Investigación de Excelencia P12-TIC-2905) and ERDF.

Competing interests

The authors declare that they have no competing interests.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
Universidad de Málaga
(2)
Ericsson

References

  1. 3rd Generation Partnership Project, Universal Mobile Telecommunications System (UMTS); LTE; Telecommunication management; Self-Organizing Networks (SON); Concepts and requirements. (3GPP TS 32.500 v. 12.1.0 Release 12) (2015-01).Google Scholar
  2. MA Khan, H Tembine, AV Vasilakos, Game dynamics and cost of learning in heterogeneous 4G networks. IEEE J. Sel. Areas Commun.30:, 198–213 (2012).View ArticleGoogle Scholar
  3. PBF Duarte, ZM Fadlullah, AV Vasilakos, N Kato, On the partially overlapped channel assignment on wireless mesh network backbone: A game theoretic approach. IEEE J. Sel. Areas Commun.30:, 119–127 (2012).View ArticleGoogle Scholar
  4. A Attar, H Tang, AV Vasilakos, FR Yu, VCM Leung, A survey of security challenges in cognitive radio networks: Solutions and future research directions. Proc. IEEE.100:, 3172–3186 (2012).View ArticleGoogle Scholar
  5. D López-Pérez, X Chu, AV Vasilakos, H Claussen, On distributed and coordinated resource allocation for interference mitigation in self-organizing LTE networks. IEEE/ACM Trans. Networking. 21:, 1145–1158 (2013).View ArticleGoogle Scholar
  6. M Youssef, M Ibrahim, M Abdelatif, L Chen, AV Vasilakos, Routing metrics of cognitive radio networks: a survey. IEEE Commun. Surv. Tutor. 16:, 92–109 (2014).View ArticleGoogle Scholar
  7. D López-Pérez, X Chu, AV Vasilakos, H Claussen, Power minimization based resource allocation for interference mitigation in OFDMA femtocell networks. IEEE J. Sel. Areas Commun.32:, 333–344 (2014).View ArticleGoogle Scholar
  8. K Lin, W Wang, X Wang, W Ji, J Wan, QoE-driven spectrum assignment for 5G wireless networks using SDR. IEEE Wirel. Commun.22:, 48–55 (2015).View ArticleGoogle Scholar
  9. 3rd Generation Partnership Project, Technical Specification Group Services and System Aspects; Telecommunication management; Self-Organizing Networks (SON); Self-healing concepts and requirements. (3GPP TS 32.541 V12.0.0 Release 12) (2014-10).Google Scholar
  10. R Barco, P Lazaro, P Munoz, in IEEE Communications Magazine, 50, no. 12. A unified framework for self-healing in wireless networks, (2012), pp. 134–142, doi:10.1109/MCOM.2012.6384463.
  11. A Zoha, A Saeed, A Imran, MA Imran, A Abu-Dayya, A learning-based approach for autonomous outage detection and coverage optimization. Trans. Emerging Tel. Tech.27:, 439–450 (2016). doi:10.1002/ett.2971.View ArticleGoogle Scholar
  12. 3rd Generation Partnership Project, Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; Telecommunication management; subscriber and equipment trace; trace concepts and requirements. (3GPP TS 32.421 version 13.0.0 Release 13) (2016-03).Google Scholar
  13. 3rd Generation Partnership Project, Universal Mobile Telecommunications System (UMTS); Universal Terrestrial Radio Access (UTRA) and Evolved Universal Radio Access (E-UTRA); Radio measurement collection for Minimization of Drive Tests (MDT); Overall description; Stage 2. (3GPP TS 37.320 version 13.1.0 Release 13) (2016-04).Google Scholar
  14. 3rd Generation Partnership Project, Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA). Relay architectures for E-UTRA (LTE-Advanced) (Release 9) 3GPP TR 36.806 V9.0.0 (2010-03).Google Scholar
  15. J Johansson, WA Hapsari, S Kelley, G Bodog, Minimization of drive tests in 3GPP release 11. IEEE Commun. Mag.50(11), 36–43 (2012).View ArticleGoogle Scholar
  16. WA Hapsari, A Umesh, M Iwamura, M Tomala, B Gyula, B Sébire, Minimization of drive tests solution in 3GPP. IEEE Commun. Mag.50(6), 28–36 (2012).View ArticleGoogle Scholar
  17. B Sayrac, J Riihijärvi, P Mähönen, SB Jemaa, E Moulines, S Grimoud, in Proceedings of the 2012 ACM SIGCOMM workshop on Cellular networks: operations, challenges, and future design (CellNet ’12). Improving coverage estimation for cellular networks with spatial bayesian prediction based on measurements (ACMNew York, 2012), pp. 43–48, doi:http://dx.doi.org/10.1145/2342468.2342479.View ArticleGoogle Scholar
  18. A Galindo-Serrano, Sayrac, B̧, SB Jemaa, J Riihijärvi, P Mähönen. Automated coverage hole detection for cellular networks using radio environment maps (IEEE, 9th International Workshop on Wireless Network Measurements (WiOpt), 2013), pp. 35–40.Google Scholar
  19. 3rd Generation Partnership Project, Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Telecommunication management; Self-Organizing Networks (SON) Policy Network Resource Model (NRM) Integration Reference Point (IRP). Requirements (3GPP TS 32.521 version 11.1.0 Release 11) (2013-02).Google Scholar
  20. 3rd Generation Partnership Project, Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Radio Resource Control (RRC). Protocol specification (Release 13). (3GPP TS 36.331 V13.2.0) (2016-06).Google Scholar
  21. RM Khanafer, B Solana, J Triola, R Barco, L Moltsen, Z Altman, P Lázaro, Automated diagnosis for UMTS networks using Bayesian network approach. IEEE Trans. Veh. Technol.57(4), 2451–2461 (2008).View ArticleGoogle Scholar
  22. 3rd Generation Partnership Project, Universal Mobile Telecommunications System (UMTS); LTE; Telecommunication management; Key Performance Indicators (KPI) for Evolved Universal Terrestrial Radio Access Network (E-UTRAN). Requirements (3GPP TS 32.451 version 13.0.0 Release 13) (2016-02).Google Scholar

Copyright

© The Author(s) 2016