To assess and validate the proposed monitoring framework implementation, the testing process described below has been followed, based on the application of a top-down approach. In particular, it is based on the execution of specific experiments monitored by the monitoring platform, where an experiment is an emulation of a given network deployment, characterized by parameters like the bandwidth consumed.
As a result, the purpose of the performance evaluation process presented is to characterize the monitoring platform itself in terms of several performance parameters, which are related to the resource consumption of the components from the monitoring platform, and the latency and packet loss experienced by the experiments deployed, obtaining these two last parameters from a given throughput in the system. These performance parameters are well explained in Sect. 5.2.
So, the idea of this performance evaluation process is not related to present how the monitoring platform is capable of monitoring a given 5G service and show the results obtained for each monitored metric (which is already presented in Sect. 4, but it is a load testing process to check whether the designed and implemented monitoring platform is capable of handling a given amount of monitoring data (in terms of throughput and number of topics running in the platform, i.e., a certain number of metrics managed by the monitoring platform simultaneously), emulating real 5G use cases with a synthetic data rate and a specific number of topics running in the platform. This would help to eventually size the monitoring platform (in terms of number of servers, their hardware requirements, throughput to be handled, etc.) for a given deployment in 5G or Beyond 5G scenarios.
In this way, this performance evaluation process starts with single-broker experiments to characterize the platform in terms of several performance parameters and finishes with multi-broker experiments to check the impact of having the two brokering levels described in Sect. 3.
It has to be mentioned that, although the results are component-sensitive, because specific components with specific requirements and specific values of design parameters have been used, the procedure followed to do the test is not component-sensitive, but it is a general-purpose methodology that can be applied to other type of components.
System assumptions
The definition of the system under test (SUT) parameters is related to the 5G EVE multi-site platform’s operation, where a set of network deployments derived from the different use cases defined in the project may be running simultaneously at a specific time, sharing all the computing and network resources provided by both the 5G EVE platform and the site facilities.
As a first approach to the evaluation, the following assumptions were made:
-
The monitoring platform must be prepared to deal with extreme conditions, such as the simultaneous execution of a considerable amount of use cases. As the 5G EVE project initially proposes the validation of six specific use cases [8], the execution of a deployment from each use case at the same time can be taken as the worst case study to validate, resulting in six simultaneous deployments (i.e., experiments) to be handled by the monitoring platform.
-
Each experiment can define a different number of metrics and KPIs to be collected and monitored during the execution of the use case, depending on vertical’s needs. For this evaluation process, as these metrics can be extracted from different sources (e.g., UEs, VNFs, PNFs), and each source may have several related metrics or KPIs, and it can be assumed that each experiment will require the monitoring of an average of 20 parameters. Furthermore, as each monitored parameter has a topic assigned for the transport and delivery of their corresponding collected data, each experiment on average will create 20 topics in the monitoring platform. As a result, the maximum number of topicsFootnote 2 created in the platform would be \(20\times 6=120\) in this case.
-
The size and the publication rate of the messages containing the values of metric or KPI managed by the monitoring platform depend on the nature of the data transported. As a result, four different alternatives have been considered for the tests:
-
100 B and 1 KB messages for data traffic (i.e., numeric or string values), representing the 80% of all the monitoring traffic (40% for each case). The publication rate for both options is set to 1000 \(messages/s\).
-
100 KB and 1 MB messages for multimedia traffic (i.e., images or video frames), which would be the remaining 20% (10% for each case). The publication rate for both cases is less than the data traffic one, as the received throughput almost never reached that value due to the message size, with 10 \(messages/s\) for 100 KB messages and 1 \(message/s\) for 1 MB messages.
The percentages have been selected assuming that most of the data will be small-side messages, but also considering that there may be larger messages, mainly related to multimedia data. As a result of the figures selected for each kind of message, this results in a concurrent publication rate of approximately 102,4 Mbps per experiment.
-
Another important parameter related to the publishers is the message batch size, which controls the amount of messages to collect before sending messages to the Multi-Broker Cluster, and which was set to 1 after validating that higher values of this parameter cause worse results in terms of latency.
-
The selected values of publication rate for each message size are also coherent for the subsequent calculation of the disk size estimation for each broker node, which is computed as \(D = s \times r \times t \times f/b\), where s is the message size, r is the publication rate, t is the retention time (at least 2 weeks, as discussed in Sect. 3), and f and b are both the replication factor and the number of brokers in the system, typically \(f=b-1\), this leading to a value slightly below 100 TB, which is an estimation of the expected amount of data handled in the project.
Testbed setup
The testbed used for the evaluation of the architecture consists on a set of Ubuntu Server 16.04 virtual machines (VMs) [32] deployed in a server located in the 5G EVE Spanish site facility, 5TONIC, using Proxmox [33] as virtualization environment, and K3s (a lightweight Kubernetes distribution) [34] to orchestrate the containerized componentsFootnote 3 deployed in each VM. This server is equipped with 40 Intel(R) Xeon(R) CPU E5-2630 v4 at 2.20 GHz and 128 GB RAM. The distribution of components in each VM can be seen in Fig. 4.
The proposed scenario intends to simulate the monitoring and data collection process of the metrics and KPIs related to a set of network deployments. The components deployed in each VM are the following:
-
Kubernetes Worker node VMs: each Kubernetes worker emulates a site, including Data Shippers for publishing monitoring data in a Site Broker, based on Apache Kafka, that replicates the data toward the Data Collection Manager, placed in the Kubernetes Master node. Regarding the Data Shippers, this role is played by two components:
-
Sangrenel [36], which is a Kafka cluster load testing tool that allows to configure parameters such as the message/batch sizing and other settings, writing messages to a specific topic and obtaining, as output, the input message rate (used for calculating the input/output (I/O) message rate, i.e., the received throughput divided by the publication rate) or the batch write latency (i.e., time spent until receiving an ACK message from the broker), which are some of the performance parameters under study, being dumped every second.
-
A Python-based Timestamp generator [37] used exclusively in multi-broker experiments. It sends messages with timestamps embedded that are eventually received by a Latency calculator component, based on Node.jsFootnote 4 [39], which takes the timestamps and calculates the so-called broker latency, i.e., time spent between the publication of data and its reception in an entity subscribed to the Site Broker. In fact, this component can be associated with the KPI Validation Framework tools, as it calculates the latency (KPI) based on timestamps (metric).
-
Kubernetes Master node VM: in this server, the Data Collection Manager, Data Collection and Storage and Data Visualization components from Fig. 2 have been implemented with a solution based on Apache Kafka and the Elastic Stack. A ZooKeeper [40] instance is also running to coordinate the Kafka cluster, and there is also another instance of the Latency calculator deployed here to calculate the end-to-end latency KPI, this being the time spent between the publication of data in a given site and its reception in an entity subscribed to the Data Collection Manager (so that data have been previously replicated from the Site Broker).
For monitoring the resource consumption of each container (focusing on the CPU consumption), Docker [41] native tools (e.g., \(docker\) \(stats\)) have been used.
Single-broker experiments
For these experiments, only one Kafka broker is required, so the testbed depicted in Fig. 4 can be simplified by only using one Kubernetes Worker node with just a Sangrenel container directly connected to that Kafka broker represented with the dark blue line that connects both components in the testbed diagram.
Experiments with one topic
To start with the performance analysis of the monitoring platform, experiments with only one topic created were performed, checking that the system operates correctly and consistently for each message size and publication rate proposed in Sect. 5.1 without limit of resources, and also with the objective of defining the minimum set of computing resources (RAM and vCPU) for the most critical components of the architecture.
In this set of tests, some of the assumptions from the system characterization were confirmed, e.g., the poor results for multimedia traffic when its publication rate is 1000 \(messages/s\), where the I/O message rate falls from 1 (obtained when the reduced publication rate is used) to 1/4 in the best-case scenario, or that the optimal value for the message batch size parameter is 1 for all types of traffic, as increases in their order of magnitude cause exactly the same increase in the order of magnitude of latency. For example, for a 100 B message size, the batch write latency goes from 0.8 ms with a message batch size of 1–500 ms, where the message batch size is 1000.
Apart from that, it was also observed that the resource consumption in the components of the monitoring architecture is CPU intensive for the most critical components of the platform, which are Kafka, Logstash and Elasticsearch, leaving the RAM to work as buffer and cache before persisting data to disk. As a consequence, this fact facilitates the sizing of these components, as the RAM value can be fixed with a specific value (in this case, with 2 GB of RAM is enough for working properly during the testing process), whereas the CPU value is the only variable term.
In terms of CPU, for a single-topic experiment, Logstash is the most critical component, with a consumption that ranges from 100 to 200%, requiring 4 vCPU in order not to degrade the performance. However, the CPU consumption in Kafka and Elasticsearch stays below 100% for all types of traffic, so 1 vCPU for both of them should be enough to cover single-topic experiments. However, in multi-topic experiments, which will be studied next, Kafka becomes the most critical component with a noticeable increase in its CPU consumption, whereas Logstash and Elasticsearch approximately maintain the same consumption profile.
Experiments with multiple topics
In multi-topic experiments, the distribution of performance parameter values between topics of the same type (i.e., that handle the same type of data, message size and publication rate) in a given experiment is expected to be uniform in general conditions, where there are no more priority topics than others.
This assumption is confirmed in Fig. 5 for the batch write latency analysis in one experiment with multiple topics, according to the per-network deployment topic distribution described in Sect. 5.1. As a result, this confirmed assumption is used in subsequent tests for accumulating and averaging the values obtained from performance parameters in topics of the same type, as if they were a single topic, which allows to simplify the performance analysis. Moreover, in Fig. 5, it can be also observed that latency is higher in larger message traffic, also increasing the deviation of the results, as seen in the longer 95% confidence interval estimated for multimedia traffic, for example. This reflects that smaller messages result in better and more precise values of latency.
Continuing with the different tests carried out related to multi-topic experiments, they aim at evaluating two design parameters that causes variations in the monitoring platform’s workload: (1) the number of topics created and running in the system as concurrent processes, due to the execution of simultaneous deployments, and (2) the total throughput received by the monitoring system, calculated as the sum of all input message rates received from each topic.
However, a variation in any of these design parameters may cause different effects in the system in terms of CPU consumption or performance that must be characterized, also checking whether the superposition property can be applied when both parameters are modified simultaneously. For doing this, the study was divided into two parts:
-
1.
A first analysis where one of the design parameters is modified, while the other one stays fixed.
-
2.
A final test including the modification of both parameters at the same time, checking whether the superposition of individual effects is present.
Part (1) is presented in Fig. 6, where the CPU consumption and the batch write latency related to 100 B aggregated data trafficFootnote 5 are evaluated for different examples of experiments:
-
On the left side, the number of experiment is fixed in 1, whereas the total throughput is modified, using the theoretical input message rate as upper limit (i.e., 102,4 Mbps) and dividing it by values between 1 and 6.
-
On the right side, the number of experiments is variable, ranging from 1 to 6, but the total throughput for all deployments is conserved, which is achieved by dividing the message rate aforementioned by the number of experiments deployed.
In both cases, it is observed that the batch write latency does not vary when modifying one of the design parameters, and it is also true for the I/O message rate, which tends to 1. However, in the first case, when the total throughput becomes higher, the Kafka CPU consumption increases with a trend that seems exponential, but in the second case, the CPU consumption also remains constant in average.
As a result, while the total throughput has an effect in the Kafka CPU consumption with an exponential tendency, the number of network deployments (i.e., the number of topics in the system) does not seem to influence the system performance, as long as the total throughput is conserved when there is an increase in the number of topics, taking care of specifying correctly the publication rate in order not to exceed the system limits. However, this is true while the system is not saturated. When this happens, the effect is similar to the one shown in Fig. 7, related to the part (2) of the study aforementioned.
Here, when the number of network deployments increases, the total throughput is also higher, and in fact, it can be noticed that message loss is present from two experiments deployed, as the I/O message rate is nearly 0,8 (so 20% of the messages are lost), and falling until less than 0,4 in the case of four experiments deployed simultaneously, value that remains constant even if more experiments are deployed (these experiments have not been included in Fig. 7 just to present the saturation process with more detail).
The evolution of the CPU consumption in Kafka is also stopped due to this saturation state, as well as the latency starts to present variations as it is calculated based on the messages that are eventually received.
In fact, these results are quite aligned with the outcomes from [42], where it was reported that Kafka throughput depends linearly on the number of topics, reaching a hard limit at some specific point. According to this study, when there is only one Kafka replica, the limit is reached for around 15.000–20.000 \(messages/s\), value which is close to these results, as one experiment in our testbed means around 16.000 \(messages/s\) and a second deployment causes a loss of performance, since that limit, which should be between 16.000 and 32.000 messages per second, is exceeded.
This issue related to effects caused by resources’ saturation must be also taken into account in order to introduce these CPU-bound components in Edge environments, where the number of physical and virtual resources allocated to execute these workloads are quite limited. In this way, apart from having a theoretical limit imposed by the technology itself, the amount of resources can also have an impact on performance in case of sizing the platform wrongly, provoking a loss of performance even before reaching the hard limit.
To reflect the impact on performance caused by the limitation on computing resources (i.e., vCPU allocation in the Kafka container), Fig. 8 presents the evaluation of both the batch write latency (top subplots) and the I/O message rate (bottom subplots), for 100 B data traffic, in two situations:
-
First of all, assuming that a full experiment is being executed in the platform (i.e., a total throughput of 102,4 Mbps is received by Kafka), the vCPUs assigned to the Kafka container was modified from 1 to 6 (the two graphs on the left in Fig. 8); checking that, from 5 vCPU, the values obtained for the performance parameters become reasonably good and stable.
-
However, on a scenario where the Site Broker is placed in the Edge, a high-resource allocation cannot be guaranteed. For this reason, a new set of tests in which the vCPU allocation was fixed to 1 vCPU, then varying the throughput received by Kafka, was carried out (the two graphs on the right in Fig. 8). The values used for the throughput vary between the 100% and the 10% of the throughput related to an experiment (i.e., 102,4 Mbps). The results reflect that, although the latency does not improve when a lower throughput is received, this is not the case for the I/O message rate, which improves every time that throughput is reduced until reaching a value of 1 when the throughput is reduced to the 10%.
Consequently, to move to an Edge environment, it is crucial to limit the resource allocation, but also the throughput received by the monitoring platform, in order to avoid packet loss. This issue should not be a problem in Edge environments, assuming that most use cases deployed in this kind of scenarios will prioritize the ability to support a large number of connections rather than guaranteeing a certain value of latency or bandwidth; as happens in IoT, for example. Therefore, the higher values of latency, compared to the ideal scenario in which there are no problems related to resource consumption (70 ms vs. 10 ms, approximately), should not be a problem, while the throughput is kept at a reasonable value. In this case, this limit can be set to 10 Mbps.
System scalability validation
To avoid the saturation effect presented in Figs. 7 and 8, the direct solution is to build mechanisms and processes that allow system scalability, mainly oriented to the application of horizontal and/or vertical scaling processes depending on the current status of the platform.
For this evaluation process, a preliminary vertical scaling system for the central component of this monitoring platform is proposed (i.e., no new instances are added, but the computing resources attached to the available instance are increased or decreased depending on the workload), based on the results obtained in the previous tests as training data, used to refine the different cases that can occur in terms of resource consumption (mainly related to CPU) and performance evaluation (mainly based on the batch write latency and the I/O message rate), and the conditions related to each case that trigger the system scale process.
Figure 9 presents an example of vertical scaling for one experiment deployed in the platform. In this case, the Kafka container is scaled by increasing its vCPU assignment until the system is able to handle the workload received without saturating, decision that depends on different parameters, such as, e.g., the current CPU consumption, the delay to compute a KPI or some other performance variable.
Note that, in this case, for illustrative purposes, an upscale is only triggered when a CPU is fully occupied for relatively long periods of times, this resulting in a relatively high convergence time (around one minute) of the I/O message rate, but more “agile” schemes could be easily implemented if needed.
Multi-broker experiments
Finally, the scalability of the full distributed, multi-site platform, as built in the testbed already presented in Fig. 4, will be evaluated in terms of the performance parameters already presented in Sect. 5.2 and the CPU consumption of the Data Collection Manager’s Kafka broker, whose computing resources will not be limited. On the other hand, the Site Brokers will be limited to 1 vCPU, taking the value already tested in the tests presented in Fig. 8.
In this case, the meaning of experiment will be a bit different. This way, each experiment instantiated in multi-broker experiments will be executed in a particular Kubernetes Worker node (so, for six experiments, six Kubernetes Worker nodes will be required), sending monitoring data to the corresponding Site Broker at 10% of the total throughput calculated in Sect. 5.1 (i.e., 10,24 Mbps), which is the throughput hard limit to avoid saturation, as stated in Fig. 8.
Impact on latency
The first performance parameter to be evaluated is the latency, in the different acceptations that are defined in Sect. 5.2: the batch write latency, the broker latency and the end-to-end latency. The values obtained during the execution of experiments, from one to six, for 100 B data traffic, can be seen in Fig. 10. Here, a similar effect than the one obtained in Fig. 5 can be observed: the results obtained in each site are similar for each case, so that performance data can be also aggregated in future analysis.
Moreover, the same tendency in latency values than observed in Fig. 8 can be seen also here: the latency does not vary even though the total throughput received by the monitoring platform increases due to the creation of new experiments.
Furthermore, the resultsFootnote 6 obtained for each type of latency are consistent with the definition of each of them: it is expected that the batch write latency (the darker color for each case) would give the lowest value (approx. 70–80 ms), as it only implies the reception of the ACK from the Site Broker. The next one would be the broker latency (the color of “intermediate” darkness in the graph), in which the Site Broker has also to deliver the data to a subscriber, but it can be checked that this does not cause a great impact on latency, as it is increased to nearly 120 ms in the worst case. And finally, the highest value on latency (approx. 2.5–2.6 s) is obtained for the end-to-end latency (the lighter color in the graph), due to the replication operation performed between each Site Broker and the Data Collection Manager and also the delivery to the corresponding subscriber. This value can be assumed in Edge environments for the reasons aforementioned.
Impact on CPU consumption and packet loss ratio
Finally, the impact on the I/O message rate in the multi-broker experiments is the same than experienced in single-broker experiments with CPU limitation (reflected in Fig. 8), where the packet loss increases with the increase in the total throughput received in the platform. This effect can be seen in Fig. 11, where the performance results from different brokers have been aggregated due to the results obtained in Sect. 5.4.1.
It can be observed that I/O message rate falls to nearly 0.65 when the six experiments are being executed concurrently, meaning a total throughput received of around 60 Mbps. This result, compared to the case observed in Fig. 8 with a single broker, with 1 vCPU, consuming 65,54 Mbps (the I/O message rate was less than 0.3), implies that the distribution of the total throughput between several Site Brokers improves the results.
Moreover, the CPU consumption in Data Collection Manager’s Kafka broker also increases with each experiment, but in a less rate, reaching the 110% of vCPU consumption for six experiments. Consequently, although the core of the Monitoring platform is intended to be executed in environments without limit of computing resources, this final result may allow the deployment of some components of this core (e.g., the Data Collection Manager) on the Edge; as long as the total throughput, again, does not exceed a specific limit that causes saturation (60 Mbps in this case).