Deployment and management of SDR cloud computing resources: problem definition and fundamental limits

Software-defined radio (SDR) describes radio transceivers implemented in software that executes on general-purpose hardware. SDR combined with cloud computing technology will reshape the wireless access infrastructure, enabling computing resource sharing and centralized digital-signal processing (DSP). SDR clouds have different constraints than general-purpose grids or clouds: real-time response to user session requests and real-time execution of the corresponding DSP chains. This article addresses the SDR cloud computing resource management problem. We show that the maximum traffic load that a single resource allocator (RA) can handle is limited. It is a function of the RA complexity and the call setup delay and user blocking probability constraints. We derive the RA capacity analytically and provide numerical examples. The analysis demonstrates the fundamental tradeoffs between short call setup delays (few processors) and low blocking probability (many processors). The simulation results demonstrate the feasibility of a distributed resource management and the necessity of adapting the processor assignment to RAs according to the given traffic load distribution. These results provide new insights and guidelines for designing data centers and distributed resource management methods for SDR clouds.


Introduction
Wireless communications technology continuously improves and already facilitates the provisioning of a wide variety of advanced communications services at competitive prices. Whereas current systems provide data rates of a few mega-bits per second (Mbps), 4G systems will offer up to 100 Mbps per user. A few seconds may be necessary today before a connection is established between the user equipment and the network. Long term evolution (LTE) and LTE-Advanced (LTE-A) promise connection establishment times of less than 50 and 10 ms, respectively, [1,2].
Base stations are the wireless access points of cellular communications systems. They comprise antennas and analog and digital signal processing resources for implementing radio transmitters and receivers. The network operator deploys base station resources, that is, wireless transceivers, as a function of the expected peak load. The http://jwcn.eurasipjournals.com/content/2013/1/59 SDR describes wireless transceivers that implement a significant part of the physical layer signal processing (DSP) in software that executes on general-purpose hardware [5]. SDR applications or waveforms define the transceiver functionality of future radio equipment. This facilitates dynamic reconfigurations or radio transceivers, changing their transmission modes through changes in the software.
Cloud computing provides IT services to clients without reference to the infrastructure that hosts the services [6]. The cloud is a generic platform for different business types, from small-scale to very-large scale. The upfront cost is minimal as the infrastructure is provided by the cloud operator, who rents resources to cloud clients. The elasticity of clouds permits business grows without longterm planning and resource preallocations. A pay-per-use business model on top of a virtualized computing resource pool enables resource sharing and on-demand resource provisioning. Computing resources (hardware and software) can then be dynamically allocated and efficiently used, ensuring faster amortization (CAPEX) and better scalability as well as savings in power consumption, security, maintenance and software licensing, among others (OPEX).
The SDR cloud provides essentially the same benefits as a general purpose cloud. It inherits the resource-asa-service and pay-per-use business concepts: computing power (infrastructure as a service-IaaS), system software (platform as a service-PaaS), and applications (software as a service-SaaS) will be provided on demand and without knowledge of the physical location and types of CPUs, discs, software repositories, and so forth.
A single data center is shared between several radio operators and thousands of end users. (Some 100,000 user sessions may be active at the same time in a city of one million or more inhabitants.) Virtualization is employed for ensuring secure and fair resource sharing, where one radio operator-the SDR cloud client-is not aware of others using the same physical machines. Different business models or agreements are possible. A minimum set of resources may be guaranteed to each radio operator, for instance. The remaining or unused resources can then be shared-fairly or competitively-as a function of the market, environment, or policy, among others. This requires a flexible, though efficient computing resource management framework as a basis for the SDR cloud business. Such framework, in other words, plays an essential role for the deployment and operation of SDR clouds. It, particularly, needs to ensure real-time resource allocation and execution in dynamic environments with different resource and service constraints. This is the topic of this article.
This article elaborates a relation between the wireless communications system requirements or constraints and the SDR cloud computing resource management before deriving optimal solutions for the high-level resource provisioning. Each service request requires loading the corresponding transceiver waveform. Real-time resource provisioning and hard real-time execution needs to be ensured for seamless service provisioning. The SDR cloud resource allocator (RA) will therefore determine the mapping of waveforms to the available computing resources on demand and under stringent timing and resource constraints. We show that the maximum traffic load that a single RA can handle is limited. It is a function of the complexity of the resource allocation algorithm, the call setup delay, and the user rejection or blocking probability. The radio access technology specifies the maximum call setup delay, whereas the radio operator determines a blocking probability target. We introduce a general execution time model for characterizing the complexity of different resource allocation algorithms and derive expressions for the average call setup delay and maximum traffic load. The results show that SDR cloud data centers can be efficiently managed in a distributed way. They provide guidelines for designing data centers and distributed resource management methods for SDR clouds.
The rest of the article is organized as follows: After providing some background on computing resource management methods and algorithms (Section 2), we identify the problem (Section 3) and elaborate a RA complexity model (Section 4). In the central part of the article, we define and solve an optimization problem for assigning computing resources to an RA as a function of the environmental parameters (Section 5). We finally apply our solution for managing the resources associated to a single radio cell (Section 6) and multiple cells (Sections 7 and 8) under different wireless communications traffic characteristic. http://jwcn.eurasipjournals.com/content/2013/1/59

Background
Massively parallel computing architectures will dominate the high-performance computing landscape. A platform with a large number of parallel processors is more suitable for executing many applications than a single powerful processor [7]. The high and heterogeneous computing demands of SDR applications, in particular, are executed more efficiently on a multiprocessing execution environment [8,9]. Empirical studies have shown that scheduling hard real-time tasks on many-core processors is challenging [10,11]. Sophisticated resource allocation algorithms are consequently necessary for managing the real-time computing demands and the limited computing resources.
Distributed computing has a long research record. The multiprocessor mapping and scheduling problem, in particular, has been vastly investigated in the heterogeneous computing context [12][13][14]. Heterogeneous computing refers to a coordinated use of distributed and heterogeneous computing resources [15]. It is similar to grid computing [16] or metacomputing [17].
It is well known that the computing resource allocation problem is NP-complete, in general [18]. Heuristic approaches were therefore proposed, presenting a polynomial relation between the problem size and the computing complexity. Grid or cloud computing RAs dispatch computing jobs or independent task for their distributed execution. Grid computing workloads exhibit little intrajob parallelism, the average job completion time is several hours, and typical job inter-arrival times are in the order of seconds or minutes [19]. Many grid or cloud workloads are data-intensive [20].
Grids and clouds are accessed via the internet, which is relatively slow and has unpredictable delays. They were originally built for providing very high computing power for scientific or popular applications with no stringent real-time constraints. Rather than ensuring real-time allocation and execution, grid or cloud RAs therefore follow other objectives. Doulamis et al. [21], for example, discuss the fair sharing of CPU rates and allocate resources to users as a function of resource availabilities, user demands, and socio-economic values. Lui et al. [22] focus on the joint resource allocation of computing and network resources in federated computing and network systems. They present various resource allocation schemes that can provide performance and reliability guarantees for modern distributed computing applications. Entezari-Maleki and Movaghar [23] develop a probabilistic task scheduling method for minimizing the mean response time of grid jobs.
The SDR cloud concept has been recently introduced [4] and merges three fundamental technologies: centralized baseband processing, automatic computing resource allocation and virtualization. Related study addresses centralized baseband processing [24,25] and offline, that is, design-time resource allocation [26]. We focus on automatic computing resource allocation, enabling runtime resource management and seamless real-time execution. Each wireless communications service request needs to be served in real time, providing sufficient computing resources for the continuous real-time data processing. Two general approaches exist for scheduling real-time tasks on multiprocessor platforms. Tasks can be statically assigned prior to execution or migrate between processors during execution. The former can be achieved through partitioned scheduling, where an application is partitioned among the processing elements (mapping) before being locally scheduled. The latter approach is typically associated with global or dynamic scheduling. The contention for the global scheduling queue and non-negligible migration overheads among processing elements can result in significant scheduling overheads in practice [10]. The migration cost limits the number of cores that a global scheduler can manage [10,11]. Non-preemptive static partitioned scheduling, on the other hand, is pertinent to high performance many-core and multiprocessor platforms. It facilitates implementation and introduces low run-time resource overheads [27].
A constant execution period and practically deterministic and regular execution patterns characterize SDR applications. The real-time constraints of the DSP processing chains can then be given as minimum throughput and maximum latency constraints and static schedulers can be employed [8]. The mapping and scheduling can thus be calculated only once for each waveform as part of the session establishment process. The SDR cloud resource management performance is then limited by the RA's execution time per invocation (user session request) and the session arrival rate. The derivation of this limit is the objective of this article.

Problem formulation
Wireless subscribers access communications services anywhere, anytime, and under different circumstances. Measurements have shown that the average user establishes seven or eight voice sessions per day of 90 s in the mean [28]. Data users realize a larger number of shorter sessions. The number of concurrent sessions in a large city may range between 10,000 and 120,000 as a function of place and time.
The SDR cloud RA needs to be able to handle the spatial and temporal variety in the traffic load. A single data center ideally executes all waveforms and centrally manages all session requests. The corresponding RA then needs to be able to dispatch thousands of requests per second.
Modern wireless communications standards, however, impose restrictions on the maximum session establishment time t max s . The call setup delay t s is the transition http://jwcn.eurasipjournals.com/content/2013/1/59 time from a dormant (camping or idle [1]) state to the transmission or reception state. Each session establishment here consists of allocating sufficient computing resources to the corresponding transceiver waveform. The shorter the call setup time the better the always connected illusion. LTE-A therefore establishes 10 ms as the target call setup delay. Wireless operators moreover define a maximum blocking probability target p max b , which should be satisfied in the mean. The blocking probability p b denotes the probability of a user session request being rejected due to insufficient computing resources. Wireless communications systems need to be accordingly dimensioned.
The session establishment time and blocking constraints determine the RA capacity in terms of manageable users. The number of users that can be concurrently served is directly proportional to the available processing resources. More processors ensure a lower p b , whereas fewer processors a shorter t s . The objective of this article is analyzing the relation between the RA capacity and the call setup time and blocking probability. We identify fundamental SDR cloud management limits and indicate possible SDR cloud data center design and management solutions. Table 1 summarizes the parameters that are used throughout the paper.

RA complexity model
The algorithmic complexity of any RA is a function of the number of tasks m and the number of processing Parameters α and β specify the complexity order of a RA. The same expression also serves as a general execution time model of an RA implementation. Parameters F, α, and β can be found by measuring the RA execution time for different n and m and then performing model fitting. Although other models may be more accurate for certain RA algorithms, (1) is simple and general.
Without loss of generality, we suggest a simple and wellknown algorithm for providing numerical examples for the analysis performed in this article. The g-or greedymapping [8] is a baseline mapping algorithm. It maps one process after another, choosing the processor that leads to the minimum mapping cost. Cost metrics are therefore computed based on a suitable cost function. The cost function we suggest manages the limited processing and interprocessor bandwidth resources and, accordingly, distributing the processing load while minimizing the data flows between processors [8].
The algorithm is implemented in C and available as open source code [29]. Measuring the execution time of our implementation as a function of n and m and performing non-linear least-squares model fitting leads to F = 3.98 · 10 −9 , α = 2.94 and β = 1.04. We can thus approximate the execution time model of the g-mapping algorithms as The g-mapping execution time thus increases linearly with the number of waveform modules m and with the number of processors n cubed. Figure

Resource provisioning
Throughout this section, we will use the previously derived execution time model (2). The analysis is also valid for other RA complexity models provided that the complexity increases with the number of processors.

Optimization problem
We analyze the relation between the call setup delay, the blocking probability, and the RA capacity. To this aim, we derive the optimal number of resources for processing user signals as a function of the environmental conditions and constraints. The wireless communications traffic model is a stochastic birth-death process. The time between consecutive session establishments follows a Poisson distribution with a mean of 1/λ. That is, λ corresponds to the average number of new user session requests per second. The session duration follows an exponential distribution, where 1/μ corresponds to the average session duration in seconds. The traffic load is then ρ = λ/μ Erlangs.
We assume that a single RA needs to handle ρ Erlangs of traffic. The objective is then determining the optimal number of processors n that satisfies the system constraints t max s and p max b . This value is obtained as the solution to an optimization problem maximizing the following objective function: U(n) is the average number of users that can be served with n processors. Function f (n) weights off the benefit (average number of served users) and the cost (allocated resource per Erlang). Parameter θ weights the importance of one term with respect to the other. Equation (3) allows minimizing the number of allocated resources n (θ = 1) or maximizing the average number of served users U(n) (θ = 0). Applying Little's law, we can express U(n) as The optimization problem can then be formulated as follows: Before solving this problem, we first need to model the call setup delay t s (n) and blocking probability p b (n) constraints.

Constraints
The session establishment process can be modeled as a double-queuing process: New users enter an infinite queue whose service time is the execution time of the RA, that is, t RA (n, m). They leave this allocation queue and enter a second multi-server queue of size c. The service time of the active sessions queue is exponentially distributed with an average of 1/μ, which corresponds to the average session duration.
This model can be represented by a two-dimensional state transition diagram, where state probability p i,j indicates the probability that there are i users waiting for the allocation queue while j users have active sessions. The model can be simplified if we consider that the mapping time is much shorter than the average session duration, that is, t RA (n, m) 1/μ. This allows separating the two queues. Following Kendall's representation, we model the allocation queue as an infinite length M/D/1 queue and the active sessions queue as a blocking and finitesize M/M/c/c queue with no wait sates. For simplifying the mathematical analysis, here we consider waveforms of m = 10 tasks and analyze t RA as a function of n.

Call setup delay
User session requests are random and independent from one another. The random session requests lead to random session establishment times. The call setup delay constraint (5) will thus be satisfied on average despite the deterministic mapping time t RA (n). Applying the Poisson arrivals see time averages (PASTA [30]) property, we know that t s (n) follows a Poisson distribution. According to the M/D/1 model, the average call setup delay then becomes This function is monotonically increasing with n ( Figure 3) for λt RA (n) ≤ 1. The system becomes unstable and the average waiting time infinite beyond that point. Figure 3 shows that the call setup delay limits the maximum number of processors that can be managed. For t s (n) = 100 ms and λ = 10 user arrivals per second, for example, up to 150 processors can be managed with the g-mapping RA, but less than 100 processors for λ = 100. Figure 3, moreover, shows that a low t s (n) significantly limits the RA capacity.

Blocking probability
The blocking probability of the active sessions queue occupying all available resources. When this happens, a new user is rejected due to insufficient computing capacity. Parameter c therefore represents the maximum number of waveforms that can be loaded to n processors. This number is difficult to characterize since depending on many factors, including the computing capacity of each processor, the interprocessor communication network, the waveforms' computing characteristics, and the performance of the RA algorithm.
For an analytical treatment the capacity of the queue needs to be abstracted. We propose defining c = (n), which defines the maximum number of users that n processors can accept. Without loss of generality, we assume the linear model (n) = n/k. Parameter k is a real positive value and indicates the percentage of a single processor that is needed for executing a waveform. For k > 1, more than one processor is required for processing a singleuser digital transceiver. For k = 1.8, for instance, one waveform requires 180 % of the processing resources of a single processor for real-time execution. Note that U(n), which provides the average number of users that can be loaded to n processors, depends on the traffic load and blocking probability, whereas (n) essentially depends on the processor capacity, waveform characteristics, and RA algorithm efficiency. The blocking probability of the M/M/c/c queue with c = (n) is then where B(ρ, c) is the Erlang-B function [30] for ρ Erlangs and c servers. Figure 4 indicates the evolution of the blocking probability as a function of n for different k.

Solution
The objective function (3) is strictly concave because the blocking probability (7) is strictly convex [31]. This ensures that the optimization problem has a unique solution. Figure 5 plots the objective function f (n) for different weights θ.
The solution is trivial for θ = 0 or θ = 1, because the objective functions and the constraints are monotonic with n over the entire range of processors. More precisely, p b (n) decreases and t s (n) increases with n. We therefore define n min as the minimum number of processors that satisfies the blocking probability constraint p max b and n max as the maximum number of processors that meets the call setup delay limit t max s . That is, n min and n max limit the number of processors to a range that provides the desired quality. They satisfy We can say that n = n min processors minimize the number of resources, whereas n = n max processors minimize the blocking probability, or maximize the average number of concurrently served users, while still satisfying the call setup delay constraint. If, zhowever, n min exceeds n max , the two constraints cannot both be satisfied and the problem becomes unsolvable. The solutions that minimize the blocking probability (θ = 0) and the allocated processors (θ = 1) then become We need to solve the problem numerically for arbitrary θ. The first option is using numerical optimization. Integer optimization problems are very complex to solve, though. We therefore relax the integer nature of the optimization variable n and use a convex solver for finding a non-integer solution. We then evaluate the objective function for the two closest integers, choosing the maximum that satisfies the constraints.
The Erlang's B(ρ, c) function is defined for natural c. The Erlang's extended B-formula is a continuous representation of the Erlang-B function based on the incomplete Gamma function. Computing this function numerically however requires numerical integration. We rather propose using the recursive method where i is a real positive number. If we are able to obtain B(ρ, z) for a real number z < 1, then we can compute B(ρ, i) for any i. Various approximations for B(ρ, z) have been published based on parabolic interpolations. We used the expression of [32] for the numerical examples that follow.

Numerical examples
More than one processor is typically needed for executing a modern waveform consisting of 10 or more tasks [8].
The numerical examples therefore consider (n) = n/2 allocatable users, m = 10 waveform tasks, and 1/μ = 40 s average data session duration. We use the interior-point numerical algorithm for solving problem (5) and obtaining a non-integer solution ( Figure 6). Assigning n * processors to the RA maximizes the system efficiency f (n). The optimal number of processors n * is a function of θ. The curve corresponding to θ = 0 represents the solution that minimizes the blocking probability (n * = n max ) while meeting the call setup delay constraint. The curve corresponding to θ = 1, at the other extreme, indicates the solution that minimize the use of processing resources (n * = n min ) while satisfying the blocking probability constraint. The intersection of the two curves provides the maximum system capacity ρ max . Parameter n min becomes larger than n max beyond that point and the problem has no solution.
The system capacity is almost 50 Erlangs for a call setup delay constraint of 50 ms, which corresponds to the LTE standard specification (Figure 6a,b). LTE-A indicates call setup times of 10 ms, reducing the RA capacity to some 25 Erlangs in this case (Figure 6c,d). The capacity can be improved by using more powerful processors. Assuming (n) = n/1.5, for example, leads to ρ max = 35 Erlangs for the LTE-A case (Figure 6e,f ).

RA capacity
The previous section has indicated that the RA capacity ρ max is finite. Here we analytically derive this limit. The manageable number of processors is obtained from the tolerable execution time. The blocking probability then determines the RA capacity.

RA execution time limit
The tolerable RA execution time t max RA is a function of the call setup delay constraint and the user arrival rate. It is obtained assuming that the average call setup delay (6) is equal to the call setup delay constraint t max Equation (13) can be simplified when t max s is either considerably smaller or considerably larger than the user inter-arrival time: When t max s λ −1 , the capacity is limited by the stability of the M/D/1 mapping queue (see (6)). The call setup delay is then dominated by the time the user needs to wait before being be served rather than the RA execution time itself. On the other hand, when t max s λ −1 the capacity is limited by the call setup delay constraint. This is the case with modern communications standards, such as LTE and LTE-A, where the call setup delay is dominated by the RA execution time.

Processor limit
The maximum number of processors that a RA can manage is a function of t max RA and follows from inverting Equation (1): The expression · indicates rounding off to the closest integer value.

Traffic limit
Considering the blocking probability constraint, the maximum traffic load ρ max that the RA can manage is then the solution to The capacity in Erlangs is thus a function of the average user session duration μ. The expression can be simplified when t max digitally processing the signals of a single user. The figure assumes the approximation t max s λ −1 and, thus, is the solution to (17). It shows that a single RA can handle up to 200 Erlangs, assuming legacy cellular communications standards, which are characterized by loose call setup delay constraints. However, the capacity drops way below 80 Erlangs for the emerging LTE and LTE-A standards, which establish maximum session establishment delays of 50 and 10 ms.

Distributed resource management
SDR clouds will provide wireless communications services to very wide service areas and will, consequently, need to manage huge traffic loads. Incoming user session requests will then need to be assigned to different RAs. Each RA will absorb only a portion of the total traffic demand, managing part of the data center resources. The assignment of processors to RAs should adapt to the traffic load distribution while satisfying the constraints of (5).
Here we assume a reduced SDR cloud model, where a data center of N = 100 processors serves two radio cells with a total traffic load of 40 Erlangs. The maximum call setup delay is 10 ms and the target blocking probability 5%. The capacity limit of a single RA is 35 Erlangs for (n) = n/1.5. We thus need at least two RAs, one per radio cell. The problem then consists of splitting the N processors between the two RAs in such a way that all constraints are satisfied. Problem (5) thus extends to max {n 1 ,n 2 } f (n 1 ) + f (n 2 ) Parameters n 1 and n 2 represent the number of processors allocated to RA1 and RA2. Figure 8 shows the optimal solution for RA1. The plot of n * 2 is symmetrical to β = 0.5. The traffic of each cell is ρ 1 = ρ and ρ 2 = (1 − ) ρ, where ρ = 40 Erlangs and 0 ≤ ≤ 1.
For θ = 0 all processors will be employed for maximizing the sum of U(n) (see (3)). The processors are distributed between the RAs depending on the slope of the Erlang-B function. For 0.1 ≤ ≤ 0.3 and 0.7 ≤ ≤ 0.9 more processors are assigned to the cell with higher traffic load. This is different for 0.3 ≤ ≤ 0.7, because assigning more processors to the cell with lower service demand decreases the overall blocking probability. For ≤ 0.1 or ≥ 0.9, the traffic of one or the other cell exceeds the corresponding RA capacity and the problem becomes unfeasible. The deployment of additional RA are necessary for such traffic distributions.
When θ = 1, the number of processors is directly proportional to the traffic load, because the slope of the objective function is constant with n. For 0 < θ < 1, the resources are allocated as a function of the performance increment in relation to the amount of allocated resources. The number of allocated processors linearly increases with and (1 − ), respectively, until reaching the maximum number of processors n max that still meets the session establishment delay constraint.

Simulation results
We simulate a non-homogeneous traffic demand, where the user session initiation and termination is modeled as a Poisson arrival and departure process. The user arrival rate is 4 times the departure rate, simulating an unstable situation for better analyzing the performance of the Each processor has a capacity of 12 giga-operations per second (GOPS). The waveforms offer 64, 128, 384, and 1024 kbps data rate services, which are solicited with a probability of 0.5, 0.2, 0.2, and 0.1, respectively. The four waveform models are those from [4], requiring between 50 % and 75 % of a processor's capacity (k < 1). The users follow a two-dimensional Gaussian distribution, centered and with a variance of 0.25 relative to the service area.
The blocking probability constraint is dropped in order to enable a fair evaluation between the three strategies. The optimal strategy then maximizes the number of served users (θ = 0). The session initialization constraint is 50 ms and the average session duration 40 s. Figure 9 shows the accumulated number of accepted user session requests over time. The adaptive strategy assigns processors to RAs according to the traffic demand and optimization parameters, resulting in 2-33 processors assigned to each RA. It accepts considerably more users than both variants of the static strategy. This indicates the performance improvement of adapting the processor assignment to the actual user distribution.

Conclusions
SDR clouds provide an alternative concept for designing and managing wireless communications infrastructure. Higher resource efficiencies are achievable by merging the digital signal processing resources of today's base stations into data centers and employing cloud computing technology. The limited resources need to be properly managed, though. This article has addressed the SDR cloud computing resource management problem. Defining the concept of a RA that manages a subset of computing resources facilitates separating the signal processing algorithms design from the infrastructure and enables using resources on a pay-per-use basis.
Based on the call setup delay and the blocking probability constraints, we have defined the RA processor allocation problem as a constrained convex optimization problem. The feasibility region provides the maximum traffic capacity that a single RA can manage. The results have shown that modern cellular communications standards, such as LTE and LTE-Advanced, considerably limit the RA capacity. Assuming that two processors or more are required to process a transceiver processing chain in real time, less than 50 Erlangs of traffic can be handled by a single RA employing a greedy mapping algorithm. A distributed resource management is therefore necessary.
The data center processors need to be distributed among several RAs subject to the call setup delay and blocking probability constraints. The simulation results moreover indicate that the number of accepted users is severely degraded if the processors distribution is not adapted to the traffic distribution. Our solution is optimal, but does not scale well with the problem size. More precisely, the complexity of problem (18) grows exponentially with the number of RAs. Future research will therefore develop sub-optimal solutions for dynamically recalculating the assignment of processors to RAs and so adapt to varying traffic loads in very large-scale computing systems. Different traffic patterns and empirical data sets will also be considered.