As already mentioned, we now convert the long-term optimization in (16) into a stability problem, hinging on stochastic Lyapunov optimization [42]. The first step is to define suitable state variables, known as *virtual queues*, whose long-term stability guarantees the constraints. More specifically, to deal with the long-term constraints in (*a*), we introduce *K* virtual queues that evolve as follows:

$$\begin{aligned} Z_{k}(t+1) = \max \Big \{0, Z_{k}(t) + \epsilon _k\left( Q^{\text {tot}}_{k}(t+1)-Q_k^{\text {avg}} \right) \Big \}, \end{aligned}$$

(17)

\(k=1,\ldots ,K\), where \(\{\epsilon _k\}_{k=1}^K\) are positive step sizes used to control the convergence speed of the algorithm. A virtual queue is a mathematical model that shows how the system is behaving in terms of constraint violations. Intuitively speaking, if a virtual queue grows too fast, the associated constraints are being violated and the system is not stable. Formally speaking, this translates into the *mean rate stability* of the queues^{Footnote 2}, which is equivalent to satisfy the constraints (*a*) in (16) [42]. To this aim, we first define the Lyapunov function as \(\mathcal {L}(t)=\mathcal {L}(\varvec{\Theta }(t)) = \frac{1}{2} \sum _{k=1}^K Z^2_{k}(t)\), where \(\varvec{\Theta }(t)=\{Z_{k}(t)\}_{k=1}^K\), and then the *drift-plus-penalty* function given by [42]:

$$\begin{aligned} \Delta ^p(t) = {\mathbb {E}}\Big \{\mathcal {L}(t+1)-\mathcal {L}(t) + V\cdot e_{\sigma }^{\text {tot}}(t)\big |\; \varvec{\Theta }(t)\Big \}. \end{aligned}$$

(18)

The drift-plus-penalty function is the conditional expected change of \(\mathcal {L}(t)\) over successive slots, with a penalty factor that weights the objective function of (16), with a weighting parameter *V*. Then, following stochastic optimization arguments as in [42], we proceed by minimizing an upper-bound of the drift-plus-penalty function in (18) in a stochastic fashion. After some algebraic manipulations (similar to the ones used in [13]), we obtain the following per-slot problem at each time *t*:

$$\begin{aligned}&\min _{\varvec{\Psi }(t)\in \mathcal {\widetilde{X}}(t)}\;\; \sum \nolimits _{k=1}^K \Big [\big ( Q_{k}^r(t)-Q^l_{k}(t)-Z_{k}(t)\big )\tau \overline{R}_{k}(t)\nonumber \\&+\big (c_k Q_{k}^a(t)-Q^r_{k}(t)-Z_{k}(t)\big )\tau f_{k}(t)J_k -\big (Q^a_{k}(t)+Z_{k}(t)\big )\tau \underline{R}_{k}(t)\Big ] +V \cdot e_{\sigma }^{\textrm{tot}}(t) \end{aligned}$$

(19)

where \(\mathcal {\widetilde{X}}(t)\) is the instantaneous feasible set, as defined in (16), with the following modifications: (i) constraint (*e*) becomes \(0\le \overline{p}_k(t)\le \widetilde{P}_k(t)I_a(t)\), where \(\widetilde{P}_{k}(t)=\min (P_k,\overline{P}_{k}(t))\), with \(\overline{P}_{k}(t)\) denoting the minimum power needed to empty the local queue \(Q^l_k(t)\) at time *t*; (ii) constraint (*f*) becomes \(0\le \underline{p}_k(t)\le \underline{P}_k(t)I_a(t)\), where \(\underline{P}_k(t)\) is the minimum power needed to empty the downlink queue \(Q_k^a(t)\) at time *t*; (iii) constraint (*h*) becomes \(0\le f_{k}(t)\le Q^r_{k}(t)/\tau J_{k}\). What is worth to emphasize about (19) is that, as opposed to (16), it does not involve any expectation and it is only based on the current values of the channel and task parameters, as well as on the (virtual and real) queues’ states. Because of the structure of set \(\mathcal {\widetilde{X}}(t)\), (19) is a mixed-integer nonlinear optimization problem, which might be very complicated to solve. Nevertheless, in the sequel, we will show how (19) can be split into sub-problems that admit low-complexity solution procedures for the optimal RIS parameters (i.e., the phase shifts of its elements), the AP’s beamformer, the uplink and downlink radio resources (i.e., powers, sleep mode and duty cycle), and the computation resources at the ES (i.e., CPU clock frequencies).

### Dynamic radio resource allocation and RISs optimization

The radio resource allocation problem aims at optimizing the AP duty cycle variable \(I_a(t)\), the beamforming vector \(\textbf{w}(t)\), the uplink and downlink transmission powers \(\{\overline{p}_{k}(t)\}_{k=1}^K\), \(\{\underline{p}_{k}(t)\}_{k=1}^K\), respectively, and the RIS reflectivity parameters \(\{{\varvec{v}}_{i}(t)\}_{i=1}^I\). From (3) and (5), it is clear that the presence of RISs couples uplink and downlink resource allocation, since transmission rates are affected by RISs in both directions. From (19), (15), (3), (5), and (16), the radio resource allocation problem reads as:

$$\begin{aligned} &\min _{\varvec{\Gamma }}(t) \;\;-\sum _{k=1}^K\overline{U}_k(t) \log _2\left( 1+\overline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t)) \overline{p}_{k}(t)\right) \nonumber \\&\quad \; -\sum _{k=1}^K\underline{U}_k(t) \log _2\left( 1+\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t)) \underline{p}_{k}(t)\right) + V \bigg [\sum _{k=1}^K\left( \sigma \tau \overline{p}_{k}(t)+\left( 1-\sigma \right) \tau \underline{p}_{k}(t)\right) \nonumber \\&\qquad \;+(1-\sigma )I_a(t)\tau p_a^{\text {on}}+(1-\sigma )(1-I_a(t))\tau p_a^{\text {s}}+(1-\sigma )\tau \sum _{i=1}^I p^{r}(b_i)\sum _{l=1}^{N_i} |v_{i,l}(t)|^2 \bigg ] \nonumber \\& \text {subject to} \quad I_a(t)\in \{0,1\}; \quad \textbf{w}(t)\in \mathcal {C};\quad 0\le \overline{p}_{k}(t)\le \widetilde{P}_{k}(t)I_a(t)\quad \forall k; \nonumber \\&\qquad \qquad \quad v_{i,l}(t)\in \mathcal {S}_i, \quad |v_{i,l}(t)|^2\le I_a(t) \;\;\forall i,l;\nonumber \\&\qquad \qquad \quad 0\le \underline{p}_{k}(t)\le \underline{P}_{k}(t), \;\; \forall k;\quad \displaystyle \sum _{k=1}^K \underline{p}_{k}(t)\le P_a \,I_a(t); \end{aligned}$$

(20)

where \(\varvec{\Gamma }(t) = [{\varvec{v}}(t), \{\overline{p}_{k}(t)\}_{k=1}^K,\{\underline{p}_{k}(t)\}_{k=1}^K,I_a(t),\textbf{w}(t)]\), and

$$\begin{aligned}\overline{U}_k(t)=(Q_{k}^l(t)-Q^r_{k}(t)+Z_{k}(t)) \overline{B}_k\tau , \end{aligned}$$

(21)

$$\begin{aligned}\underline{U}_k(t)=(Q_{k}^a(t)+Z_{k}(t)) \underline{B}_k\tau . \end{aligned}$$

(22)

Problem (20) is non-convex due to the discrete nature of the phase shifts, the beamforming vector, and the active state variable of the AP (i.e., \(I_a(t)\)); also, the non-convexity comes from the coupling among variables induced by the presence of RISs and the beamforming at the AP. In principle, the global optimum solution of (20) can be achieved through an exhaustive search over all the possible combinations of \(\{{\varvec{v}}_{i}(t)\}_{i=1}^I\), \(\textbf{w}(t)\), and \(I_a(t)\), evaluating the optimal uplink and downlink powers, and selecting the set of variables that yields to the lowest value of the objective function in (20). However, the complexity of this approach grows exponentially with the number *I* of RISs, the maximum number \(N=\max _i N_i\) of RIS elements, and the maximum cardinality \(S=\max _i |\mathcal {S}_i|\) of the sets \(\mathcal {S}_i\) in (2). Since in the dynamic context considered in this paper resource allocation must take place in a very short amount of time, we follow an alternative (albeit simplified) optimization strategy. In particular, let us first notice that we can distinguish between two different cases.

\({\underline{\hbox {Case} \,1: I_a(t)=0}.}\) In this case, problem (20) is trivial, since the AP is in sleep state (thus not receiving and transmitting), and so are also the UE and the RISs. Thus, the only feasible solution reads as:

$$\begin{aligned}\overline{p}_k(t)=0, \;\;\forall k,\quad \underline{p}_k(t)=0, \;\;\forall k,\quad {\varvec{v}}_i(t)=0, \;\;\forall i. \end{aligned}$$

(23)

In this case, the objective function of (20) boils down to:

$$\begin{aligned} \Omega (I_a(t)=0)=V(1-\sigma )\tau p_a^{\text {s}}. \end{aligned}$$

(24)

The value in (24) must be compared with the value of the objective function obtained in the following second case.

\({\underline{\hbox {Case} \,2: I_a(t)=1}.}\) In this case, the AP is available for transmission and/or reception, so that a solution is needed to select the uplink and downlink radio resources and the RIS reflectivity coefficients. In particular, problem (20) translates into the following simplified sub-problem:

$$\begin{aligned} & \min _{\varvec{\Psi }^r(t)} \;\;-\sum _{k=1}^K\overline{U}_k(t) \log _2\left( 1+\overline{\alpha }_k({\varvec{v}}(t),{\textbf{w}}(t)) \overline{p}_{k}(t)\right) \\ & \;\,\qquad \;-\sum _{k=1}^K\underline{U}_k(t) \log _2\left( 1+\underline{\alpha }_k({\varvec{v}}(t),{\textbf{w}}(t)) \underline{p}_{k}(t)\right) \\&\;\,\qquad \;+ V \left [\sum _{k=1}^K\left( \sigma \tau \overline{p}_{k}(t) +\left( 1-\sigma \right) \tau \underline{p}_{k}(t)\right) +(1-\sigma )\tau p_a^{{\rm on}} \right. \\ & \left.\;\,\qquad \; +(1-\sigma )\tau \sum _{i=1}^I p^{r}(b_i)\sum _{l=1}^{N_i} |v_{i,l}(t)|^2\right ] \\&\qquad \quad \quad {\text {subject\,to}} \quad 0\le \overline{p}_{k}(t)\le \widetilde{P}_{k}(t)\;\;\forall k;\quad v_{i,l}(t)\in \mathcal {S}_i \;\;\forall i,l;\quad {\textbf{w}}(t)\in \mathcal {C}; \\ & \;\,\qquad \qquad \qquad \qquad \quad 0\le \underline{p}_{k}(t)\le \underline{P}_{k}(t), \;\; \forall k;\quad \displaystyle \sum _{k=1}^K \underline{p}_{k}(t)\le P_a. \end{aligned}$$

(25)

To solve (25), we propose a greedy method that first optimizes (25) with respect to the RIS reflectivity parameters \(\{{\varvec{v}}_{i}(t)\}_{i=1}^I\) and the AP’s beamforming vector \(\textbf{w}(t)\), and then it selects the uplink and downlink powers. Indeed, given a fixed RISs configuration and AP’s beamformer (i.e., for a given value of \({\varvec{v}}(t)\) and \(\textbf{w}(t)\)), (25) becomes strictly convex and decouples over uplink and downlink, admitting a simple closed form solution for \(\{\overline{p}_{k}(t)\}_{k=1}^K\), and a water-filling like expression for \(\{\underline{p}_{k}(t)\}_{k=1}^K\). The details of the three optimization steps (i.e., RISs/AP’s beamforming, uplink, and downlink) are given next.

#### RISs and AP’s beamforming optimization

To optimize (20) with respect to the RISs configuration and the AP’s beamforming, we notice that, for any value of \(\overline{p}_{k}(t)\), if \(\overline{U}_{k}(t)>0\), the *k*-th component of the first objective term in (25) is minimized by increasing the normalized channel coefficients \(\overline{\alpha }_{k}({\varvec{v}}(t),\textbf{w}(t))\). A similar argument applies to the *k*-th component of the second objective term in (25), which is minimized by increasing the normalized channel coefficient \(\underline{\alpha }_{k}({\varvec{v}}(t),\textbf{w}(t))\). Thus, letting \(\mathcal {U}(t)=\{k\,|\,\overline{U}_{k}(t)>0\}\), we exploit the following surrogate optimization function:

$$\begin{aligned} \Delta ^R({\varvec{v}}(t),\textbf{w}(t))\,=\,&-\sum _{k\in \mathcal {U}(t)}\overline{U}_{k}(t)\overline{\alpha }_{k}({\varvec{v}}(t),\textbf{w}(t))-\sum _{k=1}^K\underline{U}_{k}(t)\underline{\alpha }_{k}({\varvec{v}}(t),\textbf{w}(t)) \nonumber \\&+ V (1-\sigma )\tau \sum _{i=1}^I p^{R}(b_i)\sum _{l=1}^{N_i} |v_{i,l}(t)|^2, \end{aligned}$$

(26)

which represents a linear combination of the RIS energy term in (25), weighted by the Lyapunov parameter *V*, and (negative) RIS-dependent uplink and downlink channel coefficients in (4) and (6), weighted by the terms \(\overline{U}_{k}(t)\) and \(\underline{U}_{k}(t)\) in (21)-(22), which depend on the communication, computing, and virtual queues’ states.

Intuitively, minimizing (26), the RISs will be optimized to favor uplink and/or downlink communications (depending on the status of the cumulative parameters \(\overline{U}_{k}(t)\) and \(\underline{U}_{k}(t)\) for each user *k*), with a penalty on the energy spent for such improvement in communication performance. This is equivalent to a dynamic scheduling of the RIS resources to serve the users over uplink and/or downlink communications, depending on the status of the queues (i.e., \(\overline{U}_{k}(t)\) and \(\underline{U}_{k}(t)\)) that quantify the system congestion. In other words, time plays the role of a further degree of freedom for the scheduling of the RISs, which are dynamically assigned by the proposed Lyapunov stochastic optimization procedure to serve uplink or downlink communications of different users. To the best of our knowledge, this queue-based dynamic control of RIS reconfiguration has never been proposed in the literature. Finally, increasing the value of *V*, the minimization of (26) leads to more sparse solutions for the vector \({\varvec{v}}(t)\), since it might be unnecessary to switch on all the reflecting elements to satisfy the average latency constraint in (16).

The steps of the proposed greedy method are illustrated in Algorithm 1, which proceeds according to the following rationale. Fixing an AP’s beamforming vector \(\textbf{w}(t)\in \mathcal {C}\), the method greedily optimizes the reflectivity vector \(\bar{{\varvec{v}}}_{i}\) (initialized at zero) of each RIS *i*, iteratively selecting the coefficient \(v_{i,l}\in \mathcal {S}_i\) that minimizes (26), having fixed all the other parameters of RIS *i* (i.e., \(\bar{{\varvec{v}}}_{i,-l}\)) and of the other RISs (i.e., \(\bar{{\varvec{v}}}_{-i}\)). This procedure is repeated for all possible beamforming vector \(\textbf{w}(t)\in \mathcal {C}\), in order to find the pair \(({\varvec{v}}(t),\textbf{w}(t))\) that greedily minimizes the objective in (26). For each \(\textbf{w}(t)\), this approach requires \(O(S\overline{N})\) evaluations of (26), with \(\overline{N}=\sum _{i=1}^I N_i\), and leads to a non-increasing behavior of (26) as more RIS reflecting elements are added and optimized. Interestingly, the function in (26) is greedily optimized filling the vectors \(\{{\varvec{v}}_i(t)\}_{i=1}^N\) one element per time, starting from the zero vector (cf. Algorithm 1). Thus, in the first stages of Algorithm 1, the vector \({\varvec{v}}(t)\) is composed of almost all zeros (i.e., it is highly sparse), and the computation of (26) is very light (cf. (4) and (6)). This operation is repeated for all \(\textbf{w}(t)\in \mathcal {C}\), thus requiring \(O(S\overline{N}|\mathcal {C}|)\) evaluations of the objective function in (26).

*Block optimization of RISs* Even if the complexity of the greedy procedure in Algorithm 1 is sufficiently low, in practical scenarios one might still desire an even faster procedure. To this aim, we might divide the \(N_i\) modules of RIS *i* in \(N_b\) blocks, where the elements of each block are phase-shifted in the same way. Then, proceeding as in Algorithm 1, each block of RIS *i* is greedily optimized selecting the phase shift coefficient (equal for each element of the block) that leads to the largest decrease of the surrogate objective in (26). Assuming for simplicity that the number of blocks is the same for all RISs, the complexity of Algorithm 1 is reduced of a factor \(\overline{N}/I N_b\), which is paid in terms of an overall reduction of performance. This complexity-performance trade-off will be numerically assessed in Sect. 4.

#### Uplink radio resource allocation

Once the RIS configuration \({\varvec{v}}(t)\) and the AP’s beamformer \(\textbf{w}(t)\) have been fixed through Algorithm 1, from (25), the uplink radio resource allocation decouples from downlink, and reads as:

$$\begin{aligned} \min _{\{\overline{p}_{k}(t)\}_{k=1}^K} &-\sum _{k=1}^K\overline{U}_k(t) \log _2\left( 1+\overline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t)) \overline{p}_{k}(t)\right) + V\sigma \tau \sum _{k=1}^K \overline{p}_{k}(t) \nonumber \\&\quad \quad \text {subject to} \quad \quad 0\le \overline{p}_{k}(t)\le \widetilde{P}_{k}(t),\quad \forall k. \end{aligned}$$

(27)

Problem (27) is convex, with an additive strictly convex objective that decouples over the users. Now, imposing the Karush–Kuhn–Tucker (KKT) conditions of (27), it is easy to see that the problem admits a closed form solution for the optimal \(\{ \overline{p}_{k}(t)\}_{k=1}^K\). In particular, the set \(\mathcal {U}(t)=\{k\,|\,\overline{U}_{k}(t)>0\}\) previously used in (26) takes the role of the set of transmitting users. Indeed, from a rapid inspection of (27), it is clear that user *k* does not transmit (i.e., \(\overline{p}_{k}(t)=0\)) if \(\overline{U}_{k}(t)<0\) (since both terms of the objective function in (27) are monotone non-decreasing functions of \(\overline{p}_{k}(t)\)). Thus, we get a simple closed form solution for the optimal uplink powers:

$$\begin{aligned} \overline{p}_{k}(t)={\left\{ \begin{array}{ll} \displaystyle \left[ \frac{\overline{U}_{k}(t)}{V\tau \log 2}-\frac{1}{\overline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))}\right] _0^{\widetilde{P}_{k}(t)},&{} \hbox {if}\, k\in \mathcal {U}_t;\\ 0,&{} \hbox {if}\, k\notin \mathcal {U}_t. \end{array}\right. } \end{aligned}$$

(28)

As expected, for all *k*, the transmission powers at time *t* in (28) are affected by the RIS-dependent uplink channel coefficient \(\overline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))\), and the status of the communication, computation, and virtual queues embedded into \(\overline{U}_{k}(t)\) (cf. (21)).

#### Downlink radio resource allocation

Once the RISs configuration \({\varvec{v}}(t)\) and the AP’s beamformer have been fixed, the radio resource allocation problem optimizes the downlink transmission powers \(\{\underline{p}_{k}(t)\}_{k=1}^K\). From (25), we obtain:

$$\begin{aligned} & \min _{\{\underline{p}_{k}(t)\}_{k=1}^K} -\sum _{k=1}^K\underline{U}_k(t) \log _2\left( 1+\underline{\alpha }_k({\varvec{v}}(t),{\textbf{w}}(t)) \underline{p}_{k}(t)\right) + V (1-\sigma )\tau \left( \sum _{k=1}^K\underline{p}_k(t)+p_a^{{\rm on}}\right) \\ & {\text {subject\,to}}\quad 0\le \underline{p}_{k}(t)\le \underline{P}_{k}(t), \;\;\forall k;\quad \displaystyle \sum _{k=1}^K \underline{p}_{k}(t)\le P_a. \end{aligned}$$

(29)

Problem (29) is convex, and its solution can be found very efficiently imposing the KKT conditions. In particular, the Lagrangian associated with (29) reads as:

$$\begin{aligned} L=&-\sum _{k=1}^K\underline{U}_k(t) \log _2\left( 1+\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t)) \underline{p}_{k}(t)\right) + V(1-\sigma )\tau \left( \sum _{k=1}^K\underline{p}_k(t)+p_a^{\text {on}}\right) \nonumber \\&-\sum _{k=1}^K\beta _k\underline{p}_k(t)+\sum \nolimits _{k=1}^K \gamma _k(\underline{p}_k(t)-\underline{P}_k(t))+\nu \left( \sum \nolimits _{k=1}^K\underline{p}_k(t)-P_a\right) . \end{aligned}$$

(30)

Then, the KKT conditions are given by:

$$\begin{aligned} & i)\; \frac{\partial L}{\partial \underline{p}_k}= -\frac{\underline{U}_k(t)\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))}{\log (2)\left( 1+\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))\underline{p}_k(t)\right) }+V(1-\sigma )\tau -\beta _k+\gamma _k+\nu =0, \;\;\forall k; \nonumber \\ & ii)\;\beta _k\ge 0;\quad \underline{p}_k(t)\ge 0;\quad \beta _k\underline{p}_k(t)=0,\quad \forall k;\nonumber \\& iii)\;\gamma _k\ge 0;\quad \underline{p}_k(t)\le \underline{P}_k(t);\quad \gamma _k\left( \underline{p}_k(t)-\underline{P}_k(t)\!\right) =0,\; \forall k;\nonumber \\& iv)\;\nu \ge 0;\;\; \sum \nolimits _{k=1}^K\underline{p}_k(t)\le P_a;\;\;\nu \left( \sum \nolimits _{k=1}^K\underline{p}_k(t)-P_a\right) =0. \end{aligned}$$

(31)

Now, let us consider two cases. First of all, if we assume that \(\sum _{k=1}^K\underline{p}_k(t)<P_a\), we have \(\nu =0\) due to condition *iv*) in (31). Then, from condition *i*), the optimal solution is:

$$\begin{aligned} \underline{p}_{k}(t)=\left[ \frac{\underline{U}_k(t)}{V(1-\sigma )\log 2}-\frac{1}{\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))} \right] _0^{\underline{P}_k(t)} \;\;\forall k. \end{aligned}$$

(32)

This means that, evaluating (32) for all *k*, if \(\sum _{k=1}^K\underline{p}_k(t)\le P_a\), then (32) is also the global optimal solution of (29), since it satisfies all the KKT conditions. In the second case, given (32), if \(\sum _{k=1}^K\underline{p}_k(t)> P_a\), we must have \(\nu >0\), and the optimal solution of (29) is found by imposing \(\sum _{k=1}^K\underline{p}_k(t)= P_a\) due to condition *iv*) in (31). In this case, from condition *i*) in (31), the solution of (29) admits a water-filling like structure [43] (whose practical implementation requires at most *K* iterations). More specifically, the optimal powers read as:

$$\begin{aligned} \underline{p}_{k}(t)=\left[ \frac{\underline{U}_k(t)}{[V(1-\sigma )+\nu ]\log 2}-\frac{1}{\underline{\alpha }_k({\varvec{v}}(t),\textbf{w}(t))} \right] _0^{\underline{P}_k(t)} \;\;\forall k, \end{aligned}$$

(33)

where \(\nu\) is the Lagrange multiplier chosen to satisfy the power budget constraint with equality, i.e., \(\sum _{k=1}^K \underline{p}_{k}(t)= P_a\). The overall procedure is summarized in Algorithm 2 and is very efficient. Indeed, in the case the closed-form solution in (32) is such that \(\sum _{k=1}^K \underline{p}_{k}(t)\le P_a\), the procedure stops and the water-filling solution in (33) is not needed.

*Overall procedure for radio resource allocation* Using Algorithm 1, (28), and Algorithm 2, we have the proposed solution to problem (25), i.e., the solution of problem (20) when the AP is active, i.e., \(I_a(t)=1\). Now, to decide the AP state variable \(I_a(t)\), we need to compare the value of the objective function of (20) in the active case with the one achieved in the sleep state, i.e., (24). Then, denoting by \({\varvec{v}}^{\text {on}}(t)\), \(\textbf{w}^{\text {on}}(t)\), \(\{\underline{p}_k^{\text {on}}\}_{k=1}^K\), and \(\{\overline{p}_k^{\text {on}}\}_{k=1}^K\) the solution obtained with \(I_a(t)=1\) (through Algorithm 1, (28), and Algorithm 2), the objective (20) reads as:

$$\begin{aligned} &\Omega (I_a(t)=1)=-\sum _{k=1}^K\overline{U}_k(t) \log _2\left( 1+\overline{\alpha }_k({\varvec{v}}^{\text {on}}(t),\textbf{w}^{\text {on}}(t)) \overline{p}_{k}^{\text {on}}(t)\right) \nonumber \\&\quad -\sum _{k=1}^K\underline{U}_k(t) \log _2\left( 1+\underline{\alpha }_k({\varvec{v}}^{\text {on}}(t),\textbf{w}^{\text {on}}(t)) \underline{p}_{k}^{\text {on}}(t)\right) + V\tau \left[ \sum _{k=1}^K\left( \sigma \overline{p}_{k}^{\text {on}}(t)+(1-\sigma )\underline{p}_{k}^{\text {on}}(t)\right) \right. \nonumber \\&\left. \quad +\, (1-\sigma )\!\!\left( \! p_a^{\text {on}}+\sum _{i=1}^I p^{R}(b_i)\sum _{l=1}^{N_i} |v_{i,l}^{\text {on}}(t)|^2\right) \right] . \end{aligned}$$

(34)

Then, the final solution of (20) is found by comparing (24) and (34). Indeed, if \(O(I_a(t)=0)\le O(I_a(t)=1)\), the solution is given by (23). Otherwise, the solution is given by Algorithm 1, (28), and Algorithm 2. The overall procedure for dynamic radio resource allocation is described in Algorithm 3.

### Dynamic allocation of computing resources

The computing resource allocation problem optimizes the CPU frequencies \(\{f_{k}(t)\}_{k=1}^K\) assigned by the server to the devices, and the overall ES frequency \(f_c(t)\). From (19), letting \(Y_{k}(t)=(-c_kQ_k^a(t)+Q_{k}^r(t)+Z_{k}(t))J_k\tau\), we obtain

$$\begin{aligned} &\min _{\{f_{k}(t)\}_{k=1}^K,\,f_c(t)} -\sum _{k=1}^K Y_{k}(t) f_{k}(t) + V(1-\sigma )\tau \gamma _s(f_c(t))^3\nonumber \\&\quad \quad \text {subject to} \quad 0\le f_{k}(t)\le \frac{Q^r_{k}(t)}{\tau J_{k}},\;\;\forall k;\nonumber \\&\qquad \qquad \qquad \quad \displaystyle \sum _{k=1}^K f_{k}(t)\le f_c(t); \qquad f_c(t)\in \mathcal {F}. \end{aligned}$$

(35)

The CPU frequency \(f_c(t)\) in (35) is assumed to belong to a fixed discrete set \(\mathcal {F}\). Thus, for a given \(f_c(t)\in \mathcal {F}\), problem (35) is linear in \(\{f_{k}(t)\}_{k=1}^K\), and can be solved using the simple procedure in Algorithm 4. Intuitively, Algorithm 4 assigns the largest portions of \(f_c(t)\) to the devices with largest values of \(Y_{k}(t)\), and requires at most *K* steps. Also, letting \(\mathcal {C}(t)=\{k\,|\, Y_{k}(t)>0\}\), it is clear that the ES assigns a nonzero CPU frequency only to the devices belonging to \(\mathcal {C}(t)\). Finally, denoting by \(\{f_{k}(f_c(t))\}_{k=1}^K\) the optimal frequencies assigned at the users for a given \(f_c(t)\in \mathcal {F}\) (using Algorithm 4), the optimal ES frequency \(f_c(t)\) is given by:

$$\begin{aligned} f_c(t)= \arg \min _{f_c\in \mathcal {F}} \;\;-\sum _{k\in \mathcal {C}(t)} Y_{k}(t) f_{k}(f_c)+ V (1-\sigma )\tau \gamma _s(f_c)^3. \end{aligned}$$

(36)

The variables \(f_c(t)\) and \(\{f_{k}(f_c(t))\}_{k=1}^K\) represent the global optimal solution of (35) at time *t*. The worst case number of scalar operations needed by this procedure is \(O(K|\mathcal {F}|)\), which is affordable in many practical scenarios.

### Overall algorithmic solution

The overall procedure for the proposed resource allocation strategy for RIS-empowered dynamic mobile edge computing is summarized in Algorithm 5, which explains how the previously introduced Algorithms 1-4 are intertwined. In particular, the first step of Algorithm 5 aims at allocating radio resources exploiting Algorithm 3, which embeds Algorithms 1 and 2 (cf. Algorithm 3). Then, the second step of Algorithm 5 hinges on Algorithm 4 to allocate the computing resources of the ES. The first two steps of Algorithm 5 (i.e., the resource allocation) are run by the ES, which is the only entity assumed to have the overall knowledge of the system in terms of, e.g., channels, queues, etc. Also, since steps 1 and 2 of Algorithm 5 involve decoupled optimizations over radio and computing variables, respectively, they can also be computed in parallel to make the implementation more efficient.

Clearly, Algorithm 5 builds on the previously derived joint optimization of communication, computation, and RISs parameters in Sects. 3.1 and 3.2. The method is fully dynamic and optimizes variables on-the-fly via closed form expressions or low-complexity procedures (which do not require asymptotic convergence of iterative algorithms), based on instantaneous realizations and observation of all involved random variables (i.e., wireless channels, and data arrivals), as well as that of physical and virtual queue states. Algorithm 5 is run at the ES, and the optimized variables are then sent to the UE, AP, and RISs, within the portion \(\tau _s\) of the time slot dedicated to resource optimization and signaling. Distributed implementations to limit the exchange of state information can also be envisioned, but are beyond the scope of this paper.

Interestingly, the algorithm comes with theoretical guarantees in terms of stability and performance. In particular, the following proposition holds.

### Proposition 1

Suppose that the channel gains \(\{\overline{{\varvec{h}}}_{k}^a(t)\}_k\), \(\{\overline{{\varvec{h}}}_{k,i}(t)\}_{k,i}\), \(\{\overline{{\varvec{G}}}_{k,i}^a(t)\}_{k,i}\), \(\{\underline{{\varvec{h}}}_{k}^a(t)\}_k\), \(\{\underline{{\varvec{h}}}_{k,i}(t)\}_k\), \(\{\underline{{\varvec{G}}}_{k,i}^a(t)\}_{k,i}\), and the data arrivals \(\{A_k(t)\}_{k}\) are i.i.d over time. Then, let (16) be feasible, and \({\mathbb {E}}\{\mathcal {L}(\mathbf {\Theta }(0))\}<\infty\). Then, Algorithm 5 guarantees that all physical and virtual queues are mean-rate stable and that the network objective \(e_{\textrm{tot}}^{\sigma }(t)\) satisfies:

$$\begin{aligned} \limsup \limits _{T\rightarrow \infty }\;\;\frac{1}{T}\sum _{t=1}^{T}{\mathbb {E}}\{e_{\textrm{tot}}^{\sigma }(t)\}\,\le \, e_{\textrm{tot}}^{\sigma ,\mathrm opt} + \frac{\zeta +C}{V}, \end{aligned}$$

(37)

where \(e_{\textrm{tot}}^{\sigma ,\mathrm opt}\) is the infimum time average energy achievable by any policy that meets the required constraints, and *C* and \(\zeta\) are finite positive constants.

### Proof

The claim follows from the fact that the control policy given by Algorithm 5 is a *C*-additive approximation [42, p. 59], which admits inexact solutions (with bounded error) of the drift-plus-penalty method in (19) at each time *t*. In fact, the solution of Algorithm 5 generally differs from the one of (19), because the greedy Algorithm 1 (embedded into Step 1 of Algorithm 5) is not guaranteed to find the optimal RIS configuration for a given set of channel configurations and data arrivals. However, since the objective and the feasible set of (19) are bounded, for any given value of the (real and virtual) queues at time *t*, the (expected conditional) difference of the objective values achieved by an exhaustive search procedure (striking the optimum) and the proposed approach in Algorithm 1 is always upper-bounded by a finite constant *C*. This proves that Algorithm 5 is a *C*-additive approximation of the drift-plus-penalty method in (19). Finally, following the same approach used in [42, p. 61], since all the functions in (19) are bounded over the feasible set for all *t*, it exists a finite constant \(\zeta >0\) such that the main claim of the Proposition comes as a consequence of [42, Theorem 4.8]. \(\square\)

Proposition 1 guarantees that Algorithm 5 provides stability of the system, while asymptotically approaching the optimal solution of (16) as *V* increases [42, Th. 4.8]. In practical scenarios with finite *V* values, the higher is *V*, the more importance is given to the energy consumption, rather than to the virtual queue backlogs, thus pushing the solution closer to optimality, while still guaranteeing the stability of the system. In Sect. 4, we will numerically assess the performance of the proposed resource allocation strategy.