Authenticated secret key generation in delay-constrained wireless systems

With the emergence of 5G low-latency applications, such as haptics and V2X, low-complexity and low-latency security mechanisms are needed. Promising lightweight mechanisms include physical unclonable functions (PUF) and secret key generation (SKG) at the physical layer, as considered in this paper. In this framework, we propose (i) a zero round trip time (0-RTT) resumption authentication protocol combining PUF and SKG processes, (ii) a novel authenticated encryption (AE) using SKG, and (iii) pipelining of the AE SKG and the encrypted data transfer in order to reduce latency. Implementing the pipelining at PHY, we investigate a parallel SKG approach for multi-carrier systems, where a subset of the subcarriers are used for SKG and the rest for data transmission. The optimal solution to this PHY resource allocation problem is identified under security, power, and delay constraints, by formulating the subcarrier scheduling as a subset-sum 0−1 knapsack optimization. A heuristic algorithm of linear complexity is proposed and shown to incur negligible loss with respect to the optimal dynamic programming solution. All of the proposed mechanisms have the potential to pave the way for a new breed of latency aware security protocols.


Introduction
Many standard cryptographic schemes, particularly those in the realm of public key encryption (PKE), are computationally intensive, incurring considerable overheads and can rapidly drain the battery of power-constrained devices [1,2], notably in Internet of Things (IoT) applications [3].For example, a 3GPP report on the security of ultra-reliable low-latency communication (URLLC) systems notes that authentication for URLLC is still an open problem [4].Additionally, traditional public key generation schemes are not quantum secure-in that when sufficiently capable quantum computers will be available, they will be able to break current known PKE schemes-unless the key sizes increase to impractical lengths.
In the past years, physical layer security (PLS) [5][6][7][8][9] has been studied as a possible alternative to classic, complexity-based, cryptography.As an example, signal properties as in [10] can be exploited to generate opportunities for confidential data transmission [11,12].Notably, PLS is explicitly mentioned as a 6G enabling technology in the first white paper on 6G [13]: "The strongest security protection may be achieved at the physical layer." In this work, we propose to move some of the security core functions down to the physical layer, exploiting both the communication radio channel and the hardware, as unique entropy sources.
Since the wireless channel is reciprocal, time-variant and random in nature, it offers a valid, inherently secure source that may be used in a key agreement protocol between two communicating parties.The principle of secret key generation (SKG) from correlated observations was first studied in [14] and [15].A straightforward SKG approach can be built by exploiting the reciprocity of the wireless fading coefficients between two terminals within the channel coherence time [16], and this paper builds upon this mechanism.This is pertinent to many forthcoming B5G applications that will require a strong, but nevertheless, lightweight security key agreement; in this direction, PLS may offer such a solution or complement existing algorithms.With respect to authentication, physical unclonable functions (PUFs), firstly introduced in [17] (based on the idea of physical one-way functions [18]), [19] could also enhance authentication and key agreement in demanding scenarios, including (but not limited to) device to device and tactile Internet.We note that others also point to using physical layer security to reduce the resource overhead in URLLC [20].
A further advantage of PLS is that it is information-theoretic secure [21], i.e., it is not open to attack by future quantum computers and it requires lower computation costs as will be explored later in this paper.In this work, we will discuss how SKG from shared randomness [22] is a promising alternative to PKE for key agreement.However, unauthenticated key generation is vulnerable to man in the middle (MiM) attacks.In this sense, PUFs can be used in conjunction with SKG to provide authenticated encryption (AE).As summarized in [19], the employment of PUFs can decrease the computational cost and have a high impact on reducing the authentication latency in constrained devices.
In this study, we introduce the joint use of PUF authentication and SKG in a zero round trip time (0-RTT) [23,24] approach, allowing to build quick authentication mechanisms with forward security.Further, we develop an AE primitive [25][26][27] based on standard SKG schemes.To investigate a fast implementation of the AE SKG, we propose a pipelined (parallel) scheduling method for optimal resource allocation at the physical layer (PHY) (i.e., by optimal allocation of the subcarriers in 5G resource blocks).
Next, we extend the analysis to account for statistical delay quality of service (QoS) guarantees, a pertinent scenario in B5G.The support of different QoS guarantee levels is a challenging task.In fact, in time-varying channels, such as in wireless networks, determining the exact delay bound depending on the users' requirements is impossible.However, a practical approach, namely the effective capacity [28], can provide statistical QoS guarantees and can give delay bounds with a small violation probability.In our work, we employ the effective capacity as the metric of interest and investigate how the proposed pipelined AE SKG scheme performs in a delay-constrained scenario.
The system model introduced in this work assumes that a block fading additive white Gaussian noise (BF-AWGN) channel is used with multiple orthogonal subcarriers.In our parallel scheme, a subset of the subcarriers is used for SKG and the rest for encrypted data transfer.The findings of this paper are supported by numerical results, and the efficiency of the proposed parallel scheme is shown to be greater or similar to the efficiency • When a relaxed QoS delay constraint is in place; • When a stringent QoS delay constraint is in place.
A roadmap of the paper's contributions is shown in Fig. 1.The paper is organized as follows: related work is discussed in Section 2 followed by a brief summary of the methods used within this paper in Section 3, and then the general system model is introduced in Section 4. The use of PUF authentication is illustrated in Section 4.1, the baseline SKG in Section 4.2; next, in Sections 4.3 and 4.4, we present an AE scheme using SKG and a resumption scheme to build a 0-RTT protocol.Subsequently, we evaluate the optimal power and subcarrier allocation at PHY considering both the long-term average rate in Section 5 and the effective rate in Section 6.In Section 7, the efficiency of the proposed approach is evaluated against that of a sequential approach, while conclusions are presented in Section 8.

Related work
This paper assumes the use of PUF-based authentication with SKG.PUFs are hardware entities based on the physically unclonable variations that occur during the production process of silicon.These unique and unpredictable variations allow the extraction of uniformly distributed binary sequences.Due to their unclonability and simplicity, PUFs are Fig. 1 Roadmap of contributions seen as lightweight security primitives that can offer alternatives to today's authentication mechanisms.Furthermore, employing PUFs can eliminate the need of non-volatile memory, which reduces cost and complexity [29].Common ways of extracting secret bit sequences are through measuring delays on wires and gates or observing the power up behavior of a silicon.
On the other hand, due to the nature of propagation in a shared free-space, wireless communication remains vulnerable to different types of attacks.Passive attacks such as eavesdropping or traffic analysis can be performed by anyone in the vicinity of the communicating parties; to ensure confidentiality, data encryption is vital for communication security.The required keys can be agreed at PHY using SKG.In this case, all pilot exchanges need to take place over the coherence time of the channel1 , during which Alice and Bob can observe highly correlated channel states that can be used to generate a shared secret key between them.SKG has been implemented and studied for different applications such as vehicular communications [42,43], underwater communications [44], optical fiber [45], visible light communication [46], and more as summarized in [47].The key conclusion from these studies is that SKG shows promise as an important alternative to current key agreement schemes.
Widely used sources of shared randomness used for SKG are the received signal strength (RSS) and the full channel state information (CSI) [48].In either case, it is important to build a suitable pre-processing unit to decorrelate the signals in the time/frequency and space domains.As an example, some recent works have shown that the widely adopted assumption [49] that a distance equal to half of the wavelength (which at 2.4 GHz is approximately 6 cm [50]) is enough for two channels to decorrelate may not hold in reality [40].Other works show that the mobility can highly increase the entropy of the generated key [51,52] while an important issue with the RSS-based schemes is that they are open to predictable channel attacks [40,53].These important issues need to be explicitly accounted for in actual implementations, but fall outside the scope of this paper.

Methods
The methods used and introduced in this paper rely upon a range of basic primitives, which in combination provide the full PUF and AE SKG solution.Each of these primitives is introduced below together with a summary of the methods used to analyze and optimize the solution.
Authentication: Before establishing a shared secret key, Alice and Bob must be sure they are communicating with a trusted party.To achieve this, we assume the usage of a PLS method, more specifically PUF authentication.As discussed in Section 2 by eliminating the need of non-volatile memory, the usage of PUFs could greatly reduce the complexity compared to existing authentication alternatives.Secret key generation: To ensure that their communication is private, after authenticating each other, Alice and Bob have to encrypt/decrypt the data.For this work, we assume the use of symmetric encryption where the same key is used for both operations.In order to obtain a shared key, we propose to use SKG which consists of three standard steps: (i) advantage distillation, (ii) information reconciliation, and (iii) privacy amplification; each of these steps is explained in more detail in Section 4.2.

Re-authentication:
We present a re-authentication approach that exploits the use of resumption secrets as used in 0-RTT protocols.Instead of performing full authentication before sending data encrypted with a new key, we propose a new method which allows Alice (Bob) to authenticate subsequent keys using a lightweight scheme anchored by the initial authentication process.Authenticated encryption SKG: To eliminate the possibility of tampering attacks, we build on the SKG process to introduce a new AE SKG method.AE can simultaneously guarantee confidentiality and message integrity.In our AE SKG method, side information and encrypted data transfer are pipelined.Pipelined transmission: In our proposal, the key generation is pipelined with the encrypted data transfer, i.e., side information and data encrypted with the key that corresponds to the side information are transmitted over the same 5G resource block(s).

Joint PHY/MAC delay analysis:
To analyze the system under statistical QoS delay constraints, we use the theory of effective capacity [28] and analyze the scheme's effective rate.Optimization methods: Finally, to optimize the pipelined transmission, we take into consideration practical wireless aspects such as the impact of imperfect CSI measurements and formulate two optimization problems to find the optimal resource allocation for Alice and Bob.To solve these problems, we employ tools such as combinatorial optimization, dynamic programming, order statistics, and convex optimization.

Threat model
In this paper, we assume a commonly used adversarial model with an active man-inthe-middle attacker (Eve) and a pair of legitimate users (Alice and Bob).For simplicity, we assume a rich Rayleigh multipath environment where the adversary is more than a few wavelengths away from each of the legitimate parties.This forms the basis of our hypothesis that the measurements of Alice and Bob are uncorrelated to the Eve's measurements.

Notation
Random variables are denoted in italic font, e.g., x, and vectors and matrices are denoted with lower and upper case bold characters, e.g., x and X, respectively.Functions are printed in a fixed-width teletype font, e.g., F. All sets of vectors are given with calligraphic font X , and the elements within a set are given in curly brackets, e.g., {x, y}, the cardinality of a vector or set is defined by vertical lines, e.g., |x| or |X | .Concatenation and bit-wise XORing are represented as [ x||y] and x ⊕ y, respectively.We use H to denote entropy, I mutual information, E expectation, and C the set of complex numbers.

Node authentication using PUFs and SKG
In this section, we present a joint physical layer SKG and PUF authentication scheme.To the best of our knowledge, this is the first work that proposes the utilization of the two schemes in conjunction.As discussed in Section 2, many PUF authentication protocols have been proposed in the literature, with even a few commercially available [54,55].
We do not look into developing a new PUF architecture or a new PUF authentication protocol; instead, we look at combining existing PUF mechanisms with SKG.In addition, we develop an AE scheme that can prevent tampering attacks.To further develop our hybrid cryptosystem, we propose a resumption type of authentication protocol, inspired by the 0-RTT authentication mode in the transport layer security (TLS) 1.3 protocol.The resumption protocol is important as it significantly reduces the use of the PUF to the initial authentication, thus, overcoming the limitation of a PUFs' challenge response space [34,56].

Node authentication using PUFs
As discussed in Section 3, for security against MiM attacks, the SKG needs to be protected through authentication.While existing techniques, such as the extensible authentication protocol-transport layer security (EAP-TLS), could be used as the authentication mechanism, these are computationally intensive and can lead to significant latency [57,58].This leads to the motivation to seek lightweight authentication mechanisms that can be used in conjunction with SKG.Such a mechanism that is achieving note within the research community uses a PUF.The concept of a PUF was first introduced in [17]; its idea is to utilize the fact that every integrated circuit differs to others due to manufacturing variability [59,60] and cannot be cloned [61].Having these characteristics, a PUF can be used in a challenge-response scheme, where a challenge can refer to a delay at a specific gate, power-on state, etc.
A typical PUF-based authentication protocol consists of two main phases, namely enrolment phase and authentication phase [62][63][64][65][66].During the enrolment phase, each node runs a set of challenges on its PUF and characterizes the variance of the measurement noise in order to generate side information.Next, a verifier creates and stores a database of all challenge-response pairs (CRPs) for each node's PUF within its network.A CRP pair in essence consists of an authentication key and related side information.Within the database, each CRP is associated with the ID of the corresponding node.
Later, during the authentication phase, a node sends its ID to the verifier requesting to start a communication.Receiving the request, the verifier checks if the received ID exists in its database.If it does, the verifier chooses a random challenge that corresponds to this ID and sends it to the node.The node computes the response by running the challenge on its PUF and sends it to the verifier.However, the PUF measurements at the node are never exactly the same due to measurement noise; therefore, the verifier uses the new PUF measurement and the side information stored during the enrollment to re-generate the authentication key.Finally, the verifier compares the re-generated key to the one in the CRP, and if they are identical, the authentication of the node is successful.In order to prevent replay attacks, once used, a CRP is deleted from the verifier database.
In summary, the motivation for using a PUF authentication scheme in conjunction with SKG is to exclude all of the computationally intensive operations required by EAP-TLS, which use modulo arithmetic in large fields.Measurements performed on current public key operations within EAP-TLS on common devices (such as IoT) give average authentication and key generation times of approximately 160 ms in static environments and this can reach up to 336 ms in high mobility conditions [67].
On the other hand, PUF authentication protocols have very low computational overhead and require overall authentication times that can be less than 10 ms [63,68].Furthermore, our key generation scheme, proposed in Section 4.2, requires just a hashing operation and (syndrome) decoding.Hashing mechanisms such as SHA256 performed on an IoT device require less than 0.3ms [68,69].Regarding the decoding, if we assume the usage of standard LDPC or BCH error correcting mechanisms, even in the worst-case scenario with calculations carried out as software operations, the computation is trivial compared to the hashing and requires less computational overhead [70].

SKG procedure
The SKG system model is shown in Fig. 2.This assumes that two legitimate parties, Alice and Bob, wish to establish a symmetric secret key using the wireless fading coefficients as a source of shared randomness.Throughout our work, a rich Rayleigh multipath environment is assumed, such that the fading coefficients rapidly decorrelate over short distances [16].Furthermore, Alice and Bob communicate over a BF-AWGN channel that comprises N orthogonal subcarriers.The fading coefficients h =[ h 1 , . . ., h N ] are assumed to be independent and identically distributed (i.i.d), complex circularly symmetric zero-mean Gaussian random variables h j ∼ CN (0, σ 2 ), j = 1, . . ., N.Although in actual multicarrier systems neighboring subcarriers will typically experience correlated fading, in the present work, this effect is neglected as its impact on SKG has been treated in numerous contributions in the past [71][72][73] and will not enhance the problem formulation in the following sections.
The SKG procedure encompasses three phases: advantage distillation, information reconciliation, and privacy amplification [14,15] as described below: 1. Advantage distillation: This phase takes place during the coherence time of the channel.The legitimate nodes sequentially exchange constant probe signals with power P on all subcarriers 2 , to obtain estimates of their reciprocal CSI.We note in passing that the pilot exchange phase can be made robust with respect to injection type of attacks (that fall in the general category of MiM) as analyzed in [22,74].Commonly, the received signal strength (RSS) has been used as the source of shared randomness for generating the shared key, but it is possible to use the full CSI [75].At the end of this phase, Alice and Bob obtain observation vectors , respectively, so that: 2 An explanation of the optimality of this choice under different attack scenarios is discussed in [22].

Fig. 2 Secret key generation between Alice and Bob
where z A and z B denote zero-mean, unit variance circularly symmetric complex AWGN random vectors, such that (z A , z B ) ∼ CN (0, I 2N ).On the other hand, Eve observes Due to the rich Rayleigh multipath environment, Eve's channel measurement h E is assumed uncorrelated to h and z E denotes a zero-mean, unit variance circularly symmetric complex AWGN random vector z E ∼ CN (0, I N ).

Information reconciliation:
At the beginning of this phase, the observations x A,j , x B,j are quantized to binary vectors3 r A,j , r B,j j = 1, . . ., N [76][77][78], so that Alice and Bob distill respectively.Due to the presence of noise, r A and r B will differ.To reconcile discrepancies in the quantizer local outputs, side information needs to be exchanged via a public channel.Using the principles of Slepian-Wolf decoding, the distilled binary vectors can be expressed as where e A , e B are error vectors that represent the distance from the common observed (codeword) vector d at Alice and Bob, respectively.Numerous practical information reconciliation approaches using standard forward error correction codes (e.g., LDPC, BCH) have been proposed [16,75].As an example, if a block encoder is used, then the error vectors can be recovered from the syndromes s A and s B of r A and r B , respectively.Alice transmits her corresponding syndrome to Bob so that he can reconcile r B to r A .It has been shown that the length of the syndrome [15].This has been numerically evaluated for different scenarios and coding techniques [77,[79][80][81].Following that, the achievable SKG rate is upper bounded by I(x A ; x B |x E ).

Privacy amplification:
The secret key is generated by passing r A through a one-way collision resistant compression function i.e., by hashing.Note that this final step of privacy amplification is executed locally without any further information exchange.The need for privacy amplification arises in order to suppress the entropy revealed due to the public transmission of the syndrome s A .Privacy amplification produces a key of length strictly shorter than |r A |, at least by |s A |.At the same time, the goal is for the key to be uniform, i.e., to have maximum entropy.In brief, privacy amplification reduces the overall output entropy while at the same time increases the entropy per bit-compared to the input.
The privacy amplification is typically performed by applying either cryptographic hash functions such as those built using the Merkle-Damgard construction or universal hash functions and has been proven to be secure, in an information theoretic sense, through the leftover hash lemma [82].As an example [40,83] use a 2-universal hash family to achieve privacy amplification.Summarizing, the maximum key size after privacy amplification is: where H(x A ) represents the entropy of the measurement, I(x A ; x E ) represents the mutual information between Alice's and Eve's observations, H(x A |x B ) represents the entropy revealed during information reconciliation, and r 0 > 0 is an extra security parameter that ensures uncertainty on the key at Eve's side.For details and estimation of these parameters in a practical scenario, please see [84].As shown in this section, the SKG procedure requires only a few simple operations such as quantization, syndrome calculation, and hashing.In future work, we will examine the real possibilities of implementing such a mechanism in practical systems.

AE using SKG
To develop a hybrid cryptosystem that can withstand tampering attacks, SKG can be introduced in standard AE schemes in conjunction with standard block ciphers in counter mode (to reduce latency), e.g., AES GCM.As a sketch of such a primitive, let us assume a system with three parties: Alice who wishes to transmit a secret message m with size |m|, to Bob with confidentiality and integrity, and Eve that can act as a passive and active attacker.The following algorithms are employed: • The SKG scheme denoted by G : C → K × S, accepting as input the fading coefficients (modeled as complex numbers),and generating as outputs binary vectors k and s A in the key and syndrome spaces, of sizes |k| and |s A |, respectively, where k ∈ K denotes the key obtained from h after privacy amplification and s A is Alice's syndrome.• A symmetric encryption algorithm, e.g., AES GCM, denoted by Es : K × M → C T where C T denotes the ciphertext space with corresponding decryption for m ∈ M, c ∈ C T .• A pair of message authentication code (MAC) algorithms, e.g., in HMAC mode, denoted by Sign : K × M → T , with a corresponding verification algorithm A hybrid crypto-PLS system for AE SKG can be built as follows: 1.The SKG procedure is launched between Alice and Bob generating a key and a syndrome G(h) = (k, s A ). 2. Alice breaks her key into two parts k = {k e , k i } and uses the first to encrypt the message as c = Es(k e , m).Subsequently, using the second part of the key, she signs the ciphertext using the signing algorithm t = Sign(k i , c) and transmits to Bob the extended ciphertext [s A c t], as it is depicted in Fig. 3. 3. Bob checks first the integrity of the received ciphertext as follows: from s A and his own observation he evaluates k = {k e , k i } and computes Ver(k i , c, t).The integrity test will fail if any part of the extended ciphertext was modified, including the syndrome (that is sent as plaintext); for example, if the syndrome was modified during the transmission, then Bob would not have evaluated the correct key and the integrity test would have failed.4. If the integrity test is successful then Bob decrypts m=Ds(k e , c).

Resumption protocol
In Section 4.1 we discussed that using PUF authentication can greatly reduce the computational overhead of a system.Authentication of new keys is required at the start of communication and at each key renegotiation.However, the number of challenges that can be applied to a single PUF is limited.Due to that, we present a solution that is inspired by the 0-RTT authentication mode introduced in the 1.3 version of the TLS [23].The use of 0-RTT obviates the need of performing a challenge for every re-authentication through the use of a resumption secret r s , thus reducing latency.Another strong motivation for using this mechanism is that it is forward secure in the scenario we are using here [24].We first briefly describe the TLS 0-RTT mechanism before describing a similarly inspired 0-RTT mechanism applied to the information reconciliation phase of our SKG mechanism.The TLS 1.3 0−RTT handshake works as follows: in the very first connection between client and server, a regular TLS handshake is used.During this step, the server sends to the client a look-up identifier k l for a corresponding entry in session caches or it sends a session ticket.Then, both parties derive a resumption secret r s using their shared key and the parameters of the session.Finally, the client stores the resumption secret r s and uses it when reconnecting to the same server which also retrieves it during the re-connection.
If session tickets are used, the server encrypts the resumption secret using long-term symmetric encryption key, called a session ticket encryption key (STEK), resulting in a session ticket.The session ticket is then stored by the client and included in subsequent connections, allowing the server to retrieve the resumption secret.Using this approach, the same STEK is used for many sessions and clients.On one hand, this property highly reduces the required storage of the server; however, on the other hand, it makes it vulnerable to replay attacks and not forward secure.Due to these vulnerabilities, in this work, we focus on the session cache mechanism described next.
When using session caches, the server stores all resumption secrets and issues a unique look-up identifier k l for each client.When a client tries to reconnect to that server, it includes its look-up identifier k l in the 0-RTT message, which allows the server to retrieve the resumption secret r s .Storing a unique resumption secret r s for each client requires server storage for each client but it provides forward security and resilience against replay attacks, when combined with a key generation mechanisms such as Diffie Hellman (or the SKG used in this paper) which are important goals for security protocols [24].In our physical layer 0-RTT, given that a node identifier state would be required for linklayer purposes, the session cache places little comparative load and thus is the mechanism proposed here for (re-)authentication.
The physical layer resumption protocol modifies the information reconciliation phase of Section 3.1 following initial authentication to provide a re-authentication mechanism between Alice and Bob.At the first establishment of communication, we assume initial authentication is established, such as the mechanism shown in Section 4.During that, Alice sends to Bob a look-up identifier k l .Then, both derive a resumption secret r s that is identified by k l .Note, r s and the session key have the same length |k|.Then, referring to the notation and steps in Sections 4, 4.2, and 4.3: 1. Advantage distillation phase is carried out as before (see Section 4.2), where both parties obtain channel observations and obtain the vectors r A and r B . 2. During the information reconciliation phase, both Alice and Bob exclusive or the resumption secret r s with their observations r A and r B and obtain syndromes s A and s B with which both parties can carry out reconciliation to obtain the same shared value which is now r A ⊕ r s .3. The privacy amplification step in Section 4.2 is carried out as before, but now,0 the hashing takes place on r A ⊕ r s to produce the final shared key k that is a result of both the shared wireless randomness and the resumption secret.
Note that the key k can only be obtained if both the physical layer generated key and the resumption key are valid and this method can be shown to be forward secure [24].

Pipelined SKG and encrypted data transfer
As explained in Section 4, if Alice and Bob follow the standard sequential SKG process, they can exchange encrypted data only after both of them have distilled the key at the end of the privacy amplification step.In this Section, we propose a method to pipeline the SKG and encrypted data transfer.Alice can unilaterally extract the secret key from her observation and use it to encrypt data transmitted in the same "extended" message that contains the side information (see Fig. 3).Subsequently, using the side information, Bob can distill the same key k and decrypt the received data in one single step.
We have discussed in Section 4.2 how Alice and Bob can distill secret keys from estimates of the fading coefficients in their wireless link and in Section 4.3 how these can be used to develop an AE SKG primitive.At the same time CSI estimates are prerequisite in order to optimally allocate power across the subcarriers and achieve high data rates 4 .As a result, a question that naturally arises is whether the CSI estimates (obtained at the end of the pilot exchange phase) should be used towards the generation of secret keys or towards the reliable data transfer and, furthermore, whether the SKG and the data transfer can be inter-woven using the AE SKG principle.
In this paper, we are interested in answering this question and shed light into whether following the exchange of pilots Alice should transmit reconciliation information on all subcarriers, so that she and Bob can generate (potentially) a long sequence of key bits or, alternatively, perform information reconciliation only over a subset of the subcarriers and transmit encrypted data over the rest, exploiting the idea of the AE SKG primitive.Note here that the data can be already encrypted with the key generated at Alice, the sender of the side information, so that the proposed pipelining does not require storing keys for future use.We will call the former approach a sequential scheme, while we will refer to the latter as a parallel scheme.The two will be compared in terms of their efficiency with respect to the achievable data rates.
A simplified version of this problem, where the reconciliation rate is roughly approximated to the SKG rate, was investigated in [85].In this study, it was shown that in order to maximize the data rates in the parallel approach, Alice and Bob should use the strongest subcarriers-in terms of SNR-for data transmission and the worst for SKG.Under this simplified formulation, the optimal power allocation for the data transfer has been shown to be a modified water-filling solution.
Here, we explicitly account for the rate of transmitting reconciliation information and differentiate it from the SKG rate.We confirm whether the policy of using the strongest subcarriers for data transmission and not for reconciliation is still optimal when the full optimization problem is considered, including the communication cost for reconciliation.
As discussed in Section 4.2, our physical layer system model assumes Alice and Bob exchange data over a Rayleigh BF-AWGN channel with N orthogonal subcarriers.Without loss of generality, the variance of the AWGN in all links is assumed to be unity.During channel probing, constant pilots are sent across all subcarriers [16,86] with power P. Using the observations (1), Alice estimates the channel coefficients as for j = 1, . . ., N where hj denotes an estimation error that can be assumed to be Gaussian, hj ∼ CN 0, σ 2 e [87].Under this model, the following rate is achievable on the jth subcarrier from Alice to Bob when the transmit power during data transmission is p j [87]: where we use ĝi = i,e P+1 , to denote the estimated channel gains.As a result, the channel capacity C = N j=1 R j under the short-term power constraint: , where the water level λ is estimated from the constraint (14).In the following, the estimated channel gains ĝj are-without loss of generality-assumed ordered in descending order, so that: As mentioned above, the advantage distillation phase of the SKG process consists of the two-way exchange of pilot signals during the coherence time of the channel to obtain r A,j , r B,j , j = 1, . . ., N. On the other hand, the CSI estimation phase can be used to estimate the reciprocal channel gains in order to optimize data transmission using the water-filling algorithm.In the former case, the shared parameter is used for generating symmetric keys, in the latter for deriving the optimal power allocation.In the parallel approach, the idea is to inter-weave the two procedures and investigate whether a joint encrypted data transfer and key generation scheme as in the AE SKG in Section 4.3 could bear any advantages with respect to the system efficiency.While in the sequential approach the CSI across all subcarriers will be treated as a source of shared randomness between Alice and Bob, in the parallel approach, it plays a dual role.

Parallel approach
In the parallel approach, after the channel estimation phase, the legitimate users decide on which subcarrier to send the reconciliation information (e.g., the syndromes as discussed in Section 4.2) and on which data (i.e., the SKG process here is not performed on all of the subcarriers).The total capacity has now to be distributed between data and reconciliation information bearing subcarriers.As a result, the overall set of orthogonal subcarriers comprises two subsets: a subset D that is used for encrypted data transmission with cardinality |D| = D and a subset D with cardinality | D| = N − D used for reconciliation such that D ∪ D = {1, . . ., N}.Over D, the achievable sum data transfer rate, denoted by C D , is given by while on the subset D, Alice and Bob exchange reconciliation information at rate As stated in Section 4.2, the fading coefficients are assumed to be zero-mean circularly symmetric complex Gaussian random variables.Using the theory of order statistics, the distribution of the ordered channel gains of the SKG subcarriers, j ∈ D, can be expressed as [88]: where σ 2 is the variance of the channel gains.As a result of ordering the subcarriers, the variance of each of the subcarriers is now given by: Thus, we can now write the SKG rate as (note that the noise variances are here normalized to unity for simplicity) [16,86]: The minimum rate necessary for reconciliation was discussed in Section 4.2.Here, alternatively, we employ a practical design approach in which the rate of the encoder used is explicitly taken into account.Note that in a rate k n block encoder, the side information is n − k bits long, i.e., the rate of syndrome to output key bits after privacy amplification is n−k k .However, in each key session, a 0-RTT look-up identifier of length k is also sent.Therefore, we define the parameter κ = n−k k + 1 = n k , i.e., the inverse of the encoder rate, that reflects the ratio of the reconciliation and 0-RTT transmission rate to the SKG rate.For example, for a rate k n = 1 2 encoder, κ = 2, etc.Based on this discussion, we capture the minimum requirement for the reconciliation rate through the following expression: Furthermore, to identify the necessary key rate, we note that depending on the exact choices of the cryptographic suites to be employed, it is possible to reuse the same key for the encryption of multiple blocks of data, e.g., as in the AES GCM, that is being considered for employment in the security protocols for URLLC systems [4].In practical systems, a single key of length 128 to 256 bits can be used to encrypt up to gigabytes of data.As a result, we will assume that for a particular application it is possible to identify the ratio of key to data bits, which in the following we will denote by β.Specifically, we assume that the following security constraint should be met where, depending on the application, the necessary minimum value of β can be identified.We note in passing that the case β = 1 would correspond to a one-time-pad, i.e., the generated keys could be simply x-ored with the data to achieve perfect secrecy without the need of any cryptographic suites.
Accounting for the reconciliation rate and security constraints in ( 21) and ( 22), we formulate the following maximization problem: max p j ,j∈D j∈D R j (23) s.t.( 14), ( 21), (22) (22) can be integrated with (21) to the combined constraint The optimization problem at hand is a mixed-integer convex optimization problem with unknowns both the sets D, D, as well as the power allocation policy p j , j ∈ {1, . . ., N}.These problems are typically NP hard and addressed with the use of branch and bound algorithms and heuristics.
Algorithm 1 Heuristic Greedy Algorithm for ( 27)-( 28) 1: procedure HEURISTIC(start, end, R j ) 2: if N j=1 R j x j ≤ C 1+κβ then 6: x j ← 1; j ← j + 1 7: else do x j ← 0; j ← j + 1 8: end while 10: end procedure In this work, we propose a simple heuristic to make the problem more tractable by reducing the number of free variables.In the proposed approach, we assume that the constraint ( 24) is satisfied with equality.The only power allocation that allows this is the water-filling approach that uniquely determines the power allocation p j and also requires that the constraint ( 14) is also satisfied with equality.Thus, if we follow that approach, we determine the power allocation vector uniquely and can combine the remaining constraints ( 24) and ( 25) into a single one as: The new optimization problem can be re-written as max The problem in (27,28) is a subset-sum problem from the family of 0 − 1 knapsack problems that is known to be NP hard [89].However, these type of problems are solvable optimally using dynamic programming techniques in pseudo-polynomial time [89,90].Furthermore, it is known that greedy heuristic approaches are bounded away from the optimal solution by half [91].
We propose a simple greedy heuristic algorithm of linear complexity, as follows 5 .The data subcarriers are selected starting from the best-in terms of SNR-until ( 28) is not satisfied.Once this situation occurs, the last subcarrier added to set D is removed and the next one is added.This continues either to the last index N or until (28) is satisfied with equality.The algorithm is described in Algorithm 1.
The efficiency of the proposed parallel method-measured as the ratio of the long-term data rate versus the average capacity-is evaluated as: This efficiency quantifies the expected back-off in terms of data rates when part of the resources (power and frequency) are used to enable the generation of secret keys at the physical layer.In future work, we will compare the efficiency achieved to that of actual approaches currently used in 5G by accounting for the actual delays incurred due to the PKE key agreement operations [20].

Sequential approach
In the sequential approach, encrypted data transfer and secret key generation are two separate events; first, the secret keys are generated over the whole set of subcarriers, leading to a sum SKG rate given as To estimate the efficiency of the scheme, we further need to identify the necessary resources for the exchange of the reconciliation information.We can obtain an estimate of the number of transmission frames that will be required for the transmission of the syndromes, as the expected value of the reconciliation rate (i.e., its long-term value) The average number of frames needed for reconciliation is then computed as: where x denotes the smallest integer that is larger than x.
The average number of the frames that can be sent while respecting the secrecy constraint is: where x denotes the largest integer that is smaller than x.The efficiency of the sequential method is then calculated as: 6 Effective data rate taking into account statistical delay QoS requirements In the previous section, we investigated the optimal power and subcarrier allocations strategy of Alice and Bob in order to maximize their long-term average data rate and proposed a greedy heuristic algorithm of linear complexity.Here, we extend our work from Section 5 by taking into account delay requirements.In detail, we investigate the optimal resource allocation for Alice and Bob, when their communication has to satisfy specific delay constraints.To this end, we use the theory of effective capacity [28] which gives a limit for the maximum arrival rate under delay-bounds with a specified violation probability.
We study the effective data rate for the proposed pipelined SKG and encrypted data transfer scheme; the effective rate is a data-link layer metric that captures the impact of statistical delay QoS constraints on the transmission rates.As background, we refer to [92] which showed that the probability of a steady-state queue length process Q(t) exceeding a certain queue-overflow threshold x converges to a random variable Q(∞) as: where θ indicates the asymptotic exponential decay rate of the overflow probability.For a large threshold x, (34) can be represented as Pr[ Q(∞) > x] ≈ e −θx .Furthermore, the delay-outage probability can be approximated by [28]: where D max is the maximum tolerable delay, Pr[ Q(∞) > 0] is the probability of a nonempty buffer, which can be estimated from the ratio of the constant arrival rate to the averaged service rate, and ζ is the upper bound for the constant arrival rate when the statistical delay metrics are satisfied.
Using the delay exponent (θ) and the probability of non-empty buffer, the effective capacity, that denotes the maximum arrival rate, can be formulated as [28]: where denotes the time-accumulated service process, and s[ i] , i = 1, 2, ... denotes the discrete-time stationary and ergodic stochastic service process.Therefore, the delay exponent θ indicates how strict the delay requirements are, i.e., θ → 0 corresponds to looser delay requirements, while θ → ∞ implies exceptionally stringent delay constraints.Assuming a Rayleigh block fading system, with frame duration T f and total bandwidth B, we have s[ i] = T f B Ri , with Ri representing the instantaneous service rate achieved during the duration of the ith frame.In the context of the investigated data and reconciliation information transfer, Ri , is given by: where F is the equivalent frame duration, i.e., the total number of subcarriers used for data transmission, so that for the parallel approach, we have F = |D| while for the sequential approach, F = N(L + M)L −1 .Under this formulation and assuming that Gärtner-Ellis theorem [93,94] is satisfied, the effective data rate6 E C (θ) is given as: We set α = θT f B ln (2) .By inserting (37) into (38), we get: Assuming i.i.d.channel gains, by using the distributive property of the mathematical expectation, (39) becomes [95]: We further manipulate by using the log-product rule to obtain: Similarly, the effective syndrome rate can be written as: where the size of F here is |N − D|.
Using that, we now reformulate the maximization problem given in ( 23) by adding a delay constraint.The reformulated problem can be expressed as follows: s.t.( 14), (25), where E opt C (θ) represents the maximum achievable effective capacity for both key and data transmission for a given value of θ over N subcarriers: In the proposed approach, we assume that the constraint ( 44) is satisfied with equality.The optimization problem in (43) can be evaluated as two sub-optimization problems: (i) finding the optimal long-term power allocation from ( 14) and ( 45) and (ii) finding the optimal subcarrier allocation that satisfies (25).We solve the first problem that gives the optimal power allocation using convex optimization tools.Next, as in Section 5. we use two methods to solve subcarrier allocation problem, i.e., by formulating a subset-sum 0 -1 knapsack optimization problem or through a variation of Algorithm 1.The efficiency of both methods is compared numerically to the sequential method in Section 7. Now, following the same steps as in ( 39)-( 41) and using the fact that maximizing E C (θ) is equivalent to minimizing −E C (θ) (this is due to log(•) being a monotonically increasing concave function for any θ > 0), we formulate the following minimization problem: s.t.(14).
where F = N in this case as the full set of subcarriers is concerned.We form the Lagrangian function L as: By differentiating (47) w.r.t.p i and setting the derivative equal to zero [96], we get: Solving (48) gives the optimal power allocation policy: where g 0 = Nλ α is the cutoff value which can be found from the power constraint.By inserting p * i in E C (θ), we obtain the expression for E opt C (θ): When θ → 0, the optimal power allocation is equivalent to water-filling, and when θ → ∞, the optimal power allocation transforms to total channel inversion.Now, fixing the power allocation as in (49), we can easily find the optimal subcarrier allocation that satisfies (25).As in Section 5, to do that, we first formulate a subset-sum 0 -1 knapsack optimization problem that we solve using the standard dynamic programming approach.Furthermore, we evaluate the performance of the heuristic algorithm presented in Algorithm 1.

Results and discussion
In this section, we provide numerical evaluations of the efficiency that can be achieved with the presented methods (i.e., sequential and parallel) for different values of the main parameters.With respect to the parallel approach, we provide numerical results of the optimal dynamic programming solution of the subset-sum 0 − 1 knapsack problem, as well as of the greedy heuristic approach presented in Algorithm 1.For the case of the longterm average data rate C D (16), we compare the two methods through their efficiencies, i.e., η sequential and η parallel given in (33) and (29), respectively.Next, to compare the two methods in the case of effective data rate, we evaluate E C,D (θ) given in (41).For better illustration of each case, they are separated into different subsections.

Numerical results for the case long-term average C D
Figure 4a and b show the efficiency of the methods for N = 12 and N = 64, respectively, while κ = 2 and P = 10.We note that the proposed heuristic algorithm has a near-optimal performance (almost indistinguishable from the red curves achieved with dynamic programming).Due to this fact (which was tested across all scenarios that follow), only the heuristic approach is shown in subsequent figures for clarity in the graphs.
We see that when there are a small number of subcarriers (N=12, typical for NB-IoT) and small β, the efficiency of both the parallel η parallel and the sequential η sequential approaches are very close to unity, a trend that holds for increasing N.With increasing β, due to the fact that more frames are needed for reconciliation in the sequential approach (i.e., M increases), regardless of the total number of subcarriers, the parallel method proves more efficient than the sequential.While the efficiency of the sequential and parallel methods coincide almost until around β = 0.01 for N = 12, for N = 64, the crossing point of the curves moves to the left and the efficiency of the two methods coincide until around β = 0.001.This trend was found to be consistent across many values of N, only two of which are shown here for compactness of presentation.Next, in Fig. 5, the efficiency of the parallel η parallel and the sequential η sequential methods are shown for two different values of κ ∈ {2, 3} for SNR = 10 dB and N = 24.It is straightforward to see that they both follow similar trends and when κ increases the efficiency decreases.On the other hand, regardless of the value of κ, they both perform identically until around β = 0.001.
Finally, in Fig. 6, focusing on the parallel method, the average size of set D is shown for different values of σ 2 e and SNR levels (Fig. 6a) and κ (Fig. 6b), for N = 24.As expected, in Fig. 6a, we see when the SNR increases the size of the set increases, too.This is due to the fact that more power is used on any single subcarrier and consequently a higher reconciliation rate can be sustained.Regarding the estimation error σ 2 e of the CSI, it only slightly affects the performance at high SNR levels.Hence, more subcarriers have to be used for reconciliation, and fewer for data.The SNR level in Fig. 6b is set to 10 dB.The  figure shows that when increasing κ, the size of set D decreases.This result can be easily predicted from inequality (21), meaning, when κ increases, more reconciliation data has to be sent; hence, fewer subcarriers can be used for data.In both Fig. 6a and b when β increases, the size of set D decreases; this effect is a consequence of constraint (28) as the data rate is decreasing with β.

Numerical results for the case of effective data rate
Inspired by the good performance of Algorithm 1, in the case where long-term average rate is the metric of interest, here, we continue our investigation with a variation of Algorithm 1, with the following differences: at lines 3 and 5 instead of (26), we use the constraint (25); the power allocation is fixed as in (49).The performance of our system is again compared with a sequential method, and the metric of interest here is the effective data rate.The comparison is performed by taking into account the following parameters: signal-to-noise ratio (SNR), number of subcarriers N, ratio of the reconciliation and 0−RTT transmission rate to the SKG rate κ, delay exponent θ, and the ratio of key bits to data bits β.
In Fig. 7, we give a three-dimensional plot showing the dependence of the achievable effective data rate E C,D (θ) on β and θ. Figure 7a and b compare the parallel heuristic approach and the sequential approach for high SNR levels, whereas Fig. 7c and d compare their performance for low SNR level.In Fig. 7a and c, we have N = 12 while in Fig. 7b and  d, the total number of subcarriers is N = 64.All graphs compare the performance of the heuristic parallel approach and the sequential approach for κ = 2.
As discussed in Section 6, when the delay exponent θ increases, the optimal power allocation transforms from waterfilling to total channel inversion.Consequently, the rate achieved on all subcarriers converges to the same value; hence, when we a have small number of subcarriers (such as N = 12) and small values of β, then using a single subcarrier for reconciliation data will use more capacity than needed and most of the rate on this subcarrier is wasted.Devoting a whole subcarrier for sending the reconciliation data for the case of N = 12 and β = 0.0001 is almost equivalent of losing 1/12 of the achievable rate.This can be seen for in Fig. 7a and c, where N = 12.When the SNR is high (See Fig. 7a), as discussed, this effect is mostly noticeable for large values of θ and small values of β 7 , whereas for small values of β and θ, both algorithms perform nearly identically.A similar trend can be seen at the low SNR regime in Fig. 7c.However, at a low SNR, the sequential approach has a lower effective data rate.This happens because at high SNR levels, each reconciliation frame will contain more information and hence more data frames will follow.Therefore, at the low SNR regime, the reconciliation information received will decrease; hence, less data can be sent afterwards.This does not affect the parallel approach.However, in both scenarios high SNR Fig. 7a and low SNR Fig. 7c, when β increases regardless of the value of θ, the parallel approach always achieves higher effective data rate E C,D (θ).
In the next case, when the total number of subcarriers is N = 64, illustrated in Fig. 7b  and d, we see that the penalty of devoting a high part of the achievable effective capacity E opt C (θ) to reconciliation disappears and the heuristic parallel approach always achieves higher or identical effective data rate E C,D (θ) compared to the sequential approach.This trend repeats for high and low SNR levels as given in Fig. 7b and d, respectively.Now, we take a closer look and transform some specific cases from the 3D plots to twodimensional graphs.In Fig. 8, we see the achieved effective data rate E C,D (θ) given in  (41), for different values of N and θ while the SNR =5 dB and κ = 2. Figure 8a gives the achieved effective rate on set D for N = 12 and θ = 0.0001 (relaxed delay constraint).Similarly to the case of long-term average value of C D , we see that for small values of β, the sequential approach achieves slightly higher effective data rate.As before, the increase of β results in more reconciliation frames M required in the sequential case.This effect is not seen in the parallel case and for high values of β it performs better.
Figure 8b illustrates the case when N = 12 and θ = 100 (very stringent delay constraint).Similarly, as in Fig. 7, we can see that for small values of β, the sequential approach performs better than the parallel.As discussed, the efficiency loss is caused by the fact that the devoted part of the total achievable effective capacity E opt C (θ) to reconciliation (syndrome communication) is more than what is required.However, a higher β leads to an increase in the reconciliation information that needs to be sent, and the rate of the subcarriers in set D will be fully or almost fully utilized and the parallel approach shows better performance for these values.
In the next two, Fig. 8c and d, we show the performance of the two algorithms for higher value of N = 64.It is easy to see that regardless of the value of θ and β, both algorithms perform identical or the parallel is better.In the previous case of N = 12, increasing θ might reduce the effectiveness of the parallel approach; however, when N = 64, increasing θ does not incur such a penalty and the parallel is either identical to the sequential or outperforms it.
Another interesting fact from Fig. 8 is that looking at the parallel approach, it can easily be seen that in all cases, the heuristic approach almost always performs as well as the optimal knapsack solution.The case of small values of θ is similar to the one when we work with long-term average rate and choosing the best subcarriers for data transmission works as well as the optimal Knapsack solution.Interestingly, Algorithm 1 works well for high values of θ, too.This can be explained by the fact that when θ increases, the rate on all of the subcarriers becomes similar, and switching the subcarriers in set D does not incur high penalty.

Conclusions
In this work, we discussed the possibility of using SKG in conjunction with PUF authentication protocols, illustrating this can greatly reduce the authentication and key generation latency compared to traditional mechanisms.Furthermore, we presented an AE scheme using SKG and a resumption protocol which further contribute to the system's security and latency reduction, respectively.
In addition, we explored the possibility of pipelining encrypted data transfer and SKG in a Rayleigh BF-AWGN environment.We investigated the maximization of the data transfer rate in parallel to performing SKG.We took into account imperfect CSI measurements and the effect of order statistics on the channel variance.Two scenarios were differentiated in our study: (i) the optimal data transfer rate was found under power and security constraints, represented by the system parameters β and κ, which represent the minimum ratio of SKG rate to data rate and the maximum ratio of SKG rate to reconciliation rate and (ii) by adding a delay constraint, represented by parameter θ, to the security and power constraint, we found the optimal effective data rate.
To finalize our study, we illustrated through numerical comparisons the efficiency of the proposed parallel method, in which SKG and data transfer are inter-weaved to a sequential method where the two operations are done separately.The results of the two scenarios showed that in most of the cases, the performance of both methods, parallel and sequential, is either equal or the parallel performs better.As the possible advantage of using the sequential is small and only applies in particular scenarios, we recommend the parallel scheme as a universal mechanism for general protocol design, when latency is an issue.Furthermore, a significant result is that although the optimal subcarrier scheduling is an NP hard 0 − 1 knapsack problem, it can be solved in linear time using a simple heuristic algorithm with virtually no loss in performance.

29 Fig. 3
Fig. 3 Pipelined SKG and encrypted data transfer between Alice and Bob the well known water-filling power allocation policy p j = 1 λ − 1 ĝj +

Fig. 6 a
Fig. 6 a Size of set D for different SNR levels and σ 2 e when N = 24.b Size of set D for different values of κ when N = 24

Fig. 7 a
Fig.7a Effective data rate achieved by the parallel heuristic approach and the sequential approach when N = 12, SNR= 10 dB, and κ = 2. b Effective data rate achieved by the parallel heuristic approach and the sequential approach when N = 64, SNR= 10 dB, and κ = 2. c Effective data rate achieved by the parallel heuristic approach and the sequential approach when N = 12, SNR= 0.2 dB, and κ = 2. d Effective data rate achieved by the parallel heuristic approach and the sequential approach when N = 64, SNR= 0.2 dB, and κ = 2