Pulse shaping design for OFDM systems

Spectrally contained OFDM-based waveforms are considered key enablers for a flexible air interface design to support a broad range of services and frequencies as envisaged for 5G mobile systems. By allowing for the flexible configuration of physical layer parameters in response to diverse requirements, these waveforms enable the in-band coexistence of different services. One candidate from this category of waveforms is pulse-shaped OFDM, which follows the idea of subcarrier filtering while fully maintaining the compatibility with CP-OFDM. In this paper, we provide an overview of pulse shaping methods in OFDM systems and propose a new pulse-shaped design method with arbitrary length constraint and good time-frequency localization property. Based on the pulse design, we discuss different receiver realizations and present a criterion for pulse shape evaluation. In addition, the parameterizations of OFDM system to address diverse requirements of the services envisaged for the 5G systems are described. Link and system performance results for selected scenarios show that a proper design of the OFDM numerologies and pulse shapes could substantially improve the performance under time and frequency distortions. Furthermore, pulse-shaped OFDM is able to support asynchronous transmissions and reduce the signal sensitivity to Doppler distortions, rendering it beneficial for various applications from the context of vehicular communications and the Internet-of-things.


Introduction
The next generation of mobile systems, the fifth generation (5G), is envisaged to accommodate a large variety of new scenarios and use cases, which impose diverse requirements to the system. More specifically, the three main services, enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine type communication (mMTC), impose different requirements on the 5G air interface [1,2], yielding new technical challenges.
As one of the key components, waveform design is considered a fundamental brick stone for enabling a flexible air interface design. Recent 3GPP has conducted comprehensive discussions on waveform design for 5G new radio (NR). According to the latest agreements [3,4], orthogonal frequency-division multiplexing (OFDM)-and discrete Fourier transform spread OFDM (DFTs-OFDM)based waveforms, including filtering and windowing for spectral containment, are the most promising candidates for 5G eMBB service, which is underpinned by In order to provide flexibility on physical layer, recent research has focused on enhancements of the OFDM waveform with respect to its supported numerologies and considering additional filtering components; for an overview, refer to [5][6][7]. Generally speaking, the new waveform proposals fall into two main categories: subcarrier-wise filtering, comprising filter bank multicarrier (FBMC) [6], windowed OFDM [8] and pulseshaped OFDM (P-OFDM) [9], etc, and sub-band wise filtering, composed of universal filtered (UF)-OFDM [10] and filtered OFDM [11], etc. FBMC in particular received a lot of attention in research during the past years [12], thanks to its favorable properties of not requiring a cyclic prefix (CP) and attaining very steep filter slopes, which can facilitate an excellent isolation of the signal power in frequency domain. However, these favorable properties are "bought" by a relaxed orthogonality (in fact, strict orthogonality holds for the real-valued signal field only), which requires a redesign of several algorithms developed for conventional OFDM systems. Due to this reason, it was hard for FBMC to get commonly accepted as a mature candidate for 5G, which set off the research on CP-OFDM compatible waveforms with filtering, targeting to maintain as many of the favorable properties of FBMC as possible [5]. Table 1 reviews the transmit waveform specified by existing mobile system standards, spanning 2G to 5G communication systems. As flexibility and forwardcompatibility are considered vital properties for future 5G air interface design, we propose here the flexible pulseshaped OFDM waveform with configurable numerology sets and pulse shapes. It is compatible with the current state-of-the-art OFDM-based communication system and multi-antenna technologies, while the option for pulse shape design enables radio coexistence and improved robustness to time-frequency distortions.
Pulse-shaped OFDM is closely related to windowed OFDM and filtered multitone (FMT) [6]. It exploits the pulse shape as an additional degree of freedom for multicarrier modulation systems. The principle of OFDM with pulse shaping has been introduced in [13]. One of the main criteria for pulse shape design is the timefrequency localization (TFL), which has been identified in [14] as an important target to achieve low out-of-band emissions and low interference induced in doubly dispersive channels. Furthermore, [15][16][17] investigated the pulse shape optimization framework considering realistic channel knowledge. However, these resulting pulse shapes usually span over several successive symbols, rendering them not well suited for selected scenarios like short-block low-latency transmission or fast time division duplex (TDD) uplink-downlink switching. To address this practical issue, [18,19] proposed some analytic solutions for the design of short pulse shapes for specific settings.
This paper is dedicated to the design and application of pulse-shaped OFDM in future mobile radio systems. We show that through a proper pulse shape design, it is possible to substantially improve the robustness to timefrequency distortions, to provide better spectral containment and thus to enable flexible physical layer (PHY) configurations for selected sub-bands within a given system bandwidth, which can be tailored to particular service-specific requirements. Specifically, the contributions of this paper are as follows: a comprehensive overview of pulse shaping methods in OFDM systems is provided, followed by a new pulse shape design method with arbitrary length constraint, which maintains orthogonality while providing good time-frequency localization property. Based on the designed pulse, we also discuss different receiver realizations and provide a criterion for evaluating pulse shapes. In addition, we describe the suitable parameterizations for the pulse shape design to address requirements of the diverse services envisaged for the 5G system. Finally, the implementation complexity of pulse-shaped OFDM systems is analyzed.
The paper is organized as follows: Section 2 will introduce the system model with pulse shaped OFDM and state-of-the art OFDM systems. Section 3 gives the principles for OFDM pulse shape design and some practical methods. Section 4 evaluates the pulse shape examples designed in Section 3. Section 5 discusses the parametization of pulse shaped OFDM for the new services and challenges envisaged in future mobile systems, and Section 6 addresses the practical implementation and system impacts. Some application examples are illustrated in Section 7. Finally, Section 8 draws the conclusions.

OFDM system and pulse shaping
In this section, a generic OFDM system model with pulse shaping is introduced. The state-of-the-art OFDM systems (including CP-OFDM, windowed OFDM, timefrequency-localized OFDM [14]) can be considered as a typical pulse-shaped OFDM system. Their design methodology and the system impact on the OFDM numerology are briefly discussed.

System model
The transmit signal s(t) of an OFDM-based multicarrier system can be generally represented as follows [13,15]  where a m,n is the information bearing symbol on the mth subcarrier of the nth symbol. M A is the number of active subcarriers. The transmit filter bank g m,n (t) is a timefrequency shifted version of the transmit pulse shape (also known as prototype filter) 1 g(t), i.e., with symbol period T and subcarrier spacing F. Note that subband-based filtering can be used on top of s(t) with a band-pass filter, in order to further suppress the out-ofband (OOB) leakage. At the receiver side, the demodulated symbolã m,n is obtained by correlating the received signal r(t) with the receive filter γ m,n (t): where (·) * denotes the complex conjugate operation, γ m,n (t) is a time-frequency shifted version of the receive pulse γ (t) 2 In short, a generic OFDM-based system with pulse shaping can be presented by the following steps: the transmit signal is first synthesized using (1), passed through propagation channels, and then analyzed at the receiver through (3).
If the pulses employed at the transmitter and the receiver are the same, i.e., g(t) = γ (t), the approach is matched filtering [7]. Alternatively, different pulses can be used at the transmitter and the receiver, i.e., g(t) = γ (t), yielding the mis-matched filtering. Generally, matched filtering aims at maximizing the signal-to-noise ratio (SNR) in additive white Gaussian noise (AWGN) channel, while mis-matched filtering allows for better balancing the effect of inter-symbol interference (ISI) and inter-carrier interference (ICI) experienced in doubly dispersive channels with the effect of noise enhancement.
Different from conventional CP-OFDM where the pulse shape is fixed to the rectangular pulse, pulse-shaped OFDM follows the idea of fully maintaining the signal structure of CP-OFDM but allowing for the use of flexible pulse shapes to balance the localization of the signal power in time and frequency domain. The prototype filter pair g(t) and γ (t), together with the numerology parameters T and F, are the central design parameters for pulse-shaped OFDM system.
A useful representation of numerology design is a lattice that contains the coordinates in the time-frequency plane. Assume the symbol period is T = NT s and subcarrier spacing is set to F = 1/MT s , where T s is the sampling period and M, N ∈ N denote fast Fourier transform (FFT) size and the number of samples constituting one symbol period, respectively. Figure 1 depicts the rectangular lattice representation for OFDM. The metric 1/TF can be considered as the data symbol density in rectangular sampling lattice and it is proportional to the spectral efficiency.
In this paper, we choose the numerology T and F such that TF = N/M > 1 holds [15]. Under this condition, orthogonality can be guaranteed for the signal space, yielding the full compatibility with the current techniques developed for OFDM.
Pulse-shaped OFDM allows the pulse shape to extend over the symbol period, rendering successively transmitted symbols to overlap or partially overlap. The overlap is characterized by the overlapping factor K, which is defined as the ratio of filter length L g and the symbol period, i.e., K = L g /T. The factor K can be set to any rational number in pulse-shaped OFDM.

Transceiver of pulse-shaped OFDM
As a typical uniform filter bank system, the overall transceiver structure of the pulse-shaped OFDM system is given in Fig. 2. The pulse shaping can be efficiently realized by a polyphase network (PPN) [20] for arbitrary overlapping factor K. For short pulse shapes where K ≈ 1, the PPN structure can be simplified to the "CP addition"/"CP removing"/"zero-padding" and "windowing" operations, etc. For K > 1, PPN implementation can be considered as a realization of an "overlap-add" procedure.

State-of-the-art OFDM systems and numerology design
Numerology design for multicarrier systems, including the determination of symbol period T and subcarrier spacing F, is an essential part in the system design. Its design needs a comprehensive consideration of many aspects, such as spectrum efficiency or propagation channel characteristics. In this section, we will briefly introduce the numerology design of the state-of-the-art OFDM-based systems; a detailed overview on the waveform candidates under discussion for 5G is provided in Appendix 1. All those waveform candidates can be considered as special cases in the pulse shaped OFDM framework.

CP-OFDM
The derivation of OFDM numerology w.r.t. (T, F) can be carried out by the following steps: • Set the CP length T cp according to the channel characteristics, i.e., at least longer than the maximum channel excess delay τ max .
• Determine the minimal subcarrier spacing F such that the signal-to-interference ratio (SIR) for the maximum Doppler frequency (ν max ) is above the minimum SIR requirement (SIR min ) for supporting the highest modulation requirement in the system. • Determine the approximate values of T and F based on the above two steps. • Quantize T and F according to the sampling rate and sub-frame numerology.
The above steps are based on the premise that CP-OFDM can support reliable transmission without ISI and ICI if the maximum excess delay of the channel is smaller than the CP length. It is a pragmatic approach since the robustness of CP-OFDM is pronounced in the time domain rather than in the frequency domain.

W-OFDM
Windowed OFDM (W-OFDM) is originally introduced as an enhancement to CP-OFDM for reducing the OOB emission. Recently, 3GPP RAN1 agreed that windowing is one of the favored approaches for achieving spectral confinement.
Essentially, W-OFDM is a pulse-shaped OFDM system, where a window with smoothened edges is used instead of a rectangular one (as used in CP-OFDM) to effectively reduce the side lobes. Some overlap of the window tails of the succeeding symbols is allowed, thereby from link performance perspective, W-OFDM is aiming at trading off its robustness in the time domain (due to the relaxation of CP) for an improved robustness in the frequency domain. We will show later that for the typical operational range of mobile systems, a properly designed W-OFDM system can outperform its CP-OFDM counterpart and better fulfill time-frequency (TF) localization requirements.

TF-localized OFDM
Given the same spectral efficiency, it has been shown that the link level performance can be improved over conventional CP-OFDM and its pragmatic (T, F) numerology design [14,15]. One solution is the TF-localized OFDM aiming at minimizing the distortion resulting from time-frequency dispersive channels [14]. The numerology design of this waveform is comprised of the following steps: • Determine the ratio of T and F : In order to reduce ISI and ICI, the numerology T and F of the TF-localized OFDM should be chosen in correspondence to the characteristic parameters of the doubly dispersive channel. Specifically, for the given maximal time delay τ max and maximal Doppler spread ν max , the choice of T and F should satisfy [14] T • For a fair comparison with CP-OFDM, the product of TF should be set to the same value as in CP-OFDM, reflecting the relative CP overhead or spectral efficiency loss. Combined with the ratio of T and F specified above, parameters T and F can easily be obtained. Otherwise, if TF is not specified, the numerology needs to be determined as follows: First, initialize TF with some pre-defined number; then, calculate SIR using TF-localized pulses in case of maximum excess delay and Doppler frequency; finally, adapt the numerology T and F to guarantee that the resulting SIR can support the transmission using the highest modulation format.

OFDM pulse shape design and proposed methods
Future mobile communication systems are envisioned to support the coexistence of multiple services with diverse requirements. PHY setting including waveform configuration is thus anticipated to be adapted to different requirements for each service. For example, URLLC requires low latency, and comparably short pulse is favorable. Narrowband internet-of-things (NB-IOT) service targets at good coverage extension and allows long pulse design. Machine-type communication (MTC) with mobility may require pulse design to be robust to asynchronicity and Doppler spread. In this section, we discuss several pulse shape design approaches and outline their features and applications.

Pulse shape categorization
In OFDM systems with pulse shaping, the ISI and ICI are determined by the transmit pulse g(t) and the receive pulse γ (t). In this paper, we use the pulse shape categorization according to the correlation property [7]: • Orthogonal pulse design is the pulse shaping scheme where perfect reconstruction condition is fulfilled (details given in Section 3.2.1) and matched filtering is employed.
• Bi-orthogonal pulse design is the pulse shaping scheme where perfect reconstruction condition is fulfilled and mis-matched filtering is employed. • Non-orthogonal pulse design is the pulse shaping scheme where perfect reconstruction condition is not fulfilled.

Design criteria
Depending on the specific criteria for pulse-shaped OFDM systems, pulse shapes are constructed to satisfy diverse requirements. Herein, we discuss several commonly applied conditions that the pulse shape design needs to fulfill.

Length constraint
Length constraint is the primary design criterion for OFDM pulse shapes. Since many use cases in the eMBB and URLLC context require short processing latency and stringent timing for framed transmission, short pulse lengths comparable to one OFDM symbol duration (i.e., K 1) are favorable here. In other scenarios such as mMTC or NB-IOT, though, latency constraints may be relaxed, such that pulse lengths of several symbols (K ≥ 2) can be allowed if they provide clear benefits addressing the needs of the corresponding service.

Near-perfect reconstruction condition
Assuming an ideal channel where r(t) = s(t), a perfect reconstruction (PR) condition holds if g m ,n , γ m,n = δ m m δ n n , where g m,n and γ m,n follow the definition in (2) and (4), respectively. Due to the fact that a certain level of self-interference can be tolerated for the reliable transmission of modulated signals in practice, we redefine the orthogonal and bi-orthogonal conditions as near-perfect reconstruction condition by slightly relaxing the conventional PR condition in this paper to allow for minor cross-correlation, i.e., where is determined according to the error vector magnitude (EVM) and signal-to-interference-plus-noise ratio (SINR) requirement, as detailed in Appendix 2. The condition (7) is also named as bi-orthogonality condition assuming g(t) = γ (t), since it is a prerequisite in bi-orthogonal division multiplexing (BFDM) systems for reconstructingã m,n from r(t) [6]. Under the condition that matched filtering is employed, i.e., g(t) = γ (t), (7) reduces to the orthogonality condition.
Opposed to the orthogonal transceiver pulse where the SNR for AWGN channel is maximized, BFDM has a potential to further reduce ISI and ICI for dispersive channels at the cost of a noise enhancement. It has been shown in [14,15] that a necessary condition to achieve perfect reconstruction (either orthogonal or bi-orthogonal) is TF ≥ 1. Larger values of TF lead to larger spectral efficiency loss but provide more degrees of freedom for the orthogonal pulse design.

Time-frequency localization
The ISI and ICI can be reduced if the pulse shapes at the transmitter and receiver are jointly TF localized. The classical way to measure time-frequency localization (TFL) of a filter involves the Heisenberg uncertainty parameter [7,14]. Filters with good TFL properties have a Heisenberg parameter ξ closer to 1. Assuming the center of gravity of g(t) is at (0, 0), the "width" of g in the time and frequency domain is often measured using the second-order moments defined as . Then, the Heisenberg uncertainty parameter ξ is given by where equality holds if and only if g(t) is a Gaussian function, rendering such filter to have optimal TFL [14]. Note that the joint TFL of transceiver pulses considering channel dispersion is related to the TF concentration properties of both transmit and receive pulses. The work in [15] has asserted that excellent TFL characteristics can be simultaneously achieved by pulse pairs.

SIR/SINR optimization
In wireless communication systems, the essential goal is to transmit signals reliably in practical channels. Hence, the above criteria can be slightly relaxed to increase the design degree of freedom, as long as the link performance with pulse shaping is optimized relative to certain dispersive channels of interest.
One common criterion is SIR or SINR optimization, namely, the transceiver pulses are chosen to optimize the SIR/SINR under certain dispersive channels. The resulting pulse shapes may not exactly satisfy perfect reconstruction condition, but offer better ISI/ICI robustness in dispersive channels compared to orthogonal or biorthogonal design. In the following, we detail this optimization problem for both continuous and discrete channel models.
Continuous model assuming a doubly dispersive fading channel satisfying wide-sense stationary uncorrelated scattering (WSSUS) property, its scattering function (channel statistics) can be described as [15] where h(t, τ ) is the time-varying impulse response at time instance t and delay τ , ν is the Doppler frequency, and H indicates the random linear time-varying channel. We The SINR involving pulse shaping is represented by is the cross ambiguity function of transceiver pulse pair g(t), γ (t) [15], and Q The energy of the transmit symbols are assumed to be normalized to one. If the noise variance part σ 2 n is omitted in (10), it reduces to the SIR metric.
Discrete model Based on the sampling period T s , assume the discrete dispersive channel to have P paths, where the pth path is characterized by the path delay τ p , the Doppler frequency shift ν p , and the complex channel gain η p (t). Let the P channel gains be stacked into one vector Under the assumption of WSSUS property and applying it to the discrete model, the channel correlation function R H = E{η(t)η(t) H } yields a diagonal matrix, with the diagonal elements indicating the power of the individual path gains and (·) H denoting the Hermitian operation. Assuming transmit and receive filters being discretized as well and their power being normalized to one, both these filters can be represented by the vectors g ∈ R L g ×1 , γ ∈ R L γ ×1 containing the discrete filter coefficients, where L g and L γ denote the filter lengths for transmit and receive filters, respectively. Using the above discrete expressions, the discrete model of the SINR is given by where the energy of the transmit symbols are assumed to be normalized to one. G m,n is a matrix constructed from the filter vector g m,n , which is created analogously to its continuous counterpart (2), representing the filter used for subcarrier position m and symbol position n. Each column of matrix G m,n represents the filter vector g m,n transmitted through one of the P channel taps, i.e., the vector in the pth column is shifted by the corresponding path delay τ p and modulated by the Doppler frequency ν p . For symbols preceding or succeeding the symbol of interest, i.e., for n = 0, an additional time shift of n times the symbol duration N has to be considered. If the noise variance part σ 2 n is dropped, (11) reduces to the SIR metric.

Design methods
Taking the abovementioned design criteria into consideration, the ultimate goal of pulse design is to have short pulses with maximal spectral efficiency, optimal time-frequency localization, minimized interference, and best SINR performance for arbitrary channels. Nevertheless, not all the requirements can be fulfilled simultaneously in reality, either due to contradictory conditions or practical constraints. Alternatively, in this section, we propose two approaches to design the transceiver pulse shapes for practical pulse-shaped OFDM systems, where both can respect an arbitrarily given length constraint.
1. Orthogonal design without channel statistics: For the case that the system has no reliable knowledge on channel statistics for pulse optimization, we seek to apply (almost) orthogonal transceiver pulse pair with good time-frequency localization. 2. Bi-orthogonal design with channel statistics: For the case that the system has reliable knowledge on channel statistics (e.g., scattering function) for pulse optimization, transceiver pulses are designed to achieve optimal link level performance (w.r.t. SIR/SINR) given such channel knowledge.

Orthogonal pulse design without channel statistics
As introduced before, the orthogonal pulse design employs matched filtering at the transceiver in order to achieve the maximum SNR for AWGN channels. In the absence of channel statistics, we suggest to use such orthogonal pulse design with good TFL characteristic. The TFL property is of vital importance in the pulse design since it affects the vulnerability to ISI/ICI in doubly dispersive channels. Note that TF > 1 is assumed here to have sufficient degrees of freedom for the pulse design. In the following, a universal approach for producing orthogonal pulses with constrained length as well as good TFL will be proposed. Before detailing the proposed method, we first review the orthogonal pulse generation in the literature, which provides a basis for our proposal. The classical approach in [14,15] consists of the following steps.
• Select an initial well-localized pulse, e.g., a Gaussian pulse with a decaying factor α.
• Construct an orthogonal system g gauss : Orthogonalization can be constructed according to [14], or efficient numerical solution for orthogonalization can be obtained by matrix factorization methods [21,22]. It is proven in [14] that by appropriately dilating or shrinking g (α) gauss , i.e., adjusting α, one can easily generate the optimal TFL pulses to match different channel dispersion properties.
The resulting orthogonal pulse g (α) ⊥ usually is unconstrained in its temporal length, resulting in a large overlapping factor. As elaborated above, this is not desired in many use cases from the eMBB and URLLC context, where a time-constrained short pulse is preferred. In order to generate such a pulse from g (α) ⊥ given the desired filter duration D req = KT with K ≥ 1, the simplest approach is to directly perform soft or hard truncation on g (α) ⊥ . However, this approach leads to non-orthogonality and degrades the TFL properties.
For generating orthogonal prototype filters with fixed length close to the symbol duration, (i.e., K ∼ 1), Pinchon et al. have derived two explicit expressions to compute the filter coefficients for two different optimization criteria: minimizing OOB energy and TFL [19]. Using the discretization illustrated in Fig. 1, the derivation requires the condition N 0 = M 0 + 1 where N 0 = N/gcd (N, M) and M 0 = M/gcd (N, M). Such constraint renders the extension to more general cases not straightforward.
We propose a method that aims at generating orthogonal pulses with arbitrary length constraint and maintaining good TFL property and orthogonality [23]. Given an initial well-localized pulse, by repeatedly performing orthogonalization and truncation, the overall process will converge under a given convergence criterion. A design example is described below. Details of the algorithm are described in Algorithm 1 which involves several essential steps.
• Initialize the pulse g (0) : We choose a Gaussian pulse gauss (t) = (2α) 1/4 e −παt 2 as the initial pulse g (0) due to its optimal TFL [14]. This step is similar to the first step of the abovementioned standard method but described in a discrete manner. The factor α determines the TFL of g Let n = n + 1.
ISI and ICI, it is suggested to choose α ≈ ν max /τ max [14]. In general, α can be adjusted to match different channel conditions. • Orthogonalize g (n−1) [ l] using the standard method, namely, by computing • Truncation is applied using a truncation window g W . The width of the window L W corresponds to the desired pulse length. Common windows include rectangular (RECT), raised-cosine (RC(β)), and root raised-cosine (RRC(β)) windows, where β is the roll-off factor. For β → 0, RC(β) and RRC(β) converge to RECT. • Orthogonalization and truncation are iteratively applied by The coefficient ε can be interpreted as a tradeoff between orthogonality and TFL. Small ε leads to a higher number of iterations and improved orthogonality; large ε leads to pulses with better TFL. Here, ε is set to 10 −4 .
Both a fixed window g w or an iteration-varying window can be used in the algorithm.
To illustrate the algorithm procedure, we first discuss the relationship between orthogonality and the number of iterations for a specific example in which the orthogonality is measured by SIR. The essential parameter settings are listed in Table 2. As depicted in Fig. 3, it is obvious that by increasing the number of iterations, i.e., setting a small ε, the orthogonality of g can be improved. Moreover, if taking the convergence time into consideration, confining the number of iterations to less than ten is reasonable as well, as the SIR is already more than 80 dB after the first few iterations.  Figure 4 presents the time and frequency impulse responses for the initial Gaussian pulse, optimized pulse after the first iteration, and the final result, which indicates how the number of iterations influences the time and frequency localization properties for the obtained pulse shapes.

Bi-orthogonal design with channel statistics
Bi-orthogonal pulse design allows using different pulses at the transceiver sides to maximize the link performance. It employs mis-matched filtering to balance the robustness against ISI/ICI in doubly dispersive channels with the noise enhancement. In general, bi-orthogonal design capitalizes on more degrees of freedom compared to the orthogonal design, which may lead to better performance in practice, especially in self-interference-limited scenarios.
Given that the channel statistics are available, there are two common approaches for bi-orthogonal design: first, fixing the transmit filter and design the optimal receiver filter and second, joint transmit and receive pulse design. SINR is applied as a typical measure for the design optimization. Optimized receiver filter design With regard to the pre-determined transmit pulse g, we now derive the optimized receive pulse γ to maximize link performance with taking SINR as the optimization measure. SINR D g,γ in (11) can be reformulated as where A and B are Hermitian matrices given respectively by  Compute g (n) based on γ (n) following a similar manner.
Note that (16) is defined as a generalized Rayleigh quotient, which is associated with a generalized eigenvalue problem Aγ = ζ Bγ [16,17]. The maximum SINR target ζ max corresponds to the maximum generalized eigenvalue of A and B, when the receive filter γ is chosen as the corresponding generalized eigenvector γ max i.e., Aγ max = ζ max Bγ max . Detailed implementation is presented in Algorithm 2.

Joint transmitter and receiver design
Considering the joint optimization of the transmit and receive filters w.r.t. the provided channel statistics for WSSUS channels, [16] showed that the primal problem is a nonconvex problem. An efficient alternating algorithm has been proposed to achieve a local optimum. Its detail implementation is listed in Algorithm 3. In general, this algorithm calculates the transmit and receive pulses alternatingly until the overall process converges.

SINR evaluation of pulse design based on receiver realizations
In Section 3, we have introduced two exemplary methods for designing a pair of transmit and receive pulse shapes. In practical communication systems, one may encounter a fixed transmit pulse that cannot be changed further, so that only the receive pulse can be subject to optimization. Taking this aspect into account, we propose in this section different solutions for the receiver design that depend on the usage of statistical channel knowledge and evaluate the pulse design using the SINR contour as measure.

Evaluation metric: SINR contour
For any doubly dispersive channel, the achievable SINR of given transceiver pulse pair can be computed by (10) or  (11) for SINR contour plot, the channel scattering function need to be a priori known. In practice, however, accurate channel statistic is not available but only channel characteristics such as maximum delay τ max and maximum Doppler frequency ν max . Without further specification, in this section, we assume that the "default" support region of the underspread WSSUS channel is an origin-centered rectangle shape [13], whose side lengths are equal to 2τ max and 2ν max , respectively. The diagonal entries of channel correlation function R H are set to be equal.

SINR evaluation based on receiver realizations
Given a transmit pulse optimized according to the orthogonal or bi-orthogonal methods, two receive pulse designs are considered here: so-called naive receiver which is designed without channel information or max-SINR receiver which takes channel information into the design procedure.

Naive receiver without channel knowledge
Transmit pulse based on orthogonal design Provided the transmit pulse optimized by the orthogonal method, naive receiver refers to the receive pulse which adopts a symmetric shape of the transmit pulse generated by Algorithm 1, i.e., γ (t) = g(t). Herein, we provide several design examples in this section. A necessary condition for generating orthogonal pulses is to fulfill TF > 1. On the other hand, larger TF leads to smaller spectral efficiency. As a compromise, TF is set to be slightly larger than 1. We choose TF = 1.07 and TF = 1.25 (same as normal/extended CP overhead in LTE) and α = 1. Table 3 lists the key parameters in Algorithm 1. Figure 5 illustrates the pulse shapes for overlapping factor set to K = 1.07. Solid line and dashed line indicate the optimized pulse in this paper and [19], respectively. Both results are close to the pulse shapes used in windowed OFDM. For the case of TF = 1.25 (Fig. 5b), the optimized pulse in this paper converges to the analytically derived pulse shape with the optimal TFL in [19]. Given the transmit pulse with K = 1, the proposed pulse shapes are depicted in Fig. 6. For TF = 1.25, g K=1 2 (t) coincides with the pulse proposed in [19], which aim at minimizing the OOB leakage.
This transceiver pulse design method is suitable to the scenario requiring good frequency localization, i.e., one  gauss with α 1 as the initial pulse, as in this case, we only need to consider suppressing ICI when designing the short pulses [24].
Allowing long pulse, the exemplary orthogonal design results with K = 4 are given in Fig. 7. To compare with CP-OFDM, the SIR contour with pulse pair g K=4 1 /γ K=4 1 (dashed) and g rect /g rect (solid), as well as g K=4 2 /γ K=4 2 (dashed) and g rect /g rect (solid) are depicted in Fig. 8. The number on contour line indicates the lowest achievable SINR level that a pulse pair could support within the closed region. In particular, compared with CP-OFDM, the proposed design possesses the strong robustness against time synchronization errors while maintaining similar support in frequency domain, which could potentially enable timing advance (TA)-free transmission in uplink or support downlink multi-point transmission with large coverage. In particular, the proposed design for TF = 1.25 supports a similar T − F contour region for high-order modulation (e.g., 64 QAM) while achieving overall larger T − F contour support for lower modulation (e.g., QPSK and 16 QAM). Thus, the TF = 1.25 multicarrier waveform is more robust in challenging dispersive scenarios, such as high-speed vehicular transmission, to achieve high reliability.
Transmit pulse based on bi-orthogonal design Naive receiver for bi-orthogonal design indicates adopting mismatched pulse shape of the transmit one without exploiting channel knowledge. To exemplify the receiver realization in this case, transmit pulse is fixed as two options: conventional rectangular pulse g RECT and the raisedcosine (RC) shaped pulse g RC , which is commonly used in W-OFDM systems. We remark that other transmit pulse obtained from bi-orthogonal design is applicable.
For performance evaluation, g RC is generated by the convolution with a window w with length N 0 and a rectangular window with length N. According to [6], any pulse shape satisfying N 0 −1 i=0 w i = 1 can be selected as a window. Without further specification, we choose h as Hanning windowing and set N 0 = N CP /2 with N CP = N − M. All the essential parameters are listed in Table 4. Note that the noise power is normalized according to the average transmit signal power, which is assumed to be equal to one. Figure 9 shows the RC transmit pulse combined with the rectangular receive pulse γ = γ RECT and raised-cosine receive pulse γ = γ RC , respectively. The ratio N/M is set to 1.1. The SINR contour with the pair g RECT /γ RECT (solid) and g RC /γ RECT (dashed) are depicted in Fig. 10. The x-axis denotes the delay τ normalized to symbol period T in the time domain, while the y-axis represents the Doppler ν normalized to subcarrier spacing F in the frequency domain. It can be observed that in noise-limited scenario, given rectangular receive pulse, g RC achieves stronger robustness to asynchronization in the time domain and meanwhile supports similar dispersion g RECT in the frequency domain. For the interferencelimited scenario, i.e., noise variance equal to −31 dB in Fig. 10a, g RECT /γ RECT and g RC /γ RECT for 28 dB SINR level have similar regions in contour plot, e.g., to support 256 QAM on physical downlink shared channel in LTE.

Max-SINR receiver with channel statistical knowledge
With the assumption of a rectangular-shaped channel scattering function, we evaluate the performance with transmit pulse from both orthogonal and bi-orthogonal design and its corresponding max-SINR receive pulse.

Transmit pulse based on orthogonal design
Choosing the transmit pulse for g K=1.07 1 shown in Fig. 5a, we evaluate its SINR operational range w.r.t. double dispersion and make a comparison to g cpofdm . The receive pulse is chosen calculated by Algorithm 2. The main simulation parameters have the same setting as in Table 3.
As observed in Fig. 11, compared with g cpofdm , g K=1.07 1 and its respective max-SINR receive pulse are more robust to time dispersion in high-noise-power regions, i.e., noise variance equal to −25, −22, and −19 dB. For the case when σ 2 n is −31 dB, the performance of g K=1.07 1 on the level of 28 dB is worse than g cpofdm , thus making it an undesirable choice for enabling 256 QAM in such case. Transmit pulse based on bi-orthogonal design We analyze in this section the SINR contours of g RECT and g RC with its corresponding receive pulse calculated by Algorithm 2 according to channel statistics. Noise power level is set as the same in Fig. 11, and parameter settings are given in Table 3.
As depicted in Fig. 12, given max-SINR receiver, g RC outperforms g RECT w.r.t. robustness to timing misalignment, while maintaining comparable robustness to frequency dispersion. Moreover, comparing Figs. 12 and 10, the optimized receiver is more robust against the frequency misalignment than the naive one, especially when the time shift close to zero.

Joint transmitter and receiver design with channel statistical knowledge
In this section, we provide several transceiver pulse pairs optimized according to Algorithm 3, both for timeinvariant and time-varying channels. Detailed simulation parameter setting is presented in Table 5, in which two extreme noise power levels are selected. Figure 13a,b depicts the computed pulse shapes respectively for low-and high-noise-power levels in timeinvariant channels, where the normalized maximum frequency shift is ν max /F = 0 and the normalized maximum time delay is τ max /T = 10%. An interesting observation is that for the case of the   Fig. 13 10% Normalized maximum Doppler shift in Fig. 13 0 Normalized maximum time delay in Fig. 14 5% Normalized maximum Doppler shift in Fig. 14 ≈ 1.6% low-noise-power level, the proposed pulses converge to the pulses used in conventional CP-OFDM. This result makes sense since CP-OFDM is known to be optimal in the high SNR scenario with low Doppler spreads. For the case of high noise power, Fig. 13b shows the transceiver pulses are close to a matched pulse pair. Intuitive interpretation of this result is that since the SNR loss due to transceiver mismatching becomes dominating in such noise-limited region, matched filtering is desirable. Propagation channels are commonly time-variant in practical communication systems. To evaluate the performance in this case, we select τ max /T = 5% and ν max /F ≈ Fig. 13 a, b Pulse shapes designed for time-invariant channel 1.6% by assuming that an object moves at a relatively high velocity in a medium delay spread environment, as characterized, for example, in the extended vehicular A (EVA) channel model [25]. Figure 14 illustrates the derived pulse shapes for both low-and high-noise-power levels. The optimized pulse pair for a doubly dispersive channel in the high SNR region is close to rectangular-shaped. However, due to the frequency shifts, both g and γ have some irregular shaping at the filter head and tail, which are visible as "steps" in the figure. For the channel with a high-noise-power level, Fig. 14b shows that g and γ are nearly matched, as can be explained analogous to Fig. 13b.
Ideally, pulse shape optimization aims at fulfilling the orthogonal condition, achieving good TFL and SIR/SINR performance. In reality, pulse shapes need to be properly designed according to the system requirements and available resources and channel information. Several exemplary design methods have been addressed in detail in this section.

Air interface PHY design based on P-OFDM
According to 3GPP current agreement, new waveform may be applied for new emerging services (e.g., URLLC, MTC) other than eMBB in 5G NR systems.
A new air interface design needs to provide means to adapt the physical layer parameters according to requirements for the different services and different frequency bands envisaged for 5G operation [26]. In the following, we elaborate on how the flexibility of pulse-shaped OFDM can be used to provide different PHY configurations through different parameterizations. Our focus here is on two parameters of pulse-shaped OFDM, namely, pulse shape design and numerology design.
Considering short packet transmission in URLLC service or TDD systems, short pulses are desirable to enable  low-latency transmission of packets spread over very few symbols and fast switching between uplink and downlink. Long symbols, on the contrary, would yield long transitions times due to their symbol tails. Given these circumstances, pulse-shaped OFDM with small overlapping factor K should be chosen, basically extending the symbol duration by up to half the symbol interval at maximum, i.e. K ∈ [1; 1.5]. If K ∼ 1 is chosen, the solution reduces to W-OFDM. Considering the numerology design for these cases, e.g., subcarrier spacing, symbol interval, and symbol overhead, the methodology for OFDM/W-OFDM systems described in Section 2.3 can be adopted, followed by an optimization of the designed pulse shape. It should be noted here that a larger CP length allows for improving the spectral containment, similar as increasing the symbol length characterized by the overlapping factor K, as indicated in Table 8. Hence, balancing between CP length and symbol length may be a useful consideration in some scenarios to allow finding the optimum solution.
For the MTC service and frequency-division duplex (FDD) systems, long pulses should be chosen to offer more room for the robustness against time-frequency distortions, since requirements on time localization of the transmit symbols are not so stringent here. In such cases, the overlapping factor of pulse-shaped OFDM can be chosen large, i.e., up to K = 4. This parameter setting is beneficial for providing good TFL property, which enables the system to become robust against distortions caused by time-asynchronous transmission, which can be introduced by random movement of devices with sporadic data transmission of short bursts only-a typical MTC scenario. Thus, pulse-shaped OFDM becomes an enabler for asynchronous multiple access (e.g., frequency/space division multiplexing access), facilitating grant-free and timing-advance-free communication-for details, refer to [9]. The numerology should be designed according to service requirements and channel characteristics, followed by further adjustment of the applied pulse.

Implementation and complexity
Using the specification in Fig. 1 for symbol period T = NT s and subcarrier spacing F = 1/MT s , the transmit and receive signal can be efficiently synthesized and analyzed using a PPN implementation (e.g., Fig. 2). For a detailed realization of the PPN structure, please refer to [20]. Recalling the definition of the overlapping factor K, the implementation of the state-of-the-art single and multicarrier waveforms can be unified with the PPN structure, as shown in Table 6. We remark that, alternatively, a system featuring multi-rate multi-pulse shaping synthesis and analysis could also benefit from the implementation with frequency sampled filter banks [27,28].
Furthermore, we exemplify the complexity comparison as follows. Assuming a symmetric transceiver pulse design, namely, g(t) = γ (t), M = 2048, and TF = 1.07, the number of operations including complex multiplications and additions for implementing different waveforms are summarized in Table 7. As seen from the table, the overall complexity overhead introduced by the PPN-based implementation for pulse-shaped OFDM is minor compared to CP-OFDM. Taking the whole PHY-layer baseband processing into account, where multi-rate sampling and conversion, MIMO processing, coding, and decoding are considered, the complexity overhead for modulator and demodulator part due to the PPN implementation is rather marginal.

Spectrum confinement and coexistence
Conventional CP-OFDM suffers from strong OOB leakage of its power spectral density (PSD) due to the slow frequency decay property of the rectangular pulse. In practice, one can adopt a subband-wise low-pass filtering to shape and fit the transmit signal to the spectral mask, as long as the shaping does not lead to a considerable EVM loss [11]. Alternatively, subcarrier-filtering can also improve the spectral containment. In the following, we evaluate the PSD with both ideal power amplifier model and Rapp model. For properly designed pulse shapes, the PSD of pulseshaped OFDM surpasses CP-OFDM. If the degrees of freedom for constructing the localized pulse shape are  high, e.g., for an overlapping factor K = 4 (see in Fig. 15a, b), the resulting PSD of pulse-shaped OFDM is satisfactory even without any additional spectral mask filtering, i.e., incurring no EVM loss. For small overlapping factor, e.g., K = 1.07 ≈ 1 (see in Fig. 15c, d), the spectral containment in frequency domain becomes slightly worse; however, a satisfactory PSD can still be achieved, resulting in still a small number of guard subcarriers for spectral coexistence.
Assuming −50 dBc/Hz as the required spectral leakage, the required number of guard sucarriers based on the above PSD results is summarized in Table 8. The results are based on the LTE setting of 15 kHz subcarrier spacing for 20 MHz bandwidth.
For the evaluation of the spectral containment, the nonlinearity of RF unit should be considered. To model the non-linearity of a power amplifier (PA), we use the Rapp model with smoothness factor equal to 3 and 8.3 dB   Fig. 16a, b, the PSD performance (before and after the PA) of OFDM, pulse-shaped OFDM, and OFDM with subband-filtering are shown, respectively. The product TF is set to 1.07 for the first two waveforms, and K = 1.07 is used for pulse-shaped OFDM, while OFDM with subband-filtering employs a half-symbol length FIR filter. We observe that pulse-shaped OFDM achieves comparable spectral containment as OFDM systems with subband-filtering, both significantly outperforming conventional CP-OFDM systems. If taking the PA non-linear effects into account, pulse-shaped OFDM still offers similar performance as OFDM with subband filtering in OOB emission, which is slightly better than that of OFDM systems.
For a more aggressive spectrum usage requiring minimum guard subcarrier overhead, additional subband-wise filtering can also be applied to pulse-shaped OFDM signal. However, the trade-off between EVM, OOB leakage, and particularly the linearity for RF unit (cost and power efficient) at both base station (BS) or user equipment (UE) sides should be carefully reviewed.

Application examples
In the section, we provide some applications of pulseshaped OFDM and evaluate the link performance in the respective scenarios.

Uplink timing advance (TA)-free access
Considering uplink transmission, due to radio propagation latency, timing misalignment occurs for the uplink signals at the base station, unless a closed-loop TA adjustment is performed. For example, if the cell radius is 1732 m, TA misalignment could be in a range of 0 ∼ 13μs. For the case of massive machine connections, each UE sporadically needs to send a small data packet only, with a long period of silence following. The TA adjustment procedure run for each link would impose a huge overhead to  Fig. 17, showing two UEs transmitting to one BS with different timing offsets in a spatial division multiple access (SDMA) manner. From the SINR contour in Fig. 8, we observe that pulse-shaped OFDM with long pulse (K = 4) can support large timing offset, rendering it suitable for uplink TA-free (or relaxed TA) transmissions. A such designed pulse-shaped OFDM system is particularly useful to be combined with nonorthogonal multiple access schemes like SDMA, if the base station can barely fully synchronize with each user in the uplink at reasonable complexity [9]. We apply the pulse shape depicted in Fig. 7a with overlapping factor Detailed simulation assumptions are given in Table 9. The energy of the symbols are assumed to be normalized to one. The simulation results shown in Figs. 18 and 19 confirm the advantages of pulse-shaped OFDM over CP-OFDM, exhibiting substantial link performance gains of 3 ∼ 5 dB.

HST/V2X with high mobility
High-mobility scenarios become of great importance for future wireless communications. For example, high-speed train (HST) has already been considered in LTE as one important new use case for MBB service. For 5G NR systems, vehicular-to-anything (V2X) service will enable safe driving and cooperative autonomous driving. The HST and V2X scenarios are illustrated in Fig. 20.
For the PHY configuration based on pulse-shaped OFDM, we need to derive a reasonable product of TF in a pulse-shaped design. The determination of such parameter highly depends on the propagation channels and service requirement. In this scenario, as high-mobile objects are involved, the channels are often characterized as "doubly dispersive. " Based on the modeling report [29][30][31], the maximum path delay and Doppler shift are summarized in Fig. 21. From the channel modeling, we consider that the (T, F) lattice should be adjusted best between 60 and 75 kHz for an isotropic design with TF = 1.25 for guaranteed performance in many scenarios, especially in extreme high velocity cases (for reference, LTE uses 15 kHz with TF = 1.07, IEEE 802.11p uses 156 kHz with TF = 1.25). We apply the pulse shape depicted in Fig. 7b with overlapping factor K = 4. Detailed simulation parameters are given in Tables 10 and 11 for link performance evaluation. From the BLER-performance depicted in Figs. 22 and 23 (solid-ideal channel estimation, dash-least square (LS)-based channel estimation), we see about 1 ∼ 3 dB performance gain by pulse-shaped OFDM due to the well-localized pulse shape design.

Conclusions
This paper has summarized the pulse design methods for OFDM systems and provided a new design method taking into consideration an arbitrary length constraint, orthogonality, and good time-frequency localization. We have also addressed different approaches for receiver realizations and provided a criterion for the evaluation of the pulse design, namely the SINR contour. To meet diverse requirements envisaged for future communication systems, physical layer configuration based on pulse-shaped OFDM has been addressed with suitable parameterizations in pulse design. Practical issues like implementation and complexity are also analyzed for pulse-shaped OFDM systems.
The flexibility of pulse-shaped OFDM multicarrier waveform is attributed to both its different numerology setting and to its transceiver pulse shapes. The numerology configuration mainly aims at defining the timefrequency operational range, while the design of pulse shapes is for further refining the time-frequency localization according to the system (or service) requirements. We have shown that exploiting pulse shaping as an additional degree of freedom in OFDM system design can be used beneficially to improve the system's robustness against time and frequency distortions and the spectrum coexistence capabilities, facilitating efficient fragmented spectrum access and machine-type communications.

Endnotes
1 Pulse shape and prototype filter are used interchangeably throughout the paper. 2 We assume the energy of both transceiver pulses g(t) and γ (t) are normalized to one. 3 For simplifying the analysis, causality of the system is firstly ignored, and thus CP-OFDM is modeled as halfprefixed and half-suffixed OFDM. and the spectral efficiency is proportional to 1/TF = T − T cp /T. Note that applying the transmit and receive pulses g cpofdm (t) and γ cpofdm (t) are equivalent to the "CP addition" and "CP removal" operations in CP-OFDM technology. In an (AWGN) channel, due to the discrepancy of transmit and receive pulses, namely, g cpofdm = γ cpofdm , there is a mismatching SNR loss following Cauchy-Schwarz inequality. Using the common setting in LTE systems with 7 or 25% CP overhead, the mismatching SNR loss is about 0.3 dB for TF = 1.07, while about 1 dB for TF = 1.25. 2. ZP-OFDM is also a special case of pulse-shaped OFDM, where the transmit pulse g zpofdm (t) is a rectangular pulse of length T − T zp and the receive filter γ zpofdm (t) is also rectangular shaped with length T. The overlapping factor is K = 1. 3. Windowed-OFDM can be also considered as a special case within the pulse-shaped OFDM framework, with overlapping factor 1 < K < 2 (usually K is slightly larger than 1). The pulse shape can be flexibly adjusted. 4. Filtered multitone (FMT) is a pulse-shaped OFDM system where the pulse shapes do not overlap in frequency domain [7]. The pulse shape and length are not specified. Different from FMT, pulse-shaped OFDM allows for the overlapped filters in time domain or/and in frequency domain. 5. DFTs-OFDM is a special case of pulse-shaped OFDM where a single carrier modulation is used (M = 1). The pulse shaping is carried out with a circular convolution, which corresponds to periodically time-varying filters. The transmit pulse g(t) can be considered as the Dirichlet sinc function. The k th DFT spreading block is upsampled with N IDFT /N DFTs,k where N IDFT and N DFTs,k are the number of subcarriers of IDFT block and the size of DFT spreading block, respectively. 6. Zero-tail DFT-spread OFDM (ZT-DFTs-OFDM) [32] is an extended single carrier modulation (M = 1) based on "DFTs-OFDM," where the transmit pulse g(t) can be considered also as a N zp -expanded Dirichlet sinc function. Similar to DFTs-OFDM, the upsampling ratio for k th DFT spreading block is N FFT /N DFTs,k + N zp .

EVM requirements for mobile communications
In [25], EVM indicates a measurement of the difference between the ideal and measured symbols after equalization. Following its definition, relationship between the required EVM and SIR (in linear scale) is given by The limit of the EVM of each E-UTRA carrier for different modulation schemes on Physical downlink shared channel (PDSCH) [25] along with the associated minimum SINR are summarized in the second and the third columns of Table 12, respectively.