Rate-splitting multiple access for downlink communication systems: bridging, generalizing, and outperforming SDMA and NOMA

Space-division multiple access (SDMA) utilizes linear precoding to separate users in the spatial domain and relies on fully treating any residual multi-user interference as noise. Non-orthogonal multiple access (NOMA) uses linearly precoded superposition coding with successive interference cancellation (SIC) to superpose users in the power domain and relies on user grouping and ordering to enforce some users to fully decode and cancel interference created by other users. In this paper, we argue that to efficiently cope with the high throughput, heterogeneity of quality of service (QoS), and massive connectivity requirements of future multi-antenna wireless networks, multiple access design needs to depart from those two extreme interference management strategies, namely fully treat interference as noise (as in SDMA) and fully decode interference (as in NOMA). Considering a multiple-input single-output broadcast channel, we develop a novel multiple access framework, called rate-splitting multiple access (RSMA). RSMA is a more general and more powerful multiple access for downlink multi-antenna systems that contains SDMA and NOMA as special cases. RSMA relies on linearly precoded rate-splitting with SIC to decode part of the interference and treat the remaining part of the interference as noise. This capability of RSMA to partially decode interference and partially treat interference as noise enables to softly bridge the two extremes of fully decoding interference and treating interference as noise and provides room for rate and QoS enhancements and complexity reduction. The three multiple access schemes are compared, and extensive numerical results show that RSMA provides a smooth transition between SDMA and NOMA and outperforms them both in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths, and qualities of channel state information at the transmitter). Moreover, RSMA provides rate and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers (number of SIC layers).


Introduction
With the dramatic upsurge in the number of devices expected in 5G and beyond, wireless networks will be operated in a variety of regimes ranging from underloaded to overloaded (where the number of scheduled devices is smaller and larger than the number of transmit antennas interests in re-thinking multiple access for the downlink of communication systems. In this paper, we propose a new multiple access called rate-splitting multiple access (RSMA). In order to fully assess the novelty of the proposed multiple access paradigm and the design philosophy, we first review the state of the art of two major multiple accesses, namely non-orthogonal multiple access (NOMA) [1], also called Multi-User Superposition Transmission (MUST) in 3GPP LTE Rel-13 [2] and space-division multiple access (SDMA). We identify their benefits and limitations and make critical observations, before motivating the introduction of the novel and more powerful RSMA.

SDMA and NOMA: the extremes
Contrary to orthogonal multiple access (OMA) that schedules users or groups of users in orthogonal dimensions, e.g., time (TDMA) and frequency (FDMA), NOMA superposes users in the same time-frequency resource via the power domain or the code domain, leading to the power-domain NOMA (e.g., [1]) and code-domain NOMA (e.g., sparse code multiple access (SCMA) [3]). Power-domain NOMA 1 relies on superposition coding (SC) at the transmitter and successive interference cancellation (SIC) at the receivers (denoted in short as SC-SIC) [1,[4][5][6]. Such a strategy is motivated by the well-known result that SC-SIC achieves the capacity region of the single-input single-output (SISO) (Gaussian) broadcast channel (BC) [7,8]. It is also well known that the capacity region of the SISO BC is larger than the rate region achieved by OMA (e.g., TDMA) when users experience a disparity of channel strengths [8]. On the other hand, when users exhibit the same channel strengths, OMA based on TDMA is sufficient to achieve the capacity region [8].
The benefit of a single-antenna NOMA using SC-SIC is therefore to be able, despite the presence of a single transmit antenna in a SISO BC, to cope with an overloaded regime in a spectrally efficient manner where multiple users experience potentially very different channel strengths/path losses (e.g., cell-center users and cell-edge users) on the same time/frequency resource.
The limitation of a single-antenna NOMA lies in its complexity as the number of users grows. Indeed, for a K-user SISO BC, the strongest user needs to decode using SIC the K − 1 messages of all co-scheduled users and therefore peel off K − 1 layers before accessing its intended stream. Though SIC of a small number of layers should be feasible in practice 2 , the complexity and likelihood of error propagation becomes quickly significant for a large number of users. This calls for ways to decrease the number of SIC layers at each user. One could divide users into small groups of users with disparate channels and apply SC-SIC in each group and schedule groups on orthogonal resources (using OMA), but that may lead to some performance loss and latency increase.
In nowadays wireless networks, access points are often equipped with more than one antenna. This spatial dimension opens the door to another well-known type of multiple access, namely SDMA. SDMA superposes users in the same time-frequency resource and separates user via a proper use of the spatial dimensions. Contrary to the SISO BC, the multi-antenna BC is nondegraded, i.e., users cannot be ordered based on their channel strengths in general settings. This is the reason why SC-SIC is not capacity-achieving, and the complex dirty paper coding (DPC) is the only strategy that achieves the capacity region of the multiple-input singleoutput (MISO) (Gaussian) BC with perfect CSIT [9]. DPC, rather than performing interference cancellation at the receivers as in SC-SIC, can be viewed as a form of enhanced interference cancellation at the transmitter and relies on perfect CSIT to do so. Due to the high computational burden of DPC, linear precoding is often considered the most attractive alternative to simplify the transmitter design [10]. Interestingly, in a MISO BC, multi-user linear precoding (MU-LP), e.g., either in closed form or optimized using optimization methods, though suboptimal, is often very useful when users experience relatively similar channel strengths or long-term signal-to-noise ratio (SNR) and have semi-orthogonal to orthogonal channels [11]. SDMA is therefore commonly implemented using MU-LP. The linear precoders create different beams with each beam being allocated a fraction of the total transmit power. Hence, similarly to NOMA, SDMA can also be viewed as a superposition of users in the power domain, though users are separated at the transmitter side by spatial beamformers rather than by the use of SIC at the receivers.
SDMA based on MU-LP is a well-established multiple access that is nowadays the basic principle behind numerous techniques in 4G and 5G such as multiuser multiple-input multiple-output (MU-MIMO), coordinated multipoint (CoMP) coordinated beamforming, network MIMO, millimeter-wave MIMO, and massive MIMO.
The benefit of SDMA using MU-LP is therefore to reap all spatial multiplexing benefits of a MISO BC with perfect CSIT with a low precoder and receiver complexity.
The limitations of SDMA are threefold. First, it is suited to the underloaded regime and performance of MU-LP in the overloaded regime quickly drops as it requires more transmit antennas than users to be able to efficiently manage the multi-user interference. When the MISO BC becomes overloaded, the current and popular approach for the transmitter is to schedule group of users over orthogonal dimensions (e.g., time/frequency) and perform linear precoding in each group, which may increase latency and decrease QoS depending on the application.
Second, its performance is sensitive to the user channel orthogonality and strengths and requires the scheduler to pair semi-orthogonal users with similar channel strengths together. The complexity of the scheduler can quickly increase when an exhaustive search is performed, though low-complexity (suboptimal) scheduling and user-pairing algorithms exist [10].
Third, it is optimal from a degrees of freedom 3 (DoF), also known as spatial multiplexing gain, perspective in the perfect CSIT setting but not in the presence of imperfect CSIT [12]. The problem of SDMA design in the presence of imperfect CSIT has been to strive to apply a framework motivated by perfect CSIT to scenarios with imperfect CSIT, not to design a framework motivated by imperfect CSIT from the beginning [12]. This leads to the wellknown severe performance loss of MU-LP in the presence of imperfect CSIT [13].
In view of SC-SIC benefits in a SISO BC, attempts have been made to study multi-antenna NOMA. Two lines of research have emerged that both rely on linearly precoded SC-SIC.
The first strategy, which we simply denote as "SC-SIC, " is a direct application of SC-SIC to the MISO BC by degrading the multi-antenna broadcast channel. It consists in ordering users based on their effective scalar channel (after precoding) strengths and enforce receivers to decode messages (and cancel interference) in a successive manner. This is advocated and exemplified for instance in [14][15][16][17]. This NOMA strategy converts the multi-antenna non-degraded channel into an effective single-antenna degraded channel, as at least one receiver ends up decoding all messages. While such a strategy can cope with the deployment of users experiencing aligned channels and different path loss conditions, it comes at the expense of sacrificing and annihilating all spatial multiplexing gains in general settings. By forcing one receiver to decode all streams, the sum DoF is reduced to unity 4 . This is the same DoF as that achieved by TDMA/singleuser beamforming (or OMA). This is significantly smaller than the sum DoF achieved by DPC and MU-LP in a MISO BC with perfect CSIT, which is the minimum of the number of transmit antennas and the number of users 5 . Moreover, this loss in multiplexing gain comes with a significant increase in receiver complexity due to the multi-layer SIC compared to the treat interference as noise strategy of MU-LP. As a remedy to recover the DoF loss, we could envision a dynamic switching between NOMA and SDMA, reminiscent of the dynamic switching between SU-MIMO and MU-MIMO in 4G [18]. One would dynamically choose the best option between NOMA and SDMA as a function of the channel states. A particular instance of this approach is taken in [19] where a dynamic switching between SC-SIC and zero-forcing beamforming (ZFBF) was investigated.
The second strategy, which we denote as "SC-SIC per group, " consists in grouping K users into G groups. Users within each group are served using SC-SIC, and users across groups are served using SDMA so as to mitigate the inter-group interference. Examples of such a strategy can be found in [1,[20][21][22][23][24]. This strategy can therefore be seen as a combination of SDMA and NOMA where the multi-antenna system is effectively decomposed into G hopefully non-interfering single-antenna NOMA channels. For this "SC-SIC per group" approach to perform at its best, users within each group need to have their channels aligned and users across groups need to be orthogonal.
Similarly to SDMA, multi-antenna NOMA designs also rely on accurate CSIT. In the practical scenario of imperfect CSIT, NOMA design relies on the same above two strategies but optimizes the precoder so as to cope with CSIT imperfection and resulting extra multi-user interference. As an example, the MISO BC channel is again degraded in [17] and precoder optimization with imperfect CSIT is studied.
The benefit of multi-antenna NOMA, similarly to the single-antenna NOMA, is the potential to cope with an overloaded regime where multiple users experience different channel strengths/path losses and/or are closely aligned with each other.
The limitations of multi-antenna NOMA are fourfold. First, the use of SC-SIC in NOMA is fundamentally motivated by a degraded BC in which users can be ordered based on their channel strengths. This is the key property of the SISO BC that enables SC-SIC to achieve its capacity region. Unfortunately, motivated by the promising gains of SC-SIC in a SISO BC, the multi-antenna NOMA literature strives to apply SC-SIC to a non-degraded MISO BC. This forces to degrade a non-degraded BC and therefore leads to an inefficient use of the spatial dimensions in general settings, leading to a DoF loss.
Second, NOMA is not suited for general user deployments since degrading a MISO BC is efficient when users are sufficiently aligned with each other and exhibit a disparity of channel strengths, not in general settings.
Third, multi-antenna NOMA comes with an increase in complexity at both the transmitter and the receivers. Indeed, a multi-layer SIC is needed at the receivers, similarly to the single-antenna NOMA. However, in addition, since there exists no natural order for the users' channels in multi-antenna NOMA (because we deal with vectors rather than scalars), the precoders, the groups, and the decoding orders have to be jointly optimized by the scheduler at the transmitter. Taking as an example, the application of NOMA based on "SC-SIC" to a three-user MISO BC, we need to optimize three precoders, one for each user, along with the six possible decoding orders. Increasing the number of users leads to an exponential increase in the number of possible decoding orders. "SC-SIC per group" divides users into multiple groups but that approach leads to a joint design of user ordering and user grouping. To decrease the complexity in user ordering and user grouping, multi-antenna NOMA (SC-SIC and SC-SIC per group) forces users belonging to the same group to share the same precoder (beamforming vector) [1]. Unfortunately, such a restriction can only further hurt the overall performance since it shrinks the overall optimization space.
Fourth, multi-antenna NOMA is subject to the same drawback as SDMA in the presence of imperfect CSIT, namely its design is not motivated by any fundamental limits of a MISO BC with imperfect CSIT.
The key is to recognize that the limitations and drawbacks of SDMA and NOMA originate from the fact that those two multiple accesses fundamentally rely on two extreme interference management strategies, namely fully treat interference as noise and fully decode interference. Indeed, while NOMA relies on some users to fully decode and cancel interference created by other users, SDMA relies on fully treating any residual multi-user interference as noise. In the presence of imperfect CSIT, CSIT inaccuracy results in an additional multi-user interference that is treated as noise by both NOMA (SC-SIC per group) and SDMA.

RSMA: bridging the extremes
In contrast, with RSMA, we take a different route and depart from the SDMA and NOMA literature and those two extremes of fully decode interference and treat interference as noise. We introduce a more general and powerful multiple access framework based on linearly precoded rate splitting (RS) at the transmitter and SIC at the receivers. This enables to decode part of the interference and treat the remaining part of the interference as noise [12]. This capability of RSMA to partially decode interference and partially treat interference as noise enables to softly bridge the two extreme strategies of fully treating interference as noise and fully decoding interference. This contrasts sharply with SDMA and NOMA that exclusively rely on the two extremes or a combination thereof.
In order to partially decode interference and partially treat interference as noise, RS splits messages into common 6 and private messages and relies on a superimposed transmission of common messages decoded by multiple users and private messages decoded by their corresponding users (and treated as noise by co-scheduled users). Users rely on SIC to first decode the common messages before accessing the private messages. By adjusting the message split and the power allocation to the common and private messages, RS has the ability to softly bridge the two extreme of fully treat interference as noise and fully decode interference.
The idea of RS dates back to Carleial's work and the Han and Kobayashi (HK) scheme for the two-user singleantenna interference channel (IC) [25]. However, the use of RS as the building block of RSMA is motivated by recent works that have shown the benefit of RS in multiantenna BC and the recent progress on characterizing the fundamental limits of a multi-antenna BC (and IC) with imperfect CSIT. Hence, importantly, in contrast with the conventional RS (HK scheme) used for the two-user SISO IC, we here use RS in a different setup, namely (1) in a BC and (2) with multiple antennas. The use and benefits of RS in a multi-antenna BC only appeared in the last few years 7 .
The capacity region of the K-user MISO BC with imperfect CSIT remains an open problem. As an alternative, recent progress has been made to characterize the DoF region of the underloaded and overloaded MISO BC with imperfect CSIT. In [26], a novel information theoretic upperbound on the sum DoF of the K-user underloaded MISO BC with imperfect CSIT was derived. Interestingly, this sum DoF coincides with the sum DoF achieved by a linearly precoded RS strategy at the transmitter with SIC at the receivers [27,28]. RS (with SIC) is therefore optimum to achieve the sum DoF of the K-user underloaded MISO BC with imperfect CSIT, in contrast with MU-LP that is clearly suboptimum (and so is SC-SIC since it achieves a sum DoF of unity 8 ) [28]. It turns out that RS with a flexible power allocation is not only optimum for the sum DoF but for the entire DoF region of an underloaded MISO BC with imperfect CSIT [29]. The DoF benefit of RS in imperfect CSIT settings were also shown in more complicated underloaded networks with multiple transmitters in [30] and multi-antenna receivers [31]. Considering user fairness, the optimum symmetric DoF (or max-min DoF), i.e., the DoF that can be achieved by all users simultaneously, of the underloaded MISO BC with imperfect CSIT with MU-LP and RS was studied in [32]. RS symmetric DoF was shown to outperform that of MU-LP. Finally, moving to the overloaded MISO BC with heterogeneous CSIT qualities, a multi-layer power partitioning strategy that superimposes degraded symbols on top of linearly precoded rate-splitted symbols was shown in [33] to achieve the optimal DoF region.
The benefits of RS have also appeared in multi-antenna settings with perfect CSIT. In an overloaded multigroup multicast setting with perfect CSIT, considering again fairness, the symmetric DoF achieved by RS, MU-LP, and degraded NOMA transmissions (where receivers decode messages and cancel interference in a successive manner as in SC-SIC) was studied in [34]. It was shown that RS here again outperforms both MU-LP and SC-SIC.
The DoF metric is insightful to identify the multiplexing gains of the MISO BC at high SNR but fails to capture the diversity of channel strengths among users. This limitation is countered by the generalized DoF (GDoF) framework, which inherits the tractability of the DoF framework while capturing the diversity in channel strengths [35]. In [36,37], the GDoF of an underloaded MISO BC with imperfect CSIT is studied, and here again, RS is used as part of the achievability scheme.
The DoF (GDoF) superiority of RS over MU-LP and SC-SIC in all those multi-antenna settings (with perfect and imperfect CSIT) comes from the ability of RS to better handle the multi-user interference by evolving in a regime in between the extremes of fully treating it as noise and fully decoding it.
Importantly, the rate enhancements of RS over MU-LP, as predicted by the DoF analysis, are reflected in the finite SNR regime as shown in a number of recent works. In [38], finite SNR rate analysis of RS in MISO BC in the presence of quantized feedback was analyzed and it was shown that RS benefits from a CSI feedback overhead reduction compared to MU-LP. Using optimization methods, the precoder design of RS at finite SNR was investigated in [28] for the sum rate and rate region maximization with imperfect CSIT, in [32] for max-min fair transmission with imperfect CSIT, and in [34] for multi-group multi-cast with perfect CSIT. Moreover, the benefit of RS over MU-LP in the finite SNR regime was shown in massive MIMO [39], millimeterwave systems [40] and multi-antenna deployments subject to hardware impairments [41]. Finally, the performance benefits of the power-partitioning strategy relying on RS in the overloaded MISO BC with heterogeneous CSIT was confirmed using simulations at finite SNR in the presence of a diversity of channel strengths [33]. In particular, in contrast to the RS used in [12, 28, 29, 32-34, 38, 40, 41] that relies on a single common message, [39] (as well as [30]) showed the benefits in the finite SNR regime of a multi-layer (hierarchical) RS relying on multiple common messages decoded by various groups of users.
In this paper, in view of the limitations of SDMA and NOMA and the above literature on RS in multi-antenna BC, we design a novel multiple access, called rate-splitting multiple access (RSMA) for downlink communication system 9 . RSMA is a much more attractive solution (performance and complexity-wise) that retains the benefits of SDMA and NOMA but tackles all the aforementioned limitations of SDMA and NOMA. Considering a MISO BC, we make the following contributions.
First, we show that RSMA is a more general class/framework of multi-user transmission that encompasses SDMA and NOMA as special cases. RSMA is shown to reduce to SDMA if channels are of similar strengths and sufficiently orthogonal with each other and to NOMA if channels exhibit sufficiently diverse strengths and are sufficiently aligned with each other. This is the first paper to explicitly recognize that SDMA and NOMA are both subsets of a more general transmission framework based on RS 10 .
Second, we provide a general framework of multilayer RS design that encompasses existing RS schemes as special cases. In particular, the single-layer RS of [28, 29, 32-34, 38, 40, 41] and the multi-layer (hierarchical and topological) RS of [30,39] are special instances of the generalized RS strategy developed here. Moreover, the use of RS was primarily motivated by multi-antenna deployments subject to multi-user interference due to imperfect CSIT in those works. The benefit of RS in the presence of perfect CSIT and/or a diversity of channel strengths in a multi-antenna setup, as considered in this paper, is less investigated. RS was shown in [34] to boost the performance of overloaded multi-group multi-cast. However, no attempt has been made so far to identify the benefit of RS in multi-antenna BC with perfect CSIT and/or a diversity of channel strengths.
Third, we show that the rate performance (rate region, weighted sum-rate with and without QoS constraints) of RSMA is always equal to or larger than that of SDMA and NOMA. Considering a MISO BC with perfect CSIT and no QoS constraints, RSMA performance comes closer to the optimal DPC region than SDMA and NOMA. In scenarios with QoS constraints or imperfect CSIT, RSMA always outperforms SDMA and NOMA. Since it is motivated by fundamental DoF analysis, RSMA is also optimal from a DoF perspective in both perfect and imperfect CSIT and therefore optimally exploit the spatial dimensions and the availability of CSIT, in contrast with SDMA and NOMA that are suboptimal.
Fourth, we show that RSMA is much more robust than SDMA and NOMA to user deployments, CSIT inaccuracy, and network load. It can operate in a wide range of practical deployments involving scenarios where the user channels are neither orthogonal nor aligned and exhibit similar strengths or a diversity of strengths, where the CSI is perfectly or imperfectly known to the transmitter, and where the network load can vary between the underloaded and the overloaded regimes. In particular, in the overloaded regime, the RSMA framework is shown to be particularly suited to cope with a variety of device capabilities, e.g., high-end devices along with cheap Internetof-Things (IoT)/Machine-Type Communications (MTC) devices. Indeed, the RS framework can be used to pack the IoT/MTC traffic in the common message, while still delivering high-quality service to high-end devices.
Fifth, we show that the performance gain can come with a lower computational complexity than NOMA for both the transmit scheduler and the receivers. In contrast to NOMA that requires complicated user grouping and ordering and potential dynamic switching (between SDMA, SC-SIC and SC-SIC per group) at the transmit scheduler and multiple layers of SIC at the receivers, a simple one-layer RS that does not require any user ordering, grouping, or dynamic switching at the transmit scheduler and a single layer of SIC at the receivers still significantly outperforms NOMA. In contrast to SDMA, RSMA is less sensitive to user pairing and therefore does not require complex user scheduling and pairing 11 . However, RSMA comes with a slightly higher encoding complexity than SDMA and NOMA due to the encoding of the common streams on top of the private streams.
Sixth, though SC-SIC is optimal to achieve the capacity region of SISO BC, we show that a single-layer RS is a lowcomplexity alternative that only requires a single layer of SIC at each receiver and achieves close to SC-SIC (with multi-layer SIC) performance in a SISO BC deployment.
As a takeaway message, we note that the ability of a wireless network architecture to partially decode interference and partially treat interference as noise can lead to enhanced throughput and QoS, increased robustness, and lowered complexity compared to alternatives that are forced to operate in the extreme regimes of fully treating interference as noise and fully decoding interference.
It is also worth making the analogy with other types of channels where the ability to bridge the extremes of treating interference as noise and fully decoding interference has appeared. Considering a two-user SISO IC, interference is fully decoded in the strong interference regime and is treated as noise in the weak interference regime. Between those two extremes, interference is neither strong enough to be fully decoded nor weak enough to be treated as noise. The best known strategy for the two-user SISO IC is obtained using RS (so-called HK scheme). RS in this context is well known to be superior to strategies relying on fully treating interference as noise, fully decoding interference, or orthogonalization (TDMA, FDMA) [25,35]. Limiting ourselves to those extremes strategies is suboptimal [25,35].
The rest of the paper is organized as follows. The system model is described in Section 2. The existing multiple accesses are specified in Section 3. In Section 4, the proposed RSMA and its low-complexity structures are described and compared with existing multiple accesses. The corresponding weighted sum rate (WSR) problems are formulated, and the weighted MMSE (WMMSE) approach to solve the problem is discussed. Numerical results are illustrated in Section 5, followed by conclusions and future works in Section 6.
Notations: The boldface uppercase and lowercase letters are used to represent matrices and vectors. The superscripts (·) T and (·) H denote transpose and conjugatetranspose operators, respectively. tr(·) and diag(·) are the trace and diagonal entries, respectively. |·| is the absolute value, and · is the Euclidean norm. E{·} refers to the statistical expectation. C denotes the complex space. I and 0 stand for an identity matrix and an all-zero vector, respectively, with appropriate dimensions. CN (δ, σ 2 ) represents a complex Gaussian distribution with mean δ and variance σ 2 . |A| is the cardinality of the set A.

System model
Consider a system where a base station (BS) equipped with N t antennas serves K single-antenna users. The users are indexed by the set K = {1, . . . , K}. Let x ∈ C N t ×1 denotes the signal vector transmitted in a given channel use. It is subject to the power constraint E{ x 2 } ≤ P t . The signal received at user-k is where h k ∈ C N t ×1 is the channel between the BS and user-k. n k ∼ CN 0, σ 2 n,k is the additive white Gaussian noise (AWGN) at the receiver. Without loss of generality, we assume the noise variances are equal to one for all users. The transmit SNR is equal to the total power consumption P t . We assume CSI of users is perfectly known at the BS in the following model. The imperfect CSIT scenario will be discussed in the proposed algorithm and the numerical results. Channel state information at the receivers (CSIR) is assumed to be perfect.
In this work, we are interested in beamforming designs for signal x at the BS. Specifically, the objective of beamforming designs is to maximize the WSR of users subject to a power constraint of the BS and QoS constraints of each user. We firstly state and compare two baseline multiantenna multiple accesses, namely SDMA and NOMA. Then, RSMA is explained. The WSR problem of each strategy will be formulated, and the algorithm adopted to solve the corresponding problem will be stated in the following sections.

SDMA and NOMA
In this section, we describe two baseline multiple accesses. The messages W 1 , . . . , W K intended for users 1 to K, respectively, are encoded into K independent data streams s =[ s 1 , . . . , s K ] T independently. Symbols are mapped to the transmit antennas through a precoding matrix denoted by P =[ p 1 , . . . , p K ], where p k ∈ C N t ×1 is the precoder for user-k. The superposed signal is x = Ps = k∈K p k s k . Assuming that E{ss H } = I, the transmit power is constrained by tr(PP H ) ≤ P t .

SDMA
SDMA based on MU-LP is a well-established multiple access. Each user only decodes its desired message by treating interference as noise. The signal-to-interferenceplus-noise ratio (SINR) at user-k is given by For a given weight vector u = [u 1 , . . . , u K ], the WSR achieved by MU-LP is where R k = log 2 (1 + γ k ) is the achievable rate of user-k. u k is a non-negative constant which allows resource allocation to prioritize different users. R th k accounts for any potential individual rate constraint for user-k. It ensures the QoS of each user. The WMMSE algorithm proposed in [42] is adopted to solve problem (3). The main idea of the WMMSE algorithm is to reformulate the WSR problem into its equivalent WMMSE problem and solve it using the alternating optimization (AO) approach. The rate region of the MU-LP strategy is approximated by R MU−LP (u) for different rate weight vectors u. The resulting rate region R MU−LP is the convex hull enclosing the resulting points. In general, solution to problem (3) would provide the optimal MU-LP beamforming strategy for any channel deployment (in between aligned and orthogonal channels and with similar or diverse channel strengths).

NOMA
NOMA relies on superposition coding at the transmitter and successive interference cancellation at the receiver. As discussed in the introduction, the two main strategies in multi-antenna NOMA are the SC-SIC and SC-SIC per group. SC-SIC can be treated as a special case of SC-SIC per group where there is only one group of users.

SC-SIC
In SC-SIC, the precoders and decoding orders have to be optimized jointly. The decoding order is vital to the rate obtained at each user. To maximize the WSR, all possible decoding orders of users are required to be considered. Denote π as one of the decoding orders, the message of user-π(k) is decoded before the message of user-π(j), ∀k ≤ j. The messages of user-π(k), ∀k ≤ i are decoded at user-π(i) using SIC. The SINR experienced at user-π(i) to decode the message of user-π(k), k ≤ i is given by For a given weight vector u = [u 1 , . . . , u K ] and a fixed decoding order π, the WSR achieved by SC-SIC is where R π(k) = min i≥k,i∈K {log 2 (1 + γ π(i)→π(k) )}. In [14], the problem (5) with equal weights is solved by the approximation technique minorization-maximization algorithm (MMA). To keep a single and unified approach to solve the WSR problem of different beamforming strategies, we still use the WMMSE algorithm to solve it. By approximating the rate region with a set of rate weights, the rate region R SC−SIC (π) with a certain decoding order π is attained. To achieve the rate region of SC-SIC, all decoding orders should be considered. The largest achievable rate region of SC-SIC is defined as the convex hull of the union over all decoding orders as R SC−SIC = conv(∪ π R SC−SIC (π)).

SC-SIC per group
Assuming the K users are divided into G groups, denoted as G = {1, . . . , G}. In each group, there is a subset of users K g , g ∈ G. The user groups satisfy the following conditions: K g ∩ K g = ∅, if g = g , and g∈G |G g | = K. Denote π g as one of the decoding orders of the users in K g , the message of user-π g (k) is decoded before the message of user-π g (j), ∀k ≤ j. The messages of user-π g (k), ∀k ≤ i are decoded at user-π g (i) using SIC. The SINR experienced at user-π g (i) to decode the message of user-π g (k), k ≤ i is given by where I π g (i) = g ∈G,g =g j∈K g |h H π g (i) p j | 2 is the intergroup interference suffered at user-π g (i). For a given weight vector u =[ u 1 , . . . , u K ], a fixed grouping method G and a fixed decoding order π = {π 1 , . . . , π G }, the WSR achieved by SC-SIC per group is where R π g (k) = min i≥k,i∈K g {log 2 (1 + γ π g (i)→π g (k) )}. Similarly to the SC-SIC strategy, the problem can be solved by using the WMMSE algorithm. To maximize the WSR, all possible grouping methods and decoding orders should be considered.

Remark 1:
As described in the introduction, it is common in the multi-antenna NOMA literature (SC-SIC and SC-SIC per group) to force users belonging to the same group to share the same precoder, so as to decrease the complexity in user ordering and user grouping. Note that, in the system model described for both SC-SIC and SC-SIC per group, we consider the most general framework where each message is precoded by its own precoder. Hence, we here do not constrain symbols to be superimposed on the same precoder as this would further reduce the performance of NOMA strategies and therefore leading to even lower performance. Hence, the performance obtained with NOMA in this work can be seen as the best possible performance achieved by NOMA.

Methods-proposed rate-splitting multiple access
In this section, we firstly introduce the idea of RS by introducing a two-user example (K = 2) and a threeuser example (K = 3). Then, we propose the generalized framework of RS and specify two low-complexity RS strategies. We further compare RSMA with SDMA and NOMA from the fundamental structure and complexity aspects. Finally, we discuss the general optimization framework to solve the WSR problem.

Two-user example
We first consider a two-user example. There are two messages W 1 and W 2 intended for user-1 and user-2, respectively. The message of each user is split into two parts, W 12 1 , W 1 1 for user-1 and W 12 2 , W 2 2 for user-2. The messages W 12 1 , W 12 2 are encoded together into a common stream s 12 using a codebook shared by both users. Hence, s 12 is a common stream required to be decoded by both users. The messages W 1 1 and W 2 2 are encoded into the private stream s 1 for user-1 and s 2 for user-2, respectively. The overall data streams to be transmitted based on RS is s =[ s 12 , s 1 , s 2 ] T . The data streams are linearly precoded via precoder P =[ p 12 , p 1 , p 2 ] , where p 12 ∈ C N t ×1 is the precoder for the common stream s 12 . The resulting transmit signal is x = Ps = p 12 s 12 + p 1 s 1 + p 2 s 2 .
We assume that tr ss H = I, and the total transmit power is constrained by tr PP H ≤ P t .
At user sides, both user-1 and user-2 firstly decode the data stream s 12 by treating the interference from s 1 and s 2 as noise. Therefore, each user decodes part of the message of the other interfering user encoded in s 12 . The interference is partially decoded at each user. The SINR of the common stream at user-k is Once s 12 is successfully decoded, its contribution to the original received signal y k is subtracted. After that, user-k decodes its private stream s k by treating the private stream of user-j (j = k) as noise. The two-user transmission model using RS is shown in Fig. 1. The SINR of decoding the private stream s k at user-k is The corresponding achievable rates of user-k for the streams s 12 and s k are R 12 To ensure that s 12 is successfully decoded by both users, the achievable common rate shall not exceed R 12 = min R 12 1 , R 12 2 . All boundary points for the two-user RS rate region can be obtained by assuming that R 12 is shared between users such that C 12 k is the kth user's portion of the common rate with C 12 1 + C 12 2 = R 12 . Following the two-user RS structure described above, the total achievable rate of user-k is R k,tot = C 12 k + R k . For a given pair of weights u = [u 1 , u 2 ], the WSR achieved by the two-user RS approach is where c = C 12 1 , C 12 2 is the common rate vector required to be optimized in order to maximize the WSR. For a Fig. 1 Two-user transmission model using RS fixed pair of weights, problem (10) can be solved using the WMMSE approach in [28], except we have perfect CSIT here. By calculating R RS 2 (u) for a set of different rate weights u, we obtain the rate region.
In contrast to MU-LP and SC-SIC, the RS scheme described above offers a more flexible formulation. In particular, instead of hard switching between MU-LP and SC-SIC, it allows both to operate simultaneously if necessary, and hence smoothly bridges the two. In the extreme of treating multi-user interference as noise, RS boils down to MU-LP 12 by simply allocating no power to the common stream s 12 . In the other extreme of fully decoding interference, RS boils down to SC-SIC by forcing one user, say user-1, to fully decode the message of the other user, say user-2. This is achieved by allocating no power to s 2 , encoding W 1 into s 1 and encoding W 2 into s 12 , such that x = p 12 s 12 + p 1 s 1 . User-1 and user-2 decode s 12 by treating s 1 as noise and user-1 decodes s 1 after canceling s 12 . A physical-layer multicasting strategy is obtained by encoding both W 1 and W 2 into s 12 and allocating no power to s 1 and s 2 .
Remark 2 : It should be noted that while the RS transmit signal model resembles a broadcasting system with unicast (private) streams and a multi-cast stream, the role of the common message is fundamentally different. The common message in a unicast-multi-cast system carries public information intended as a whole to all users in the system, while the common message s 12 in RS encapsulates parts of private messages, and is not entirely required by all users, although decoded by the two users for interference mitigation purposes [12].

Remark 3 :
A general framework is adopted where potentially each user can split its message into common and private parts. Note however that depending on the objective function, it is sometimes not needed for all users to split their messages. For instance, for sum-rate maximization subject to no individual rate constraint, it is sufficient to have only one user to split its message [28]. However, when it comes to satisfying some fairness (WSR, QoS constraint, max-min fairness), splitting the message of multiple users appears necessary [28,32,34].

Three-user example
We further consider a three-user example. Different from the two-user case, the message of user-1 is split into   13 . s 12 is the partial common stream intended for user-1 and user-2. Hence, user-1 and user-2 will decode s 12 while user-3 will decode its intended streams by treating s 12 as noise. Similarly, we obtain s 23 partially encoded for user-2 and user-3. W 1 1 , W 2 2 , andW 3 3 are encoded into private streams s 1 , s 2 , and s 3 , respectively.
The vector of data streams to be transmitted is s = [s 123 , s 12 , s 13 , s 23 , s 1 , s 2 , s 3 ] T . After linear precoding using precoder P = [p 123 , p 12 , p 13 , p 23 , p 1 , p 2 , p 3 ], the signals are superposed and broadcast. The decoding procedure when K = 3 is more complex comparing with that in the two-user example. The main difference lies in decoding partial common streams for two-users. Define the streams to be decoded by l users as l-order streams. The 2-order streams to be decoded at user-1 are s 12 ands 13 . The 2-order streams to be decoded at user-2 and user-3 are s 12 ands 23 and s 13 ands 23 , respectively. As the 1-order and 2-order streams to be decoded at different users are not the same, we take user-1 as an example. The decoding procedure is the same for other users. User-1 decodes four streams s 123 , s 12 , s 13 , ands 1 based on SIC while treating other streams as noise. The decoding procedure starts from the 3-order stream (common stream) and progresses downwards to the 1-order stream (private stream). Specifically, user-1 first decodes s 123 and subtracts its contribution from the received signal. The SINR of the stream s 123 at user-1 is After that, user-1 decodes two streams s 12 , s 13 and treats interference of s 23 as noise. Both decoding orders of decoding s 12 followed by s 13 and s 13 followed by s 12 should be considered in order to maximize the WSR. Denote π l as one of the decoding order to decode l-order streams. There is only one 1-order stream and one 3-order stream to be decoded at each user. Therefore, only one decoding order exists for both π 1 and π 3 . In contrast, each user is required to decode two 2-order streams. Denote s π 2,k (i) as the ith data stream to be decoded at user-k based on the decoding order π 2 . One instance of π 2 is 12 → 13 → 23, where s 12 is decoded before s 13 and s 13 is decoded before s 23 at all users. Since only data streams s 12 and s 13 are decoded at user-1, the decoding order at user-1 based on π 2 is π 2,1 = 12 → 13. Hence, s π 2,1 (1) = s 12 and s π 2,1 (2) = s 13 . The data stream s π 2,1 (1) is decoded before s π 2,1 (2) . The SINRs of decoding streams s π 2,1 (1) and s π 2,1 (2) at user-1 are User-1 finally decodes s 1 by treating other data streams as noise. The three-user RS transmission model with the decoding order π 2 = 12 → 13 → 23 is shown in Fig. 2. The SINR of decoding s 1 at user-1 is The corresponding rate of each data stream is calculated in the same way as in the two-user example. To ensure that s 123 is successfully decoded by all users, the achievable common rate shall not exceed To ensure that s 12 is successfully decoded by user-1 and user-2, the achievable common rate shall not exceed R 12 = min R 12 1 , R 12 2 . Similarly, we have R 13 = min R 13 1 , R 13 3 and R 23 = min R 23 2 , R 23 3 . All boundary points for the three-user RS rate region can be obtained by assuming that R 123 , R 12 , R 13 , and R 23 are shared by the corresponding group of users. Denote the portion of the common rate allocated to user-k for the message s 123 as C 123 k , we have C 123 Similarly, we have C 12 1 + C 12 2 = R 12 , C 13 1 + C 13 3 = R 13 , and C 23 2 + C 23 3 = R 23 . Following the three-user RS structure described above, the total achievable rate of each user is R 1,tot = C 123 1 + C 12 1 + C 13 1 + R 1 , R 2,tot = C 123 2 + C 12 2 + C 23 2 + R 2 , and R 3,tot = C 123 3 + C 13 3 + C 23 3 + R 3 . For a given weight vector u = [u 1 , u 2 , u 3 ] and a fixed decoding order π = [π 1 , π 2 , π 3 ], the WSR achieved by the three-user RS approach is is the common rate vector required to be optimized in order to maximize the WSR. By calculating R RS 3 (u, π) for a set of different rate weights u, we obtain the rate region R RS 3 (π) of a certain decoding order π. The rate region of the three-user RS is achieved as the convex hull of the union over all decoding orders as R RS = conv π R RS (π) . Similar to the two-user case, SC-SIC and MU-LP are again easily identified as special sub-strategies of RS by switching off some of the streams. Problem (15) is non-convex and non-trivial. We propose an optimization algorithm in Section 4.7 to solve it based on the WMMSE approach.

Generalized rate-splitting
We further propose a generalized RS framework for K users. The users are indexed by the set K = {1, . . . , K}. For any subset A of the users, A ⊆ K, the BS transmits a data stream s A to be decoded by the users in the subset A while treated as noise by other users. s A loads messages of all the users in the subset A. The message intended for The stream order defined in Section 4.2 is applied to the generalized RS. The stream order of data stream s A is |A|. For a given l ∈ K, there are K l distinct l-order streams. For example, we have only one K-order stream (traditional common stream) while we have K 1-order streams (private steams). Define s l ∈ C ( K l )×1 as the l-order data stream vector formed by all l-order streams in {s A |A ⊆ K, |A | = l}. Note that when l = K, there is a single K-order stream. s K reduces to s K . For example, when K = 3, the 3-order stream vector is s 3 = s 123 . The 1-order and the 2-order stream vectors are s 1 = [s 1 , s 2 , s 3 ] T and s 2 = [s 12 , s 13 , s 23 ] T , respectively. The data streams are linearly precoded via the precoding matrix P l formed by p A |A ⊆ K, |A | = l . The precoded streams are superposed and the resulting transmit signal is At user sides, each user is required to decode the intended streams based on SIC. The decoding procedure starts from the K-order stream and then goes down to the 1-order stream. A given user is involved in multiple l-order streams with an exception of the K-order and 1-order streams. Denote π l as one of the decoding orders to decode the l-order data streams s l for all users. The l-order stream vector to be decoded at user-k based on a certain decoding order π l is The SINR of user-k to decode the l-order stream s π l,k (i) with a certain decoding order π l is where is the interference at user-k to decode s π l,k (i) .
j>i |h H k p π l,k (j) | 2 is the interference from the remaining non-decoded l-order streams in s π l,k .
is the interference from lower order streams s π l ,k , ∀l < l to be decoded at user-k.
is the interference from the streams that are not intended for user-k. The corresponding achievable rate of user-k for the data stream s π l,k (i) is To ensure that the streams shared by more than two users are successfully decoded by all users, the achievable rate of each user in the subset For a given l ∈ K, the l-order streams to be decoded at different users are different. s A is decoded at user-k (k ∈ A) based on the decoding order π |A|,k . R A becomes the rate of receiving stream s A at all users in the user group A with a certain decoding order π |A| . All boundary points for the K-user RS rate region can be obtained by assuming that R A is shared by all users in the user group A. Denote the portion of the common rate allocated to Following the RS structure described above, the total achievable rate of user-k is where R k is the rate of the 1-order stream s k . It is intended for user-k only. No common rate sharing is required for R k . For a given weight vector u = [u 1 , · · · , u K ] and a certain decoding order π = {π 1 , . . . , π K }, the WSR achieved by RS is is the precoding matrix of all order streams. c is the common rate vector formed by C A k |A ⊆ K, k ∈ A . For a fixed weight vector, problem (20) can be solved using the WMMSE approach discussed in Section 4.7 by establishing rate-WMMSE relationships for all data streams. By calculating R RS (u, π) for a set of different rate weights u, we obtain the rate region R RS (π) of a certain decoding order π. To achieve the rate region, all decoding orders should be considered. The capacity region of RS is defined as the convex hull of the union over all decoding orders as

Structured and low-complexity rate-splitting
The generalized RS described in Section 4.3 is able to provide more room for rate and QoS enhancements at the expense of more layers of SIC at receivers. Hence, though the generalized RS framework is very general and can be used to identify the best possible performance, its implementation can be complex due to the large number of SIC layers and common messages involved. To overcome the problem, we introduce two low-complexity RS strategies for K users, 1-layer RS and 2-layer hierarchical RS (HRS). Those two RS strategies require the implementation of one and two layers of SIC at each receiver, respectively.

1-layer RS
Instead of transmitting all order streams, 1-layer RS transmits the K-order common stream and 1-order private streams. Only one SIC is required at each receiver. The message of each user is split into two parts The messages W K 1 , . . . , W K K are jointly encoded into the K-order stream s K intended to be decoded by all users. W k k is encoded into s k to be decoded by user-k only. The overall data streams to be transmitted based on 1-layer RS is s The data streams are linearly precoded via precoder P = [p K , p 1 , . . . , p K ]. The resulting transmit signal is x = Ps = p K s K + k∈K p k s k . Figure 3 shows a 1-layer RS model. Readers are referred to Fig. 1 in [12] for a detailed illustration of the 1-layer RS architecture.
At user sides, all users firstly decode the data stream s K by treating the interference from s 1 , . . . , s K as noise. The SINR of the K-order stream at user-k is Once s K is successfully decoded, its contribution to the original received signal y k is subtracted. After that, user-k decodes its private stream s k by treating the 1-order private streams of other users as noise. The SINR of decoding the private stream s k at user-k is The corresponding achievable rates of user-k for the streams s K and s k are To ensure that s K is successfully decoded by all users, the achievable common rate shall not exceed k is the kth user's portion of the common rate with k∈K C K k = R K . Following the two-user RS structure described above, the total achievable rate of user-k Fig. 3 One-layer RS model of K users. The common stream s K is shared by all the users is R k,tot = C K k + R k . For a given weight vector u = [u 1 , . . . , u K ], the WSR achieved by the K-user 1-layer RS approach is For a given weight vector, problem (24) can be solved using the WMMSE approach in [28].
In contrast to NOMA, this 1-layer RS does not require any user ordering or grouping at the transmitter side since all users decode the common message (using single layer of SIC) before accessing their respective private messages. We also note that the 1-layer RS is a sub-scheme of the generalized RS and is a super-scheme of MU-LP (since by not allocating any power to the common message, the 1-layer RS boils down to MU-LP). However, for K > 2, SC-SIC and SC-SIC per group are not sub-schemes of 1-layer RS (even though they were subschemes of the generalized RS). This explains why, in [12], the authors already contrasted 1-layer RS and NOMA and expressed that the two strategies cannot be treated as extensions or subsets of each other. This 1-layer RS appeared in many scenarios subject to imperfect CSIT in [28, 29, 32-34, 38, 40, 41].

2-layer HRS
The K users are divided into G groups G = {1, . . . , G} with K g , g ∈ G users in each group. The user groups satisfy the same conditions as in Section 3.2.2. Besides the K-order stream and 1-order streams, 2-layer HRS also allows the transmission of a |K g |-order stream intended for users in K g . The overall data streams to be transmitted based on 2-layer RS is The data streams are linearly precoded via precoder Each user is required to decode three streams s K , s K g , and s k . We assume k ∈ K g . The data stream s K is decoded first by treating the interference from all other streams as noise. The SINR of the K-order stream at user-k is Once s K is successfully decoded, its contribution to the original received signal y k is subtracted. After that, user-k decodes its group common stream s K g by treating other group common streams and 1-order private streams as noise. The SINR of decoding the |K g |-order stream s K g at user-k is After removing its contribution to the received signal, user-k decodes its private stream s k . The SINR of decoding the private stream s k at user-k is The corresponding achievable rates of user-k for the streams s K , s K g , and s k are The achievable common rate of s K and s K g shall not exceed R K is shared among users such that C K k is the kth user's portion of the common rate with k∈K C K k = R K . R K g is shared among users in the group K g such that C K g k is the kth user's portion of the common rate with Following the two-user RS structure described above, the total achievable rate of user-k is For a given weight vector u =[ u 1 , . . . , u K ], the WSR achieved by the K-user 2-layer HRS approach is where c is the common rate vector formed by For a given weight vector, problem (28) can be solved by simply modifying the WMMSE approach discussed in Section 4.7.
Comparing with SC-SIC per group where |K g |−1 layers of SIC are required at user sides, 2-layer HRS only requires two layers of SIC at each user. Moreover, the user ordering issue in SC-SIC per group does not exist in 2-layer HRS. The streams of a higher stream order will always be decoded before the streams of a lower stream order. Onelayer RS is the simplest architecture since only one SIC is needed at each user and it is a sub-scheme of the 2-layer HRS. We also note that we can obtain a 1-layer RS per group from the 2-layer HRS by not allocating any power to s K . Note that SC-SIC and SC-SIC per group are not necessarily sub-schemes of the 2-layer HRS. The 2-layer HRS strategy was first introduced in [39] in the massive MIMO context.

Encompassing existing NOMA and SDMA
A comparison of NOMA, SDMA and RSMA are shown in Table 1. Comparing with NOMA and SDMA, the most important characteristic of RSMA is that it partially decodes interference and partially treats interference as noise through the split into common and privates messages. This capability enables RSMA to maintain a good performance for all user deployment scenarios and all network loads, as it will appear clearer in the numerical results of Section 5.
Let us further discuss how the proposed framework of generalized RS in Section 4.3 contrasts and encompasses NOMA, SDMA, and RS strategies. We first compare the four-user MIMO-NOMA scheme illustrated in Fig. 5 of [1] with the four-user 2-layer HRS strategy illustrated in Fig. 4. In Fig. 5 of [1], user-1 and user-2 are superposed in the same beam. User-3 and user-4 share another beam. The users are decoded based on SC-SIC within each beam. As for the four-user 2-layer HRS strategy in Fig. 4, the encoded streams are precoded and transmitted jointly to users. If we set the common message s 12 to be encoded by the message of user-2 only and decoded by both user-1 and user-2, the common message s 34 to be encoded by the message of user-4 and decoded by user-3 and user-4, we also set the precoders p 12 and p 1 to be equal, the precoders p 34 and p 3 to be equal, and the precoders of other streams to be 0, then the proposed RS scheme reduces to the scheme illustrated in Fig. 5 of [1]. Similarly, the K-user RS model can be reduced to the K-user MIMO-NOMA scheme. Therefore, the MIMO-NOMA scheme proposed in [1] is a particular case of our RS framework.
In view of the above discussions, it should now be clear that SDMA and the multi-antenna NOMA strategies discussed in the introduction (relying on SC-SIC and SC-SIC per group) are all special instances of the generalized RS framework. In the proposed generalized K-user RS model, if we set P l = 0, ∀l ∈ {2, · · · , K}, only 1-order streams (private streams) are transmitted. Each user only decodes its intended private stream by treating others as noise. Problem (20) is then reduced to the SDMA problem (3). If the message of each user is encoded into one stream of distinct stream order, problem (20) is equivalent to the SC-SIC problem (5). By keeping 1-order and K-order streams, we have the 1-layer RS strategy whose performance benefit in the presence of imperfect CSIT was highlighted in various scenarios in [28, 29, 32-34, 38, 40, 41]. There is only one common data stream to be transmitted and decoded by all users before each user decodes its private stream. By keeping 1-order, K-order, and l-order streams, where l is selected from {2, · · · , K − 1}, the problem becomes the 2-layer HRS originally proposed in [39] with two layers of common messages to be transmitted. Another example of such a multi-layer RS has also appeared in the topological RS for MISO networks of [30]. Therefore, the formulated K-user RS problem is a more general problem. It encompasses SDMA, NOMA, and existing RS methods as special cases.
Though the current work focuses on MISO BC, the RS framework can be extended to multi-antenna users and the general MIMO BC [31] as well as to a general network scenario with multiple transmitters [30]. Nevertheless, the optimization of the precoders in those scenarios remain interesting topics for future research. Applications of this RS framework to relay networks is also worth exploring. Preliminary ideas have appeared in [43], though joint encoding of the splitted common messages are not taken into account.

Complexity of RSMA
We further discuss the complexity of RSMA by comparing it with NOMA and SDMA. A qualitative comparison of NOMA, SDMA, and RSMA is shown in Table 2. In Table 2, RS refers to the generalized RS of Section 4.3.
As mentioned in the introduction, the complexity of NOMA in the multi-antenna setup is increasing significantly at both the transmitter and the receivers. The optimal decoding order of NOMA is no longer fixed based on the channel gain as in the SISO BC. To maximize the WSR, the decoding order should be optimized together with precoders at the transmitter. Moreover, SC-SIC is suitable for aligned users with large channel gain difference. A proper user scheduling algorithm increases the scheduler complexity. At user sides, K − 1 layers of SIC are required at each user for a K-user SC-SIC system. Increasing the number of users leads to a dramatic increase of the scheduler and receiver complexity and is subject to more error propagation in the SICs.
SC-SIC per group reduces the complexity at user sides. Only K G layers of SIC are required at each user if we uniformly group the K users into G groups. However, the complexity at the transmitter increases with the number of user groups. A joint design of user ordering and user grouping for all groups is necessary in order to maximize the WSR. For example, for a 4-user system, if we divide the users into two groups with two users in each group, we should consider three different user grouping methods and four different decoding orders for each grouping method.
The complexity of MU-LP is much reduced as it does not require any SIC at user sides. However, as MU-LP is more suitable for users with (semi-)orthogonal channels and similar channel strengths, the transmitter requires accurate CSIT and user scheduling should be carefully designed for interference coordination. The scheduler complexity at the transmitter is still high.
Comparing with NOMA and SDMA, RSMA is able to balance the performance and complexity better. All forms of RS are suitable for users with any channel gain difference and any channel angle in between, though a multi-layer RS would have more flexibility. Considering the generalized RS, the decoding order of multiple streams with the same stream order should be optimized together with the precoders when there are multiple streams of the same stream order intended for each user (e.g., each user decodes two 2-order streams in the 3-user example of Section 4.2.). But its special case, 1-layer RS, simplifies both the scheduler and receiver design, and it is still able to achieve a good performance in all user deployment scenarios. One-layer RS requires only one SIC at each user. It does not rely on user grouping and user ordering for user scheduling. Therefore, the complexity of the scheduler is much simplified. The cost of RSMA comes with a slightly higher encoding complexity since private and common streams need to be encoded. For the 1-layer RS in a K-user MISO BC, K + 1 streams need to be encoded in contrast to K streams for NOMA and SDMA.

Optimization of RS
The WMMSE approach proposed in [42] is extended to solve the problem. The WMMSE algorithm to solve the sum rate maximization problem with 1-layer RS (discussed in Section 4.4.1) is proposed in [28]. We further extend it to solve the generalized RS problem (20). To simplify the explanation, we focus on the 3-user problem (15). It can be easily extended to solve the K-user generalized RS problem.
It can be easily shown that by minimizing (36a) with respect to u and g, respectively, we obtain the MMSE solutions u MMSE , g MMSE formed by the corresponding MMSE equalizers and weights. They satisfy the KKT optimality conditions of (36) for P. Therefore, according to the rate-WMMSE relationship (35) and the common rate transformation c = −x, problem (36) can be transformed to problem (15). For any point (x * , P * , u * , g * ) satisfying the KKT optimality conditions of (36), the solution given by (c * = −x * , P * ) satisfies the KKT optimality conditions of (15). The WSR problem (15) is then transformed into the WMMSE problem (36). The problem (36) is still nonconvex for the joint optimization of (x, P, u, g). We have derived that when (x, P, u) are fixed, the optimal equalizer is the MMSE equalizer g MMSE . When (x, P, g) are fixed, the optimal weight is the MMSE weight u MMSE . When (u, g) are fixed, (x, P) is coupled in the optimization problem (36), closed-form solution cannot be derived. But it is a convex quadratically constrained quadratic program (QCQP) which can be solved using interior-point methods. These properties motivates us to use AO to solve the problem. In nth iteration of the AO algorithm, the equalizers and weights are firstly updated using the precoders obtained in the n − 1th iteration (u, g) = u MMSE (P [n−1] ), g MMSE (P [n−1] ) . With the updated (u, g), (x, P) can then be updated by solving the problem (36). (u, g) and (x, P) are iteratively updated until the WSR converges. The details of the AO algorithm is shown in Algorithm 1, where WSR [n] is the WSR calculated based on the updated (x, P) in nth iteration. is the tolerance of the algorithm. The AO algorithm is guaranteed to converge as the WSR is increasing in each iteration and it is bounded above for a given power constraint. When considering imperfect CSIT, we follow the robust approach proposed in [28] for 1-layer RS with imperfect CSIT. The precoders are optimized based on the available channel estimate to maximize a conditional averaged weighted sum rate (AWSR) metric, computed using partial CSIT knowledge. The stochastic AWSR problem was transformed into a deterministic counter part using the sample average approximated (SAA) method. Then, the rate-WMMSE relationship is applied to transform the AWSR problem into a convex form and solved using an AO algorithm. The robust approach for 1-layer RS in [28] can be easily extended to solve the K-user generalized RS problem based on our proposed Algorithm 1, which will not be explained here.

Results and discussion
In this section, we evaluate the performance of SDMA, NOMA , and RSMA in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths, and qualities of channel state information at the transmitter). We first illustrate the rate region of different strategies in the two-user case followed by the WSR comparisons of the three-user, four-user, and ten-user cases.

Underloaded two-user deployment with perfect CSIT
When K = 2, the rate region of all strategies can be explicitly compared in a two-dimensional figure. As mentioned earlier, the rate region is the set of all achievable points. Its boundary is calculated by varying the weights assigned to users. In this work, the weight of user-1 is fixed to u 1 = 1. The weight of user-2 is varied as u 2 = 10 [−3,−1,−0.95,··· ,0.95, 1,3] , which is the same as in [42]. To investigate the largest achievable rate region, the individual rate constraints are set to 0 in all strategies R th k = 0, ∀k ∈ {1, 2}.
In the perfect CSIT scenario, the capacity region is achieved by DPC. Therefore, we compare the rate regions of different beamforming strategies with the DPC region. The DPC region is generated using the algorithm in [44].
Since the WSR problems for all beamforming strategies described earlier are non-convex, the initialization of P is vital to the final result. It has been observed in [28] that maximum ratio transmission (MRT) combined with singular value decomposition (SVD) provides good overall performance over various channel realizations. It is used in this work for precoder initialization of RS. The precoders for the private message p k is initialized as

Random channel realizations
We firstly consider the scenarios when the channel of each user h k has independent and identically distributed (i.i.d.) complex Gaussian entries with a certain variance, i.e., CN 0, σ 2 k . The BS is equipped with two or four antennas (N t = 2, 4) and serves two single-antenna users. Figure 5 shows the average rate regions of different strategies over 100 random channel realizations when σ 2 1 = 1, σ 2 2 = 1, and N t = 4. SNRs are 10 and 20 dB, respectively. When the number of transmit antenna is larger than the number of users, MU-LP achieves a good performance. The generated precoders of the users tend to be more orthogonal as the number of transmit antennas increases. In contrast, the average rate region achieved by SC-SIC is small. When σ 2 1 = 1andσ 2 2 = 1, there is no disparity of average channel strengths. SC-SIC is not able to achieve a good performance in such scenario. As the SC-SIC strategy is motivated by leveraging the channel strength difference among users, it achieves a good performance when the channels are degraded. Specifically, when the channels of users are close to alignment, SC-SIC works better than MU-LP if the users have asymmetric channel strengths. However, for the general non-degraded MISO-BC, SC-SIC often yields a performance loss [19]. The simulation results when σ 2 1 = 1, σ 2 2 = 0.09, and N t = 2 is illustrated in Fig. 6. The average channel gain difference between the users increases to 5 dB, and the number of the transmit antenna reduces to two. In such scenario, the rate region gap between RS and MU-LP increases while the rate region gap between RS and SC-SIC decreases. It shows that SC-SIC is more suited to the scenarios where the users experience a large disparity in channel strengths. In both Figs. 5 and 6, the rate region gaps among different strategies increase with SNR. RS achieves a larger rate region than SC-SIC and MU-LP, and it is closer to the capacity region achieved by DPC.

Specific channel realizations
In order to have a better insight into the benefits of RS over MU-LP and SC-SIC, we investigate the influence of user angle and channel strength on the performance. When N t = 4, the channels of users are realized as In above channel realizations, γ and θ are control variables. γ controls the channel strength of user-2. If γ = 1, If γ = 0.3, user-2 suffers from an additional 5 dB path loss compared to user-1. θ controls the angle between the channels of user-1 and user-2. It varies from 0 to π 2 . If θ = 0, the channel of user-1 is aligned with that of user-2. If θ = π 2 , the channels of user-1 and user-2 are orthogonal to each other. In the following results, γ = 1, 0.3, which corresponds to 0 dB, 5 dB channel strength difference, respectively. For each γ , θ adopts value from θ = π 9 , 2π 9 , π 3 , 4π 9 . Intuitively, when θ is less than π 9 , the channels of users are sufficiently aligned and SC-SIC performs well. When θ is larger than 4π 9 , the channels of users are sufficiently orthogonal to each other and MU-LP is more suitable. Therefore, we consider angles within the range of π 9 , 4π 9 . SNR is fixed to 20 dB. When N t = 2, the channels of user-1 and user-2 are realized as h 1 = [1, 1] H and h 2 = γ × 1, e jθ H , respectively. The same values of γ and θ are adopted in N t = 2 as used in N t = 4 13 . a b c d Fig. 7 Achievable rate region comparison of different strategies in underloaded two-user deployment with perfect CSIT, γ = 1 and N t = 4, SNR = 20 dB. a θ = π/9. b θ = 2π/9. c θ = π/3. d θ = 4π/9 Figure 7 shows the results when γ = 1 and N t = 4. In all subfigures, the rate region achieved by RS is equal to or larger than that of SC-SIC and MU-LP. When γ = 1 and θ = π 9 , the channels of user-1 and user-2 almost coincide. RS exhibits a clear rate region improvement over SC-SIC and MU-LP. SC-SIC cannot achieve a good performance due to the equal channel gain while the performance of MU-LP is poor when the user channels are closely aligned to each other. As θ increases, the gap between the rate regions of RS and MU-LP reduces as the performance of MU-LP is better when the channels of users are more orthogonal to each other while the gap between the rate regions of MU-LP and SC-SIC increases. The rate regions of RS and MU-LP tend to the capacity region achieved by DPC as θ increases. As shown in Fig. 7d, when the channels of users are sufficiently orthogonal to each other, the rate regions of DPC, RS, and MU-LP are almost identical. In such an orthogonal scenario, RS reduces to MU-LP. Figure 8 shows the results when γ = 1 and N t = 2. In all subfigures, RS outperforms MU-LP and SC-SIC. Comparing with the results of N t = 4, the rate region gap between RS and MU-LP is enlarged when N t = 2. When the number of transmit antenna decreases, it becomes more difficult for MU-LP to design orthogonal precoders for users. MU-LP is more suited to underloaded scenarios (N t > K). In both Figs. 7 and 8, the rate region of SC-SIC is the worst due to the equal channel gain. In contrast, RS performs well for any angle between user channels. Figure 9 shows the rate region comparison of DPC, RS, SC-SIC, and MU-LP transmission schemes with 5 dB channel strength difference between the two users, i.e., γ = 0.3 and N t = 4. RS and SC-SIC are much closer to the DPC region in the setting of Fig. 9 compared to Fig. 7 because of the 5 dB channel strength difference. Figure 9b, c are interesting as SC-SIC and MU-LP outperform each other at one part of the rate region. There is a crosspoint between the two schemes in each figure mentioned. The rate region of RS is equal to or larger than the convex hull of the rate regions of SC-SIC and MU-LP. Figure 10 shows the rate region comparison when γ = 0.3 and N t = 2. Comparing Fig. 10 with Fig. 9, SC-SIC achieves a relatively better performance when the number of transmit antenna reduces. The WSRs of RS and SC-SIC are overlapped, and they almost achieve the capacity region when θ = π 9 . However, as θ increases, the rate region gap between RS and SC-SIC increases despite the 5 dB channel gain difference. Both SC-SIC and RS rely on one SIC when there are two users in the system. Though the receiver complexity of SC-SIC and RS a b c d are the same, RS achieves explicit performance gain over SC-SIC in most investigated scenarios. Comparing with MU-LP and SC-SIC, RS is suited to any channel angles and channel gain difference. More results of underloaded two-user deployments with perfect CSIT are given in Appendix 1. We further illustrate the rate regions of different strategies when SNR is 10 dB. Comparing the corresponding figures of 10 dB and 20 dB, we conclude that as SNR increases, the gaps among the rate regions of different schemes increase, with RS exhibiting further performance benefits. In all investigated scenarios, RS always outperforms MU-LP and SC-SIC.

Underloaded two-user deployment with imperfect CSIT
Next, we investigate the rate region of different transmission schemes in the presence of imperfect CSIT. We assume the users are able to estimate the channel perfectly while the instantaneous channel estimated at the BS is imperfect. We assume the estimated channel of user- respectively. The precoders are initialized and designed using the estimated channels h 1 and h 2 and the same methods as stated in perfect CSIT scenarios. One thousand different channel error samples are generated for each user. Each point in the rate region is the average rate 14 over the generated 1000 channels. SNR is fixed to 20 dB. Figures 11 and 12 show the results when γ = 1 and γ = 0.3, respectively. Similarly to the results in perfect CSIT, the gaps between the rate regions of RS and MU-LP reduce as θ increases in both figures. When θ = 4π 9 , the channels of the two users are sufficiently orthogonal. The rate regions of RS and MU-LP are almost identical. SC-SIC achieves a good performance when the channels of users are sufficiently aligned with enough channel gain difference, as shown in Fig. 12a.
Comparing Figs. 11 and 7, the rate region gap between RS and MU-LP increases in imperfect CSIT due to the residual interference introduced. The interferencenulling in MU-LP is distorted and yields residual interference at the receiver, which jeopardizes the achievable rate. In contrast, the rate region gap between RS and SC-SIC slightly reduces in imperfect CSIT, as observed a b c d by comparing Fig. 12 with Fig. 9. SC-SIC is less sensitive to CSIT inaccuracy comparing with MU-LP. However, the rate region gap between RS and SC-SIC is still obvious. In comparison, RS is more flexible and robust to multi-user interference originating from the imperfect CSIT, as evidenced by the recent literature on RS with imperfect CSIT [27][28][29][30][31][32][33][38][39][40][41]. With RS, the amount of interference decoded by both users (through the presence of common stream) is adjusted dynamically to the channel conditions (channel directions and strengths) and CSIT inaccuracy. More results of underloaded two-user deployments with imperfect CSIT are given in Appendix 2. The rate regions of different strategies for varied SNR, N t and γ are illustrated. We further show that the performance of RS is stable in a wide range of parameters, namely number of transmit antennas, user deployments, and CSIT inaccuracy. RS achieves equal or better performance than MU-LP and SC-SIC in all simulated channels.

Underloaded three-user deployment with perfect CSIT
When K = 3, the rate region of each strategy is a threedimensional surface. The gaps among rate regions of different strategies are difficult to display. As each point of the rate region is derived by solving the WSR problem with a fixed weight vector u, the WSRs instead of the rate regions of different transmission strategies are compared in the three-user case. Two RS schemes are investigated in three-user deployments. RS refers to the generalized RS strategy of Section 4.2 and 1-layer RS refers to the low-complexity RS strategy of Section 4.4.1. We compare the WSR of RS, 1-layer RS, DPC, SC-SIC, and MU-LP. The beamforming initialization of different strategies is extended based on the methods adopted in the two-user case. There are three streams of distinct stream orders in RS (1/2/3-order streams). The precoders of the streams are initialized differently. The transmit power P t is divided into three parts α 1 P t , α 2 P t , and α 3 P t for streams of three distinct stream orders, where α 1 , α 2 , α 3 ∈[ 0, 1] and α 1 + α 2 + α 3 = 1. The precoder p k , ∀k ∈ {1, 2, 3} of the 1-order stream (private stream) s k is initialized as is the allocated power. The precoders p 12 , p 13 , andp 23 of the 2-order streams are initialized as p 12 = p 12 u 12 , p 13 = p 13 u 13 , and p 23 = p 23 u 23 , respectively, where p 12 = p 13 = p 23 = α 2 P t 3 and u 12 [ h 1 , h 2 , h 3 ]. The beamforming initialization of 1-layer RS is similar as RS except we have p 123 and p k , ∀k ∈ {1, 2, 3} only. By setting α 2 = 0, the initialization of RS is applied to 1-layer RS. To ensure a fair comparison, the precoders of MU-LP are initialized based on MRT. For SC-SIC, the precoder of the user decoded first p π(1) is initialized as p π(1) = p π(1) u π(1) , where p π(1) = α 3 P t and u π(1) is the largest left singular vector of the channel matrix H 123 =[ h 1 , h 2 , h 3 ]. The precoder of the user decoded secondly p π(2) is initialized as p π(2) = p π(2) u π (2) , where p π(2) = α 2 P t and u π (2) is the largest left singular vector of the channel matrix H π(23) =[ h π (2) , h π (3) ]. The user decoded last is initialized based on MRT.
We firstly consider an underloaded scenario. The BS is equipped with four transmit antennas (N t = 4) and serves three single-antenna users in all simulations. The individual rate constraint is set to 0, R th k = 0, ∀k ∈ {1, 2, 3}. The channel of users are realized as γ 1 andγ 2 and θ 1 andθ 2 are control variables as discussed in the two-user case. For a given set of γ 1 andγ 2 , θ 1 adopts value from θ 1 = π 9 , 2π 9 , π 3 , 4π 9 and θ 2 = 2θ 1 . When θ 1 = π 9 andθ 2 = 2π 9 , the channels of user-1 and user-2, and user-2 and user-3 are sufficiently aligned. When θ 1 = 4π 9 andθ 2 = 8π 9 , the channels of user-1 and user-2 and user-2 and user-3 are sufficiently orthogonal. We consider SNRs within the range 0 to 30 dB. We assume the sum of the weights allocated to users is equal to one, i.e., u 1 + u 2 + u 3 = 1. Figures 13 and 14 show the results when the weight vectors are u =[ 0.2, 0.3, 0.5] and u =[ 0.4, 0.3, 0.3], respectively. In both figures, γ 1 = 1andγ 2 = 0.3. There is a 5 dB channel gain difference between user-1 and user-3 as well as between user-2 and user-3. In all scenarios and SNRs, RS always outperforms MU-LP and SC-SIC. Comparing with Fig. 14, the WSR improvement of RS is more explicit in Fig. 13. It implies that RS provides better enhancement of system throughput and user fairness. The performance of SC-SIC is the worst in most subfigures. This is due to the underloaded user deployments where N t > K. One of the three users are required to decode all the messages, and all the spatial multiplexing gains are sacrificed. Therefore, the sum DoF of SC-SIC is reduced to 1, resulting in the deteriorated performance of SC-SIC in underloaded a b c d Fig. 13 Weighted sum rate versus SNR comparison of different strategies for underloaded three-user deployment with perfect CSIT, γ 1 = 1, γ 2 = 0.3, u 1 = 0.2, u 2 = 0.3, u 3 = 0.5, N t = 4, R th k = 0, k ∈ {1, 2, 3}. a θ 1 = π/9, θ 2 = 2π/9. b θ 1 = 2π/9, θ 2 = 4π/9. c θ 1 = π/3, θ 2 = 2π/3. d θ 1 = π/9, θ 2 = 8π/9 a b c d Fig. 14 Weighted sum rate versus SNR comparison of different strategies for underloaded three-user deployment with perfect CSIT, . a θ 1 = π/9, θ 2 = 2π/9. b θ 1 = 4π/9, θ 2 = 4π/9. c θ 1 = π/3, θ 2 = 2π/3. d θ 1 = 4π/9, θ 2 = 8π/9 scenarios. In comparison, the performance of MU-LP is better than SC-SIC except in Fig. 14a. MU-LP is more likely to serve the users with higher weights and channel gains by turning off the users with poor weights and channel gains when there is no individual rate constraints. It cannot deal efficiently with user fairness when a higher weight is allocated to the user with weaker channel strength. In contrast, SC-SIC works better when user fairness is considered. The WSR achieved by low-complexity 1-layer RS is equal to or larger than that of MU-LP and SC-SIC in most subfigures. Comparing with SC-SIC and MU-LP, 1-layer RS is more robust to different user deployments and only a single SIC is required at each user. Moreover, the WSR of 1-layer RS is approaching that of RS in all user deployments. Considering the trade-off between performance and complexity, 1-layer RS is a good alternative to RS. In all three-user deployments of SC-SIC, the decoding order is required to be optimized together with the precoder. To investigate the influence of different decoding orders, we compare the WSRs of SC-SIC using different decoding orders when u 1 = 0.2, u 2 = 0.  Fig. 15, the WSR of six different decoding orders are illustrated in the circumstance where there is a 5dB channel gain difference between user-1/2 and user-3. When γ 1 = 1andγ 2 = 0.3, it is typical to decode the message of user-3 first as the channel gain of user-3 is the worst. However, we notice that the optimal decoding order in Fig. 15 is order 3, user-1 is decoded first. This is due to the smallest weight allocated to user-1, u 1 = 0.2. It implies that the weights assigned to users will affect the optimal decoding order. The scheduler complexity of SC-SIC becomes extremely high in order to find the optimal decoding order. In contrast, 1-layer RS has a much lower scheduling complexity and does not rely on any user ordering at the transmitter. Moreover, it only requires a single SIC at each receiver. a b c d Fig. 15 Weighted sum rate versus SNR comparison of different decoding order of SC-SIC for underloaded three-user deployment with perfect CSIT, γ 1 = 1, γ 2 = 0.3, u 1 = 0.2, u 2 = 0.3, u 3 = 0.5, N t = 4, R th k = 0, k ∈ {1, 2, 3}. a θ 1 = π/9, θ 2 = 2π/9. b θ 1 = 2π/9, θ 2 = 4π/9. c θ 1 = π/3, θ 2 = 2π/3. d θ 1 = 4π/9, θ 2 = 8π/9 More results of underloaded three-user deployments with perfect CSIT and imperfect CSIT are given in Appendices 3 and 5, respectively. The WSRs of different strategies for varied SNR, N t , γ 1 , γ 2 , and u are illustrated. In all figures, RS outperforms SC-SIC and MU-LP. Though the scheduler and receiver complexity of 1-layer RS is low, it achieves equal or better performance than SC-SIC and MU-LP in most figures of perfect CSIT and all figures of imperfect CSIT. All forms of RS are robust to a wide range of CSIT inaccuracy, channel gain difference, and channel angles among users.

Two transmit antenna deployment
We first consider an overloaded scenario where the BS is equipped with two antennas (N t = 2) and serves three single-antenna users. The channel realizations and beamforming initialization follows the methods used in the underloaded three-user deployment. The channel of users are realized as h 1 = [1, 1] H , h 2 = γ 1 × 1, e jθ 1 H , and h 3 = γ 2 × 1, e jθ 2 H . In overloaded scenarios, to guarantee some QoS, we add individual rate constraints to users as the system has otherwise a tendency to turn off some users. In all simulations of two transmit antenna deployment, we assume the rate threshold of each user is equal R th 1 = R th 2 = R th 3 . Since the BS is able to serve users with higher QoS requirements as SNR increases, the rate threshold is assumed to increase with SNR. The rate threshold increases as r th = [0.02, 0.08, 0.19, 0.3, 0.4, 0.4, 0.4] bit/s/Hz for SNR = [0, 5,10,15,20,25,30] dBs.
We compare the performance of RS, 1-layer RS, SC-SIC, MU-LP, and SC-SIC per group in the overloaded threeuser deployment. In SC-SIC per group, we consider a fixed grouping method. We assume user-1 is in group 1 while user-2 and user-3 are in group 2. The decoding order will be optimized together with the precoder. The beamforming initialization of SC-SIC per group is different from SC-SIC. In group 1, the precoder of user-1 is initialized based on MRT. In group 2, the precoder of the user decoded first p π(1) is initialized as p π(1) = p π(1) u π (1)  4] bit/s/Hz. a θ 1 = π/9, θ 2 = 2π/9. b θ 1 = 2π/9, θ 2 = 4π/9. c θ 1 = π/3, θ 2 = 2π/3. d θ 1 = 4π/9, θ 2 = 8π/9 constraints are not zero and N t < K, MU-LP cannot coordinate the multi-user interference coming from all the users served simultaneously. When the angles of channels are large enough (subfigure c and subfigure d of Fig. 16), the WSR of SC-SIC per group is better than SC-SIC. This is due to its ability to combine treating interference as noise (to tackle inter-group interference) with decoding interference (to tackle intra-group interference). However, as the angles of channels decrease, the performance of SC-SIC becomes better while that of SC-SIC per group is worse. Whether SC-SIC outperforms SC-SIC per group depends on SNR and user deployments. To ensure the WSR of the NOMA system is maximized, a joint optimization of NOMA strategies based on switching between SC-SIC and SC-SIC per group on top of deciding, the user grouping and user ordering is required. Such switching method has high scheduler and receiver complexity while its achieved performance is still lower than the simple 1-layer RS in most user deployments.

Single transmit antenna deployment
In a SISO BC, there is no need to split the messages into common and private parts since the capacity region is achieved by SC-SIC. Nevertheless, in view of the benefit of 1-layer RS in the MISO BC, we may wonder whether RS can be of any help in a SISO BC, especially when it comes to reducing the complexity of the receivers and the number of SIC needed. We therefore compare the performance of 1-layer RS with SC-SIC in a 3-user SISO BC. We note that SC-SIC requires two layers of SIC while 1-layer RS requires a single SIC for all users. The channel of each user h k has an i.i.d. complex Gaussian entry with a certain variance, i.e., CN 0, σ 2 k . Figure 17 shows the average WSRs of different strategies over ten random channel realizations when σ 2 1 = 1, σ 2 2 = 0.3, andσ 2 3 = 0.1. 1-layer RS is able to achieve very close performance to SC-SIC. Comparing with SC-SIC, the complexity of 1-layer RS is much reduced. There is no ordering issue at the BS, and only one SIC is required at each user. Jointly considering the performance and complexity of the system, 1-layer RS is an attractive alternative to SC-SIC.
More results of overloaded three-user deployments with perfect CSIT and imperfect CSIT are given in Appendices 4 and 6, respectively. The WSRs of different strategies for varied SNR, N t , γ 1 , γ 2 , and u are illustrated. We further show that RS exhibits a clear WSR gain over SC-SIC, SC-SIC per group, and MU-LP in all simulated channels and weights. One-layer RS outperforms SC-SIC, SC-SIC per group and MU-LP in most simulated scenarios. It is more robust and achieves a nearly equivalent WSR to that of RS in all user deployments. We also show that 1-layer RS achieves near optimal performance in various channel conditions of SISO BC.

Overloaded four-user deployment with perfect CSIT
We further investigate the four-user system model shown in Fig. 4, where user-1 and user-2 are in group 1 while user-3 and user-4 are in group 2. We compare the 2-layer HRS, 1-layer RS per group, 1-layer RS, SC-SIC per group, and MU-LP. In 2-layer HRS, the intra-group interference is mitigated using the intra-group common streams s 12 and s 34 , and the inter-group interference is mitigated using the inter-group common stream s 1234 . One-layer RS and 1-layer RS per group are two special strategies of 2layer HRS. All users in 1-layer RS are treated as single group. Only the 4-order common stream s 1234 and 1-order private streams are active. No power is allocated to s 12 and s 34 . In contrast, 1-layer RS per group only allocate power to the intra-group common stream s 12 and s 34 and 1-order private streams. No power is allocated to the inter-group common stream s 1234 . Users within each group are served using RS and users across groups are served using SDMA so as to mitigate the inter-group interference.
We consider an overloaded scenario. The BS is equipped with two antennas and serves four single-antenna users. The channel of users are realized as γ 1 , γ 2 , γ 3 and θ 1 , θ 2 , θ 3 are control variables. θ 1 is the channel angle between user-1 and user-2. It is denoted as intra-group angle of group 1. θ 2 is the channel angle between user-1 and user-2. θ 2 − θ 1 is the channel angle between user-2 and user-3, denoted as inter-group angle. θ 3 is the channel angle between user-1 and user-3. θ 3 − θ 2 is the channel angle between user-3 and user-4. It is the intra-group angle of group 2. In the following, we assume the intra-group angle of group 1 is the same as that of group 2. We have θ 3 = θ 1 + θ 2 . In each figure, the intragroup angle is varied as θ 1 = 0, π 18 , π 9 , π 6 . The individual rate constraint is set to r th =[ 0.03, 0.1, 0.2, 0.3, 0.4, 0.4, 0.4] bit/s/Hz for SNR =[ 0, 5,10,15,20,25,30] dBs. The weights of users are assumed to be equal, i.e., u 1 = u 2 = u 3 = u 4 = 0.25. We also assume the channel gain difference within each group is equal. The channel gain of user-3 is equal to that of user-1 (γ 2 = 1), and the channel gain of user-4 is equal to that of user-2 (γ 3 = γ 1 ). Figures 18 and 19 show the results when γ 1 = 0.3. The inter-group angles are π 9 and π 3 , respectively. The WSR achieved by 2-layer HRS is equal to 1-layer RS in both figures, which means that 2-layer HRS reduces to 1-layer RS in these user deployments. Two-layer HRS and 1-layer RS outperform all other schemes. The inter-group and intra-group interference can be jointly mitigated by one layer common message. As the inter-group angle increases, the WSR gaps between 2-layer HRS and 1-layer RS per group reduces. The inter-group interference can be coordinated by SDMA when the inter-group angle is sufficiently large. One-layer RS per group has the same WSR as SC-SIC per group in both figures. It reduces to SC-SIC per group because SC-SIC is more suitable when the intra-group angle is sufficiently small and the channel gain difference between users within each group is sufficiently large.
More results of overloaded four-user deployments with perfect CSIT are given in Appendix 7. The WSRs of different strategies when there is no channel gain difference (γ 1 = 1) are illustrated. We further show that 2-layer HRS, 1-layer RS, and 1-layer RS per group achieve equal or better performance than SC-SIC per group and MU-LP in all simulated channel conditions.

Overloaded ten-user deployment with perfect CSIT
We further consider an extremely overloaded scenario subject to QoS constraints. The BS is equipped with two antennas (N t = 2) and serves ten users. certain variance, i.e., CN (0, σ 2 k ). The rate of each user is averaged over the 10 randomly generated channels. We compare 1-layer RS, MU-LP, multi-cast, and SC-SIC with a certain decoding order. There are 10! different decoding orders of SC-SIC in the ten-user case. The optimal decoding order of SC-SIC is intractable. In the following simulations, only the decoding order based on the ascending channel gain is considered for WSR calculation in SC-SIC. It is the optimal decoding order in SISO BC. Multicast can be regarded as a special scheme of 1-layer RS with only the 10-order stream to be transmitted to all users. The weight of each user is assumed to be equal to 1. Figure 20 shows the WSRs of different strategies when σ 2 1 = σ 2 2 = . . . = σ 2 10 = 1, r th =[ 0.01, 0.03, 0.05, 0.1, 0.1, 0.1, 0.1] bit/s/Hz. The WSR achieved by the multi-cast scheme is the worst. In such an overloaded user deployment, the spectral efficiency of multi-cast is low as it is difficult for a single beamformer to satisfy all users. Under the rate constraint r th , the WSR of SC-SIC is better than that of MU-LP while the slopes of the WSRs are the same for large SNRs. It implies that SC-SIC and MU-LP achieve the same DoF of 1. In contrast, 1-layer RS shows an obvious WSR improvement over all other strategies and exhibits a DoF of two. This highlights that RS exploits the maximum DoF of the considered deployments (that is limited by two, given the two transmit antennas). To further investigate the reason behind the results, we focus on one random channel realization. The WSRs achieved by all strategies when SNR = 30 dB are compared as shown in Fig. 21. The optimized common rate vector of one-layer RS is c =[ 0, 0.1, 0.1, 0.1, 0, 0.1, 0.1, 0.1, 0.1, 0.1] bit/s/Hz. No common rate is allocated to user-1 and user-5. But in Fig. 21, we can observe that the rate allocated to user-1 and user-5 are the highest. It implies that RS uses the common message to pack messages from eight users and uses two transmit antennas to deliver private messages to user-1 and user-5. RS achieves a sum-DoF of 2 in the overloaded regime. In contrast, MU-LP and SC-SIC allocate most of power to single user. The rate achieved by user-5 when using MU-LP and the rate achieved by user-10 when using SC-SIC are much higher than other users in Fig. 21. The DoFs achieved by MU-LP and SC-SIC are limited to 1 in such circumstance.
Note that results here show the usefulness of the RS framework for massive IoT or MTC services. Those devices are typically cheap. In the example above, user-1 and user-5 could be high-end devices, for which RS would be implemented. Those devices would therefore perform SIC. All other devices could be IoT or MTC devices, who would not need to implement RS, nor SIC, but simply a b c d Fig. 19 Weighted sum rate versus SNR comparison of different strategies for overloaded four-user deployment with perfect CSIT, γ 1 = 0.3, decode the common message. Hence, the RS framework can be used to pack the IoT/MTC traffic in the common message.
More results of overloaded ten-user deployments with perfect CSIT are given in Appendix 8. We further We show that the when the rate threshold of each user is 0, MU-LP is able to achieve a DoF of 2. However, as the rate threshold increases, MU-LP cannot coordinate the inter-user interference and its achieved DoF drops to 1. In the extremely overloaded scenario, the WSR gap between RS and SC-SIC is still large. SC-SIC makes an inefficient use of the transmit antennas and achieves a DoF of 1.

Conclusions
To conclude, we propose a new multiple access called rate-splitting multiple access (RSMA). We compare the proposed RSMA with SDMA and NOMA by solving the problem of maximizing WSR in MISO-BC systems with QoS constraints. Both perfect and imperfect CSIT are investigated. WMMSE and its modified algorithms are adopted to solve the respective optimization problems. We show that SDMA and NOMA are subject to many limitations, including high-system complexity and a lack of robustness to user deployments, network load, and CSIT inaccuracy. We propose a general multiple access framework based on rate splitting (RS), where the common symbols decoded by different groups of users are transmitted on top of private symbols decoded by the  corresponding users only. Thanks to its ability of partially decoding interference and partially treating interference as noise, RSMA softly bridges and outperforms SDMA and NOMA in any user deployments, CSIT inaccuracy, and network load. The simplified RS forms, such as 1-layer RS and 2-layer HRS, show great potential to reduce the scheduler and receiver complexity but maintain good and robust performance in any user deployments, CSIT inaccuracy, and network load. Particularly, we show that 1-layer RS is an attractive alternative to SC-SIC in a SISO BC deployment due to its near optimal performance and very low complexity. Therefore, RSMA is a more general and powerful multiple access for downlink multiantenna systems that encompasses SDMA and NOMA as special cases. RSMA has the potential to change the design of the physical layer and MAC layer of next-generation communication systems by unifying existing approaches and relying on a superposed transmission of common and private messages. Many interesting problems are left for future research, including among others the role played by RSMA to achieve the fundamental limits of broadcast, interference and relay channels in the presence of imperfect CSIT and disparity of channel strengths, optimization (robust design, sum-rate maximization, max-min fairness, QoS constraints) of RSMA, performance analysis of RSMA, RSMA design for multi-user/massive/millimeterwave/multi-cell/network MIMO, modulation and coding for RSMA, RSMA with multi-carrier transmissions, RSMA with linear versus nonlinear precoding, resource allocation and cross-layer design of RSMA, security provisioning in RSMA, RSMA design for cellular and satellite communication networks, prototyping and experimentation of RSMA, and standardization issues (link/systemlevel evaluations, receiver implementation, transmission schemes/modes, CSI feedback mechanisms, and downlink and uplink signaling) of RSMA. 1 In the sequel, power-domain NOMA will be referred simply by NOMA. 2 Recall that SU-MIMO in LTE Rel. 8 was designed with minimum mean square error-SIC (MMSE-SIC) in mind [45]. 3 The DoF characterizes the number of interference-free streams that can be transmitted or equivalently the prelog factor of the rate at high SNR. 4 This can be easily seen since, for the receiver forced to decode all streams, the model reduces to a multiple access channel (MAC) with a single-antenna receiver, which has a sum-DoF of 1. This was discussed in length in [34]. 5 Recall that this spatial multiplexing gain is the main driver for using multiple antennas in a multi-user setup and the introduction of MU-MIMO in 4G [18]. 6 "Common" is sometimes referred to as "public. " 7 This also contrasts with NOMA, for which the usefulness of SC-SIC in a BC is known for several decades [7,8]. 8 Note that in the specific case where we have finite precision CSIT, the sum DoF collapses to 1 [26], and RS, SC-SIC,and TDMA all achieve the same optimal DoF. 9 It is worth noting that Rate-Splitting Multiple Access (RSMA) also exists in the uplink for the SISO Multiple Access Channel [46]. Though they share the same name and the splitting of the messages, they have different motivations and structures. 10 As already explained in [12], RS can also be seen as a form of non-orthogonal multi-user transmission. Indeed, in its simplest form, the common message in RS can be seen as a non-orthogonal layer added onto the private layers. 11 This benefit of RS was briefly pointed out in [39]. 12 Note that OMA (single-user beamforming) is a subset of MU-LP and is obtained by allocating power exclusively to s 1 or s 2 . 13 Note that for a given θ, the users' direction of arrival (DoA) are the same for N t = 2 and N t = 4 scenarios while the channel angle is more orthogonal when N t = 4 comparing with that when N t = 2. 14 The readers are referred to [28] for a rigorous discussion about the notion of average rate.

Underloaded three-user deployment with imperfect CSIT
We consider the imperfect CSIT scenarios. The channel model in the two-user deployment with imperfect CSIT is extended here. The estimated channel of user-1, user-2, and user-3 are initialized using Eq. (38). For the given channel estimate at the BS, the channel realization is h k = h k + h k , ∀k ∈ {1, 2, 3}, where h k is the estimated error of user-k. h k has i.i.d. complex Gaussian entries drawn from CN 0, σ 2 e,k . The error covariance of user-1, user-2, and user-3 are σ 2 e,1 = P −0.6 t , σ 2 e,2 = γ 1 P −0.6 t , and σ 2 e,3 = γ 2 P −0.6 t , respectively. The precoders are initialized and designed using the estimated channels h 1 , h 2 , and h 3 and the same methods as stated in perfect CSIT scenarios. One thousand different channel error samples are generated for each user. Each point in the rate region is the average rate over the generated 1000 channels.

Fig. 60
Weighted sum rate versus SNR comparison of different strategies for overloaded ten-user deployment with perfect CSIT, σ 2 1 = σ 2 2 = . . . = σ 2 10 = 1, N t = 2, SNR = 30 dB, r th = [0, 0.001, 0.004, 0.01, 0.03, 0.06, 0.1] bit/s/Hz achieves equal or better WSR. One-layer RS per group is more general than SC-SIC per group. It enables the capability of partially decoding interference and partially treating interference as noise in each user group. When there is a sufficient channel gain difference between users within each group and a sufficient inter-group angle, the WSR of SC-SIC per group becomes closer to the WSR of RS comparing Figs. 59 and 19.

Appendix 8
Overloaded ten-user deployment with perfect CSIT Figure 60 shows the simulation results when σ 2 1 = σ 2 2 = . . . = σ 2 10 = 1, r th = [0, 0.001, 0.004, 0.01, 0.03, 0.06, 0.1] bit/s/Hz. Comparing with Fig. 21, the rate threshold of each SNR is reduced in Fig. 60. The WSR achieved by MU-LP is approaching RS when SNR is 0 or 5 dB in Fig. 60. This is because the rate threshold is set to 0 when SNR is 0 dB or 5 dB. When the rate threshold is 0, MU-LP could deliver two interference free streams since there are two transmit antennas. It achieves a DoF of 2 while SC-SIC is always limited by a DoF of 1. Figure 61 shows the simulation results when σ 2 1 = 1, σ 2 2 = 0.9, . . . σ 2 10 = 0.1. The rate threshold is the same as in Fig. 60. In the extremely overloaded scenario, the WSR gap between RS and SC-SIC is still large despite the diversity in channel strengths. Here again, SC-SIC makes an inefficient use of the transmit antennas and achieves a DoF of 1. In contrast, 1-layer RS, with a low scheduler and receiver complexity, achieves a good performance in all network loads.