This article studies the problem of limited feedback design for heterogeneous multiuser (MU) transmissions over time- and frequency-selective (doubly selective) multiple-input multiple-output downlink channels. Under a doubly selective propagation condition, a basis expansion model (BEM) is deployed as a fitting parametric model for capturing the time-variation of the MU downlink channels and for reducing the number of the channel parameters. The resulting dimension reduction in the time-variant channel representation, in turn, translates into a reduced feedback load of channel state information (CSI) to the base station (BS). To produce limited feedback information, vector quantization of the BEM coefficients is performed at mobile terminals under the assumption that perfect BEM coefficient estimation has been established by existing algorithms. Then, the output indices of the quantized BEM coefficient vectors are sent to the BS via error-free, zero-latency feedback links. To assess the feasibility of using the BEM-based limited feedback design in a MU network with an arbitrary number of active users, the resultant sum-rate performance of the network is provided by employing the block-diagonalization precoding and greedy scheduling techniques at the BS. The relevant numerical results show that the BEM-based limited feedback scheme is able to significantly alleviate the detrimental effect of outdated CSI feedback which likely occurs as using the conventional block-fading assumption in MU transmissions over (fast) time-varying channels.
Keywords:limited feedback; greedy scheduling; block-diagonalization precoding; vector quantization; basis expansion model (BEM); time- and frequency-selective channels; heterogeneous multiuser network
Besides the well-known time, frequency and code divisions in wireless communications, spatial separation has been recently recognized as a new signal dimension for further system performance enhancement, especially in multiuser (MU) transmissions. In the so-called spatial division multiple access (SDMA), the use of multi-antenna arrays allows the base station (BS) to simultaneously transmit multiple data streams to multiple users by exploiting the new signal dimension [1,2]. Among several MU transmission techniques, it is well known that dirty paper coding (DPC)  is an optimal MU encoding strategy, whose performance achieves the capacity limit of MU broadcast channels. However, the optimal performance of DPC comes with the cost of impractically high complexity (in a large user pool). As an alternative low-complexity linear technique, block-diagonalization (BD) precoding [4,5] is a suboptimal MU encoding scheme with a realizable implementation.
In the literature, most of existing precoding techniques [2,4-6] assume MU downlink channels to be homogeneous and time-invariant within a transmission block/burst (i.e., the block-fading assumption). However, in a MU network with rapidly moving nodes (e.g., users in cars/trains in long-term evolution (LTE) systems), the resultant time-selectivity of the channel impulse response (CIR) introduces a large number of channel parameters. This induces a very high channel state information CSI feedback load for precoding and scheduling processes with consideration of time-varying channels at the BS. In addition, the presence of time-selective channels would give rise to the problem of outdated CSI feedback  that could severely degrade the system performance. To deal with the channels, [8,9] has proposed a minimum mean squared error-based beamforming algorithm for homogeneous MU transmissions over multiple-input single-output, spatially correlated, frequency-flat, time-selective channels. Specifically, the existing technique uses full feedback of channel distribution information and an iterative beamforming process to provide stable MU transmissions over the channels.
Unlike [8,9], this paper is concerned with limited CSI feedback design for BD precoding and greedy scheduling over spatially uncorrelated, doubly selective, multiple-input multiple-output downlink channels with heterogeneous users (i.e., mobile terminals with different numbers of receive antennas and different receiver noise powers). Over the doubly selective channels, a basis expansion model (BEM) [10,11] is used as a fitting parametric model for capturing the time-variation of the channels and for reducing the number of the channel parameters. Specifically, to generate limited feedback information, vector quantization (VQ) of the BEM coefficients is performed at mobile terminals under the assumption that perfect BEM coefficient estimation has been established by existing algorithms. Then, the output indices of the quantized BEM coefficient vectors are sent to the BS via feedback links. To investigate the performance of the limited feedback scheme in a MU network with an arbitrary number of mobile terminals, BD precoding and greedy scheduling are deployed accordingly in the MU network.
The rest of the paper is organized as follows. Section 2 delineates the system and channel models. The suggested BEM-based limited feedback for BD-based heterogeneous MU transmissions is presented in Section 3. Simulation results and relevant discussions are located in Section 4. Finally, Section 5 provides some concluding remarks.
Notations: (X)T and (X)H denote the transpose and conjugate transpose (Hermitian operator) of the matrix X, respectively. stands for expectation operator. tr(X), |X|, and ||X|| denote the trace, determinant and Frobenius norm of the matrix X, respectively.
2. System Formulation
A. Transmitted Signal Model
Consider a heterogeneous MU MIMO LTE downlink channel, where the BS is equipped with Nt transmit antennas, and different terminals have different numbers of receive antennas and different signal-to-noise ratios (SNRs). Orthogonal frequency division multiplexing (OFDM) modulation with N-point fast Fourier transform (FFT) is employed for the downlink multi-carrier transmission. After inverse FFT (IFFT) and cyclic prefix (CP) insertion, the transmitted baseband signal of the mth OFDM symbol at the pth transmit antenna can be written as
B. Doubly selective channel model
In this article, for each pair of the pth transmit antenna (at BS) and the ru th receive antenna of the uth user (having Ru Rx-antennas), the lth (time-variant) channel tap gain that includes the effect of transmit-receive filters and doubly selective propagation is denoted by , where n and m denote the time and OFDM symbol indices, respectively. In the considered downlink channels, a BEM  is employed to capture the time-variation of the channels. With the aid of the BEM, the lth time-variant channel tap gain between the pth transmit antenna and the ruth receive antenna of the uth user at the nth time instance in the mth OFDM symbol can be represented as 
where Ns = N + Ng denotes the OFDM symbol length after CP insertion and n = 0,..., Ns - 1. The mobile users' speeds are assumed to be unchanged within M OFDM symbols (in a duration of a number of LTE frames). L denotes the channel length. stand for the qth basis function values of the used BEM. are the BEM coefficients of the channel fitting. Q is the number of basis functions used in the basis expansion modeling.
In the simulation section of this article, the time-variant multipath channels are first generated by the modified Jakes model , and then fitted (approximated) by the DPS-BEM , i.e., using a linear combination of Q basis functions as shown in (2).
C. Received signal model
where is the additive white Gaussian noise with variance at the uth user. It is assumed that different terminals may experience different receiver noise powers in the considered heterogeneous MU system.
After performing FFT at users, the kth subcarrier in the mth OFDM symbol at the ruth receive antenna of the uth user can be determined by
In the considered MU network with an arbitrary number of users, precoding and scheduling are performed for each subcarrier in each OFDM symbol in a LTE frame. For the sake of notational simplicity, unless otherwise indicated, the indices of OFDM symbol m and subcarrier k can be omitted in the subsequent formulations. As a result, the kth received subcarriers at the uth user can be represented as
In (5), the ICI can be negligible since its power is much smaller than that of the subcarrier of interest under the considered LTE system settings  with a normalized Doppler frequency below 0.1. In particular, as shown in , the ICI power PICI is upper bounded by
Where is the Doppler frequency of the channel, v is the mobile speed of terminals, c = 3×108 m/s is the speed of light, fc denotes the carrier frequency, Ts stands for the OFDM symbol duration and . For instance, under the LTE system settings  with N = 128, fc = 2 GHz, fs = N/Ts = 1.92 MHz and the mobile terminal speed of v = 400 km/h, the resulting ICI power is upper bounded by . Simulation results of the ICI-to-signal power ratio over the range of normalized Doppler frequencies smaller than 0.1 in Figure 1 are in a good agreement with the upper bound of (6).
Figure 1. ICI-to-signal ratio versus normalized Doppler frequency.
3. Multiuser MIMO Transmission with BEM-Based Limited Feedback
In this section, a limited feedback design over time- and frequency-selective (doubly selective) channels is suggested to reduce the CSI feedback load and to alleviate the detrimental effect of outdated CSI feedback (that likely occurs as using the block-fading assumption in MU transmissions). More specifically, a BEM [10,11] is used as a fitting parametric model of the doubly selective channels. The use of BEM helps to considerably reduce the number of time-variant channel representation parameters.
Unlike [5,6] using BD precoding in MU transmission for a fixed number of homogeneous users, this section adopts the BD precoding and greedy scheduling to a MU network with an arbitrary number of heterogeneous users (supporting various types of terminals with different numbers of receive antennas and different SNRs).
A. BEM-based limited feedback
In the limited feedback design, it is assumed that BEM coefficient estimation has been established at users (using existing algorithms [10,11,15] ), then VQ of the available BEM coefficient estimates is performed using a predetermined Linde-Buzo-Gray (LBG) codebook . Owing to possibly large numbers of BEM coefficients Q (e.g., in the presence of high mobile user speeds), a partition of the BEM coefficient vector helps to reduce the codebook pre-generation complexity and the codebook's cardinality under a required VQ distortion level. In particular, the partition can be expressed by
In the LBG codebook generation, for each resolvable path l, the LBG algorithm  (using 105 training BEM coefficient vectors) is employed to pre-generate the following codebook:
As illustrated in Figure 2, it is numerically shown that the distributions of a BEM coefficient (e.g., ) in (2) under different mobile speeds are different. As a result, to attain low distortion in the VQ of the BEM coefficients, a LBG codebook
Figure 2. Histogram of amplitudes of a DPS-BEM coefficient under different mobile speeds and Q = 18.
should be pre-generated for each possible target mobile speed using the LBG algorithm  with training vectors of BEM coefficients corresponding to that speed. Then, for each mobile terminal with a known speed, the LBG codebook G with target speed closest to the known speed should be deployed accordingly for the VQ of BEM coefficients.
In practice, it is very difficult to estimate exactly the actual speeds of mobile terminals. In addition, the memory capacity in the receiver of each mobile terminal also limits the number of pre-generated LBG codebooks G corresponding to different target speeds that can be pre-stored in the mobile terminal. Therefore, the speed mismatch between the actual mobile speed and the target speed of the used LBG codebook always exits in the VQ of BEM coefficient at mobile terminals. In particular, the effect of the speed mismatch problem on the performance of the considered MU network will be numerically investigated in Section IV.
Then, the indices of the quantized subvectors of the BEM coefficients are sent to the BS via error-free feedback links. Based on the knowledge of the feedback indices, the BS can determine the quantized versions of the BEM coefficients as follows:
where x = 1,..., Q/V and l = 0,..., L - 1.
As shown in (15), the channel response at each subcarrier in each OFDM symbol in the current LTE time slots can be determined using the quantized versions of the BEM coefficients . As aforementioned, these BEM coefficients are assumed to be perfectly estimated by existing BEM-based channel estimation algorithms [10,11] using pilot signals from the previous LTE time slots.
After having the BEM-based limited feedback information at the BS, the quantized versions of the user channel responses are naively treated as perfect CSI in the BD precoding and greedy scheduling processes as presented in the next subsections.
The use of a BEM significantly reduces the complexity of the quantization process of time-variant CSI. In particular, the number of doubly selective CIR parameters at each mobile terminal is RuNtLMN corresponding to the duration of M OFDM symbols where Ru, Nt, L, and N denote the number of receive/transmit antennas, channel length and the used FFT size, respectively. As shown in Table 1 BEM helps to reduce this large number RuNtLMN to RuNtLQ where Q is the number of basis functions (used in the BEM) which is much smaller than MN. As a result, using BEM helps to reduce significantly CSI feedback load from each mobile terminal to the BS. The downside on both the BS and the terminals is the extra memory required to store the related basis function values. In this work, it is assumed that perfect estimation of BEM coefficients has been established. Particularly, the estimation process of BEM coefficients for current OFDM symbols can be performed by existing techniques [10,11,15] within the duration of previous OFDM symbols.
Table 1. Implementation steps of the BEM-based limited feedback
B. BD precoding
With the quantized versions of CSI , this subsection adopts the BD precoding process to the considered heterogeneous MU system (supporting various types of users with different numbers of receive antennas and different SNRs). In BD precoding , the ICI in (5) can be eliminated by pre-multiplying Ru data subcarrier streams of the uth user with precoding matrices. Specifically, let and be the transmitted symbol vector and precoding matrix corresponding to subcarrier k in the mth OFDM symbol of the uth user, respectively. For a set of U (selected/scheduled) users, the transmitted subcarriers at the BS are
Using (5) and (16), the received subcarriers of the uth selected/scheduled user can be represented in a vector form as
To obtain (18), one can form the following matrix:
where u = 1,..., U.
To attain the zero-forcing constraint in (18), Wk,m,u must lie in a null space of under the following dimension condition . To obtain a basis set in the null space, the singular value decomposition of is determined as follows:
As shown in Appendix A, one can use (20) to deduce
With limited CSI feedback, the BS naively treats as perfect CSI and the condition (18) becomes for all u ≠ u' and 1 ≤ u, u' = U. As a result, the BD precoding matrix Wk,m,u of the uth user can be chosen as follows:
In the absence of inter-user interference (after BD precoding), the resultant received signal at the uth user is
Let us define which are positive semidefinite matrices. With the quantized versions of CSI available at the BS, the achievable rate of the uth user can be determined in the BS's precoding/scheduling computation as follows:
The matrix Ck,m,u can be determined to maximize the system sum-rate or equivalently minimize -rk,m,Σ under the sum-power constraint. More specifically, the problem of finding Ck,m,u can be formulated as follows:
With the affine constraints and convex objective function, the above problem can be efficiently solved by using the Karush-Kuhn-Tucker (KKT) optimality conditions . For the sake of simplified computations, let be the effective channel for the uth user after precoding. The solution of Ck,m,u under the KKT conditions can be obtained using the eigen-decomposition of the related matrices. In particular, one can perform the following eigen-decomposition where Eu is a diagonal matrix containing the eigenvalues and . Then, it is straightforward to deduce that . As shown in Appendix B, the solution of Ck,m,u can be determined as follows:
where (x)+ = max(x, 0).
Applying the sum-power constraint and the trace property tr(ABC) = tr(BCA) to (27), the water-level γ can be determined by
Given a set of selected users, the above BD precoding process attempts to eliminate the inter-user interference and maximize the system sum-rate. As aforementioned, the feasibility of the suggested BEM-based limited feedback scheme will be investigated in a MU network with an arbitrary number of active users. In particular, the limited feedback links provide CSI to not only precoding but also scheduling at the BS. Under the use of sum-rate performance metric, scheduling is to perform user selection with a reasonable complexity for maximizing the system sum-rate. The considered scheduling technique will be addressed in detail in the next subsection.
C. Greedy Scheduling
Given a precoding technique, the purpose of scheduling (user selection) is to find a set of users among all active users to maximize the system sum-rate. Obviously, a simple optimal method for user selection is exhaustive search but it lends itself to impractically high complexity as the number of users is large. To avoid the impractical implementation of exhaustive search, greedy scheduling  is considered herein. After performing the aforementioned BD precoding technique on a given user set (i.e., a set of users' indices), the resulting sum-rate can be determined at the BS (for scheduling/user selection) as follows:
where Pk,m,u and Ck,m,u are determined by (23) and (27), respectively.
Then, the detailed implementation of the greedy scheduling for Ua active users can be described in the following steps:
1) Initialization: is the set of all active users' indices. is the set of selected users, initially assigned to a null set. v = 0 stands for the number of selected users, initially set to zero. R0 = 0 is the system sum-rate of selected users, initially set to zero.
• Let u* be the index of a selected user in the current iteration. Specifically, the index u* can be determined as follows:
• v = v + 1
• If ξmax < Rv-1 go to Step 3 otherwise do:
- Rv = ξmax
• Go to Step 2 Repetition.
3) Stop the user selection process.
4. Simulation Results and Discussions
Following the 3GPP-LTE system settings , the BD-based heterogeneous MU transmission using the suggested BEM-based limited feedback scheme over doubly selective MIMO-OFDM downlink channels is simulated as follows. With the number of channel tap gains L = 5 and the exponentially decaying power-delay profile , the time-variant multipath channels are first generated by the modified Jakes' model , and then fitted (approximated) by the DPS-BEM  using Q basis functions. More specifically the realization of doubly selective channels is generated by using the exponentially decaying power-delay profile of . The discrete time indices n and l denote sampling at rate fs = 1.92 MHZ. The root mean square delay spread TD of the power delay profile can be determined by TD = 4/fs ≈ 2.1 μs . The autocorrelation function for every channel tap is given by which results in the classical Jake's spectrum where J0 (·)is the zeroth-order Bessel function of the first kind, v and c denote the speeds of terminals and light, respectively. As a result, coherent bandwidth Bc can be approximated to Bc ≈1/TD = 0.48 MHz [, Sec. 3.3.2].
Unless otherwise stated, the considered heterogeneous MU network has Ua = 4 active users with mobile speeds of 200 km/h and Q = 18, where Ua/2 users are equipped with a single receive antenna (Ru = 1 for u = 1,..., Ua/2) and the remaining users have two receive antennas (Ru = 2 for u = Ua/2 + 1,..., Ua). The BS is equipped with four transmit antennas (Nt = 4). As a frame format in the 3GPP-LTE system settings , one LTE frame consists of 20 time slots and each of these contains seven OFDM symbols (i.e., 140 OFDM symbols in one LTE frame) in the simulated LTE transmission. In addition, 128-point FFT and carrier frequency fc = 2 GHz is used for the simulated multicarrier transmissions. The CP length of each OFDM symbol is set to 10 samples . Unless otherwise indicated, the average transmit power constraint is PΣ = 10 and receiver noise variance . In the figures illustrating the simulation results, each plotted point of the sum-rate performance is obtained by averaging over 500 independent channel realizations.
Figure 3 shows the sum-rate performance of the BD-based MU transmission with the BEM-based limited feedback versus number of active users. For comparison, the sum-rate performance of DPC (curve a) is also provided by using an iterative algorithm in . As observed in Figure 3, the BEM-based limited feedback scheme (curve c) offers a significant sum-rate gain relative to the case of using full feedback of CSI but assuming the channels to be time-invariant (curve d) within one LTE frame (i.e., the block-fading assumption). Furthermore, the sum-rate performance of the BEM-based limited feedback scheme with B = 10 bits is slightly smaller than that of the ideal case where the BS uses full feedback of perfect time-variant CSI (curve b). As can be seen from curve d, over time-varying channels, i.e., in LTE systems with mobile users, the detrimental effect of outdated CSI feedback incurs a considerable sum-rate loss as using the block-fading assumption in precoding and scheduling.
Figure 3. Sum-rate performance of DPC and BD precoding with different CSI feedback schemes versus number of active users.
Figure 4 presents the sum-rate performance of the BD-based heterogeneous MU transmission versus total number of feedback bits BΣ for each receive antenna. Under the use of Q = 18 DPS-BEM coefficients and V = 2 for mobile user speed of 200 km/h, the resulting total number of feedback bits for one receive antenna is BΣ = NtL(Q/V)B = 180B. As can be seen from curve b, the BEM-based limited feedback scheme using BΣ = 180 bit can provide a better sum-rate performance than the case of using the assumption of block-fading in the BD-based MU transmission (curve c).
Figure 4. Sum-rate performance of BD precoding and greedy scheduling with different CSI feedback schemes versus total number of feedback bits.
In Figure 5, the sum-rate results of the MU transmission under different CSI feedback schemes versus normalized Doppler frequency fDTs are plotted. As can be seen, the BEM-based limited feedback (curve b) provides a stable sum-rate performance with robustness against a wide range of user speeds. As observed, the use of the block-fading assumption (curve c) incurs a significant sum-rate loss (due to the detrimental effect of outdated CSI feedback) when fDTs > 0.0012 (i.e., the mobile user speeds are higher than 10 km/h).
Figure 5. Sum-rate performance of BD-based heterogeneous MU transmissions versus mobile speed.
Figure 6 shows the sum-rate performance of the BD-based MU transmission versus number of DPS basis functions Q (used for fitting the considered time-variant channels). As can be observed, using Q = 18 DPS basis functions for each time-variant channel tap gain allows the BEM-based limited feedback scheme (curve b) to provide a sum-rate performance comparable to that of the ideal case where the BS uses full feedback of perfect time-variant CSI (curve a). With the use of B = 10 bits and Q = 4 DPS basis functions, the BEM-based limited feedback scheme (curve b) outperforms the case of using the block-fading assumption (curve c).
Figure 6. Sum-rate performance of BD-based heterogeneous MU transmissions versus number of the used basis functions.
Figure 7 shows the sum-rate performance of homogeneous (by using zero-forcing precoding ) and heterogeneous (by using BD precoding) MU transmissions versus number of transmit antennas. In this figure, the considered homogeneous MU system has four (active) single-antenna users (Ru = 1, u = 1,..., 4), and the considered heterogeneous one consists of four active users where Ru = 1, u = 1,..., 4. Under these considered system settings, at the cost of higher complexity, the BD-based heterogeneous MU transmission provides higher system sum-rate than the ZF-based homogeneous MU transmission.
Figure 7. Sum-rate performance of homogeneous and heterogeneous MU transmissions versus number of transmit antennas.
To pre-generate a LBG codebook G to be used at a given mobile speed, a set of 105 training BEM coefficient vectors corresponding to the target speed is employed by the LBG algorithm . Under an ideal scenario, each mobile terminal is assumed to know exactly the actual value of its mobile speed and uses the corresponding LBG codebook for the VQ process of BEM coefficient vectors. However, in practice, each mobile terminal may have only an estimated value of its mobile speed and chooses a LBG codebook with the target speed closest to the estimated speed value.
To investigate the robustness of the LBG-based CSI quantization against the aforementioned scenario of user speed mismatch, Figure 8 shows the sum-rate performance of the BD-based MU transmissions as the actual user speed values are uniformly distributed in the range [v - δ, v + δ] where v is the target speed of the used LBG codebook and δ refers to the speed mismatch level. In this figure, there are four heterogeneous users with different receiver noise powers . As can be seen, given a LBG codebook dedicated to a target mobile speed (i.e., v = 100 km/h), using the pre-generated LBG codebook for mobile terminals with actual speed values uniformly distributed around the target speed value v only incurs a slight sum-rate loss (the values of curve b slightly decrease as δ increases from 0 to 40 km/h).
Figure 8. Sum-rate performance as the actual user speed value is uniformly distributed around the target speed value of the used LBG codebook G.
This article introduced a BEM-based limited feedback design for MU transmissions over doubly selective MIMO downlink channels. By employing a BEM to capture the channel's time-variation, the resulting feedback load of BEM coefficients is significantly smaller than that of CIR or CFR. Over time-varying channels, the BEM-based limited feedback helps to reduce the detrimental effect of outdated CSI feedback (as using the block-fading assumption in MU transmissions), and to provide stable sum-rate performance for heterogeneous users with a wide range of mobile speeds.
The authors declare that they have no competing interests.
A. Proof of (21)
From (20), the following result can be obtained:
Therefore, one can deduce the following:
Based on (34), one can obtain the following:
then one can have
B. Derivations of the water-filling solution in (27)
Based on (26), the Lagrangian function  of the convex problem can be determined by:
where positive semidefinite matrices are the slack variables  to guarantee that Ck,m,u are positive semidefinite. The real non-negative γ is a slack variable associated with the sum-power constraint.
In the convex optimality problem, the KKT conditions are necessary and sufficient for the optimal solutions of Ck,m,u . With the complementary slackness and KKT conditions , one can have the following:
From the above, one can obtain
It is straightforward to obtain
From (40) and (41), the following result can be deduced:
The solution of Ck,m,u under the KKT conditions (39) can be obtained by using the eigen-decomposition of the related matrices. In particular, one can perform the following eigen-decomposition: where Eu is a diagonal matrix containing the eigenvalues and . Then, it is straightforward to deduce that . Also, one can assume that where Fu is a diagonal matrix of eigenvalues . As a result, the solution of Ck,m,u can be determined as follows:
The positive semidefiniteness of Ck,m,u implies that all eigenvalues of Ck,m,u are non-negative and therefore
Plugging (43) into the KKT conditions (39) and using the property tr(ABC) = tr(BCA), one can have the following:
Since each of the diagonal matrices contains non-negative elements on its diagonal, one can, therefore, deduce that
where u = 1,..., U and ru = 1,... Ru.
where (x)+ = max(x,0).
The study presented in this article was partly supported by the NSERC CRD and Prompt Grants with InterDigital Canada.
MHM Costa, Writing on dirty paper. Trans Inf Theory 29(3), 439–441 (1983). Publisher Full Text
QH Spencer, A Lee Swindlehurst, M Haardt, Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans Signal Process 52(2), 461–471 (2004). Publisher Full Text
YR Zheng, C Xiao, Simulation models with correct statistical properties for Rayleigh fading channels. IEEE Trans Commun 51(6), 920–928 (2003). Publisher Full Text
Y Linde, A Buzo, RM Gray, An algorithm for vector quantizer design. IEEE Trans Commun 28(1), 84–95 (1980). Publisher Full Text
N Jindal, W Rhee, S Vishwanath, SA Jafar, Goldsmith A: Sum power iterative water-filling for multi-antenna Gaussian broadcast channels. IEEE Trans Inf Theory 51(4), 1570–1580 (2005). Publisher Full Text