Abstract
Optimal design of wireless systems in the presence of fading involves the instantaneous allocation of resources such as power and frequency with the ultimate goal of maximizing long term system properties such as ergodic capacities and average power consumptions. This yields a distinctive problem structure where long term average variables are determined by the expectation of a not necessarily concave functional of the resource allocation functions. Despite their lack of concavity it can be proven that these problems have null duality gap under mild conditions permitting their solution in the dual domain. This affords a significant reduction in complexity due to the simpler structure of the dual function. The article discusses the problem simplifications that arise by working in the dual domain and reviews algorithms that can determine optimal operating points with relatively lightweight computations. Throughout the article concepts are illustrated with the optimal design of a frequency division broadcast channel.
Introduction
Operating variables of a wireless system can be separated in two types. Resource allocation variables p(h) determine instantaneous allocation of resources like frequencies and transmitted powers as a function of the fading coefficient h. Average variables x capture system’s performance over a large period of time and are related to instantaneous resource allocations via ergodic averages. A generic representation of the relationship between instantaneous and average variables is
where f_{1}(h,p(h)) is a vector function that maps channel h and resource allocation p(h) to instantaneous performance f_{1}(h,p(h)). The system’s design goal is to select resource allocations p(h) to maximize ergodic variables x in some sense.
An example of a relationship having the form in (1) is a code division multiple access channel in which case h denotes the vector of channel coefficients, p(h) the instantaneous transmitted power, f_{1}(hp(h)) the instantaneous communication rate determined by the signal to interference plus noise ratio, and x the ergodic rates determined by the expectation of the instantaneous rates. The design goal is to allocate instantaneous power p(h) subject to a power constraint so as to maximize a utility of the ergodic rate vector x. This interplay of instantaneous actions to optimize long term performance is pervasive in wireless systems. A brief list of examples includes optimization of orthogonal frequency division multiplexing [1], beamforming [2,3], cognitive radio [4,5], random access [6,7], communication with imperfect channel state information (CSI) [8,9], and various flavors of wireless network optimization [1018].
In many cases of interest the functions f_{1}(h,p(h)) are nonconcave and as a consequence finding the resource allocation distribution
p^{∗}(h) that maximizes x requires solution of a nonconvex optimization problem. This is further complicated
by the fact that since fading channels h take on a continuum of values there is an infinite number of p^{∗}(h) variables to be determined. A simple escape to this problem is to allow for time
sharing in order to make the range of
In this article, we review a general methodology that can be used to solve optimal
resource allocation problems in wireless communications and networking without resorting
to time sharing
[19,20]. The fundamental observation is that the range of
We also discuss a stochastic optimization algorithm to determine optimal dual variables that can operate without knowledge of the channel probability distribution (Section “Dual descent algorithms”). This algorithm is known to almost surely converge to optimal operating points in an ergodic sense (Theorem 5). Throughout the article concepts are illustrated with the optimal design of a frequency division broadcast channel (Section “Frequency division broadcast channel” in “Optimal wireless system design”, Section “Frequency division broadcast channel” in “Recovery of optimal primal variables”, and Section “Frequency division broadcast channel” in “Dual descent algorithms”).
One of the best known resource allocation problems in wireless communications concerns the distribution of power on a block fading channel using capacityachieving codes. The solution to this problem is easy to derive and is well known to reduce to waterfilling across the fading gain, e.g., [21, p. 245]. Since this article can be considered as an attempt to generalize this solution methodology to general wireless communication and networking problems it is instructive to close this introduction by reviewing the derivation of the waterfilling solution. This is pursued in the following section.
Power allocation in a pointtopoint channel
Consider a transmitter having access to perfect CSI h that it uses to select a transmitted power p(h) to convey information to a receiver. Using a capacity achieving code the instantaneous
channel rate for fading realization h is
In most cases the fading channel h takes on a continuum of values. Therefore, solving (2) requires the determination
of a power allocation function
To solve (2) we work in the dual domain. To work in the dual domain we need to introduce
the Lagrangian, the dual function, and the dual problem. Introduce then the nonnegative
dual variable
The dual function is defined as the maximum value of the Lagrangian across all functions
The dual problem corresponds to the minimization of the dual function with respect to all nonnegative multipliers λ,
Since the objective in (2) is concave with respect to variables p(h) and the constraint is linear in p(h) the optimization problem in (2) is convex and as such it has null duality gap in the sense that P=D.
An entity that is important for the upcoming discussion is the primal Lagrangian maximizer
function
Using the definition of the Lagrangian maximizer function we can write the dual function
as
Computing values p(h,λ) of the Lagrangian maximizer function p(λ) is easy. To see that this is true rewrite the Lagrangian in (3) so that there is only one expectation
With the Lagrangian written in this form we can see that the maximization of
That the equality in (8) is true is a consequence of the fact that the expectation operator is linear and that there are no constraints coupling the selection of values p(h_{1}) and p(h_{2}) for different channel realizations h_{1}≠h_{2}. Functional values p(h) in both sides of (8) are required to be nonnegative but other than that we can select p(h_{1}) and p(h_{2}) independently of each other as indicated in the right hand side of (8).
Since the right hand side of (8) states that to maximize the Lagrangian we can select functional values p(h) independently of each other, values p(h,λ) of the Lagrangian maximizer function p(λ) defined in (6) are given by
The similarity between (6) and (9) is deceiving as the latter is a much easier problem to solve that involves a single variable. To find the Lagrangian maximizer value p(h,λ) operating from (9) it suffices to solve for the null of the derivative with respect to p(h). Doing this yields the Lagrangian maximizer
where the operator [x]^{ + }:=max(x,0) denotes projection onto nonnegative reals, which is needed because of the constraint p(h)≥0.
Of particular interest is the Lagrangian maximizer function p(λ^{∗}) corresponding to the optimal Lagrange multiplier
Indeed, since D is given by a Lagrangian maximization it must equal or exceed the values of the Lagrangian
for any function p, and for the optimal function p^{∗}in particular. Considering the explicit Lagrangian expression in (3) we can write
Observe that since p^{∗}is the optimal power allocation function it must satisfy the power constraint implying
that it must be
where the equality is true because P and p^{∗}are the otpimal value and arguments of the primal optimization problem (2). Combining the bounds in (11) and (13) yields
Using the equivalence P=Dof primal and dual optimum values it follows that the inequalities in (13) must hold
as equalities. The equality
The important consequence of (15) follows from the fact that the Lagrangian maximizer function p(λ^{∗}) is easy to compute using the decomposition in (9). By extension, if the optimal multiplier λ^{∗} is available, computation of the optimal power allocation function p^{∗}also becomes easy. Indeed, making λ=λ^{∗}in (9) we can determine values p^{∗}(h) of p^{∗} as
which are explicitly given by (10) with λ=λ^{∗}.
To complete the problem solution we still need to determine the optimal multiplier λ^{∗}. We show a method for doing so in Section “Dual descent algorithms”, but it is important to recognize that this cannot be a difficult problem because the dual function is singledimensional and convex—dual functions of maximization problems are always convex. This has to be contrasted with the infinite dimensionality of the primal problem. By working in the dual domain we reduce the problem of determining the infinite dimensional optimal power allocation function to the determination of the one dimensional optimal Lagrange multiplier λ^{∗}.
Recapping the derivation of the optimal power allocation p^{∗}, we see that there are three conditions that make (18) simple:
(C1) Since the optimization problem is convex it is equivalent to its Lagrangian dual implying that optimal primal variables can be determined as Lagrangian maximizers associated with the optimal multiplier λ^{∗}[cf. (11)–(15)].
(C2) Due to the separable structure of the Lagrangian, determination of the optimal power allocation function is carried out through the solution of perfading state subproblems [cf. (6)–(9)].
(C3) Because there is a finite number of constraints the dual function is finite dimensional even though there is an infinite number of primal variables.
Most optimization problems in wireless systems are separable in the sense of (C2) and have a finite number of constraints as [cf. (C3)] but are typically not convex [cf. (C1)].
To illustrate this latter point consider a simple variation of (2) where instead of
using capacity achieving codes we use adaptive modulation and coding (AMC) relying
on a set of L communication modes. The lth mode supports a rate α_{l}and is used when the signal to noise ratio (SNR) at the receiver end is between β_{l−1} and β_{l}. Letting γ be the received SNR and
The corresponding optimal power allocation problem subject to average power constraint q_{0} can now be formulated as
Similar to (2) the dual function of the optimization problem in (18) is one dimensional and its Lagrangian is separable in the sense that we can find Lagrangian maximizers by performing perfading state maximizations. Alas, the problem is not convex because the AMC rate function C(γ) is not concave, in fact, it is not even continuous.
Since it does not satisfy (C1), solving (18) in the dual domain is not possible in principle. Nevertheless the condition that allows determination of p^{∗}(h) as the Lagrangian maximizer p(h,λ^{∗}) [cf. (16)] is not the convexity of the primal problem but the lack of duality gap. We’ll see in Section “Duality in wireless systems optimization” that this problem does have null duality gap as long as the probability distribution of the channel h contains no points of strictly positive probability. Thus, the solution methodology in (3)(16) can be applied to solve (18) despite the discontinuous AMC rate function C(γ). This is actually a quite generic property that holds true for optimization problems where nonconcave functions appear inside expectations. We introduce this generic problem formulation in the next section.
Optimal wireless system design
Let us return to the relationship in (1) where h denotes the random fading state that ranges on a continuous space
It is convenient to think of f_{1}(h,p(h)) as a family of functions indexed by the parameter h with p(h) denoting the corresponding variables. Notice that there is one vector p(h) per fading state h, which translates into an infinite number of resource allocation variables if h takes on an infinite number of values. Consequently, it is adequate to refer to the
set
Instantaneous resource allocations p(h) are further constrained to a given bounded set
Variables p(h) determine system performance over short periods of times. As such, they are of interest to the system designer but transparent to the end user except to the extent that they determine the value of ergodic variables x. Therefore, we adopt as our design objective the maximization of a concave utility function f_{0}(x) of the ergodic average x. Putting these preliminaries together we write the following program as an abstract formulation of optimal resource allocation problems in wireless systems
where we added further constraints on the set of ergodic averages x. These constraints are in the form of a bounded convex set inclusion
For future reference define x^{∗}and p^{∗} as the arguments that solve (20)
The configuration pair (x^{∗},p^{∗}) attains the optimum value P=f_{0}(x^{∗}) and satisfies the constraints
To write the Lagrangian we introduce a nonnegative Lagrange multiplier
The corresponding dual function is the maximum of the Lagrangian with respect to primal
variables
The dual problem is defined as the minimum value of the dual function over all nonnegative dual variables
and the optimal dual variables are defined as the arguments that achieve the minimum in (24)
Notice that the optimal dual argument Λ^{∗}is a set as in the case of the primal optimal arguments because there may be more than one vector that achieves the minimum in (24). As we do with the optimal primal variables (x^{∗},p^{∗}), we use Λ^{∗} to denote the set of optimal dual variables and an arbitrary element of this set. A particular example of this generic problem formulation is presented next.
Frequency division broadcast channel
A common access point (AP) administers a power budget q_{0}to communicate with a group of J nodes. The physical layer uses frequency division so that at most one terminal can be active at any given point in time. The goal is to design an algorithm that allocates power and frequency to maximize a given ergodic rate utility metric while ensuring that rates are at least r_{min} and not more than r_{max}.
Denote as h_{i} the channel to terminal i and define the vector h:=[h_{1}…,h_{J}]^{T}grouping all channel realizations. In any time slot the AP observes the realization
h of the fading channel and decides on suitable power allocations p_{i}(h) and frequency assignments α_{i}(h). Frequency assignments α_{i}(h) are indicator variables α_{i}(h)∈{0,1} that take value α_{i}(h)=1 when information is transmitted to node i and α_{i}(h)=0 otherwise. If α_{i}(h)=1 communication towards i ensues at power p_{i}(h) resulting in a communication rate C(h_{i}p_{i}(h)/N_{0}) determined by the SNR h_{i}p_{i}(h)/N_{0}. The specific form of the function C(h_{i}p_{i}(h)/N_{0}) mapping channels and powers to transmission rates depends on the type of modulation
and codes used. One possibility is to select capacity achieving codes leading to
Regardless of the specific form of C(h_{i}p_{i}(h)/N_{0}) we can write the ergodic rate of terminal i as
The factor C(h_{i}p_{i}(h)/N_{0}) is the instantaneous rate achieved if information is conveyed. The factor α_{i}(h) indicates wether this information is indeed conveyed or not. The expectation weights the instantaneous capacity across fading states and is equivalent to the consideration of an infinite horizon time average.
Similarly, p_{i}(h) denotes the power allocated for communication with node i, but this communication is attempted only if α_{i}(h)=1. Thus, the instantaneous power utilized to communicate with i for channel realization h is α_{i}(h)p_{i}(h). The total instantaneous power is the sum of these products for all i and the long term average power consumption can be approximated as the expectation
that according to the problem statement cannot exceed the budget q_{0}.
To avoid collisions between communication attempts the indicator variables α_{i}(h) are restricted so that at most one of them is 1. Define the vector α(h):=[α_{1}(h),…,α_{J}(h)]^{T} corresponding to values of the function
We can now express the frequency exclusion constraints as
We still need to model the restriction that the achieved capacity r_{i} needs to be between r_{min} and r_{max} but this is easily modeled as the constraint r_{min}≤r_{i}≤r_{max}. Defining the vector r=[r_{1},…,r_{J}]^{T} this constraint can be written as
We finally introduce a monotonic nondecreasing utility function U(r_{i}) to measure the value of rate r_{i}and formally state our design goal as the maximization of the aggregate utility
where we relaxed the rate expression in (26) to an inequality constraint, which we can do without loss of optimality because the utility U(r_{i}) is monotonic nondecreasing.
The problem formulation in (30) is of the form in (20). The ergodic rates r in (30) are represented by the ergodic variables x in (20) whereas the power and frequency allocation functions p and α of (30) correspond to the resource allocation function pof (20). The set
To write the Lagrangian corresponding to this optimization problem introduce multipliers λ:=[λ_{1},…,λ_{J}]^{T} associated with the capacity constraints and μassociated with the power constraint. The Lagrangian is then given by
The dual function, dual problem, and optimal dual arguments are defined as in (22)–(25)
by replacing
Duality in wireless systems optimization
For any optimization problem the dual minimum D provides an upper bound for the primal optimum value P. This is easy to see by combining the definitions of the dual function in (23) and the Lagrangian in (22) to write
Because the dual function value g(Λ) is obtained by maximizing the right hand side of (32), evaluating this expression for arbitrary primal variables yields an upper bound on g(Λ). Using a pair (x^{∗},p^{∗}) of optimal primal arguments as this arbitrary selection yields the inequality
Since the pair x^{∗},p^{∗} is optimal, it is feasible, which means that we must have
The inequality in (34) is true for any Λand therefore true for Λ=Λ^{∗}in particular. It then follows that the dual optimum D upper bounds the primal optimum P,
as we had claimed. The difference DP is called the duality gap and provides an indication of the loss of optimality incurred by working in the dual domain.
For the problem in (20) the duality gap is null as long as the channel probability distribution m_{h}(h) contains no point of positive probability as we claim in the following theorem which is a simple generalization of a similar result in [20].
Theorem 1
Let P denote the optimum value of the primal problem (20) and D that of its dual in (24) and assume there exists a strictly feasible point (x_{0},p_{0}) that satisfies the constraints in (20) with strict inequality. If the channel probability distribution m_{h}(h) contains no point of positive probability the duality gap is null, i.e.,
The condition on the channel distribution not having points of positive probability is a mild requirement satisfied by practical fading channel models including Rayleigh, Rice, and Nakagami. The existence of a strictly feasible point (x_{0},p_{0}) is a standard constraint qualification requirement which is also not stringent in practice.
In order to prove Theorem 1 we take a detour in Section “Lyapunov’s convexity theorem” to define atomic and nonatomic measures along with the presentation of Lyapunov’s Convexity Theorem. The proof itself is presented in Section “Proof of Theorem 1”. The implications of Theorem 1 are discussed in Sections “Recovery of optimal primal variables” and “Dual descent algorithms”.
Lyapunov’s convexity theorem
The proof of Theorem 1 uses a theorem by Lyapunov concerning the range of nonatomic measures [22]. Measures assign strictly positive values to sets of a Borel field. When all points have zero measure the measure is called nonatomic as we formally define next.
Definition 1 (Nonatomic measure)
Let w be a measure defined on the Borel field
Familiar measures are probability related, e.g., the probability of a set for a given
channel distribution. To build intuition on the notion of nonatomic measure consider
a random variable X taking values in [0,1] and [2,3]. The probability of landing in each of these intervals
is 1/2 and X is uniformly distributed inside each of them; see Figure
1. The space
Figure 1. Nonatomic measure. The random variable X is uniformly distributed in
Note that, except for the factor 2, the value of w_{X}(E) represents the contribution of the set E to the expected value of X and that when E is the whole space
To contrast this with an example of an atomic measure consider a random variable Y landing equiprobably in [0,1] or 5/2; see Figure
2. In this case, the measure
Figure 2. Atomic measure. The random variable Y lands with equal probability in Y=5/2 and uniformly in the interval [0,1]. The measure
Theorem 2 (Lyapunov’s convexity theorem [[22]]).
Consider nonatomic measures w_{1},…,w_{n}on the Borel field
The difference between the distributions of X and Y is that Y contains a point of strictly positive probability, i.e., an atom. This implies presence of delta functions in the probability density function of Y . Or, in a rather cleaner statement the cumulative distribution function (cdf) of X is continuous whereas the cdf of Y is not.
Lyapunov’s convexity theorem introduced next refers to the range of values taken by (vector) nonatomic measures.
Returning to the probability measures defined in terms of the probability distributions of the random variables X and Y , Theorem 2 asserts that the range of w_{X}(E), i.e., the set of all possible values taken by w_{X} is convex. In fact, it is not difficult to verify that the range of w_{X}is the convex interval [0,3] as shown in Figure 1. Theorem 2 does not claim anything about w_{Y}. In this case, it is easy to see that the range of w_{Y} is the (nonconvex) union of the intervals [0,1/2] and [5/2,3]; see Figure 2.
Proof of Theorem 1
To establish zero duality gap we will consider a perturbed version of (20) obtained
by perturbing the constraints used to define the Lagrangian in (22). The perturbation
function P(Δ) assigns to each (perturbation) parameter set
The perturbed problem in (38) can be interpreted as a modified version of (20), where we allow the constraints to be violated by Δamounts. To prove that the duality gap is zero, it suffices to show that P(Δ) is a concave function of Δ; see, e.g., [23].
Let
The roadblock to establish concavity of the perturbation functions is the constraint
The set
Theorem 3.
Let
Before proving Theorem 3 let us apply it to complete the proof of Theorem 1. For doing
that consider two arbitrary points
If
Further define the ergodic limit convex combination x_{α}:=αx + (1−α)x′. We first show that the pair (x_{α},p_{α}) is feasible for the problem for perturbation Δ_{α}.
The convex combination satisfies the constraint
We are left to show that the pair (x_{α},p_{α}) satisfies the constraint
Perform a convex combination of the inequalities in (43) and use the definitions of
x_{α}:=αx + (1−α)x′ and
Combining (42) and (44) it follows that
The utility yield of this feasible pair is f_{0}(x_{δ}) which we can bound as
Since (x,p) is optimal for perturbation Δ we have f_{0}(x)=P(Δ) and, likewise, f_{0}(x′)=P(Δ′). Further noting that the optimal yield P(Δ_{α}) for perturbation Δ_{α} must exceed the yield f_{0}(x_{α}) of feasible point x_{α}we conclude that
The expression in (46) implies that P(Δ) is a concave function of the perturbation vector Δ. The duality gap is therefore null as stated in (36).
We proceed now to the proof of Theorem 3.
Proof.
(Theorem 3) Consider two arbitrary points
To prove that
For this we will use Theorem 2 (Lyapunov’s convexity theorem). Consider the space
of all possible channel realizations
where the integrals are over the set E with respect to the channel distribution m_{h}(h). A vector of channel realizations
Two particular sets that are important for this proof are the empty set, E=∅, and the entire space
For E=∅, or any other zeromeasure set for that matter, we have w(∅)=0.
The measure w(E) is nonatomic. This follows from the fact that the channel distribution contains
no points of positive probability combined with the requirement that the resource
allocation values p(h) belong to the bounded sets
Being w(E) nonatomic it follows from Theorem 2 that the range of wis convex. Hence, the vector
belongs to the range of possible measures. Therefore, there must exist a set E_{0} such that
The analogous relation holds for the entries of w(E_{0}) corresponding to p′, i.e., (52) is valid if p(h) is replaced by p′(h) but this fact is inconsequential for this proof.
Consider now the complement set
We mimick the reasoning leading from (51) to (52), but now we restrict the focus of
(53) to the second entries of
Define now power distributions p_{α}(h) coinciding with p(h) for channel realization h∈E_{0}and with p′(h) when
The resource distribution p_{α}satisfies the set constraint
Let us now ponder the ergodic limit
Using (52) and (54), the average link capacities for power allocation p_{α}(h) can be expressed in terms of p(h), p′(h) as
The first equality in (56) holds becauset the space
Comparing (56) with (48) we see that the power allocation p_{α} yields ergodic limit y_{α}. Therefore
Recovery of optimal primal variables
Having null duality gap means that we can work in the dual domain without loss of optimality. In particular, instead of finding the primal maximum P we can find the dual minimum D, which we know are the same according to Theorem 1. A not so simple matter is how to recover an optimal primal pair (x^{∗},p^{∗}) given an optimal dual vector Λ^{∗}. Observe that recovering optimal variables is more important than knowing the maximum yield because optimal variables are needed for system implementation. In this section we study the recovery of optimum primal variables (x^{∗},p^{∗}) from a given optimal multiplier Λ^{∗}.
Start with an arbitrary, not necessarily optimal, multiplier Λ and define the primal Lagrangian maximizer set as
The elements of (x(Λ),p(Λ)) yield the maximum possible Lagrangian value for given dual variable. Comparing the definition of the dual function in (23) and that of the Lagrangian maximizers in (57) it follows that we can write the dual function g(Λ) as
Particular important pairs of Lagrangian maximizers are those associated with a given optimal dual variable Λ^{∗}. As we show in the following theorem, these variables are related with the optimal primal variables (x^{∗},p^{∗}).
Theorem 4.
For an optimization problem of the form in (20) let
Proof
Reinterpret (x^{∗},p^{∗}) as a particular pair of optimal primal arguments. Start by noting that the value of the dual function g(Λ^{∗}) can be upper bounded by the Lagrangian evaluated at (x,p)=(x^{∗},p^{∗})
Indeed, the inequality would be true for any
Consider now the Lagrangian
Since the pair (x^{∗},p^{∗}) is an optimal argument of (20) we must have
Combining (60) and (62) and using the definitions g(Λ^{∗}) and P=f_{0}(x^{∗}) yields
But since according to Theorem 1 the duality gap is null D=P and the inequalities must hold hold as equalities. In particular, we have
which according to (58) means the pair (p^{∗},Λ^{∗}) is a Lagrangian maximizer associated with Λ=Λ^{∗}. Since this is true for any pair of optimal variables (p^{∗},Λ^{∗}) it follows that the set (p^{∗},Λ^{∗}) is included in the set of Lagrangian maximizers
According to (59) optimal arguments
In general, problems in optimal wireless networking are such that the Lagrangian maximizer resource allocation functions p(Λ^{∗}) are unique to within a set of zero measure. The ergodic limit Lagrangian maximizers x(Λ^{∗}), however, are not unique in many cases. This is more a nuisance than a fundamental problem. Algorithms that find primal optimal operating points regardless of the characteristics of the set of Lagrangian maximizers are studied in Section “Dual descent algorithms”.
Separability
To determine the primal Lagrangian maximizers in (57) it is convenient to reorder terms in the definition in (22) to write
We say that in (22) the Lagrangian is grouped by dual variables because each dual
variable appears in only one summand. The expression in (65) is grouped by primal
variables because each summand contains a single primal variable if we interpret
Writing the Lagrangian as in (65) simplifies computation of Lagrangian maximizers. Since there are no constraints coupling the selection of optimal x and p in (65) we can separate the determination of the pair (x(Λ),p(Λ)) in (57) into the determination of the ergodic limits
and the resource allocation function
The computation of the resource allocation function in (67) can be further separated.
The set
The absence of coupling constraints in the Lagrangian maximization, which permits
separation into the ergodic limit maximization in (66) and the perfading maximizations
in (68), is the fundamental difference between (57) and the primal problem in (20).
In the latter, the selection of optimal x^{∗} and p^{∗} as well as the selection of p(h_{1}) and p(h_{2}) for different channel realizations h_{1}≠h_{2} are coupled by the constraint
The decomposition in (68) is akin to the decomposition in (16) for the particular case of a point to point channel with capacity achieving codes. It implies that to determine the optimal power allocation p^{∗} it is convenient to first determine the optimal multiplier Λ^{∗}. We then proceed to exploit the lack of duality gap and the separable structure of the Lagrangian to compute values p^{∗}(h) of the optimal resource allocation independently of each other. The computation of optimal ergodic averages is also separated as stated in (66). This separation reduces computational complexity because of the reduced dimensionality of (66) and (68) with respect to that of (20).
For the separation in (66) and (68) to be possible we just need to have a nonatomic channel distribution and ensure existence of a strictly feasible point as stated in the hypotheses of Theorem 1. These two properties are true for most optimal wireless communication and networking problems. A particular example is discussed in the following section.
Frequency division broadcast channel
Consider the optimal frequency division broadcast channel problem in (30) whose Lagrangian
This rearrangement is equivalent to the generic transformation leading from (22) to (65). As we observed after (65) the computation of Lagrangian maximizing ergodic limits and resource allocation functions can be separated as can the computation of resource allocation values corresponding to different fading channel realizations. This is the case in this particular example. In fact, there is more separability to be exploited in (69).
With regards to primal ergodic variables rnotice that each Lagrangian maximizing rate r_{i}(λ,μ) depends only on λ_{i} and that we can compute each r_{i}(λ,μ)=r_{i}(λ_{i}) separately as
This can be easily computed as U(r_{i}) is a one dimensional concave function. As a particular case consider the identity utility U(r_{i})=r_{i}. Since the Lagrangian becomes a linear function of r_{i}, the maximum occurs at either r_{max}or r_{min} depending on the sign of 1−λ_{i}. When λ_{i}=1 the Lagrangian becomes independent of c_{i}. In this case any value in the interval [r_{min},r_{max}] is a Lagrangian maximizer. Putting these observations together we have
Notice that the Lagrangian maximizer r_{i}(λ_{i}) is not unique if λ_{i}=1. Therefore, if
As we observed in going from (67) to (68) determination of the optimal power and frequency allocation functions requires maximization of the terms inside the expectation. These implies solution of the optimization problems
where we opened up the constraint
The maximization in (72) can be further simplified. Begin by noting that irrespectively of the value of α(h) the best possible power allocation p_{i}(h,λ,μ) of terminal i is the one that maximizes its potential contribution to the sum in (72), i.e.,
If α_{i}(h)=1 this contribution is added to the sum in (72). If it is multiplied by α_{i}(h)=0 it is not added to the sum in (72). Either way p_{i}(h,λ,μ) as given by (73) is the optimal power allocation for terminal i.
To determine the frequency allocation α(h,λ,μ) define the discriminants
which we can use to rewrite (72) as
Since at most one α_{i}(h)=1 in (75), the best we can do is to select the terminal with the largest discriminant when that discriminant is positive. If all discriminants are negative the best we can do is to make α_{i}(h)=0 for all i.
The Lagrangian maximizers p_{i}(h,λ,μ) and α(h,λ,μ) in (73) and (75) are almost surely unique for all values of Λand μ. In particular, optimal allocations
Dual descent algorithms
Determining optimal dual variables Λ^{∗} is easier than determining optimal primal pairs (p^{∗},x^{∗}) because there are a finite number of multipliers and the dual function is convex. If the dual function is convex descent algorithms are guaranteed to converge towards the optimum value, which implies we just need to determine descent directions for the dual function.
Descent directions for the dual function can be constructed from the constraint slacks associated with the Lagrangian maximizers. To do so, consider a given Λ and use the definition of the Lagrangian maximizer pair (x(Λ),p(Λ)) in (57) to write the dual function as
Further consider the Lagrangian
Subtracting (76) from (77) yields
Defining the vector
and recalling the multiplier definitions
If the dual function g(Λ) is differentiable, the expression in (80) implies that
Since the inner product of
Having a descent direction available we can introduce a time index t and a stepsize ε_{t}to define the dual subgradient descent algorithm as the one with iterates λ(t) obtained through recursive application of
This algorithm is known to converge to optimal dual variables if the stepsize vanishes at a nonsummable rate and to approach Λ^{∗} if the stepsize is constant; see e.g., [20], Section 6.
A problem in implementing (82) is that computing the subgradient component
Stochastic subgradient descent
Consider a given channel realization h and given multiplier Λ and define the vector
This definition is made such that the expected value of s_{1}(h,Λ) with respect to the channel distribution is the subgradient component
Formally, (84) implies that s(h,Λ) is a stochastic subgradient of the dual function. Intutively, (84) implies that
s(h,Λ) is an average descent direction of the dual function because its expectation is
a descent direction. Thus, if we draw independent channel realizations h(t) and replace
The advantage of this substitution is that to compute the stochastic subgradient s(h,Λ), we do not need to evaluate an expectation as in the case of the subgradient
(S2) Dual stochastic subgradient. With the Lagrangian maximizers determined by (85) compute the stochastic subgradient
(S3) Dual iteration. With stochastic subgradients as in (86) and given step size ε descend in the dual domain along the direction −s(t) [cf. (82)]
The core of the dual stochastic subgradient descent algorithm is the dual iteration (S3). The purpose of the primal iteration (S1) is to compute the stochastic subgradients in (S2) that are needed to implement the dual descent update in (S3). We can think of the primal variables x(Λ(t)) and p(h(t),Λ(t)) as a byproduct of the descent implementation.
Convergence properties depend on whether constant or time varying step sizes are used.
If the stepsizes ε_{t} form a nonsummable but square summable series, i.e.,
If λ(t) approaches or converges to Λ^{∗}it follows as a consequence of Theorem 4 that an optimal primal pair (x^{∗},p^{∗}) can be computed from the Lagrangian maximizers if the latter are unique. Observe that this does not require a separate computation because the Lagrangian maximizers are computed in the primal iteration (S1). One may question that at time t we do not compute the Lagrangian maximizer function p(Λ(t)) but just the single value p(h(t),Λ(t)). However, h(t) is the channel realization at time t which means that p(h(t),Λ(t)) is the value we need to compute to adapt to the current channel realization.
This permits reinterpretation of (S1)–(S3) as a policy to determine wireless systems’ operating points. At time t we observe current channel realization h(t) and determine resource allocation p(h(t),Λ(t)) which we proceed to implement in the current time slot. In this case the core of the algorithm is the primal iteration (S1) and the dual variable λ(t) is an internal state that determines the operating point. Steps (S2) and (S3) are implemented to update this internal state so that as time progresses λ(t) approaches Λ^{∗}and the policy becomes optimal because it chooses the best possible resource allocation adapted to the current channel realization h(t).
The reinterpretation of (S1)–(S3) as a policy to determine resource allocations p(t)=p(h(t),Λ(t)) associated with observed channel realizations h(t) motivates a redefinition of the concept of solution to the wireless optimization
problem in (20). In principle, solving (20) entails finding the optimal resource allocation
p^{∗} and the optimal ergodic average x^{∗} such that the problem constraints are satisfied and
To be more specific consider the channel stochastic process
Indeed, since
Writing the constraint
Definition 2
Consider the channel stochastic process
(iii) Almost sure optimality. The yield of the ergodic limit of
If the stochastic process
Theorem 5 (Ergodic stochastic optimization [[19]])
Consider the optimization problem in (20) as well as processes
The sequences
Variables p^{∗} and x^{∗} optimal in the sense of (21) are not computed by (S1)–(S3). Rather, (89) implies that, asymptotically, (S1)–(S3) is drawing resource allocation realizations p(t)=p(h(t),Λ(t)) and variables x(t):=x(Λ(t)) that are close to optimal as per Definition 2. The important point here is that having a procedure to generate stochastic processes close to optimal in the sense of Definition 2 is sufficient for practical implementation.
An example application of the dual stochastic subgradient descent algorithm (S1)–(S3) is discussed in the next section.
Frequency division broadcast channel
To implement dual stochastic descent for the frequency division broadcast channel we need to specify the primal iteration (S1) and the dual iteration (S2). To specify the primal iteration (S1) we need to compute Lagrangian maximizers for which it suffices to recall the expressions in Section “Frequency division broadcast channel” of “Recovery of optimal primal variables”. For the ergodic rate r_{i} we make λ_{i}=λ_{i}(t) in (70) to conclude that the primal iterate r_{i}(t)=r_{i}(λ_{i}(t)) is
For the power allocations p_{i}(t)=p_{i}(h(t),λ(t),μ(t)) and the frequency assignments α(t)=α(h(t),λ(t),μ(t)) we need to set the multipliers to λ=λ(t) and μ=μ(t) and also set the value of the channel to its current state h=h(t). This substitution in (73) yields the power allocation
To determine the frequency assignments α(t) we first substitute λ=λ(t), μ=μ(t), and h=h(t) in (74) to compute the discriminants d_{i}(t)=d_{i}(h(t),λ(t),μ(t))
from where we conclude that the frequency assignment α(t)=α(h(t),λ(t),μ(t)) is given by the solution of [cf. (75)]
Recall that since at most one α_{i}(h)=1 in (96), the optimal frequency allocation is to make α_{i}(h)=1 for the terminal with the largest discriminant when that discriminant is positive. If all discriminants are negative we make α_{i}(h)=0 for all i.
The ESO algorithm for optimal resource allocation in broadcast channels is completed with an iteration in the dual domain [cf. (83) and (87)]
As per Theorem 5 iterative application of (93)–(97) yields sequences r_{i}(t), α_{i}(t) and p_{i}(t) such that: (i) The sum utility for the ergodic limits of r_{i}(t) is almost surely within a small constant of optimal; (ii) The power constraint in
(27) and the rate constraints in (26) are almost surely satisfied in an ergodic sense.
This result is true despite the presence of the nonconvex integer constraint
Numerical results
The dual stochastic subgradient descent algorithm for optimal resource allocation
in frequency division broadcast channels defined by (93)–(97) is simulated for a system
with J=16 nodes. Three AMC modes corresponding to capacities 1, 2 and 3 bits/s/Hz are used
with transitions at SINR 1, 3 and 7. Fading channels are generated as i.i.d. Rayleigh
with average powers 1 for the first four nodes, i.e., j=1,…,4, and 2, 3 and 4 for subsequent groups of 4 nodes. Noise power is N_{0}=1 and average power available is q_{0}=3. Rate of packet acceptance is constrained to be 0≤r_{i}(t)≤2 bits/s/Hz. The optimality criteria is proportional fair scheduling, i.e.,
Figure
3 shows evolution of dual variables λ_{i}(t) and corresponding rates r_{i}(t) for representative nodes i=1 with average channels
Figure 3. Primal and dual iterates in dual stochastic gradient descent. Evolution of dual variables λ_{i}(t) and rates r_{i}(t) for representative nodes with average channels
Figure 4. Optimal frequency division broadcast channel. Objective value
Figure 5. Power and capacity constraints. Feasibility as time grows is corroborated for the power constraint in (27) (top) and rate constraints in (26) (bottom). For the rate constraint we show the maximum and minimum value of constraint violation.
Figure 6. Power allocations. Power allocated as a function of channel realization is shown for channels with average
power
Conclusions
This article reviews recent results which state that problems of the form in (20) in which nonconcave functions appear inside expectations have null duality gap as long as the probability distribution of the fading coefficient h contains no points of strictly positive probability. Lack of duality gap permits solution in the dual domain leading to a substantial reduction in the computational cost of determining optimal operating points of the wireless system. Working in the dual domain leads to a solution methodology that can be interpreted as a generalization of the derivation of the waterfilling power allocation in point to point channels reviewed in Section “Power allocation in a pointtopoint channel”.
Specifically, the problem of determining the optimal resource allocation function p^{∗} in (20) is challenging due to its infinite dimensionality and lack of convexity. However, in the dual domain we need to determine the optimal multiplier Λ^{∗}that minimizes the dual function in (23). This is simpler because the dual function is convex and finite dimensional. Once we have found an optimal dual variable we can determine optimal operating points as Lagrangian maximizers. In doing so we can exploit the separable structure of the Lagrangian to decompose the optimization problem into the per fading state subproblems in (68). We emphasize that solving the optimization programs in (68) is not necessarily easy if the dimensionality of h is large. Nevertheless solving (68) is always simpler than solving (20) and in some cases plain simple. Lack of duality gap and Lagrangian separability are further exploited to propose the dual stochastic subgradient descent algorithm (S1)–(S3) which converges to an optimal operating point with probability 1 in an ergodic sense.
There are three key points that permit the development of the solution methodology
outlined in the previous paragraph: Nonatomic fading distribution. A nonatomic fading distribution leads to the lack of duality gap. The fact that P=D, i.e., that primal and dual optimal values are the same, is what allows us to work
in the dual domain without loss of optimality. In formal terms, lack of duality gap
is the tool that we used to recover the optimal primal variables (x^{∗},p^{∗}) from the optimal dual variable Λ^{∗}by determining the primal Lagrangian maximizers for
Nonatomic fading distributions, Lagrangian separability, and having a finite number of constraints are properties that appear in many, indeed most, problems in optimal design of wireless systems. In such cases the methodology described in this article can be applied to their solution.
Further reading
The use of dual problems as a shortcut to solve optimization problems in communications has a rich history [2528]; see also [29] for a comprehensive treatment. Lack of duality gap in nonconvex optimization problems has also been observed in the context of asymmetric digital subscriber lines [30,31]. In network optimization lack of duality gap leads to the optimality of layered architectures which renders the complexity of wireless networking essentially identical to the complexity of physical layer optimization [3235]. For the use of techniques discussed here in the solution of specific problems we refer the reader to [3642]. For further details on dual stochastic sub gradient descent, the literature on convergence of subgradient descent algorithms [4345], and stochastic subgradient descent [4649] is of interest.
Competing interests
The author declares that he has no competing interests.
Acknowledgements
Work in this article is supported by the Army Research Office grant W911NF1010388 and the National Science Foundation award CAREER CCF0952867. Part of the results in this article were derived while the author was at the University of Minnesota. The work presented here has benefited from discussions with Yichuan Hu, Dr. Nikolaos Gatsis, and Prof. Georgios B. Giannakis. The Associate Editor, Dr. Deniz Gunduz, provided valuable corrections to a draft version of this article.
References

X Wang, GB Giannakis, Resource allocation for wireless multiuser OFDM networks. IEEE Trans. Inf. Theory 57(7), 4359–4372 (2011)

V Ntranos, N Sidiropoulos, L Tassiulas, On multicast beamforming for minimum outage. IEEE Trans. Wirel. Commun 8(6), 3172–3181 (2009)

ND Sidiropoulos, TN Davidson, ZQ Luo, Transmit beamforming for physicallayer multicasting. IEEE Trans. Signal Process 54(6), 2239–2251 (2006)

JA Bazerque, GB Giannakis, Distributed scheduling and resource allocation for cognitive OFDMA radios. Mobile Nets. Apps 13(5), 452–462 (2008). Publisher Full Text

Z Quan, S Cui, AH Sayed, Optimal linear cooperation for spectrum sensing in cognitive radio networks. IEEE J. Sel. Topics Signal Process 2, 28–40 (2008)

Y Hu, A Ribeiro, Optimal wireless networks based on local channel state information. IEEE Trans. Signal Process 60(9), 4913–4929 (September 2012)

Y Hu, A Ribeiro, Adaptive distributed algorithms for optimal random access channels. IEEE Trans. Wirel. Commun 10(8), 2703–2715 (2011)

Y Hu, A Ribeiro, A Optimal wireless multiuser channels with imperfect channel state information. Proc. Int. Conf. Acoustics Speech Signal Process, vol. 1, ((Kyoto Japan, 2012), pp), . 1–4

Y Hu, A Ribeiro, Optimal transmission over a fading channel with imperfect channel state information. Global Telecommun. Conf., vol. 1, ((Houston, TX, 2011), pp), . 1–5

L Chen, SH Low, M Chiang, JC Doyle, Crosslayer congestion control, routing and scheduling design in ad hoc wireless networks. Proc. IEEE INFOCOM, vol. 1, ((Barcelona, Spain, 23–29 April 2005), pp), . 1–13

M Chiang, SH Low, RA Calderbank, JC Doyle, Layering as optimization decomposition. Proc IEEE 95, 255–312 (2007)

A Eryilmaz, R Srikant, Joint congestion control, routing, and MAC for stability and fairness in wireless networks. IEEE J. Sel. Areas Commun 24(8), 1514–1524 (2006)

L Georgiadis, MJ Neely, Resource allocation and crosslayer control in wireless networks. Found Trends Netw 1, 1–144 (2006)

JW Lee, RR Mazumdar, NB Shroff, Opportunistic power scheduling for dynamic multiserver wireless systems. IEEE Trans. Wirel. Commun 5(6), 1506–1515 (2006)

X Lin, NB Shroff, R Srikant, A tutorial on crosslayer optimization in wireless networks. IEEE J. Sel. Areas Commun 24(8), 1452–1463 (2006)

MJ Neely, E Modiano, CE Rohrs, Dynamic power allocation and routing for timevarying wireless networks, IEEE J. Sel. Areas Commun 23, 89–103 (2005)

X Wang, K Kar, Crosslayer rate optimization for proportional fairness in multihop wireless networks with random access. IEEE J. Sel. Areas Commun 24(8), 1548–1559 (2006)

Y Yi, S Shakkottai, Hopbyhop congestion control over a wireless multihop network. IEEE/ACM Trans. Netw. 15(133–144), 1548–1559 (2007)

A Ribeiro, Ergodic stochastic optimization algorithms for wireless communication and networking. IEEE Trans. Signal Process 58(12), 6369–6386 (2010)

A Ribeiro, G Giannakis, Separation principles in wireless networking. IEEE Trans. Inf. Theory 56(9), 4488–4505 (2010)

S Boyd, L Vandenberghe, in Convex Optimization (Cambridge University Press, Cambridge, 2004)

AA Lyapunov, Complètement, Sur les, Fonctionsvecteur, URSS, Additives. Bull. Acad. Sci. Sèr. Math 4, 465–478 (1940)

RT Rockafellar, in Convex Analysis (Princeton University Press, Princeton, NJ, 1970)

NZ Shor, in Minimization Methods for NonDifferentiable Functions (Springer, Berlin, 1985)