# Energy-Efficient Digital Predistortion With Lookup Table Training Using Analog Cartesian Feedback

SungWon Chung, Student Member, IEEE, Jack W. Holloway, and Joel L. Dawson, Member, IEEE

Abstract-We demonstrate energy-efficient low-complexity adaptive linearization for wideband handset power amplifiers (PAs). Due to power overhead and complexity, traditional wideband linearization techniques such as adaptive digital predistortion (DPD) thus far have not been used for wideband handset transmitters. Our energy-efficient lookup table training strategy resulted in a training energy of 1.83 nJ/entry for a 5-MHz bandwidth WiMAX orthogonal frequency division multiple access (OFDMA) transmission, which represents more than 40× improvement over state-of-the-art DPD implementations. Our experimental prototype transmitter achieves a maximum of 9.9-dB improvement of adjacent channel leakage power at 5.15-MHz offset with 22.0-dBm channel power in the 5-MHz bandwidth WiMAX-OFDMA transmission. This linearity improvement offers 26.5% savings in PA power consumption by reducing power backoff.

Index Terms—Adaptive linearization, adaptive predistortion, Cartesian feedback, digital predistortion (DPD), lookup table (LUT), power amplifier (PA) linearization, wideband handset PA, wideband predistortion, WiMAX.

# I. INTRODUCTION

PAs) have suffered from poor power efficiency in mobile wireless applications such as wideband code division multiple access (WCDMA), WiMAX, and wireless local area networks (WLANs). While process scaling has been improving the power efficiency and speed of digital circuitry, it has not been of benefit to PAs. As radios are evolving to digitally intensive architectures, the PA is expected to be the last power-hungry analog block in portable wireless transmitters. Thus, PA power-efficiency enhancement is of great importance in wireless mobile handset design. PA linearization is a traditional way of improving the poor PA power efficiency. In this paper, we present an energy-efficient low-complexity adaptive PA linearization technique specially suited to wideband handset PAs (typically transmitting on the order of 1-W output power.)

Manuscript received April 8, 2008; revised June 24, 2008. First published September 12, 2008; current version published October 8, 2008. This work was supported in part by the Focus Center Research Program (FCRP) Focus Center for Circuit and System Solutions (C2S2) under Contract 2003-CT-888, and by the Korea Science and Engineering Foundation (KOSEF) under Grant D00248.

- S. Chung and J. L. Dawson are with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: sungwon@ieee.org; jldawson@mtl.mit.edu).
- J. W. Holloway was with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is now with the United States Marine Corps, Corpus Christi, TX 78419

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMTT.2008.2003139

Fig. 1 illustrates adaptive digital predistortion (DPD), which is a classical technique for wideband PA linearization. Saleh and Salz [1] introduced a DPD technique to linearize a PA for particular signals. Nagata [2] generalized for arbitrary signals by using a Cartesian lookup table (LUT) predistorter. For current wideband mobile standards, Nagata's system would need gigabytes of memory.

This memory size requires a great deal of expensive silicon area. In addition, the corresponding long training time makes Nagata's technique a difficult fit for handset applications. Eun and Powers [3] proposed a DPD technique using a Volterra polynomial predistorter. The memory requirement for this polynomial predistorter was far less than for LUT predistorters, but the amount of digital signal processing is much larger, rendering this technique unsuitable for wideband handsets. Cavers [4] proposed a complex gain LUT, which greatly reduced the size of the LUT and training time. However, as the bandwidth of transmit signal gets wider, the feedback loop delay mismatch becomes increasingly detrimental to the convergence of LUT adaptation algorithm. Recently, Woo et al. [5] and Kim et al. [6] suggested wideband DPD techniques using feedback [digital feedback predistortion (DFBPD)], which lessens the impact of feedback loop delay mismatch.

What all of these DPD techniques have in common is that they are far better suited to baseband high power amplifiers (HPAs) than to wideband handset PAs. First, the power consumption and complexity of the feedback path hinder the application of DPD techniques to fully integrated wideband handset transmitters. Particularly, high-resolution ADCs in the feedback path may nullify power savings achieved by the adaptive linearization of wideband handset PAs. With 1-D LUT predistorters such as in [4], multiple coordinate rotation digital computers (CORDICs), high-speed digital multipliers for the correction of modulator imperfections, and precise demodulator calibration are required. This power overhead and complexity, inconsequential for base stations, indeed matter with handsets. Thus, traditional DPD techniques are difficult to apply to wideband handsets.

This paper presents an energy-efficient low-complexity digital predistortion technique for wideband handset PAs. In order to efficiently train the Cartesian LUT, the proposed digital predistortion technique uses traditional analog Cartesian feedback to train each LUT entry. After this offline LUT training is completed, open-loop predistortion is performed using a compact Cartesian LUT. The overall benefit of the proposed technique is adaptive linearization of wideband handset PAs with: 1) energy-efficient LUT training using analog Cartesian feedback and 2) energy-efficient predistortion without CORDICs and modulator correction. The proposed technique builds on the adaptive linearization technique developed in



Fig. 1. Classical DPD, which needs CORDICs, modulator correction, and time alignment for wideband linearization.



Fig. 2. Classical analog Cartesian feedback, which suffers from bandwidth limitation due to feedback loop delay.

[7] and [8] while avoiding the transmission of LUT training symbols and reducing LUT training time by a factor of more than 100. The reduction of LUT training time is enabled by the combined results of LUT size reduction, specially ordered training symbols, and surface acoustic wave (SAW) filter bypassing during the training phase. Since handset PAs do not show severe memory effects [9], the proposed digital predistortion technique does not compensate for PA memory effects in order to obtain power savings and simplicity at the expense of a slight linearity degradation.

This paper is organized as follows. The two contributions of the proposed low-power digital predistortion technique are separately described. Section II describes energy-efficient LUT training using analog Cartesian feedback. Section III describes energy-efficient predistortion using a compact Cartesian LUT. Measured results from an experimental prototype are presented in Section IV.

#### II. ENERGY-EFFICIENT LUT TRAINING

We present a new simple method using analog Cartesian feedback in order to reduce the number of iterations to train each LUT entry.

## A. Reducing Iterations Using Analog Cartesian Feedback

Fig. 2 shows analog Cartesian feedback [10], which can simply linearize the PA using feedback loop gain. However, a finite delay existing in the feedback loop sets a fundamental bandwidth limitation so that analog Cartesian feedback cannot be applied to wideband systems.

We use analog Cartesian feedback only for LUT training, as seen in Fig. 3(a). Thus, the fundamental bandwidth limitation of analog Cartesian feedback is no longer a barrier to wideband linearization. After the completion of LUT training, the analog Cartesian feedback is turned off so that we perform open-loop predistortion in order to linearize wideband signals, as seen in Fig. 3(b).

Fig. 4 compares the proposed LUT training method with DPD and DFBPD. The DPD technique uses an adaptation algorithm such as recursive least square (RLS), which converges the LUT slowly. The result is that many iterations of LUT training are required. The adaptation algorithm needs the difference between data symbols and corresponding symbols at the PA output. Precise time alignment to cancel out a delay existing in the feedback loop is seen in Fig. 4(a). Without the time alignment, the adaptation process fails to converge. The DFBPD technique reduces the long convergence time and loop delay sensitivity of DPD by



Fig. 3. Proposed digital predistortion technique in two steps. (a) LUT training using analog feedback. (b) Predistortion with the feedback turned off.

simplifying the adaptation algorithm of DPD into digital feedback, as seen in Fig. 4(b).

The architectural difference from the proposed LUT training method from DPD and DFBPD is that the LUT is separated from feedback loop, as seen in Fig. 4(c). This separation brings two advantages to the proposed method. First, multiple iterations to train each LUT entry become unnecessary. Therefore, the energy efficiency of the LUT training phase is improved. Second, time alignment between feedforward and feedback path is not necessary.

The detailed operation of LUT training using analog Cartesian feedback is as follows. To fill the LUT, training symbols to

characterize the PA are transmitted. The analog loop filter provides both loop gain and feedback compensation. The feedback causes corresponding predistorted symbols to appear at the loop filter output. Two ADCs sample the I and Q component of the predistorted symbols. During this LUT training phase, to prevent the transmission of training symbols through an antenna, an RF switch connects the PA output to a  $50\text{-}\Omega$  dummy load. SAW filters before the PA are bypassed in order to reduce loop delay, permitting a unity-gain bandwidth over 10 MHz. For a dominant-pole compensated feedback loop, a 10-MHz unity-gain frequency implies that distortion products up to 100 kHz will be suppressed by at least 40 dB.



Fig. 4. Comparison of feedback structures with: (a) complex-gain predistortion (DPD) [4], which needs many iterations to converge LUT, (b) DFBPD [5], which is faster than DPD in convergence, and (c) proposed digital predistortion technique, which places the LUT outside of the feedback loop, and thus, no iteration is necessary to train each LUT entry.

Training symbols are designed to minimize the Cartesian distance between consecutive training symbols in order to reduce the overall LUT training time. This causes the individual training waveforms on the I and Q channels to resemble a monotonically increasing or decreasing staircase (see, e.g., d(t) in Fig. 5). During training, we also avoid simultaneous transitions on I and Q, which would otherwise increase the distance between consecutive symbols by a factor of  $\sqrt{2}$ . Following these two guidelines results in training signals with a low bandwidth at a given training symbol rate. The impact is a  $2\times$  reduction in training time compared to randomly ordered training symbols.

In order to renew the LUT, it is required to stop transmission and enter into the training phase. The renewal of the LUT can happen without interrupting ongoing communication because the LUT training needs a millisecond and most communication protocols implement enough buffering for an error control method such as automatic repeat request (ARQ). For example, WCDMA and WLAN implement 200-ms ARQ timeout, and WiMAX implements 50-ms ARQ timeout. How often the LUT would need a renewal depends on the PA characteristics and the environmental conditions (PA topology, aging effects, process technology, temperature fluctuation, and so on).

## B. LUT Training Time, Power, and Energy

Table I summarizes the simulation results of LUT training time, power, and energy requirements. We compare the proposed LUT training method with DPD and DFBPD. The simulation is carried out for WiMAX-OFDMA transmission with 5- and 20-MHz bandwidth using 64 quadrature amplitude modulation (QAM) subchannel modulation. In order to achieve the adjacent channel interference (ACI) of -60 dBc, the decision on the LUT bitwidth and size is made based on the quantization analysis given in [5] and [11]. A loop delay misalignment of 0.1 symbol time is assumed for DPD and DFBPD.

1) Time: Minimum training time using the proposed method is determined by the bandwidth of analog Cartesian feedback. With our experimental prototype, the 3-dB bandwidth of analog Cartesian feedback is 1 MHz. For each of the 256 entries in the Cartesian LUT, seven time constants for the analog Cartesian feedback were allowed for settling to get higher than 10-bit resolution. Corresponding overall LUT training time is 285.2  $\mu$ s. However, at the maximum PA output that could be linearized, the closed-loop bandwidth decreases. Therefore, with the additional margin determined by experimental measurement from our prototype transmitter, 570.4  $\mu$ s was allowed for the LUT training.

Minimum training time of DPD and DFBPD depends on the number of iterations to train each LUT entry. In simulation, to train each LUT entry, DPD and DFBPD require at least 22 and 12 iterations, respectively. DPD and DFBPD perform online training where transmitted symbols for ongoing communications are used to train the LUT. On the other hand, the proposed LUT training method is an offline scheme using predetermined training symbols. Thus, the amplitudes of transmitted symbols in DPD and DFBPD are not uniformly distributed so that there are more than 3× difference between the minimum and average iterations, as seen in Table I.

Each iteration in DPD and DFBPD needs one ADC sampling, which requires at least one DAC conversion cycle for the comparison of the feedfoward and feedback signal. To create WiMAX-OFDMA signals with 5-MHz channel bandwidth, a 5.6-MHz DAC clock rate is required by the IEEE 802.16e standard. To compensate fifth-order nonlinearity, 28.0 MS/s is a theoretical minimum DAC clock rate, but higher than 56.0 MS/s is needed for a realizable DAC reconstruction filter. Minimum LUT training time for 5-MHz channel bandwidth takes 178.5 and 97.4  $\mu$ s for DPD and DFBPD, respectively, showing 3× and 6× faster training time compared to the proposed LUT training method. For 20-MHz channel bandwidth, DPD and DFBPD provide further improvement in the minimum training time because the ADC sampling can go faster.

2) Power: LUT training power is estimated from required ADC power consumption, which is determined by sampling rate and resolution. ADC power consumption was projected from [12] in 90-nm CMOS process technology. The ADC sampling rate for LUT training is determined not by the communication channel bandwidth, but by the designers' choice to tradeoff training time and ADC power consumption. When minimizing training time, the ADC sampling rate for DPD and DFBPD is the same as the 56.0-MS/s transmitter DAC clock rate that we



Fig. 5. Feedback delay  $\delta$  and associated signals at the LUT training phase. If necessary, it is possible to improve the signal-to-noise ratio (SNR) of the LUT by employing multiple sampling as shown, and then averaging those samples.

TABLE I
COMPARISON OF LUT TRAINING TIME, POWER, AND ENERGY FOR
WIMAX-OFDMA TRANSMISSION ACHIEVING -60-dBc ACI

|                       | DPD            | DFBPD         | This work    |
|-----------------------|----------------|---------------|--------------|
| LUT size              | 128 entries    | 128 entries   | 256 entries  |
| LUT bitwidth          | 9 bit          | 8 bit         | 10 bit       |
| ADC resolution        | 10 bit         | 10 bit        | 10 bit       |
| Avg. Iterations       | 78.1           | 42.6          | 1            |
| Min. Iterations       | 22             | 12            | 1            |
| Delay alignment       | required       | required      | -            |
| Modulator calibration | required       | required      | -            |
| 5-MHz bandwidth       | -              | -             |              |
| ADC sampling rate     | 56.0 MS/s      | 56.0 MS/s     | 0.4 MS/s     |
| Training power        | 102.4 mW       | 102.4 mW      | 0.8 mW       |
| Training time         | 178.5 us       | 97.4 us       | 570.4 us     |
| 20-MHz bandwidth      |                |               |              |
| ADC sampling rate     | 224.0 MS/s     | 224.0 MS/s    | 0.4 MS/s     |
| Training power        | 409.5 mW       | 409.5 mW      | 0.8 mW       |
| Training time         | 44.6 us        | 24.3 us       | 570.4 us     |
| Training energy       | 142.8 nJ/entry | 77.9 nJ/entry | 1.8 nJ/entry |

have discussed. However, a 0.4-MS/s ADC sampling rate is used for the proposed LUT training method. 0.4 MS/s represents the training symbol rate, which is determined by the bandwidth of the analog Cartesian feedback in our experimental prototype.

LUT training power reduction by the proposed LUT training method is 124 × and 499 × for 5- and 20-MHz bandwidth transmission, respectively, with the minimum training time. We have omitted the consideration of the digital power. This omission actually favors DPD and DFBPD because our method does not require complex digital circuits such as CORDICs, modulator correction units, time alignment, and a DSP in order to manage LUT adaptation. In addition, because our system does the training offline, we are able to use a much lower ADC sampling rate. We provide more detailed discussion of digital power consumption in Section III-B

3) Energy: We compare the energy efficiency of the LUT training methods by calculating the necessary training energy per LUT entry. A simple expression for the training energy per LUT entry is obtained as follows.

First, we rewrite ADC power consumption in terms of universal ADC figure-of-merit  ${\rm FOM_{ADC}}$  as

$$P_{\text{ADC}} = S_R \cdot 2^{\text{ENOB}} \cdot \text{FOM}_{\text{ADC}}$$
 (1)

where  $S_R$  is the ADC sampling rate and ENOB is the effective number of bits available from ADC, which represents ADC resolution. The strength of using  ${\rm FOM_{ADC}}$  is that it allows designers to estimate ADC power consumption from the published literature once they know the required ADC conversion steps and sampling rate.

Second, we write the overall LUT training time as

$$T_{\text{LUT}} = T_{\text{entry}} \cdot N_{\text{entry}} = \frac{I_{\text{entry}}}{S_R} \cdot N_{\text{entry}}$$
 (2)

where  $T_{\rm entry}$  is training time per entry,  $N_{\rm entry}$  is the number LUT entries, and  $I_{\rm entry}$  is the number of iterations to train each LUT entry.

LUT training energy per entry can be obtained by using (1) and (2) as

$$E_{\text{entry}} = \frac{T_{\text{LUT}} \cdot P_{\text{ADC}}}{N_{\text{entry}}}$$

$$= T_{\text{entry}} \cdot S_R \cdot 2^{\text{ENOB}} \cdot \text{FOM}_{\text{ADC}}$$

$$= I_{\text{entry}} \cdot 2^{\text{ENOB}} \cdot \text{FOM}_{\text{ADC}}.$$
(3)

Equation (3) clearly explains the primary advantage of our method over DPD and DFBPD. For an equivalent ADC and ENOB, we see that the great advantage of this method of

training is that it reduces the number of iterations to obtain each LUT entry.

LUT training energy per entry is 1.83 nJ/entry, providing  $78 \times$  and  $43 \times$  reduction compared to DPD and DFBPD, respectively. This improvement comes from the significant reduction in the number of iterations to train each LUT entry.

#### C. Impact of Cartesian Feedback Delay

Fig. 5 illustrates the training process. The ADCs do not begin sampling until a time interval  $\delta$  after a new training symbol has appeared at the input. If this delay is not considered in the LUT training, the necessary resolution of the LUT entry cannot be achieved. The implementation of the Cartesian LUT predistorter with the delay consideration is simple and does not need any calibration because the amount of the delay is determined primarily by the loop filter. As illustrated in Fig. 5, multiple sampling by a factor of N can increase feedback noise immunity and provides a tradeoff between the training time and the LUT SNR (see discussion on averaging in Section II-D).

It is worth taking a closer look at the delay between the feedback input and loop filter output. For a Cartesian feedback loop with a dominant-pole compensated loop filter, it is straightforward to see the transfer function relating the input to the loop filter output, given by a first-order response

$$P(s) = \frac{L_e(s)}{1 + L_e(s)Af}D(s) \tag{4}$$

where D(s) is the feedback input, A is the forward gain from modulator, preamplifier, and PA, f is the feedback attenuation, and  $L_e(s)$  is the effective loop filter transfer function. Since the narrow bandwidth of training symbols assures that feedback loop gain is very large, we identify

$$L_e(s) \simeq \frac{L(s)}{\cos \phi}$$
 (5)

where the loop filter transfer function L(s) is given by

$$L(s) = \frac{k}{1 + s/\omega_c} \tag{6}$$

and  $\phi$  is the phase shift introduced by the AM–PM characteristics of the PA, which cannot be regulated by phase alignment techniques for synchronous downconversion. Therefore, from the following relation:

$$P(s) \simeq \frac{1}{Af} \cdot \frac{1}{1 + s / \left(\frac{k A f \omega_c}{\cos \phi}\right)} D(s)$$
 (7)

when the forward gain A decreases due to PA gain compression for a large amplitude signal, we see the delay between feedback input and loop filter output will increase because of closed-loop bandwidth reduction. The time constant of the first-order response from the Cartesian feedback input to the loop filter output is given by

$$\tau = \frac{\cos \phi}{k A f \omega_c} \tag{8}$$



Fig. 6. Increase of feedback delay due to the PA gain compression [dashed line is the analytical result from (8)].

and is compared with the measured results, as seen in Fig. 6. The impact of the feedback delay variation on overall LUT training time can be made negligible by allowing additional time to wait for feedback settling.

# D. Averaging To Increase Noise Immunity

Averaging can be exploited to increase the noise immunity of the system. For WiMAX and WCDMA handsets, the analog loop filter output needs to be digitized with 10- and 12-bit resolution, respectively. These resolutions may not be available at the loop filter output because of noise. To enhance the noise immunity of LUT training process, we have the option of exploiting digital averaging.

Two noise sources, i.e., band-limited thermal noise and quantization noise, may limit the resolution of LUT entries. First, the thermal noise is generated from the analog demodulator in the feedback path. The bandwidth of the demodulator noise is limited by closed-loop feedback. Second, the quantization noise is introduced by the ADC, which samples the loop filter output.

Averaging decreases the statistical variance of samples taken at loop filter output for a training symbol. Thus, averaging increases the LUT entry SNR by reducing both the band-limited thermal noise power and quantization noise power.<sup>1</sup>

#### III. ENERGY-EFFICIENT PREDISTORTION

In addition to the advantage of lower LUT training energy per entry, the use of a Cartesian LUT [2] also reduces the power required to predistort data symbols during transmission. A Cartesian LUT eliminates the need for CORDICs and modulator calibration. 2-D spline interpolation reduces the size of the Cartesian LUT, leaving the LUT size to that of a 1-D complex gain table. Although there are many predistortion techniques compensating for memory effects such as [13]–[16], our method does not compensate for memory effects. We made this choice because the relatively low-power PAs in handsets do not exhibit strong memory effects, allowing us to tradeoff memory effects compensation for simplicity and low power dissipation.

<sup>1</sup>Note that, without dithering, averaging cannot reduce the quantization noise power.



Fig. 7. Trained Cartesian LUT, which compensates modulator imperfections, as well as predistorts baseband symbols.

Fig. 3(b) illustrates our simple and energy-efficient predistortion method, which is performed open loop. After the compact Cartesian LUT training is completed, the entire feedback path is turned off. The  $50-\Omega$  dummy load, which was used for LUT training to prevent the transmission of training symbols, is disconnected from the PA output. The RF switch steers the PA output to the antenna. Finally, transmission symbols are predistorted by a Cartesian LUT predistorter and radiated through an antenna.

#### A. Removing CORDIC and Modulator Correction

The compact Cartesian LUT brings two benefits over a complex gain LUT [4] and an AM/AM-AM/PM LUT [17], which are: 1) CORDICs are not necessary and 2) modulator correction is unnecessary. Thus, the complexity and power consumption of digital signal processing for predistortion are greatly reduced.

The proposed predistortion method does not need CORDICs because a Cartesian LUT gets an IQ pair as an input index to address a LUT entry rather than getting a magnitude of the IQ pair. In addition, a complex-number multiplier, which is used to adjust baseband symbols based on LUT output in the DPD technique, is not necessary.

Modulator correction is also not necessary with the proposed predistortion method. Fig. 7 shows the contents of the Cartesian LUT trained by analog Cartesian feedback. The asymmetric contents of the Cartesian LUT represent the necessary compensation for the upconversion mixer imperfections of dc offset, gain mismatch, and phase mismatch.

#### B. Estimated Power Consumption for Digital Predistortion

Table II shows the estimated power consumption of the proposed predistortion method for sending information with 5-MHz bandwidth WiMAX-OFDMA transmission.

For the estimation of power consumption, all digital circuit blocks are analyzed in terms of the: 1) number of digital adders; 2) number of digital multipliers; 3) operating clock rate; and 4) bit width of input and output. Using measured results from published literature as a reference, we then estimated the power consumption of each block. Power consumption for LUT static RAM (SRAM)  $(P_L)$ , LUT interpolator  $(P_I)$ , LUT upsampler  $(P_U)$ , CORDIC  $(P_C)$ , mixer mismatch correction unit  $(P_M)$ 

TABLE II
ESTIMATION OF POWER CONSUMPTION FOR THE PROPOSED PREDISTORTION
METHOD FOR 5-MHz WiMAX-OFDMA TRANSMISSION

|                                     | DPD     | DFBPD   | This work |
|-------------------------------------|---------|---------|-----------|
| Power Consumption                   |         |         |           |
| LUT (P <sub>L</sub> )               | 1.2 mW  | 1.1 mW  | 4.2 mW    |
| Interpolation (P <sub>I</sub> )     | 1.3 mW  | 1.3 mW  | 2.9 mW    |
| Upsampling (P <sub>U</sub> )        | -       | -       | 9.1 mW    |
| CORDIC (P <sub>C</sub> )            | 19.1 mW | 19.1 mW | -         |
| Complex mult. (P <sub>X</sub> )     | 2.3 mW  | -       | -         |
| Mixer correction (P <sub>M</sub> )  | 1.2 mW  | 1.2 mW  | -         |
| Filter correction (P <sub>F</sub> ) | 18.6 mW | 18.6 mW | -         |
| DAC (P <sub>D</sub> )               | 0.9 mW  | 0.9 mW  | 5.7 mW    |
| Overall Power                       | 44.5 mW | 42.1 mW | 22.0 mW   |

and reconstruction filter mismatch correction unit  $(P_F)$ , complex multiplier  $(P_X)$ , and DAC  $(P_D)$  are estimated as follows:

$$\begin{split} P_L &= 2 \times P_{\text{ref,s}} \cdot \frac{R \cdot N_i}{1 \text{ GHz}} \cdot \frac{W_l}{32 \text{ bit}} \cdot \frac{W_s}{256} \\ P_I &= 2 \times \left( P_{\text{ref,m}} \cdot \frac{R \cdot N_m}{1 \text{ GHz}} + P_{\text{ref,a}} \cdot \frac{R \cdot N_a}{1 \text{ GHz}} \right) \\ P_U &= 2 \times \left( P_{\text{ref,m}} \cdot \frac{R_m \cdot N_m}{1 \text{ GHz}} + P_{\text{ref,a}} \cdot R_a \cdot N_a 1 \text{ GHz} \right) \\ P_C &= P_{\text{ref,a}} \cdot (R_a / 1 \text{ GHz}) \cdot N_a \cdot N_p \\ P_{M,X} &= P_{\text{ref,m}} \cdot (R / 1 \text{ GHz}) \cdot N_m \\ P_F &= 2 \times P_{\text{ref,m}} \cdot (R / 1 \text{ GHz}) \cdot N_m \\ P_D &= 2 \times P_{\text{ref,d}} \cdot (R_d / 600 \text{ MS/s}) \end{split} \tag{9}$$

where  $N_i$  is the number of LUT accesses per each interpolation point,  $W_l$  is the word length of the LUT SRAM,  $W_s$  is the number of LUT entries,  $N_m$  is the number of multipliers,  $N_a$ is the number of adders,  $N_p$  is the number of pipeline stages, R is the operating clock rate,  $R_m$  is the multiplier operating clock rate,  $R_a$  is the adder operating clock rate,  $R_d$  is the DAC operating clock rate, and the factor 2 implies that two units are required. Reference power consumptions are from [18]–[21] and projected to 90-nm CMOS technology using the general  $P = fCV^2$  rule, which represents power consumption P as proportional to the frequency f, capacitance C, and the square of supply voltage V. The reference power consumptions are given as follows:  $P_{\text{ref},s} = 37.70 \text{ mW}$  for SRAMs,  $P_{{\rm ref},m}=10.40\,$  mW for multipliers,  $P_{{\rm ref},a}=0.95\,$  mW for adders, and  $P_{\text{ref},d} = 4.80 \text{ mW}$  for differential DACs. The assumed CORDIC architecture is from [22], which is very efficient. The architecture of the reconstruction filter correction unit is from [23].

The proposed predistortion method is estimated to consume 22.0 mW, providing 50.6% and 47.7% power savings compared to the conventional DPD and DFBPD, respectively. We applied additional upsampling after the LUT to eliminate the need of mismatch correction for high-Q reconstruction filters. Most of the power savings is achieved by removing CORDICs and modulator correction units.

# IV. MEASURED RESULTS

Fig. 8 shows a discrete-component WiMAX-OFDMA prototype transmitter, which implements the proposed digital predistortion technique. Analog Cartesian feedback was implemented as fully differential. With the 1-W Mini-Circuits



Fig. 8. Prototype WiMAX-OFDMA transmitter implementing the proposed digital predistortion technique.

ZHL-0812-HLN PA, the analog Cartesian feedback provided more than 20-dB linearity improvement when the PA is driven close to saturation. The analog Cartesian feedback has 35-dB closed-loop gain and 90-dB open-loop gain, thereby representing 55-dB gain reduction for linearity improvement. An Agilent DSO80000 8-bit oscilloscope and Tektronix AFG3102 14-bit arbitrary waveform generator were used for ADC and DAC, respectively. Upconversion mixing and downconversion mixing were performed by an AD8340 and a LT5517 evaluation board. The digital averaging and spline interpolation were realized by MATLAB software.

In order to get the 10-bit resolution of the LUT entry using an 8-bit oscilloscope,  $16\times$  averaging is applied to get the 10-bit resolution of the LUT entry. We acquire all 16 samples consecutively from one training symbol. The settling error  $\Delta$  in LUT entries is reduced approximately to  $\Delta/16$  after  $16\times$  averaging. The training time is minimized with this low training symbol rate because getting multiple samples p(t) for a training symbol d(t) does not require a settling time between each sample, as seen in Fig. 5.

# A. Wideband Linearization

Fig. 9 shows the measured spectrum of PA output delivering 5-MHz bandwidth WiMAX-OFDMA signals with 22.0-dBm channel power. The maximum linearity improvement is 9.9 dB at 5.15-MHz offset. Adjacent channel power ratio (ACPR) at all corner frequencies is improved, but the noise floor is degraded by 2.3 dB. Measured error vector magnitude (EVM) improvement for the 5-MHz bandwidth WiMAX-OFDMA signals is from 7.81% to 5.88%. The EVM improvement comes from the compensation for the dispersion of frequency-domain WiMAX-OFDMA constellations.

Fig. 10 shows the measured spectrum for 5-MHz bandwidth QAM-16 signals with 26.5-dBm channel power. The maximum linearity improvement is 16.7 dB at -3.30-MHz offset. Due to imperfect demodulator calibration and weak PA memory



Fig. 9. Measured spectrum for 802.16e WiMAX-OFDMA transmission with 22.0-dBm channel power, 5-MHz channel bandwidth, and 8.5-dB PAPR.



Fig. 10. Measured spectrum for QAM-16 transmission with 26.5-dBm channel power, 5-MHz channel bandwidth, and 4.7-dB PAPR.

effects, the linearity improvement at +3.30-MHz offset is 11.3 dB. 11.3417-dB Measured EVM improvement for the 5-MHz bandwidth QAM-16 signals from 4.49% to 1.32%. This EVM improvement comes from the compensation for



Fig. 11. Impact of antenna impedance variation on linearization performance: VSWR = 1.4 (dashed line is before predistortion and solid line is after predistortion.)



Fig. 12. Impact of antenna impedance variation on linearization performance: VSWR = 3.0 (dashed line is before predistortion and solid line is after predistortion.)

the compression and modulator imperfections of time-domain QAM-16 constellations.

# B. Impact of Varying Antenna Impedance

The impact of varying antenna impedance on the PA output spectrum is important to the proposed digital predistortion technique because the LUT is trained with a 50- $\Omega$  dummy load, but predistortion is performed with an antenna load. Furthermore, antenna impedance variation due to human interaction and environmental change affects the linearization performance.

Figs. 11 and 12 show the variation of ACPR caused by the variation of PA load impedance angle with the voltage standing-wave ratio (VSWR) of 1.4 and 3.0. In order to get arbitrary antenna impedance variation, we used a variable stub tuner, but a variable stub tuner is more narrowband than an antenna would be, which forces us to use a more narrowband signal than is traditionally used for WiMAX. Our compromise is that we impose WiMAX modulation on a 200-kHz channel bandwidth signal, and report the results as shown. ACPR measured at  $\pm 200$ -kHz offset was always improved by predistortion. The improvement decreases as VSWR worsens. However, ACPR measured at  $\pm 300$ -kHz offset was improved by predistortion within the load impedance angle range of  $210^{\circ}$  for the VSWR



Fig. 13. Transmitter power consumption for WiMAX-OFDMA transmission with 22.0-dBm channel power, 5-MHz channel bandwidth, and 8.5-dB PAPR.



Fig. 14. Transmitter power consumption for QAM-16 transmission with 26.0-dBm channel power, 5-MHz channel bandwidth, and 4.7-dB PAPR.

of 3.0. In order to guarantee the ACPR improvement, the restriction of antenna impedance angle is necessary. As shown in [24], the handset antenna impedance can be designed to have impedance angle variation less than 180°.

# C. Power Efficiency of PA and Transmitter

The power consumption of the PA, preamplifier, modulator, and demodulator was measured from the prototype. The power overhead for digital predistortion is estimated to be 22.0 mW in 90-nm CMOS technology, as we have described in Section III-B In order to satisfy the WiMAX noise floor requirement, the preamplifier is bypassed during WiMAX-OFDMA. During the training phase, however, the preamplifier is required to provide enough loop gain for feedback linearization.

Figs. 13 and 14 display the transmitter power consumption for 5-MHz bandwidth 22.0-dBm WiMAX-OFDMA and 5-MHz bandwidth 26.0-dBm QAM-16 transmission. The PA power savings of 26.5% and 41.5% are measured during WiMAX-OFDMA and QAM-16 transmission, allowing 2.52-and 3.90-W reduction in transmitter power consumption, respectively. The measured peak-to-average power ratio (PAPR) of the WiMAX-OFDMA and QAM-16 signals is 8.5 and 4.7 dB, respectively.

These power savings are achieved by reducing the PA power backoff. We reduced PA power supply voltage, and then trained the Cartesian LUT on the PA with the reduced power supply voltage. The net overall effect is to achieve the same output power, only on a PA that is being run with much less backoff.

The result is modest performance improvement in EVM and ACPR, but a significant improvement in power dissipation.

#### V. CONCLUSION

We have demonstrated a digital predistortion technique, which provides energy-efficient and low-complexity adaptive linearization for wideband handset PAs. The proposed LUT training method greatly reduces the power consumption of feedback path by allowing the use of low-speed low-SNR ADCs. In addition, the convergence problem and the need for precise time alignment for feedback loop delay are eliminated. The proposed predistortion method greatly simplifies digital signal processing. The use of a compact Cartesian LUT eliminates the need for Cartesian-to-polar conversion and modulator correction. For wideband handsets such as for WiMAX/WCDMA/WLAN, the power savings from this simplified architecture is significant.

#### ACKNOWLEDGMENT

The authors would like to thank M. H. Perrott, SiTime, Sunnyvale, CA, and A. P. Chandrakasan, Massachusetts Institute of Technology (MIT), Cambridge, for helpful discussions. The authors also wish to thank S. Dakshinamurthy and D. K. Shaeffer, both with Beceem Communications Inc., Santa Clara, CA, for valuable comments.

#### REFERENCES

- [1] A. A. M. Saleh and J. Salz, "Adaptive linearization of power amplifiers in digital radio systems," *Bell Syst. Tech. J.*, vol. 62, no. 4, pp. 1019–1033, Apr. 1983.
- [2] Y. Nagata, "Linear amplification technique for digital mobile communications," in *Proc. IEEE Veh. Technol. Conf.*, May 1989, vol. 1, pp. 159–164.
- [3] C. Eun and E. J. Powers, "A new Volterra predistorter based on the indirect learning architecture," *IEEE Trans. Signal Process.*, vol. 45, no. 1, pp. 223–227, Jan. 1997.
- [4] J. K. Cavers, "Amplifier linearization using a digital predistorter with fast adaptation and low memory requirements," *IEEE Trans. Veh. Technol.*, vol. 39, no. 4, pp. 372–382, Nov. 1990.
- [5] Y. Y. Woo, J. Kim, J. Yi, S. Hong, I. Kim, J. Moon, and B. Kim, "Adaptive digital feedback predistortion technique for linearizing power amplifiers," *IEEE Trans. Microw. Theory Tech.*, vol. 55, no. 5, pp. 932–940, May 2007.
- [6] J. Kim, Y. Y. Woo, J. Moon, and B. Kim, "A new wideband adaptive digital predistortion technique employing feedback linearization," *IEEE Trans. Microw. Theory Tech.*, vol. 56, no. 2, pp. 385–392, Feb. 2008
- [7] J. L. Dawson, "Feedback linearization of RF power amplifiers," Ph.D. dissertation, Dept. Elect.Eng., Stanford Univ., Stanford, CA, 2003.
- [8] S. Chung, J. W. Holloway, and J. L. Dawson, "Open-loop digital predistortion using Cartesian feedback for adaptive RF power amplifier linearization," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2007, pp. 1449–1452.
- [9] H. Ku, M. D. McKinley, and J. S. Kenney, "Quantifying memory effects in RF power amplifiers," *IEEE Trans. Microw. Theory Tech.*, vol. 50, no. 12, pp. 2843–2849, Dec. 2002.
- [10] J. L. Dawson and T. H. Lee, "Automatic phase alignment for a fully integrated Cartesian feedback power amplifier system," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2269–2279, Dec. 2003.
- [11] L. Sundström, M. Faulkner, and M. Johansson, "Quantization analysis and design of a digital predistortion linearizer for RF power amplifiers," *IEEE Trans. Veh. Technol*, vol. 45, no. 4, pp. 707–719, Nov. 1996.
- [12] D. Huber, R. Chandler, and A. Abidi, "A 10b 160 MS/S 84 mW IV subranging ADC in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, Feb. 2007, pp. 454–455.

- [13] T. Liu, S. Boumaiza, and F. M. Ghannouchi, "Augmented Hammerstein predistorter for linearization of broad-band wireless transmitters," *IEEE Trans. Microw. Theory Tech.*, vol. 54, no. 24, pp. 1340–1349, Jun. 2006
- [14] D. Rönnow and M. Isaksson, "Digital predistortion of radio frequency power amplifier using Kautz-Volterra model," *Electron. Lett.*, pp. 780–782, Jun. 2006.
- [15] Z. He, J. Ge, S. Geng, and G. Wang, "An improved lookup table predistortion technique for HPA with memory effects in OFDM systems," *IEEE Trans. Broadcast.*, vol. 52, no. 1, pp. 87–91, Mar. 2006.
- [16] P. L. Gilabert, A. Cesari, G. Montoro, E. Bertran, and J. M. Dilhac, "Multilookup table FPGA implementation of an adaptive digital predistorter for linearizing RF power amplifiers with memory effects," *IEEE Trans. Microw. Theory Tech.*, vol. 56, no. 2, pp. 372–384, Feb. 2008
- [17] M. Faulkner and M. Johansson, "Adaptive linearization using predistortion—Experimental results," *IEEE Trans. Veh. Technol.*, vol. 43, no. 2, pp. 323–332, May 1994.
- [18] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, R. Uchida, Y. Nakazawa, and T. Saito, "Per-bit sense amplifier scheme for 1 GHz SRAM macro in sub-100 nm CMOS technology," in *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, Feb. 2004, pp. 502–503.
- [19] B. R. Zeydel, V. G. Oklobdzija, S. Mathew, R. K. Krishnamurthy, and S. Borkar, "A 90 nm 1 GHz 22 mW 16 × 16-bit 2's complement multiplier for wireless baseband," in *IEEE VLSI Circuits Symp. Tech. Dig.*, Jun. 2003, pp. 235–236.
- [20] S. Kao, R. Zlatanovici, and B. Nikolic, "A 240 ps 64b carry-lookahead adder in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Tech. Dig.*, Feb. 2006, pp. 1745–1746.
- [21] N. Ghittori, A. Vigna, P. Malcovati, S. D'Amico, and A. Baschirotto, "A 1.2-V, 600-MS/s, 2.4-mW DAC for WLAN 802.11 and 802.16 wireless transmitters," in *Proc. Eur. Solid-State Circuits Conf.*, Sep. 2006, pp. 404–407.
- [22] C.-S. Wu, A.-Y. Wu, and C.-H. Lin, "A high-performance/low-latency vector rotational CORDIC architecture based on extended elementary angle set and Trellis-based searching schemes," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 9, pp. 589–601, Sep. 2003.
- [23] J. Tuthill and A. Cantoni, "Efficient compensation for frequency-dependent errors in analog reconstruction filters used in IQ modulators," *IEEE Trans. Commun.*, vol. 53, no. 3, pp. 489–496, Mar. 2005.
- [24] K. R. Boyle, Y. Yuan, and L. P. Ligthart, "Analysis of mobile phone antenna impedance variations with user proximity," *IEEE Trans. Antennas Propag.*, vol. 55, no. 2, pp. 364–372, Feb. 2007.



SungWon Chung (S'99) received the B.S. degree from Pusan National University, Pusan, Korea, in 2002, the M.S. degree from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2005, and is currently working toward the Ph.D. degree at the Massachusetts Institute of Technology (MIT), Cambridge.

From 1995 to 2000, he was with KITEL, Seoul, Korea, where he led the development of a fault-tolerant UNIX operating system. During Summer 2008, he was a Design Engineering Intern with Intersil,

Milpitas, CA, where he was involved in high-speed communication transceiver chipset design.

Mr. Chung was the recipient of a USENIX Association student research grant, the Samsung Electronics Humantech Thesis Prize Award, and the IEEE Larson Outstanding Student Paper Award.



Jack W. Holloway received the S.B. degree in applied mathematics and S.B. degree in electrical engineering and M.Eng. degree in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, in 2003 and 2004, respectively, and is currently working toward the Ph.D. degree in integrated RF circuit design at MIT.

He is currently with the Microsystems Technology Laboratories, MIT. He is also a Second Lieutenant in the United States Marine Corps, currently under-

going training as a naval aviator in Corpus Christi, TX.



Joel L. Dawson (S'96–M'04) received the S.B. degree in electrical engineering and M.Eng. degree in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, in 1996 and 1997, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA. His doctoral research concerned PA linearization techniques.

He is currently an Assistant Professor with the Department of Electrical Engineering and Computer Science, MIT. Prior to joining the MIT faculty in

2004, he spent one year with Aspendos Communications, a startup company that he cofounded. He continues to be active in the industry as both a technical and legal consultant.

Prof. Dawson was the recipient of the 2008 National Science Foundation (NSF) CAREER Award.