### ELE617E Lectures

Prof. Dr. Müştak E. Yalçın

Istanbul Technical University

mustak.yalcin@itu.edu.tr

- Two main advantages of using pipelining and parallel processing: -Higher speed and Lower power consumption
- When sample speed does not need to be increased, these techniques can be used for lowering the power consumption

# Pipelining for Low Power

### 2. CMOS inverter: Propagation delay

Inverter propagation delay: time delay between input and output signals; figure of merit of logic speed.

Typical propagation delays: < 100 ps.

Complex logic system has 10-50 propagation delays per clock cycle.

#### Estimation of t<sub>p</sub>: use square-wave at input



Average propagation delay:

$$t_{p}=\frac{1}{2}\big(t_{PHL}+t_{PLH}\big)$$
 http://web.mit.edu/6.012/www/SP07-L13.pdf

CMOS inverter: Propagation delay high-to-low



During early phases of discharge, NMOS is saturated and PMOS is cut-off.

Time to discharge *half* of charge stored in C<sub>1</sub>:.

 $t_{pHL} \approx \frac{\frac{1}{2} \text{ charge on } C_L @ t = 0^-}{\text{NMOS discharge current}}$ 

# Pipelining and Parallel Processing for Low Power

### **CMOS inverter:** Propagation delay high-to-low (contd.)

Charge in C<sub>L</sub> at t=0:

$$\mathbf{Q}_{\mathbf{L}}\left(\mathbf{t}=\mathbf{0}^{-}\right)=\mathbf{C}_{\mathbf{L}}\mathbf{V}_{\mathbf{D}\mathbf{D}}$$

Discharge Current (NMOS in saturation):

 $\mathbf{I}_{\mathbf{Dn}} = \frac{\mathbf{W}_{\mathbf{n}}}{2\mathbf{L}_{\mathbf{n}}} \mu_{\mathbf{n}} \mathbf{C}_{\mathbf{ox}} (\mathbf{V}_{\mathbf{DD}} - \mathbf{V}_{\mathbf{Tn}})^2$ 

Then:

$$t_{\text{PHL}} \approx \frac{C_{\text{L}} V_{\text{DD}}}{\frac{W_{\text{n}}}{L_{\text{n}}} \mu_{\text{n}} C_{\text{ox}} (V_{\text{DD}} - V_{\text{Tn}})^2}$$

Graphical Interpretation





During early phases of discharge, PMOS is saturated and NMOS is cut-off.

Time to charge to *half* of final charge on C<sub>L</sub>:.

$$t_{PLH} \approx \frac{\frac{1}{2} \text{charge on } C_{L} @t = \infty}{\text{PMOS charge current}}$$

Prof. Dr. Müştak E. Yalçın (İTÜ)

ELE6xxE (V: 0.1)

# Pipelining and Parallel Processing for Low Power

### CMOS inverter: Propagation delay high-to-low (contd.)

Charge in  $C_L$  at t= $\infty$ :

$$\mathbf{Q}_{\mathrm{L}}(\mathbf{t}=\infty)=\mathbf{C}_{\mathrm{L}}\mathbf{V}_{\mathrm{DI}}$$

Charge Current (PMOS in saturation):

$$-\mathbf{I}_{\mathbf{D}\mathbf{p}} = \frac{\mathbf{W}_{\mathbf{p}}}{2\mathbf{L}_{\mathbf{p}}} \mu_{\mathbf{p}} \mathbf{C}_{\mathbf{ox}} \left( \mathbf{V}_{\mathbf{D}\mathbf{D}} + \mathbf{V}_{\mathbf{T}\mathbf{p}} \right)$$

Then:

$$\mathbf{t}_{PLH} \approx \frac{\mathbf{C}_{L} \mathbf{V}_{DD}}{\frac{\mathbf{W}_{p}}{\mathbf{L}_{p}} \, \boldsymbol{\mu}_{p} \mathbf{C}_{ox} \left( \mathbf{V}_{DD} + \mathbf{V}_{Tp} \right)^{2}}$$

#### Key dependencies of propagation delay:

- $V_{DD} \uparrow \Rightarrow t_p \downarrow$ - Reason:  $V_{DD} \uparrow \Rightarrow Q(C_L) \uparrow$ , but  $I_D$  goes as square↑ - Trade-off:  $V_{DD} \uparrow \Rightarrow$  more power consumed.
- $L \downarrow \Rightarrow t_p \downarrow$ 
  - $\text{ Reason: } L \downarrow \Rightarrow I_D \uparrow$
  - Trade-off: manufacturing cost!

### Power Dissipation

· Energy from power supply needed to charge up the capacitor:

$$E_{ch \arg e} = \int V_{DD} i(t) dt = V_{DD} Q = V_{DD}^2 C_L$$

· Energy stored in capacitor:

$$E_{store} = 1/2C_L V_{DD}^2$$

· Energy lost in p-channel MOSFET during charging:

$$E_{diss} = E_{charge} - E_{store} = 1/2 C_L V_{DD}^2$$

•During discharge the n-channel MOSFET dissipates an identical amount of energy.

•If the charge/discharge cycle is repeated f times/second, where f is the clock frequency, the dynamic power dissipation is:

$$P = 2E_{diss} * f = C_L V_{DD}^2 f$$

In practice many gates do not change state every clock cycle which lowers the power dissipation.

#### Prof. Dr. Müştak E. Yalçın (İTÜ)

#### ELE6xxE (V: 0.1)

#### Jan, 2020 5 / 14

## Notation

• The propagation delay and Power of the original filter are

$$t = \frac{C_L V_o}{k(V_o - V_t)^2}$$

and

$$P = C_{\text{total}} V_o^2 f = C_L V_o^2 \frac{1}{T_{clk}}$$

PS: take  $t_{phl}$  (small one).

 $C_L$ : the cap. to be charged and discharged in a single clock cycle.  $C_{\rm total}$ : the total cap. of the circuit.  $V_o$ : supply voltage f: clock frequency.  $f = \frac{1}{T_{\rm clk}}$  and  $T_{\rm clk}$ : clock period.

• We will consider *M*-level pipeline and *L*-parallel. Their propagation delay and Power are:

$$t_{pip}$$
,  $P_{pip}$  and  $t_{par}$ ,  $P_{par}$ , respectively.

- Consider an *M*-level pipeline system, where the critical path is reduced to  $\frac{1}{M}$ , then  $C_L$  is reduced to  $\frac{C_L}{M}$  for a single clock cycle.
- In the same time that C<sub>L</sub> was charge/discharge, now only a fraction of it should be charge/discharge
- $\bullet\,$  Then, the supply voltage can be reduced by  $\beta,$  where 0  $<\beta<1$
- The power consumption of the pipeline filter will be

$$P_{pip} = C_{total} \beta^2 V_o^2 f = \beta^2 P$$

• How can the value of  $\beta$  be determined ?

- Consider an *M*-level pipeline system, where the critical path is reduced to  $\frac{1}{M}$ , then  $C_L$  is reduced to  $\frac{C_L}{M}$  for a single clock cycle.
- In the same time that  $C_L$  was charge/discharge, now only a fraction of it should be charge/discharge
- $\bullet\,$  Then, the supply voltage can be reduced by  $\beta,$  where 0  $<\beta<1$
- The power consumption of the pipeline filter will be

$$P_{pip} = C_{total} \beta^2 V_o^2 f = \beta^2 P$$

• How can the value of  $\beta$  be determined ? by examining the propagation delay.

# Pipelining for Low Power

• The propagation delay of the original filter is

$$t = \frac{C_L V_o}{k(V_o - V_t)^2}$$

• While the propagation delay of the pipeline filter is

$$t_{pip} = \frac{\frac{C_L}{M}\beta V_o}{k(\beta V_o - V_t)^2}$$

• The same clock speed is maintained for both filters, therefore the following equation is maintained

$$M(\beta V_o - V_t)^2 = \beta (V_o - V_t)^2$$

• Then  $\beta$  is obtained, the reduction of power consumption can be computed using

$$P_{pip} = C_{\text{total}} \beta^2 V_o^2 f = \beta^2 P$$

Study : Example 3.4.1

Prof. Dr. Müştak E. Yalçın (İTÜ)

# Pipelining for Low Power



| trag = TU+TA = 10 44                    | (+) → C <sub>A</sub>                | ×–            | → 5cA                              |
|-----------------------------------------|-------------------------------------|---------------|------------------------------------|
| $t_{pip} = T_{A'} + T_{A} = 4u \cdot t$ | $\oplus \rightarrow c_{\mathbf{A}}$ | ⊙_):<br>(~) → | SCA<br>2CA                         |
| $C_{L=} C_{M} + CA = GC_{A}$            |                                     | <u> </u>      |                                    |
| CL = CAI = CAL+CA = 3CA                 |                                     | Note u        |                                    |
| M. (роб-се) = В (со-се)                 | 2_<br>)                             |               | ८, =۶∨<br><i>∪</i> , <u>-</u> 0,0∕ |
|                                         | F= D.6<br>F= 202                    |               |                                    |
| Uo= FUo = 201V                          | β= 36                               | .4 %          |                                    |

Jan, 2020 9 / 14

## Parallel Processing for Low Power

- In an L-parallel system, the charging capacitance does not change, but the total capacitance is increased by L times.
- In order to maintain the same data rate, the clock period must be increased to *LT*
- Then, there is more time to charge the same capacitance.
- Therefore, the supply voltage can be reduced to  $\beta V_o$
- The propagation delay of the original filter is

$$t = \frac{C_L V_o}{k(V_o - V_t)^2}$$

• The propagation delay of the parallel filter is

$$t_{par} = \frac{C_L \beta V_o}{k(\beta V_o - V_t)^2}$$

(one of L)

Prof. Dr. Müştak E. Yalçın (İTÜ)

• The same clock speed  $(t_{par} = Lt)$  is maintained for both filters, therefore the following equation is maintained

$$L(\beta V_o - V_t)^2 = \beta (V_o - V_t)^2$$

• Then  $\beta$  is obtained, the reduction of power consumption can be computed using

$$P_{par} = LC_L \beta^2 V_o^2 \frac{f}{L} = \beta^2 C_L V_o^2 f = \beta^2 P$$

 $\mathsf{PS:}\ C_{\mathrm{total}} = LC_L$ 

Please read textbook for Example 3.4.2

## Parallel Processing for Low Power



 $\begin{array}{c} y^{2}(2k) = h_{1} \times (2k) + h_{12}(2k-2) + h_{2} \times (2k-3) = h_{0} \times (2k) + h_{1} \times (2(k-1)) + 1_{2} + h_{2} \times (2(k-1)) + h_{2} \times (2(k-1)-1) \\ y^{2}(2k+1) = h_{0} \times (2k+1) + h_{1} \times (2k) + h_{2} \times (2k-1) + h_{2} \times (2k-1) = \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1}{2k} + \frac{1$ 





## Parallel Processing for Low Power



- Pipeling reduces the capacitance to be charged/discharge in 1 clock period.
- Parallel processing increases the clock period for charging/discharging the original capacitance.
- The propagation delay of the original filter is

$$t = \frac{C_L V_o}{k(V_o - V_t)^2}$$

• the propagation delay of the parallel-pipelined filter is

$$t_{pip} = \frac{\frac{C_L}{M}\beta V_o}{k(\beta V_o - V_t)^2}$$

• The same clock speed  $(t_{par} = Lt)$  is maintained for both filters,

$$t_{par} = Lt = \frac{\frac{C_L}{M}\beta V_o}{k(\beta V_o - V_t)^2}$$

• therefore the following equation is maintained

$$ML(\beta V_o - V_t)^2 = \beta (V_o - V_t)^2$$