# On Optimal Ordering of Signals in Parallel Wire Bundles

Shmuel Wimer <sup>1,2</sup>, Konstantin Moiseev<sup>2</sup> and Avinoam Kolodny<sup>2</sup>

1. Intel Corporation, Israel Development Center, Haifa, Israel

2. Electrical Engineering Department, Technion, Haifa, Israel

Abstract — Optimal ordering and sizing of wires in a constrained-width interconnect bundle are studied in this paper. It is shown that among all possible orderings of signal wires, a monotonic order of the signals according to their effective driver resistance yields the smallest weighted average delay. Minimizing weighted average delay is a good approximation for MinMax delay optimization. Three variants of monotonic ordering are proven to be optimal, depending on the MCF ratio between the signals at the sides of the bundle and that of the internal wires. The monotonic order property holds for a very broad range of VLSI circuit settings arising in common design practice. A simple, yet nearoptimal, setting of wire widths within the bundle to yield the best average weighted delay is proposed. The theoretical results have been validated by numerical experiments on 65 nanometer process technology and industrial design data. In all cases the ordering optimization yielded improvement in the range of 10% in wire delay, translated to about 5% improvement in the clock cycle of a high-performance microprocessor implemented in that technology.

Index Terms— routing, wire ordering, wire spacing

## I. INTRODUCTION

**C**ROSS-capacitances between wires in interconnect structures have a major effect on circuit timing. The importance of this effect grows with technology scaling [1], [2]. In this paper, delays in a bundle of parallel wires with different drivers and loads are minimized by choosing an optimal ordering of the nets. A model for the bundle of wires is shown in Fig. 1(a). It represents a common CMOS layout configuration, where interconnect wires run in parallel between two power supply or shielding rails, such that the total width of the structure *A* is a fixed constraint. An abstraction of actual layout is made by assuming that all drivers and all receivers are located at the ends of the structure of length *L*. Real layouts can be decomposed into several such structures using effective drivers and receivers, since long segments of parallel wires are very common in industrial practice, mostly when high metal layers are concerned. The wire delays in the model of Fig. 1(a) are typically dominated by cross-capacitances between adjacent wires, since the ratio between wire thickness and wire width tends to grow with non uniform technology scaling [28]. Therefore, delays can be optimized by allocation of inter-wire spaces. In addition, wire widths can be set to optimize wire resistances. Furthermore, reordering of the wires can improve the timing, because critical wires can be put next to each other and share the largest spaces, which have the smallest cross-capacitances.

Reordering of the bundle wires is a new degree of freedom in timing optimization, which has not been explored in the past. The main result of this paper is that the signal ordering is highly beneficial and can

typically be solved independently of the wire sizing. Moreover, the optimal order can be derived directly from the parameter setting of the given problem, by positioning the wires according to the effective resistances of their drivers.

Wire order within the bundle yielding minimal delays must be monotonic in the strength of the driver. The type of the monotonic order depends on Miller coupling factors (MCF) occurring at the side signals of the bundle. Only three types of monotonic order can yield the minimal delay, regardless of the specific driver strengths. These are illustrated in Figs. 1(b, c, d). Fig. 1(b) illustrates the case of uniform MCF (e.g. when the sidewalls of the bundle are not power supplies but rather arbitrary logical signals); the corresponding optimal order is called "symmetric hill", where the signals with the weakest drivers are located at the center of the bundle, and their corresponding spaces are the largest. Fig. 1(c) illustrates the case where MCF at the sidewalls is half of the MCF between internal wires in the bundle (e.g. when sidewalls are connected to



Fig. 1. (a) Signal drivers (modeled as voltage sources with series resistances), interconnect bundle wires of length L, and receivers (modeled as load capacitances). Timing optimization is performed by reordering the signal wires and by allocating wire widths and spaces, for a given constrained channel width A. (b, c, d) present the optimal order of signals and the corresponding wire-to-wire space allocation for various ratios of MCF between extreme and internal signals.

3

power supplies such that their MCF is 1, while internal MCF of 2 is assumed). The corresponding optimal order is ascending, where the weakest driver resides on one side of the bundle and the strongest driver resides at the other side. Fig. 1(d) corresponds to a case where the MCF at the sidewall is assumed to be zero (e.g. when active shielding [29] is employed at the sidewalls), corresponding to a "symmetric valley".

Net-ordering for delay optimization has not been addressed in previous works. For a fixed order of wires, the problem of allocating widths and spaces to maximize performance in tuning of bus structures was proposed in [3] and solved in [25]. The wire sizing problem has been addressed in [4] and [5] for a single net. Sizing and spacing multiple nets with consideration of coupling capacitance has been addressed in [6] for general interconnect layouts by converting cross capacitance to effective fringe capacitance. Coupling capacitance has been addressed explicitly in the context of physical design for minimizing crosstalk noise, [7],[8] or dynamic power [9]. Some authors treated wire sizing for throughput optimization in buses using uniform wire widths and spaces [21], [26], [27]. Several variants of net-reordering have been applied for improving layout efficiency [13], and for noise reduction [8], [14] [15][16] [17]. Swapping of wires for power reduction was applied in [18]. Vittal et al. [14] have suggested without proof to reduce crosstalk noise by sorting wires in order of driver strength, which is closely related to our result in delay minimization.

#### II. PROBLEM FORMULATION

Consider a bundle of *n* signal nets  $\sigma_0, ..., \sigma_{n-1}$  between two side-walls (wires at fixed locations) as shown in Fig. 1.  $S_i$  and  $S_{i+1}$ , respectively, denote spaces to neighbors of wire  $W_i$ . The length of each wire is *L*. Each wire is driven by a driver with output resistance  $R_i$  and loaded by receiver with capacitance  $C_i$ . The sum of wire widths and spaces between the side walls is given in the following constraint (2.1), which represents the total width *A* of the available area for laying out the bundle of wires.

$$g\left(\overline{W},\overline{S}\right) = \sum_{j=0}^{n-1} W_i + \sum_{j=0}^n S_i = A$$
(2.1)

Signal delays are expressed by an Elmore model using simple approximations for wire capacitances and wire resistance. The delay of signal  $\sigma_i$  is given in [25] by

$$\Delta_i\left(\overline{W},\overline{S}\right) = a + R_i\left(kW_i + g + C_i\right) + \frac{b + eC_i}{W_i} + \left(hR_i + \frac{d}{W_i}\right)\left(\frac{1}{S_i} + \frac{1}{S_{i+1}}\right)$$
(2.2)

The coefficients a, b, d, e, k, g, h are technology dependent parameters. This model includes effects of wire resistance (inversely proportional to wire width  $W_i$ ) and effects of wire capacitance terms (area capacitance is proportional to  $W_i$ , cross-capacitances to neighboring wires are inversely proportional to spaces  $S_i$  and  $S_{i+1}$ ). Note that only nearest-neighbor wires are included, because the adjacent upper and lower metal layers are assumed to be dense, and serve as effective shields for capacitive coupling to other wires in the bundle. Although the Elmore model is a first-order approximation and it does not account for input waveform slope [31], it is widely applicable in interconnect optimization due to its high-fidelity property [<K.D. Boese, A.B. Kahng, B.A. McCoy and Robins TCAD 14 1995>]. The absolute accuracy of the model can also be improved, by using parameter fitting as described in [30]. The model is used in this work because of its simplicity in mathematical analysis, while the delay improvements are verified by SPICE simulations. Miller coupling factor (MCF) can be included in the last term to account for crosstalk effect on

delay uncertainty [10]. Typically, MCF=2 is assumed when neighbor wires switch in the opposite direction causing increased delays, while MCF=0 is assumed for same-direction switching and reduced delay. We assume first that MCF=1 for all of the signals, yielding nominal delay values.

Let  $\pi \in \Pi$  denote an ordering (permutation) of the signals in the interconnect bundle, taken from the set of all *n*! possible orders. We are seeking an ordering  $\pi^*$  of the bundle signals, which after wire width and space allocation yields the minimum objective function representing some delay characteristic. In practice, the useful delay objective function is the maximal worst slack among all signals and the optimization attempts to minimize this maximum:

$$f_1(\pi, \overline{W}, \overline{S}) = \max_{1 \le i \le n} \left\{ \Delta_i(\overline{W}, \overline{S}) - T_i \right\},$$
(2.3)

where  $T_i$  is required arrival time of the *i*-th signal. Note that we exchanged the terms of the slack for the sake of mathematical convenience. This design scenario calls for *MinMax* optimization problems. Such problems are hard to solve analytically since they are not differentiable. Maximization of average wire slack, which is equivalent to minimization of average delay or minimization of the total sum of delays, is given in (2.4):

$$f_{2}\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \Delta_{i}\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \left\{ a + R_{i}\left(kW_{i} + g + C_{i}\right) + \frac{b + eC_{i}}{W_{i}} + \left(hR_{i} + \frac{d}{W_{i}}\right) \left(\frac{1}{S_{i}} + \frac{1}{S_{i+1}}\right) \right\}$$
(2.4)

This objective function is mathematically convenient because it is differentiable, and it is also a useful performance metric in industrial practice [25].

Minimization of maximal slack (2.3) can be approached by introducing weights in (2.4). We define the following objective:

$$f_{3}\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \alpha_{i} \Delta_{i}\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \alpha_{i} \left\{ a + R_{i}\left(kW_{i} + g + C_{i}\right) + \frac{b + eC_{i}}{W_{i}} + \left(hR_{i} + \frac{d}{W_{i}}\right) \left(\frac{1}{S_{i}} + \frac{1}{S_{i+1}}\right) \right\},\tag{2.5}$$

where  $\alpha_i$  is a normalized estimation of signal criticality. The least critical signal (with maximal  $T_i$ ) will have  $\alpha = 1$ , all the others will have values larger than 1. In other words, (2.5) represents the *total sum of* wire delays weighted by signal criticality. This objective incorporates good properties of (2.3) and (2.4): on the one hand, (2.5) is a differentiable function; on the other hand, weighting delays by signal criticality makes it similar to (2.3), which is more useful in practice than (2.4).

Assume for the moment that the order  $\pi$  of the signals in the bundle is given. For minimizing (2.5) subject to (2.1) we differentiate  $f_3$  and g by all of their sizing variables:

$$\begin{cases} \frac{\partial f_{3}}{\partial W_{i}} = \alpha_{i} \left( kR_{i} - \frac{eC_{i} + b}{W_{i}^{2}} - \frac{d}{W_{i}^{2}} \left( \frac{1}{S_{i}} + \frac{1}{S_{i+1}} \right) \right), 0 \le i \le n-1; \\ \frac{\partial f_{3}}{\partial S_{i}} = -\frac{1}{S_{i}^{2}} \left[ h(\alpha_{i-1}R_{i-1} + \alpha_{i}R_{i}) + d\left( \frac{\alpha_{i-1}}{W_{i-1}} + \frac{\alpha_{i}}{W_{i}} \right) \right], 0 < i \le n-1; \\ \frac{\partial f_{3}}{\partial S_{0}} = -\frac{\alpha_{0}}{S_{0}^{2}} \left[ hR_{0} + \frac{d}{W_{0}} \right]; \\ \frac{\partial f_{3}}{\partial S_{n}} = -\frac{\alpha_{n-1}}{S_{n}^{2}} \left[ hR_{n-1} + \frac{d}{W_{n-1}} \right]; \\ \frac{\partial g}{\partial W_{i}} = 1, 0 \le i \le n-1; \\ \frac{\partial g}{\partial S_{i}} = 1, 0 \le i \le n \end{cases}$$

$$(2.6)$$

At the minimum there exists some real number  $\lambda$  (Lagrange multiplier), satisfying  $\nabla f = \lambda \nabla g$ . Rearranging and substituting yields the following:

$$\begin{split} \lambda &= \alpha_{i} \left( kR_{i} - \frac{eC_{i} + b}{W_{i}^{2}} - \frac{d}{W_{i}^{2}} \left( \frac{1}{S_{i}} + \frac{1}{S_{i+1}} \right) \right), 0 \leq i \leq n-1; \\ \lambda &= -\frac{1}{S_{i}^{2}} \left[ h(\alpha_{i-1}R_{i-1} + \alpha_{i}R_{i}) + d\left( \frac{\alpha_{i-1}}{W_{i-1}} + \frac{\alpha_{i}}{W_{i}} \right) \right], 0 < i \leq n-1; \\ \lambda &= -\frac{\alpha_{0}}{S_{0}^{2}} \left[ hR_{0} + \frac{d}{W_{0}} \right]; \\ \lambda &= -\frac{\alpha_{n}}{S_{n}^{2}} \left[ hR_{n-1} + \frac{d}{W_{n-1}} \right]; \end{split}$$

The above equations and the area constraint (2.1) impose 2n+2 algebraic equations in 2n+2 unknown variables  $\lambda, W_0 \cdots W_{n-1}, S_0 \cdots S_n$ . Solving and substituting into (2.5) produces minimal weighted total sum of signal delays for the assumed order  $\pi$ .

The order of wires affects the sum of delays primarily because every driver pair of adjacent signal is associated with a shared cross-capacitance between the wires. It makes sense to allocate large spaces to a wire driven by a weak driver or a wire with high criticality, in order to reduce the driver's load. Strong drivers and non-critical nets can cope with large cross-capacitances resulting in narrow spaces. Consequently, in order to best utilize the total area given for the wire bundle, weak drivers or highly critical nets should share the same large space. Similarly, strong drivers or non-critical nets can share a small interwire space. The space sharing idea is illustrated in Fig. 2. There, the bundle is comprised of some signals with weak drivers (W) and some with strong drivers (S). For equal criticality, the ordering in Fig. 2(b) is superior to Fig. 2(a), which is apparently the worst. Wire sizing and spacing optimization aiming at minimizing the total sum of delays will yield smaller (better) delays for configuration 2(b), in comparison with 2(a).

Consider now  $\pi \in \Pi$  as variable, and find the order for which optimal wire sizing and spacing yields minimum total weighted sum of delays, as discussed in the previous section. One needs therefore to solve the following problem:

$$\begin{cases} \min_{\pi \in \Pi} f_3(\pi, W, S); \\ \sum_{j=0}^{n-1} W_i + \sum_{j=0}^n S_i = A \end{cases}$$

In this formulation, both signal ordering and wire sizing are considered simultaneously.

#### III. OPTIMALITY OF SYMMETRIC HILL ORDER

## 3.1 Wires of uniform width

For the sake of clarity, assume first that all the wires have the same width *W* while spaces can vary among wires. Hence, wire sizing means finding the optimal *W* and allocating optimal spaces between wires. For any order  $\pi \in \Pi$ , minimizing the total sum of delays involves only n + 2 variables  $(W, S_0, \dots, S_n)$ . The following conditions are necessary for optimum:

$$\frac{\partial f}{\partial S_i} + \lambda \frac{\partial g}{\partial S_i} = 0, \ 0 \le i \le n$$
(3.1)

$$\frac{\partial f}{\partial S_i} = -\frac{d\left(\alpha_{i-1} + \alpha_i\right)}{WS_i^2} - \frac{h(\alpha_{i-1}R_{i-1} + \alpha_iR_i)}{S_i^2} = 0, \ 0 < i < n$$
(3.2)

$$\begin{cases} \frac{\partial f}{\partial S_0} = -\frac{d\alpha_0}{WS_0^2} - \frac{h\alpha_0 R_0}{S_0^2};\\ \frac{\partial f}{\partial S_n} = -\frac{d\alpha_{n-1}}{WS_n^2} - \frac{h\alpha_{n-1} R_{n-1}}{S_n^2};\\ \frac{\partial g}{\partial S_i} = 1, \ 0 \le i \le n \end{cases}$$
(3.3)

Substitution of (3.3) and (3.2) into (3.1) yields



Fig. 2 Space sharing in two interconnects channel configurations. a) Interleaved placement of strong and weak drivers, b) Sorted placement of signals according to driver strength.

$$\begin{cases} \lambda = \frac{1}{S_i^2} \left( \frac{d(\alpha_{i-1} + \alpha_i)}{W} + h(\alpha_{i-1}R_{i-1} + \alpha_iR_i) \right), & 0 < i < n; \\ \lambda = \frac{1}{S_0^2} \left( \frac{d\alpha_0}{W} + h\alpha_0R_0 \right); \\ \lambda = \frac{1}{S_n^2} \left( \frac{d\alpha_{n-1}}{W} + h\alpha_{n-1}R_{n-1} \right) \end{cases}$$
(3.4)

From (3.4) we obtain the following expressions for spaces at the optimum:

$$\begin{cases} S_{i} = \sqrt{\frac{1}{\lambda} \left( \alpha_{i} \left( \frac{d}{W} + hR_{i} \right) + \alpha_{i-1} \left( \frac{d}{W} + hR_{i-1} \right) \right)}, & 0 < i < n; \\ S_{0} = \sqrt{\frac{1}{\lambda} \alpha_{0} \left( \frac{d}{W} + hR_{0} \right)}; \\ S_{n} = \sqrt{\frac{1}{\lambda} \alpha_{n-1} \left( \frac{d}{W} + hR_{n-1} \right)} \end{cases}$$

$$(3.5)$$

These optimal spaces depend on resistances of the signal drivers, but are independent of capacitive loads. Substitution of (3.5) into (2.1) yields the following expression for  $\lambda$ 

$$\lambda = \frac{1}{A - nW} \left( \sum_{i=1}^{n-1} \sqrt{\alpha_i \left(\frac{d}{W} + hR_i\right) + \alpha_{i-1} \left(\frac{d}{W} + hR_{i-1}\right)} + \sqrt{\alpha_0 \left(\frac{d}{W} + hR_0\right)} + \sqrt{\alpha_0 \left(\frac{d}{W} + hR_0\right)} + \sqrt{\alpha_0 \left(\frac{d}{W} + hR_0\right)} \right)$$
(3.6)

Further substitution into (2.5) produces the following expression for the minimal weighted total sum of delays.

$$f_3 = f^I + f^{II} \tag{3.7}$$

where

$$f^{I} = n \left( a + \frac{b}{W} \right) \sum_{i=0}^{n-1} \alpha_{i} + \left( kW + g \right) \sum_{i=0}^{n-1} R_{i} \alpha_{i} + \frac{e}{W} \sum_{i=0}^{n-1} C_{i} \alpha_{i} + \sum_{i=0}^{n-1} R_{i} C_{i} \alpha_{i}$$

and

$$f^{II} = \frac{1}{A - nW} \left[ \sum_{i=1}^{n-2} \sqrt{\alpha_i \left(\frac{d}{W} + hR_i\right) + \alpha_{i-1} \left(\frac{d}{W} + hR_{i-1}\right)} + \sqrt{\alpha_0 \left(\frac{d}{W} + hR_0\right)} + \sqrt{\alpha_{n-1} \left(\frac{d}{W} + hR_{n-1}\right)} \right]^2 \right]^2$$

The first term  $f^{I}$  is invariant for different orders of signals. In the second term  $f^{II}$ , the indices of adjacent signals interact with each other in square root terms, thus making  $f^{II}$  dependent on the order of signals in the bundle. The physical reason for this is that cross capacitance between adjacent wires is determined by the space they share with each other. The question of what is the order  $\pi \in \Pi$  that minimizes  $f^{II}$ , is therefore important. As proven below, symmetric hill ordering, which captures the above reasoning, yields the minimum of average weighted wire delay.

Definition 3.1: effective signal resistance. Let *R* be the resistance of the driving gate, *W* be the signal wire width and  $\alpha$  be the signal criticality. The term  $\Re = \alpha (d/W + hR)$  is called effective signal resistance.

*Definition 3.2:* successive roots sum. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be sequences of positive real numbers. The term

 $\sqrt{\Re_0} + \sum_{i=1}^{n-1} \sqrt{\Re_{i-1} + \Re_i} + \sqrt{\Re_{n-1}}$  is called successive roots sum (SRS).

*Definition 3.3:* symmetric hill ordering. Let  $\Re_0 \leq \Re_1 \leq \cdots \leq \Re_{n-2} \leq \Re_{n-1}$  be a sequence of *n* positive real numbers increasingly ordered. Let us split it into even and odd interleaved subsequences  $\Re_0 \leq \Re_2 \cdots$  and  $\Re_1 \leq \Re_3 \leq \cdots$ . Reverse the order of numbers in the odd subsequence, thus turning it into monotonic decreasing sequence. Finally, concatenate the even and the modified (reversed) odd subsequences into one sequence. The new sequence thus obtained is said to be ordered in *symmetric hill ordering* (as it resembles climbing and descending a symmetric hill). Fig. 3 illustrates how such order is obtained.

*Property 3.1:* **pair swapping.** Let  $(\mathfrak{R}_i, \mathfrak{R}_{i+1}, \mathfrak{R}_{i+2}, \dots, \mathfrak{R}_{i+k-2}, \mathfrak{R}_{i+k-1}, \mathfrak{R}_{i+k})$ ,  $k \ge 3$  be a sequence of real positive numbers, such that  $\mathfrak{R}_{i+1} \ge \mathfrak{R}_{i+k-1}$  (called internal pair) and  $\mathfrak{R}_i \le \mathfrak{R}_{i+k}$  (called external pair). Then the inversion of subsequence  $(\mathfrak{R}_{i+1}, \mathfrak{R}_{i+2}, \dots, \mathfrak{R}_{i+k-2}, \mathfrak{R}_{i+k-1})$  into  $(\mathfrak{R}_{i+k-1}, \mathfrak{R}_{i+k-2}, \dots, \mathfrak{R}_{i+2}, \mathfrak{R}_{i+1})$  decreases the SRS of the sequence.

*Proof:* Since only neighbors of  $\Re_i, \Re_{i+1}, \Re_{i+k-1}, \Re_{i+k}$  are changed, it is sufficient to prove that  $\sqrt{\Re_i + \Re_{i+1}} + \sqrt{\Re_{i+k-1} + \Re_{i+k}} \ge \sqrt{\Re_i + \Re_{i+k-1}} + \sqrt{\Re_{i+1} + \Re_{i+k}}$ . Squaring the two sides one needs to show that  $(\Re_i + \Re_{i+1}) \times (\Re_{i+k-1} + \Re_{i+k}) \ge (\Re_i + \Re_{i+k-1}) \times (\Re_{i+1} + \Re_{i+k})$ . Expanding both sides, it is left to show that  $(\Re_i - \Re_{i+k}) \times (\Re_{i+k-1} - \Re_{i+1}) \ge 0$ , which indeed follows from the assumption on the relations of the internal and external pairs.

*Property 3.2:* **optimal insertion of maximal value**. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of positive real numbers ordered as a symmetric hill. Let  $\mathfrak{R} > \max{\{\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1}\}}$ . Then the location where inserting  $\mathfrak{R}$  into the sequence minimizes the new SRS, is at the center between the two largest numbers. Hence the new sequence is also in symmetric hill order.

*Proof:* Let us insert  $\Re$  arbitrarily into the sequence between  $\Re_i$  and  $\Re_{i+1}$ , thus resulting in the quadruples  $(\Re_{i-1}, \Re_i, \Re, \Re_{i+1})$  and  $(\Re_i, \Re, \Re_{i+1}, \Re_{i+2})$  in the new sequence of n+1 numbers. If  $\Re_i$  and  $\Re_{i+1}$  were not the two center numbers of the old sequence (top of the hill), at least one of these quadruples satisfies the



Fig. 3. Construction of symmetric hill ordering: (a) Sort numbers in ascending order; (b) Split sequence into odd and even subsequences; (c) Reverse order of numbers in the even subsequence; (d) Concatenate the odd and the modified subsequences

condition of *pair swapping* property 3.1. Therefore, the SRS of the new sequence can be reduced by appropriate swapping of  $\Re$  with its left or right neighbor. If  $\Re$  is inserted before  $\Re_1$  or after  $\Re_n$ , a direct calculation shows that swapping  $\Re$  with  $\Re_1$  (or  $\Re_n$ ) decreases the resulting SRS. The only position where the pair swapping property 3.1 condition does not exist is in between the two largest numbers of the old sequence. Such insertion creates a new sequence satisfying symmetric hill order.

*Definition 3.4:* local maximum. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of positive real numbers. The number  $\mathfrak{R}_j$ , is called a local maximum of  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  if both  $\mathfrak{R}_j \ge \mathfrak{R}_{j-1}$  and  $\mathfrak{R}_j \ge \mathfrak{R}_{j+1}$ .

Property 3.3: local maximum elimination. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of positive real numbers. Let  $(\mathfrak{R}_i, \mathfrak{R}_{i+1})$  and  $(\mathfrak{R}_j, \mathfrak{R}_{j+1}, \mathfrak{R}_{j+2})$  be two disjoint subsequences, where  $\mathfrak{R}_{j+1}$  is a local maximum and  $\mathfrak{R}_i \ge \mathfrak{R}_{j+1} \ge \mathfrak{R}_{i+1}$ . Then, repositioning  $\mathfrak{R}_{j+1}$  in between  $\mathfrak{R}_i$  and  $\mathfrak{R}_{i+1}$  decreases the SRS of the sequence.

*Proof:* we need to show that  $\sqrt{\Re_i + \Re_{i+1}} + \sqrt{\Re_j + \Re_{j+1}} + \sqrt{\Re_{j+1} + \Re_{j+2}} \ge \sqrt{\Re_i + \Re_{j+1}} + \sqrt{\Re_{j+1} + \Re_{j+1}} + \sqrt{\Re_j + \Re_{j+2}}$ . Let us denote  $p = \Re_i / \Re_{j+1} \ge 1$ ,  $q = \Re_{i+1} / \Re_{j+1} \le 1$ ,  $r = \Re_j / \Re_{j+1} \le 1$ ,  $s = \Re_{j+2} / \Re_{j+1} \le 1$ . Therefore, we need to show that  $\sqrt{p+q} + \sqrt{r+1} + \sqrt{1+s} \ge \sqrt{p+1} + \sqrt{1+q} + \sqrt{r+s}$ . Rearranging, this is equivalent to:  $\sqrt{r+1} + \sqrt{1+s} - \sqrt{r+s} \ge \sqrt{p+1} + \sqrt{1+q} - \sqrt{p+q}$ . (3.8)

Since  $r \le 1$ , it follows that left hand side of (3.8) is monotonic decreasing in *s*, thus minimized for s = 1. Since  $p \ge 1$ , it follows that right hand side of (3.8) is monotonic increasing in *q*, thus maximized for q = 1. So if the inequality still holds for the minimal value of left hand side and maximal value of right hand side, (3.8) does always hold. Indeed, substitution of s = 1 and q = 1 we obtain  $\sqrt{r+1} + \sqrt{2} - \sqrt{r+1} = \sqrt{2} + \sqrt{1+q} - \sqrt{1+q}$ .

Based on the above properties, we are ready to prove the theorem of optimal signal ordering in a bundle of parallel wires.

Theorem 3.1 (optimal ordering of uniform-width wires): Let a signal bundle have arbitrary drivers, arbitrary capacitive loads, arbitrary required arrival times and uniform wire width. Let MCF be the same for all signal pairs, including the side walls. Then the symmetric hill ordering of the signals in the bundle according to their effective driver resistance yields minimum total weighted sum of delays.

*Proof:* It was shown in (3.7) that for any order of the signals, the minimized total sum of delays  $f_3$  consists of two terms  $f^{T}$  and  $f^{TT}$ . The term  $f^{TT}$  captures the delays resulting from the capacitive loads, a component that is independent of the signal order in the bundle. The term  $f^{TT}$  captures the delay contributed by the cross capacitances of the signals, a component which depends on the signal order. It is therefore sufficient to minimize  $f^{TT}$ .

Let  $\pi^* = (\Re_0, \dots, \Re_{n-1})$  be the effective driver resistance symmetric hill ordering of the bundle, and denote by  $f''(\pi^*)$  the corresponding term in the minimized total sum of delays thus obtained. We'll show by induction that for any other ordering  $\pi$  of effective driver resistances  $f''(\pi^*) \leq f''(\pi)$ .

For a bundle comprised of one or two signals the induction hypothesis trivially exists. For a bundle of three signals, the optimality of symmetric hill ordering follows from the optimal insertion property 3.2. Put the two smaller effective resistances, say  $\Re_a$  and  $\Re_b$  in the bundle first. Then, the optimal insertion property

3.2 dictates the location of  $\mathfrak{R}_c$  at the center, thus resulting in symmetric hill order. If  $\mathfrak{R}_a$  ( $\mathfrak{R}_b$ ) and  $\mathfrak{R}_c$  are placed first, a direct calculation shows that  $\mathfrak{R}_b$  ( $\mathfrak{R}_a$ ) needs to reside such that  $\mathfrak{R}_c$  is located at the center.

By the induction hypothesis, the symmetric hill order is optimal for any n-1 signals bundle. Assume on the contrary that there exists a *n* signal bundle whose optimal order  $\pi'$  is not symmetric hill. It follows from the non optimality of  $\pi^*$  that  $f^{II}(\pi^*) > f^{II}(\pi')$ .

Let  $(\mathfrak{R}_{l}, \mathfrak{R}_{x}, \mathfrak{R}_{r})$  be the center triplet of  $\pi^{*}$ , namely,  $\mathfrak{R}_{x}$  is the largest resistance. There are two possibilities: triplet  $(\mathfrak{R}_{l}, \mathfrak{R}_{x}, \mathfrak{R}_{r})$  exists or doesn't exist in  $\pi'$ .

If it exists, let us delete  $\Re_x$  from both  $\pi'$  and  $\pi^*$ , thus inducing bundles of n-1 signals  $\pi'^{n-1}$  and  $\pi^{*,n-1}$ . The first is not symmetrically hill ordered, while the second is. It follows from the induction hypothesis that  $f''(\pi^{*,n-1}) < f''(\pi'^{n-1})$ . However, the magnitude of the difference in f'' between the *n* signal bundle and its n-1 signal bundle induced by  $\Re_x$  deletion is the same for  $\pi'$  and  $\pi^*$  and equals to

 $\Delta = \sqrt{\Re_l + \Re_x} + \sqrt{\Re_r + \Re_x} - \sqrt{\Re_l + \Re_r} .$  Therefore

 $f^{II}(\pi^{*}) = f^{II}(\pi^{*,n-1}) + \Delta < f^{II}(\pi^{\prime,n-1}) + \Delta = f^{II}(\pi^{\prime}).$ 

This is a contradiction to  $f^{II}(\pi^*) > f^{II}(\pi')$  that followed from the non optimality hypothesis of  $\pi^*$ .

Consider now the case where the triplet  $(\Re_l, \Re_x, \Re_r)$  doesn't exist in  $\pi'$ . Then there are two possibilities. In the first, the triplet appears in  $\pi'$  as a subsequence  $(\Re_x, \max(\Re_l, \Re_r), \min(\Re_l, \Re_r))$ . The pair swapping property 3.1 can be applied on quadruple  $(\Re, \Re_x, \max(\Re_l, \Re_r), \min(\Re_l, \Re_r))$  and result quadruple  $(\Re, \max(\Re_l, \Re_r), \Re_x, \min(\Re_l, \Re_r))$ , hence f'' can be reduced. In the second possibility, in any order at least one of  $\Re_l$  and  $\Re_r$  is a local maximum in  $\pi'$ , say  $\Re_l$ . Then applying the local maximum elimination property 3.3 to  $\Re_l$  and moving it to be adjacent to  $\Re_x$ , will decrease f'' value of the newly created order. This again contradicts the optimality assumption of  $\pi'$ .

Notice that although wire width W is uniform, it is still a variable and should be optimally set together with the spaces  $(S_0, \dots, S_n)$  between the wires in order to minimize the total sum of delays. This is a simplification of the total sum of delays minimization problem [25], where individual wires may have different widths  $(W_1, \dots, W_n)$ .

## 3.2 Non-uniform wire widths implied by impedance matching

In the following we'll prove the optimality of the symmetric hill ordering for more general cases with non-uniform wire width. We assume that wire widths are matched to driver strengths, a common design practice in most practical VLSI designs. It is shown below that minimal total weighted sum of delays is obtained by symmetric hill ordering.

Let  $\psi(R)$  be a positive, non decreasing function of the driver resistance *R*, and let the corresponding wire width be defined by

$$W = 1/\psi(R) \tag{3.9}$$

In the former discussion of uniform wire width  $\psi(R)$  was simply a constant. The relation in (3.9) represents impedance matching, where a stronger driver (smaller *R*) is assigned a wider wire with a lower impedance. According to (3.9), the effective signal resistance becomes  $\Re = \alpha (d\psi(R) + hR)$ . Then the following theorem can be stated:

*Theorem 3.2 (optimal ordering of variable-width wires):* Let a signal bundle have arbitrary drivers, arbitrary capacitive loads and wire width inversely proportional to the corresponding driver resistance. Then the symmetric hill ordering of the signals in the bundle according to effective signal resistances yields minimum total weighted sum of delays.

*Proof:* All properties of symmetric hill order still hold since  $f^{II}$  remains an SRS.

The function  $\psi(R) = \alpha + \beta R$ , where  $\alpha$  and  $\beta$  are real positive number is admissible, providing further minimization compared to the case of uniform width. The minimum total sum of delays is obtained by first ordering the signals according to Theorem 3.2. Then a minimization of total sum of delays for that order takes place, where the wire spacing  $(S_0, \dots, S_n)$  and the parameters  $\alpha$  and  $\beta$  are the optimization variables. Notice that  $\beta = 0$  is the case of uniform wire width.

# 3.3. Symmetric Hill Order for arbitrary wire width

Assume now that wire width can vary arbitrarily. It is no longer true that symmetric hill ordering yields the minimum total sum of delays. This general case might be caused by large capacitive loads, since the optimal setting of wire width depends on the corresponding load. This in turn affects the optimal order within the bundle. Note that if wire widths are predetermined randomly but are fixed, ordering by effective driver resistance is still advantageous and the optimal order is unaffected by the capacitive loads.

What is the most general setting of wire widths such that symmetric hill order still yields minimal total weighted sum of delays? It can be derived by writing the relation between wire widths and driver resistances at minimum total sum of delays. At the minimum, equations (2.1) and (2.5) satisfy:

$$\frac{\partial f}{\partial W_i} + \lambda \frac{\partial g}{\partial W_i} = 0, 0 \le i \le n - 1.$$
(3.10)

Differentiating (2.1) and (2.5) we obtain

$$\frac{\partial f}{\partial W_i} = -\frac{\alpha_i}{W_i^2} \left( b + \frac{d}{S_i} + \frac{d}{S_{i+1}} + eC_i \right) + \alpha_i kR_i \quad \frac{\partial g}{\partial W_i} = 1$$
(3.11)

Substituting (3.11) into (3.10) yields

$$W_{i} = \sqrt{\frac{\alpha_{i}}{\lambda + k\alpha_{i}R_{i}}} \left( b + \frac{d}{S_{i}} + \frac{d}{S_{i+1}} + eC_{i} \right)$$
(3.12)

Equation (3.12) demonstrates the dependency between wire width at minimum total sum of delays and the corresponding driver resistance, spacing to adjacent wires, signal criticality and the capacitive load. Substitution of (3.12) into the expression for effective signal resistance presented in definition 3.1 yields:

$$\Re_{i} = d \sqrt{\frac{\alpha_{i} \left(\lambda + k\alpha_{i}R_{i}\right)}{b + d/S_{i} + d/S_{i+1} + eC_{i}}} + \alpha_{i}hR_{i}$$
(3.13)

Whenever  $R_i \ge R_j$  implies  $\Re_i \ge \Re_j$ , symmetric hill order according to wire driver resistances yields minimum total sum of weighted delays among all possible orders.

As can be seen from (3.12), in order to satisfy the required relation in the numerator,  $R_i \ge R_j$  should imply  $\alpha_i \ge \alpha_j$ , namely, the weaker the driver is, the more critical the signal is. For the term  $d/S_i + d/S_{i+1}$  at the denominator it has been shown in (3.5) that optimality implies that the spaces are necessarily monotonically increasing with driver resistance, which also imposes a non decreasing relation between  $\Re_i$ and  $R_i$ . The only term remaining "free" is the capacitive load at the denominator of (3.13). In order to obtain a monotonic relation in (3.13) the following condition between resistance of drivers, their corresponding capacitive loads and signal criticality weights is imposed:

Theorem 3.3 (sufficient conditions for optimality of symmetric hill order): Let a *n* signal bundle have arbitrary drivers and capacitive loads. Let  $\sigma_i, \sigma_j, 0 \le i, j \le n-1$  be any two signals and let  $(R_i, C_i, \alpha_i)$  and  $(R_j, C_j, \alpha_j)$  be their driver resistance, capacitive load and signal criticality weight, respectively. If the relation  $R_i \ge R_j$  implies  $C_i \le C_j \land \alpha_i \ge \alpha_j$ , then symmetric hill order according to driver resistances yields minimum total sum of weighted delays among all orders.

*Proof:* If follows from equation (3.13) that if  $R_i \ge R_j$  implies  $C_i \le C_j \land \alpha_i \ge \alpha_j$ , then  $\Re_i \ge \Re_j$ . We can therefore replace the "effective driver resistance" phrase in theorem 3.1 by "driver resistance" and obtain the same result.

A special case of the Theorem 3.3 occurs in real design when all signals are of same criticality, at "first order" circuit implementation. In that case if  $R_i \ge R_j \land C_i \le C_j, 0 \le i, j \le n-1$ , symmetric hill ordering is optimal and sizing optimization should be performed in this order. The true criticality of the signals due to the physical realization is then discovered. As a result the signals are assigned criticality weights according to how far is their delay from the requirement, and the signal order in the bundle is then verified to be in symmetric hill order according to (3.13). If relation (3.13) is not satisfied, the signals are reordered to satisfy symmetric hill, and wire resizing takes place again.

## IV. IMPLICATIONS OF MILLER COUPLING FACTOR

So far we ignored crosstalk effects between wires by assuming MCF=1. In order to account for worstcase wire switching, the cross capacitances should be multiplied by MCF values. Thus, the delay equation in (2.5) turns to be:

$$f_3\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \alpha_i \Delta_i\left(\pi, \overline{W}, \overline{S}\right) = \sum_{i=0}^{n-1} \alpha_i \left\{ a + R_i\left(kW_i + g + C_i\right) + \frac{b + eC_i}{W_i} + \left(hR_i + \frac{d}{W_i}\right) \left(\frac{MCF_i}{S_i} + \frac{MCF_{i+1}}{S_{i+1}}\right) \right\},\tag{4.1}$$

where  $MCF_i$  is Miller Coupling Factor between wires i-1 and i (for side wires it is the MCF between the wire and the sidewall). In practice, worst-case crosstalk effect on delays is usually represented by  $MCF_i = 2$  for  $1 \le i \le n-1$ . If sidewall shielding wires are inactive, they don't induce Miller effect, i.e.  $MCF_0 = MCF_n = 1$ . Denoting  $MCF_{int}$  for all 0 < i < n-1 and  $MCF_{side}$  for i = 0 and i = n, (4.1) can be rewritten as follows:

$$\begin{split} f_{3}\left(\pi, \overline{W}, \overline{S}\right) &= \sum_{i=0}^{n-1} \alpha_{i} \Delta_{i}\left(\pi, \overline{W}, \overline{S}\right) = MCF_{int}\left[\sum_{i=1}^{n-2} \alpha_{i} \left\{a + R_{i}\left(kW_{i} + g + C_{i}\right) + \frac{b + eC_{i}}{W_{i}} + \left(hR_{i} + \frac{d}{W_{i}}\right)\left(\frac{1}{S_{i}} + \frac{1}{S_{i+1}}\right)\right\} + \alpha_{0} \left\{a + R_{0}\left(kW_{0} + g + C_{0}\right) + \frac{b + eC_{0}}{W_{0}} + \left(hR_{0} + \frac{d}{W_{0}}\right)\left(\frac{MCF_{side}/MCF_{int}}{S_{0}} + \frac{1}{S_{1}}\right)\right\} + \alpha_{n-1} \left\{a + R_{n-1}\left(kW_{n-1} + g + C_{n-1}\right) + \frac{b + eC_{n-1}}{W_{n-1}} + \left(hR_{n-1} + \frac{d}{W_{n-1}}\right)\left(\frac{1}{S_{n-1}} + \frac{MCF_{side}/MCF_{int}}{S_{n}}\right)\right\}\right] \end{split}$$

Decomposing  $f_3$  into order independent and dependent components, the order dependent component is the following:

$$f^{II} = \frac{MCF_{\text{int}}}{A - nW} \left[ \sum_{i=0}^{n-2} \sqrt{\Re_i + \Re_{i+1}} + \sqrt{r\Re_0} + \sqrt{r\Re_{n-1}} \right]^2, \tag{4.2}$$

where  $r = MCF_{side}/MCF_{int}$  is called MCF ratio. If worst-case crosstalk is assumed between internal wires, then r = 1/2. The following shows that the order of wires which minimizes the total weighted sum of delays is *ascending*, where wires with strongest and weakest drivers are placed oppositely near the walls and all others are sorted monotonically between them (Fig. 1(c)). Before proving optimality of the ascending order, a few more properties are in order.

*Property 4.1*: end value repositioning for MCF ratio = 1/2. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of positive real numbers and  $(\mathfrak{R}_i, \mathfrak{R}_{i+1})$  a pair of successive entries. If  $\mathfrak{R}_i \leq \mathfrak{R}_0 \leq \mathfrak{R}_{i+1}$  (similarly  $\mathfrak{R}_i \leq \mathfrak{R}_{n-1} \leq \mathfrak{R}_{i+1}$ ), then repositioning  $\mathfrak{R}_0$  (similarly  $\mathfrak{R}_{n-1}$ ) in between  $\mathfrak{R}_i$  and  $\mathfrak{R}_{i+1}$  decreases the SRS of the sequence.



Fig. 5 End value repositioning



Fig. 4 Optimal insertion case: the location near the wall is better than the location between two largest values

We show next the existence of optimal insertion for the case of MCF ratio = 1/2.

*Property 4.2:* **optimal insertion for MCF ratio** =1/2. Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of ascending positive real numbers, let  $\mathfrak{R} > \mathfrak{R}_{n-1}$ . Then the location where inserting  $\mathfrak{R}$  into the sequence minimizes the new SRS, is between  $\mathfrak{R}_{n-1}$  and the wall. Hence the new sequence is also ascending.

*Proof:* Let us examine all n+1 possible locations for  $\Re$  insertion. It follows from the pair swapping property 3.1 that among the n-1 locations of which are not adjacent to walls, the best one is between  $\Re_{n-1}$  and  $\mathfrak{R}_{n-2}$ . If we show that positioning  $\mathfrak{R}$  between  $\mathfrak{R}_{n-1}$  and the wall yields smaller SRS, we are done. We show that  $\sqrt{\Re_{n-2}} + \Re + \sqrt{\Re + \Re_{n-1}} + \sqrt{\frac{1}{2}\Re_{n-1}} \ge \sqrt{\Re_{n-2} + \Re_{n-1}} + \sqrt{\Re_{n-1} + \Re} + \sqrt{\frac{1}{2}}\Re$ . therefore need to Denote  $p = \Re_{n-2} / \Re_{n-1} \le 1, q = \Re / \Re_{n-1} \ge 1$ . Substitution in above the inequality yields  $\sqrt{p+q} + \sqrt{\frac{1}{2}} \ge \sqrt{p+1} + \sqrt{\frac{1}{2}q}$ . Rearranging, we obtain  $\sqrt{p+q} - \sqrt{\frac{1}{2}q} \ge \sqrt{p+1} - \sqrt{\frac{1}{2}}$ . The left hand side is monotonic increasing in q, so if we prove that the inequality holds for the minimal value of q, regardless of p, the inequality will always hold. Substitution of q = 1 in the inequality yields  $\sqrt{p+1} - \sqrt{1/2} \ge \sqrt{p+1} - \sqrt{1/2}$ , which is indeed true, independent of p.

It remains to show that positioning  $\Re$  between  $\Re_0$  and the wall is inferior compared to positioning  $\Re$  between  $\Re_{n-1}$  and the wall. In terms of SRS, this translates to showing that  $\sqrt{\frac{1}{2}\Re_{n-1}} + \sqrt{\Re_0 + \Re} + \sqrt{\frac{1}{2}\Re} \ge \sqrt{\frac{1}{2}\Re} + \sqrt{\Re + \Re_{n-1}} + \sqrt{\frac{1}{2}\Re_0}$ . Denote  $p = \Re_0/\Re_{n-1} \le 1, q = \Re/\Re_{n-1} \ge 1$ . Substitution yields  $\sqrt{\frac{1}{2}} + \sqrt{p+q} \ge \sqrt{q+1} + \sqrt{\frac{1}{2}p}$ . Rearranging, we obtain  $\sqrt{\frac{1}{2}} + \sqrt{p+q} - \sqrt{\frac{1}{2}p} \ge \sqrt{q+1}$ . The left hand side is monotonic decreasing in p, which follows from  $\partial/\partial p \left(\sqrt{\frac{1}{2}} + \sqrt{p+q} - \sqrt{\frac{1}{2}p}\right) = \frac{1}{2\sqrt{p+q}} - \frac{1}{2\sqrt{2p}} \le \frac{1}{2\sqrt{p+p}} - \frac{1}{2\sqrt{2p}} = 0$ . Therefore, if the inequality holds at minimum of left hand side, it always holds. Substituting p = 0 we obtain  $\sqrt{\frac{1}{2}} + \sqrt{q} \ge \sqrt{q+1}$ , which is always true for  $q \ge 1$ .

The above properties establish the theorem below.

*Theorem 4.1 (optimal ordering with MCF ratio* = 1/2): Let a signal bundle have arbitrary drivers, arbitrary capacitive loads, wire width decreasing with the corresponding driver resistance and MCF at walls 1/2 of

MCF between wires inside the bundle. Then ascending order of the signals in the bundle according to effective signal resistances yields minimum total weighted sum of delays.

*Proof:* Let  $\pi^* = (\Re_0, \dots, \Re_{n-1})$  be sorted left to right in ascending order, and let  $f''(\pi^*)$  be the corresponding term in the total sum of delays which depends on the SRS. We'll show by induction that for any other ordering  $\pi$  of driver resistances  $f''(\pi^*) \leq f''(\pi)$ .

For a bundle comprised of one or two signals the induction hypothesis trivially exists. For a bundle of three signals, the optimality of ascending order follows from the *optimal insertion property*. By the induction hypothesis, ascending order is optimal for any n-1 signals bundle. Assume on the contrary that there exists a *n* signal bundle whose optimal order  $\pi'$  is not ascending. It follows from the non optimality contradictory hypothesis that  $f''(\pi^*) > f''(\pi')$ .

Consider the location of the successive pair  $(\mathfrak{R}_{n-2},\mathfrak{R}_{n-1}) \subset \pi^* \operatorname{in} \pi'$ . It certainly cannot occur next to the right side wall. Because if it did, then  $\mathfrak{R}_{n-1}$  can be dropped from both  $\pi^*$  and  $\pi'$ . The remaining part of  $\pi^*$ ,  $\pi^{*,n-1}$ , is ascending ordered, while the remaining part of  $\pi', \pi'^{,n-1}$ , is not. SRS in both  $\pi^{*,n-1}$  and  $\pi'^{,n-1}$  is decreased by  $\delta = \sqrt{\frac{1}{2}\mathfrak{R}_{n-1}} - \sqrt{\frac{1}{2}\mathfrak{R}_{n-2}} + \sqrt{\mathfrak{R}_{n-2} + \mathfrak{R}_{n-1}}$ . On the other hand, it follows from the induction hypothesis that  $\pi^{*,n-1}$  is an optimal order, while  $\pi'^{,n-1}$  is not, thus implying that  $f''(\pi^{*,n-1}) < f''(\pi^{*,n-1})$ . Consequently,  $f''(\pi^*) = f''(\pi^{*,n-1}) + \delta < f''(\pi'^{,n-1}) + \delta = f''(\pi')$ , thus contradicting  $f''(\pi^*) > f''(\pi')$ .

The above shows that it is impossible for an optimal ordering to have the pair  $(\mathfrak{R}_{n-2}, \mathfrak{R}_{n-1})$  residing next to the right wall (unless this is the ascending order  $\pi^*$ , which we aim to prove is optimal). We'll show next that any order  $\pi'$  claiming to be optimal must have  $\mathfrak{R}_{n-1}$  positioned next to the right wall, by showing that if this was not the case, we could always decrease the corresponding SRS by changing the position of one of the other  $\mathfrak{R}$ 's.

But having  $\Re_{n-1}$  necessarily located next to the right wall implies that  $\Re_{n-2}$  must be left adjacent to  $\Re_{n-1}$ , since if this was not the case we could apply again the property of end value repositioning and inset

 $\mathfrak{R}_{n-2}$  between  $\mathfrak{R}_{n-1}$  and its left adjacent  $\mathfrak{R}$ .

In summary, we've shown that any order aiming at minimizing SRS for MCF ratio 1/2 must have  $(\Re_{n-2}, \Re_{n-1})$  next to the right wall. Consequently, the ascending order is superior over any other ordering having  $(\Re_{n-2}, \Re_{n-1})$  next to right wall as already shown, which concludes the proof. Similar arguments hold for the case where the pair  $(\Re_{n-1}, \Re_{n-2})$  is positioned next to the left wall.

Consider now the case where the MCF occurring between the end signals and the walls is zero. This case corresponds to active shielding [29]. Therefore, the MCF ratio between the end signals and the internal ones is zero. In the following we'll show that the order of effective drivers yielding the smallest SRS is such that the two signals having weakest drivers are located near the walls, one at each side. The strongest one is located at the center. The rest are evenly and symmetrically distributed on both sides in ascending order of their effective driver strength, from the ends towards the center. Such an order is called *symmetric valley*, defined formally as follows.

*Definition:* symmetric valley ordering. Let  $\Re_0 \leq \Re_1 \leq \cdots \leq \Re_{n-2} \leq \Re_{n-1}$  be a sequence of *n* positive real numbers increasingly ordered. Assume without loss of generality that *n* is even. Let us split it into even and odd interleaved subsequences  $\Re_0 \leq \Re_2 \leq \cdots \leq \Re_{n-2}$  and  $\Re_1 \leq \Re_3 \leq \cdots \leq \Re_{n-1}$ . Reverse the order of numbers in the even subsequence, thus turning it into monotonic decreasing sequence. Finally, concatenate the odd and the modified (reversed) even subsequences into one sequence. The new sequence thus obtained is said to be ordered in *symmetric valley ordering* (as it resembles descending and climbing a symmetric valley). Fig. 7 illustrates how such an order is obtained. The following property is analogous to *optimal insertion of maximal value* 3.2 derived for *symmetric hill order*.

*Property 4.1:* **optimal insertion of minimal value.** Let  $(\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1})$  be a sequence of positive real numbers ordered as a symmetric valley. Let  $\mathfrak{R} < \min{\{\mathfrak{R}_0, \dots, \mathfrak{R}_{n-1}\}}$ . Then the location where inserting  $\mathfrak{R}$  into the sequence minimizes the new SRS, is at the center between the two smallest numbers. Hence the



Fig. 6 Proof of optimal ordering with MCF ratio = 1/2

new sequence is also in symmetric valley order.

We skip the proof of this property, as it follows similarly to the proof of *optimal insertion of maximal value* 3.2. The following theorem manifests the optimal ordering for 0 MCF ratio.

*Theorem 4.2 (optimal ordering with MCF ratio* = 0): Let a signal bundle has arbitrary drivers, arbitrary capacitive loads and wire width decreasing with the corresponding driver resistance. Let the MCF at the side walls be 0 and MCF of wire pairs inside the bundle be equal to all. Then, the symmetric valley order of the signals in the bundle according to effective driver resistances yields minimum total weighted sum of delays.

*Proof:* Like the case of MCF ratio 1, where the optimal order is symmetric hill, the pair swapping property 3.1 and local maximum elimination property 3.3 hold also for this case, since both involve comparing SRS of internal signals only. Following similar arguments as in symmetric hill order, with the aid of *optimal insertion of minimal value property* 4.1, the proof is identical to Theorem 3.1.

Theorems 3.1, 4.1 and 4.2 manifested the optimal signal ordering in a bundle for typical cases of MCF boundary conditions, where external to internal signal MCF ratios are 1, ½ and 0. The implied orders are independent of the effective drivers' strengths and are valid under very wide wire width settings applicable for most practical design scenarios. An interesting question is what happens for other MCF ratios. With some further manipulations of SRS it can be shown that:

- When the ratio of end MCF to internal MCF is equal or greater than 1, *symmetric hill* order yields minimal total weighted sum of delays, independently of effective drivers' strength.
- When the ratio of end MCF to internal MCF is equal to <sup>1</sup>/<sub>2</sub>, *ascending* order yields minimal total weighted sum of delays, independently of effective drivers' strength.
- When the ratio of end MCF to internal MCF is equal to or smaller than 0 *symmetric valley* order yields minimal total weighted sum of delays, independently of effective drivers' strength.

For all other ratios, namely, 0 < r < 1/2 and 1/2 < r < 1 the order depends on the specific values of effective drivers' strength and may be none of the above.



Fig. 7 Formation of symmetric valley ordering: (a) Sort numbers in ascending order; (b) Split sequence into odd and even subsequences; (c) Reverse order of numbers in the add subsequence; (d) Concatenate the even and the modified subsequences

## **V. CROSSTALK NOISE REDUCTION**

Instead of incorporating delay uncertainty into the delay expression by using worst-case Miller factor, we may directly consider delay uncertainty and optimize it simultaneously with the nominal (crosstalk-free) signal delay.

For calculating crosstalk noise effectively, several models have been presented in the literature, e.g. [14], [24]. For peak noise voltage  $V^{p}$  we use a simple model, given in [14]. The peak noise in wire *i* can be represented as

$$V_{i}^{p} = V_{dd} \frac{R_{i} \cdot C_{c_{i}} + R_{w_{i}} \cdot C_{c_{i}}/2}{\Delta_{i-1} + \Delta_{i} + \Delta_{i+1}}$$
(5.1)

In (5.1), the numerator represents a part of wire delay caused by coupling capacitance, and the denominator represents the sum of Elmore delays of the wire and its neighbors. Rewriting (5.1) in terms of (2.2), obtain

$$V_{i}^{p} = V_{dd} \frac{\left(d/W_{i} + hR_{i}\right) \cdot \left(1/S_{i} + 1/S_{i+1}\right)}{\Delta_{i-1} + \Delta_{i} + \Delta_{i+1}}, \quad 0 < i < n-1$$

$$V_{0}^{p} = V_{dd} \frac{\left(d/W_{0} + hR_{0}\right) \cdot 1/S_{1}}{\Delta_{0} + \Delta_{1}}$$

$$V_{n-1}^{p} = V_{dd} \frac{\left(d/W_{n-1} + hR_{n-1}\right) \cdot 1/S_{n-1}}{\Delta_{n-1} + \Delta_{n}}$$
(5.2)

For analytical modeling of the delay uncertainty caused by effects of crosstalk noise on circuit timing, we use superposition-based approximations, proposed in [22]. According to the approximation, the upper bound of delay uncertainty of wire i can be expressed as

$$\delta_{\max,i} = \Delta_i \ln\left(2V_i^p / V_{dd} + 1\right) \tag{5.3}$$

Let us introduce two new objective functions:

$$h_{\rm l} = \sum_{i=0}^{n-1} \delta_{\max,i} \tag{5.4}$$
 and

 $h_2 = \max_{i} \delta_{\max,i}, \tag{5.5}$ 

which are the total sum of delay uncertainties and the largest delay uncertainty among the wires in the bundle.

According to [23], after some simplifications (5.2) can be represented in the form

 $V_{i}^{p} \approx V_{dd} \frac{R_{i}}{R_{i-1} + R_{i} + R_{i+1}} \cdot \eta$ , where  $\eta$  represents the ratio of cross-coupling capacitance to total wire

capacitance.

As it follows from this expression, if the driver of a wire is significantly larger than the driver of its neighbors, then the wire with the smaller driver (as a crosstalk victim) will be exposed to serious noise from the wires with the larger drivers (aggressors). Therefore, crosstalk noise will be minimized, if neighboring wires have roughly equal drivers [23]. It can be achieved by ordering wires in the bundle in one of the monotonic orders discussed above.

Although we have not performed direct mathematical optimization of delay uncertainty, our experiments have shown that total delay uncertainty (5.4) is reduced by minimizing the total weighted sum of delays (2.4), and the worst delay uncertainty (5.5) is reduced by minimization the worst wire delay (2.3), using a symmetric hill order.

## VI. EXPERIMENTAL RESULTS

Numerical experiments for various problem instances were performed using 65 nanometer technology parameters. Continuous optimization has been used, and results were verified for allowed discrete sizes as required by the technology. Delay improvements were verified by SPICE simulations of several circuits before and after optimization.

In all experiments we assume uniform timing requirements, unless mentioned otherwise.

*Experiment 1.* This experiment demonstrates the benefit of wire ordering. Random problem instances using five signals were evaluated as follows: Each signal was assigned a driver randomly. The range of driver resistances was  $50 \Omega$  to  $3 K\Omega$ , and load capacitances in the range 10-200 *fF* were assigned according to driver strength to avoid excessive driver loading, such that the conditions of theorem 3.3 were always satisfied. For each problem the wire widths and spaces were optimized once to yield minimum total sum of delays, and again to yield minimum worst-wire delay (MinMax). This was repeated for all the 5!=120 possible orders. The procedure was done for eight different bundle widths A – 1.5, 2, 2.5, 3, 3.5, 7, 9.5 and  $12 \mu m$ , and five different bundle lengths L – 300, 500, 800, 1200 and  $1500 \mu m$ . The optimization impact (% improvement of best versus worst ordering, after width/space optimization, averaged over 20 random problem instances for each width and length configuration) is presented in Table 1. This experiment demonstrates that net ordering can significantly improve results of wire sizing and spacing optimization, especially when the bundle width is tightly constrained. All obtained optimal orders for total sum of delays minimization came out as symmetric hills (as expected, since theorem 3.2 is always satisfied in this example). As can be seen from the table, bundle worst wire delay (lower half cell) is more sensitive to the ordering than the total sum of delays (upper half cell).

<u>Experiment 2.</u> This experiment evaluates the benefit of ordering for a large number of wires in the bundle. The impact of net ordering on interconnect bundles containing a large number of wires was evaluated, using 15 representative interconnect bundles in 65nm technology. The number of signal wires per bundle varied from 10 to 128. The width of each bundle was determined by allocating for times the minimal width implied by the minimum design rules.. Driver resistances varied from  $50\Omega$  to  $2.5 K\Omega$ , averaging  $1.24 K\Omega$ . Exhaustive search to find the worst and best ordering is infeasible for such problems. Instead, a poor ordering has been guessed, and the corresponding signal delays were compared with results of symmetric hill ordering. The experiment confirmed that symmetric hill net ordering can improve delays by a significant percentage: After net ordering and sizing optimization, up to 18.3% in average delays were

| Bundle<br>length | Bundle width |        |        |        |        |       |        |       |  |  |
|------------------|--------------|--------|--------|--------|--------|-------|--------|-------|--|--|
|                  | 1.5 µm       | 2 µm   | 2.5 µm | 3 µm   | 3.5 µm | 7 µm  | 9.5 µm | 12 µm |  |  |
| 300 µm           | 10.14%       | 9.13%  | 8.13%  | 7.25%  | 6.62%  | 3.12% | 2.25%  | 1.97% |  |  |
|                  | 17.19%       | 14.98% | 12.7%  | 10.86% | 9.84%  | 4.6%  | 2.86%  | 2.13% |  |  |
| 500 µm           | 11.31%       | 9.5%   | 8.21%  | 7.46%  | 6.71%  | 3.32% | 2.43%  | 2.14% |  |  |
|                  | 17.24%       | 15.18% | 13.29% | 10.81% | 9.64%  | 5.13% | 3.07%  | 2.94% |  |  |
| 800 µm           | 9.82%        | 8.76%  | 7.79%  | 7.32%  | 6.5%   | 2.47% | 1.92%  | 1.05% |  |  |
|                  | 16.22%       | 14.11% | 13.08% | 11.09% | 9.98%  | 5.14% | 3.24%  | 1.83% |  |  |
| 1200 µm          | 8.78%        | 8.23%  | 7.38%  | 6.89%  | 6.35%  | 2.24% | 1.7%   | 1.1%  |  |  |
|                  | 14.18%       | 14.58% | 13%    | 11.63% | 9.84%  | 5.13% | 2.72%  | 1.51% |  |  |
| 1500 µm          | 7.63%        | 7.2%   | 6.94%  | 6.54%  | 6.12%  | 2.1%  | 1.81%  | 0.97% |  |  |
|                  | 14.13%       | 14.02% | 12.97% | 11.51% | 10.15% | 4.99% | 2.62%  | 2%    |  |  |

 TABLE 1

 AVERAGE IMPROVEMENT (BEST VS. WORST ORDERING) FOR RANDOM PROBLEM INSTANCES, IN SUM-OF-DELAYS (UPPER HALF

 CELL) AND WORST WIRE DELAY (LOWER HALF CELL)

 TABLE 2

 % IMPROVEMENT OF BEST VERSUS WORST ORDERING, AFTER WIDTH/SPACE OPTIMIZATION, FOR A SIGNAL CHANNEL WITH TWO DRIVER STRENGTHS

| No. of weak drivers | Percent of improvement in worst delay |  |  |  |  |
|---------------------|---------------------------------------|--|--|--|--|
| 1                   | 0.11%                                 |  |  |  |  |
| 2                   | 8%                                    |  |  |  |  |
| 3                   | 12.7%                                 |  |  |  |  |
| 4                   | 16.3%                                 |  |  |  |  |
| 5                   | 10.76%                                |  |  |  |  |
| 6                   | 5.25%                                 |  |  |  |  |

obtained. On average, the interconnect delay improvement in this experiment was 11.8%, which is equivalent to 5% of the clock cycle used in the given technology.

<u>Experiment 3.</u> This example demonstrates how the set of wire driver resistances influences the impact of ordering optimization. The effect of signal ordering on MinMax delay in bundles with both strong and weak drivers is shown in Table 2. A bundle of 7 signals with driver-load pairs of  $(50\Omega, 50 fF)$  or  $(3 K\Omega, 5 fF)$  is examined for various numbers of the weak drivers. Bundle width and length were A=3  $\mu m$  and L=500  $\mu m$ . As can be expected, when the numbers of strong and weak drivers were about equal, signal ordering is most effective. The worst ordering is indeed the interleaved one, described in Fig. 2(a), while the best one is clearly symmetric hill.

<u>Experiment 4.</u> This example demonstrates the influence of driver's resistances range on ordering optimization impact. The range of drivers is specified by the ratio  $R_{\text{max}}/R_{\text{min}}$ , where  $R_{\text{max}}$  and  $R_{\text{min}}$  are the largest and the smallest driver resistances in a set of wires being ordered. 19 different 7-wire sets were evaluated, with driver resistances distributed uniformly around a constant average of  $1 K\Omega$ . In these sets,  $R_{\text{max}}/R_{\text{min}}$  varied from 1 (all drivers equal) to 6.4. Bundle length is 700  $\mu m$  and width is  $3 \mu m$  in all cases. The results are presented in Fig.8. As can be seen, optimization impact increases with resistance range. Worst wire delay optimization is influenced much more than optimization of average delay. For larger



Fig.8. Influence of relative range of drivers on optimization impact



Fig. 10 A bundle with a critical wire; a) Cross section after weighted sum minimization without ordering (the critical wire is at the leftmost position); b) Cross section after weighted sum minimization with ordering (symmetric hill order according to effective signal resistance)



Fig. 9 A bundle with uniform timing requirements; a) Cross section after weighted sum minimization without ordering; b) Cross section after weighted sum of delays minimization with ordering (symmetric hill order according to driver resistances)

range of driver resistances, the increase in delay improvement saturates.

Experiment 5. This experiment demonstrates the impact of signal criticality weight. Consider a 3  $\mu m$ -wide bundle of 500  $\mu m$  length with 5 nets in it with drivers of different strengths, and all load capacitances are equal to 10 *fF*. The cross section of the bundle after sum-of-weighted-delays optimization when all nets have uniform timing requirements (all weights are 1) is shown in Fig. 9a. The wire with the largest driver (2.8  $K\Omega$ ), is allocated the largest spaces, as expected. After ordering optimization, according to symmetric hill order by driver resistances, this wire is placed in the middle of the bundle (Fig. 9b). Assume now that the net with the strongest driver (0.05  $K\Omega$ ) is the most critical and is assigned  $\alpha = 10$ . The situation after sum-of-weighted-delays optimization and ordering optimization are shown at Fig. 10a and b respectively. Now, the critical net is placed close to the middle and shares a large space with the weakest driven net. In both cases, the average weighted delay was reduced by about 8%. In the second case, the net with the strongest driver was allocated larger width in order to reduce wire resistance due to net criticality. The experiment shows that the weighted sum method takes into account simultaneously both wire driver resistance and net criticality.

<u>Experiment 6.</u> This experiment demonstrates a-priory assignment of wire widths by a heuristic which guarantees optimality of symmetric hill ordering. This is compared with the most general optimization where ordering, width and spacing are searched exhaustively. In this experiment, delays obtained by exhaustive simultaneous ordering/sizing/spacing optimization are compared with results of heuristics using symmetric hill order for total sum of delays objective. Another set of random 1600 instances was generated with the same range of drivers and the same set of bundle widths and lengths, but all load capacitances equal to 10 pF. Heuristic wire width assignment with the inverse linear width function  $W = \frac{1}{(\alpha + \beta R)}$  was applied. For each value of bundle width and length, the delay difference between the

optimal result of exhaustive search and the result of the heuristic was expressed as a fraction of the delay difference between best and worst results of the exhaustive search. On average for all these problem instances, the global minimum delay was approached as closely as 0.37%. Hence, the heuristic wire width assignment, which allows to use symmetric hill ordering instead of exhaustive search, is effective.

Experiment 7. In this experiment we demonstrate crosstalk reduction results by wire ordering. We evaluated 20 random problem instances using five signals. Each signal was assigned a driver randomly. The range of driver resistances was  $100\Omega$  to 2  $K\Omega$  and load capacitances in the range 200 fF to 10 fF were assigned accordingly to satisfy theorem 3.3. For each problem the wire widths and spaces were optimized twice: first to yield minimum total sum of delays, and second to yield minimum worst wire delay. This was done for all the 5! =120 possible order permutations. The procedure was repeated for five different bundle widths of 2, 5, 8, 12 and  $20 \,\mu m$ , and five different lengths of 300, 500, 800, 1200 and 1500  $\mu m$ . For the best and worst timing orders, the total sum of delay uncertainties and maximum delay uncertainty were calculated. The crosstalk results for total sum of delays optimization is presented in table 3 (the results for worst wire delay optimization are very similar) In each cell, the upper half cell (colored in gray) represents improvement in total sum of delay uncertainties and the lower half cell – improvement in maximum delay uncertainty. The experiment demonstrates that net ordering can significantly improve bundle noise immunity. The maximum delay uncertainty is affected more than sum of delay uncertainties.

 TABLE 3

 PERCENT OF AVERAGE IMPROVEMENT IN DELAY UNCERTAINTY (BEST VS. WORST ORDERING) FOR RANDOM PROBLEM INSTANCES,

 OBTAINED BY SUM-OF-DELAYS OPTIMIZATION (UPPER HALF-CELL – AVERAGE WIRE DELAY UNCERTAINTY, LOWER HALF-CELL – WORST WIRE DELAY UNCERTAINTY)

|                   | A= 2μm | A=5 μm | A= 8μm | Á= 12μm | A= 20µm |
|-------------------|--------|--------|--------|---------|---------|
| L=300 µm          | 21.9   | 27.1   | 28.8   | 31.3    | 38.6    |
| Ξ 500 μ           | 26.6   | 32.2   | 38.2   | 48.1    | 46.7    |
| $L = 500  \mu m$  | 22.1   | 26.9   | 28.4   | 30.6    | 32.6    |
| Ε = 500 μm        | 29.1   | 30.5   | 39.3   | 45.2    | 39.8    |
| $L = 800  \mu m$  | 22.8   | 28.6   | 28.7   | 32.5    | 33.8    |
| E ooo µm          | 28.3   | 34.7   | 38.4   | 36.6    | 44.1    |
| $L = 1200  \mu m$ | 23.5   | 27.7   | 29.2   | 34.4    | 33.4    |
| E 1200 µm         | 25.3   | 30.7   | 37.0   | 41.2    | 38.9    |
| L =1500 µm        | 24.1   | 27.6   | 29.9   | 34.4    | 29.3    |
| 2 1000 μm         | 24.8   | 30.5   | 37.2   | 37.1    | 39.9    |

## VII. CONCLUSION AND OPEN QUESTION

Reordering of wires in a constrained-width interconnect bundle has been studied. It has been shown that a monotonic order of the signals according to their effective driver resistance yields the smallest average weighted delay among all possible orderings of signal wires. The weighted delay objective was chosen in order to approximate MinMax optimization. Three variants of monotonic ordering have been found to be optimal, depending on the MCF ratio between the signals at the sides of the bundle and that of the internal wires.

The monotonic order property exists for a very broad range of VLSI circuit settings arising in common design practice. The paper proposed a simple, yet near-optimal, setting of wire widths within the bundle to yield the best average weighted delay.

The above theoretical results have been validated by numerical experiments on 65 nanometer process technology and industrial design data. In all cases the ordering optimization yielded improvement in the range of 10% in wire delay, translated to about 5% improvement in the clock cycle of high-performance microprocessor implemented in that technology.

The authors could not prove that monotonic ordering by effective signal resistance yields the smallest MinMax timing slack, as it does for the average weighted delay. It is an interesting and important question what are the problem settings which ensure that monotonic ordering would yield the smallest MinMax delay.

### REFERENCES

- [1] R. Ho, K. Mai and M. Horowitz, "The future of wires", Proceedings of the IEEE, Vol. 89, no. 4, Apr. 2001.
- [2] D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron", in Proc. ICCAD, pp. 203-211, 1998.
- [3] A. Kahng, S. Muddu, E. Sarto and R. Sharma, "Interconnect Tuning Strategies for High-Performance ICs", *Proc. Design, Automation and Testing in Europe*, Paris, February 1998.
- [4] J. Cong and K. S. Leung, "Optimal Wiresizing Under Elmore Delay Model". *IEEE Trans. on Computer-Aided Design*, vol. 14, no. 3, pp. 321-336, Mar. 1995.

- [5] S. S. Sapatnekar, "Wire Sizing as a Convex Optimization Problem: Exploring the Area Delay Tradeoff", *IEEE Transactions* on Computer-Aided Design of Integrated Circuits and Systems, Vol. 15, No. 8, pp. 1001 1011, August 1996.
- [6] J. Cong, L. He, C. K. Koh and Z. Pan, "Interconnect Sizing and Spacing with Consideration of Coupling Capacitance", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 20, no. 9, pp.1164-1169, September 2001.
- [7] M. Becer, D. Blaauw, V. Zolotov, R. Panda, I. Hajj, "Analysis of Noise Avoidance Techniques in Deep-Submicron Interconnects Using a Complete Crosstalk Noise Model", *IEEE/ACM Design Automation and Test in Europe Conference* (DATE), pp. 456-463, March 2002.
- [8] D. A. Kirkpatrick and A. L. Sangiovanni-Vincentelli. "Techniques for crosstalk avoidance in the physical design of high performance digital systems", In Proc. IEEE/ACM Int. Conf. Computer Aided Design, pp. 616--619, Nov. 1994.
- [9] R. Arunachalam, E. Acar, and S. R. Nassif, "Optimal shielding/spacing metrics for low power design", Proceedings of IEEE Computer Society Annual Symposium on VLSI, pp. 167-172, 2003.
- [10] P. Chen, D. A. Kirkpatrick and K. Keutzer, "Miller Factor for Gate-Level Coupling Delay Calculation", in Proc. ICCAD, pp.68-74, 2000.
- [11] M. Hashimoto and H. Onodera, "Increase in delay uncertainty by performance optimization", *Proc. IEEE International Symposium on Circuits and Systems (ISCAS)*, pages 379--382, 2001.
- [12] X. Bai, C. Visweswariah, P. N. Strenski, D. J. Hathaway, "Uncertainty-Aware Circuit Optimization", *The 39th Design Automation Conference (DAC)*, pp.58-63, 2002.
- [13] P. Groeneveld, "Wire ordering for detailed routing", IEEE Design and Test, vol.6, issue 6, pp. 6-17, Nov. 1989
- [14] A.Vittal, L Chen, M. Marek-Sadowska, K. Wang, S Yang, "Crosstalk in VLSI Interconnections", *IEEE Transactions on CAD Design of IC and Systems*, Vol.18, No. 12, Dec. 1999
- [15] T. Gao and C. Liu, "Minimum Crosstalk Channel Routing", TechnicalDigest Int. Conf. on CAD, pp. 692-696, 1993
- [16] J. Ma and L. He, "Formulae and Applications of Interconnect Estimation Considering Shield Insertion and Net Ordering", 2001 International Conference on Computer-Aided Design (ICCAD '01), pp. 327-332, Nov. 2001
- [17] P. Gupta and A. Kahng. "Wire Swizzling to Reduce Delay Uncertainty Due to Capacitive Coupling." *Proc. IEEE Intl. Conf. on VLSI Design*, January, p.431, 2004.
- [18] E. Macii, M. Poncino and S. Salerno, "Combining Wire Swapping and Spacing for Low-Power Deep-Submicron Buses", Proc. of the 13th ACM Great Lakes symposium on VLSI, pp. 198-202, 2003
- [19] T. Sato, Y. Cao, D. Sylvester and C. Hu, "Characterization of interconnect coupling noise using in-situ delay change curve measurements", Proc. IEEE ASIC/SoC Conf., pp. 321-325, 2000
- [20] P. Chen and K. Keutzer, "Toward true crosstalk noise analysis", Proc. ICCAD, pp. 132-137, 1999
- [21] T. Lin and L. Pillegi, "Throughput-Driven IC Communication Fabric Synthesis", Proc. ICCAD, pp. 274-279, 2002.
- [22] T. Sato, Y. Cao, K. Agarwal, D. Sylvester, and C. Hu, "Bidirectional Closed-Form Transformation Between On-Chip Coupling Noise Waveforms and Interconnect Delay-Change Curves", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, pp. 560-572, March 2003
- [23] O. Milter, A. Kolodny, "Crosstalk noise reduction in synthesized digital logic circuits", *IEEE transactions on VLSI*, pp. 1153 1158, Dec. 2003
- [24] A. Kahng, S. Muddu and D. Vidhani, "Noise and Delay Uncertainty Studies for Coupled RC Interconnects", Proc. of Twelfth Annual IEEE International ASIC/SO SOC Conference, pp. 3-8, Sept. 1999
- [25] S. Wimer, S. Michaely, K. Moiseev and A. Kolodny, "Optimal Bus Sizing in Migration of Processor Design", *IEEE Transactions on Circuits and Systems I*, vol. 53. no. 5, May 2006.
- [26] A. Naeemi, R. Venkatesan, and J. D. Meindl, "Optimal global interconnects for GSI," *IEEE Trans. Electron. Devices*, vol. 50, pp. 980-987, April 2003.
- [27] D. Pamunuwa, L. Zheng and H. Tenhunen, "Maximizing throughput over parallel wire structures in the deep submicrometer regime", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, Volume 11 Issue 2, 2003.
- [28] M. L. Mui, K. Benerjee and A. Mehortra, "A global interconnect optimization scheme for nanometer scale VLSI with implications for latency, bandwidth, and power dissipation", *IEEE Transactions on Electron Devices*, Vol. 51, No. 2, Feb. 2004, pp. 195-203.
- [29] H. Kaul, D. Sylvester and D. Blaauw, "Active shields: a new approach to shielding global wires," Proceedings of the 12<sup>th</sup> ACM Great Lakes symposium on VLSI, 2002, pp. 112-117.
- [30] A. I. Abou-Seido, B. Nowak and C. Chu "Fitted Elmore Delay: A simple and Accurate Interconnect Delay Model", Proc. of IEEE Int. Conf. on Computer Design, pp. 422-427, 2002.
- [31] A. Kahng, K.Masuko and S.Muddu, "Analytical delay models for VLSI interconnects under ramp input", *ICCAD-96*, pp. 30-36, 1996