## 行政院國家科學委員會專題研究計畫 成果報告

# X-結構多重時脈繞線合成策略應用於奈米製程技術之研究 (I) 研究成果報告(精簡版)

| 計 | 畫 | 類 | 別 | : | 個別型                    |
|---|---|---|---|---|------------------------|
| 計 | 畫 | 編 | 號 | : | NSC 96-2221-E-343-007- |
| 執 | 行 | 期 | 間 | : | 96年08月01日至97年07月31日    |
| 執 | 行 | 單 | 位 | : | 南華大學資訊工程學系             |
|   |   |   |   |   |                        |

計畫主持人: 蔡加春

計畫參與人員:碩士班研究生-兼任助理人員:古琳正 碩士班研究生-兼任助理人員:尤志豪 大專生-兼任助理人員:張嘉益 大專生-兼任助理人員:張瑋斌 博士班研究生-兼任助理人員:郭仲傑

報告附件:出席國際會議研究心得報告及發表論文

處 理 方 式 : 本計畫涉及專利或其他智慧財產權,1年後可公開查詢

### 中華民國 97年08月10日

| 行政院國家科學委 | 員  | 會補助專題研 | F究言 | 十書成果報告 |
|----------|----|--------|-----|--------|
|          | 73 |        |     |        |

| ≫ | ****                    | < 💥 |
|---|-------------------------|-----|
| і |                         | і   |
| і | X-結構多重時脈繞線合成策略應用於奈米製程技術 | і   |
| * | 之研究(I)                  | і   |
| і |                         | ≫   |
| * | ****                    | <₩  |

計畫類別: ☑個別型計畫 □整合型計畫 計畫編號:NSC 96-2221-E-343-007-執行期間:96年8月1日至97年7月31日

計畫主持人: 蔡加春 教授 共同主持人: 計畫參與人員: 郭仲傑(博士生) 古琳正、尤志豪(碩士生)

張嘉益、張瑋斌 (大專生)

執行單位:南華大學資工系

中華民國 97 年 8 月 10 日

### X-結構多重時脈繞線合成策略應用於奈米製程技術之研究(I) X-Architecture Multiple Clock Routing Synthesis for Applications in Nanometer Process Technology(I)

計畫類別:個別型計畫 國科會計劃編號:NSC 96-2221-E-343-007 執行期限:96年8月1日至97年7月31日 主持人:南華大學資工系 蔡加春教授

#### 中文摘要

在本專案中,我們針對時脈網路提出了 X-架構 的繞線演算法,並配合單一配對點的十六種 X-繞線 圖樣定義,應用在我們的演算法中,建構一零時脈傾 斜誤差的 X-時脈數。當使用 DME 方法時,採用這些 圖樣可以簡化合併線段的選擇過程。而為了達到最小 的時脈延遲,我們提出了 X-倒置方法用於縮短每個 配對點之間的線段長度。在節省繞線資源方面,我們 更進一步提出調整線段尺寸大小來取代延長線段。將 我們的標準測試電路實驗結果和其他演算法相比 較,在時脈延遲、總線段長、功率消耗和穿孔數上, 分別可以達到 16%、0.8%、1.5%和 17.4%的改善率。

**關鍵字:**奈米製程,X-結構,時脈繞線,時脈延遲,時脈傾斜誤差。

#### 英文摘要

In this study, we propose an X-architecture routing algorithm for a clock network. With the definition of 16-pattern X-routing for a pair of points, our algorithm applies these patterns to simplify the selection of merging segments whereas using the DME approach and constructs an X-clock tree with zero skew. An X-flip is employed to shorten the wire length of each pair of points as possible for minimal clock delay. Moreover, a wire sizing is applied to remove snaking wires for saving routing resource. Experimental results on benchmarks compared with other algorithms show that our improvements in terms of clock delay, wire length, power consumption, and via cost are 16%, 0.8%, 1.5%, and 17.4%, respectively.

**Keywords**: Nanometer process, X-Architecture, Clock routing, Clock delay, Clock skew.

#### **I. INTRODUCTION**

In synchronous VLSI design, clock routing dominates chip performance. Wire delay has a first-order effect on clock routing performance due to wire lengths increase rapidly with the shrinking geometry in nanometer scales [1]. Minimal clock delay and zero skew are two major requirements for a clock routing. Many techniques were proposed to treat them carefully. The MMM (method of means and medians) algorithm [2] generates a topology by recursively partitioning a set of sinks into two equal-sized subsets downwardly and then connects a centre of mass of entire set to centers of mass of two subsets upwardly. The GMA (geometric matching algorithm) algorithm [3] geometrically matches a set of sinks and contributes 5-7% less total wire length than MMM.

Tsay [4] achieved an exact zero skew tree (ZST). The approach recursively combines pairs of sub-ZSTs at a tapping point to create an upward ZST. The location of the tapping point is determined to achieve zero skew with a condition ratio *x*. If  $0 \le x \le 1$ , the tapping point is located on a point along the wiring path between two subtrees. If x < 0 or x > 1, the tapping point is exactly on the root of one of them and a snaking wire is required for balancing clock skew. The DME (deferred merge embedding) algorithm [5] was another popular method in clock routing field. It generates a ZST and gets 8-15% wire length reduction over MMM and GMA. However, all above approaches base on Manhattan architecture and only deal with horizontal and vertical wires, such as metal layers 1 and 2 (M1 and M2), on routing space.

and 2 (M1 and M2), on routing space. Current manufacturing technologies [6] have the sufficient lithographic considerations and support routing wires with arbitrary angles, especially for diagonal wires (±45°) using extra metal layers 3 and 4 (M3 and M4). Basically, decreasing the wire length can reduce the wire delay. The X-architecture [7] wire topology is defined as the combination of diagonal, horizontal, and vertical wires and contributes improvements of 10%, 20%, 30%, and 20% in terms of chip performance, power consumption, die cost, and wire length, respectively.

Figure 1 shows the potential of saving wire length based on X-architecture. A dotted wire connecting two points,  $s_1$  and  $s_2$ , has an arbitrary angle  $\alpha$  with x-axis and its Euclidean distance is L =  $[(x_1-x_2)^2+(y_1-y_2)^2]^{1/2}$ . The wire connecting two points based on Manhattan architecture with length L<sub>M</sub>=L(sin $\alpha$ +cos $\alpha$ ) is shown in Fig. 1(a). In general, L<sub>M</sub>=4L/ $\pi \approx 1.27$ L on average. Figure 1(b) shows a wire based on X-architecture and its wire length is L<sub>x</sub>=L(0.41sin $\alpha$ +cos $\alpha$ ). Basically, L<sub>x</sub>  $\approx 1.055$ L on average. Obviously, the X-architecture contributes the shorter wire length than that of the Manhattan architecture.



The X-architecture shortens the wire length dramatically, but it may generate more vias than that of the Manhattan architecture. As shown in Fig. 1, the X-architecture requires two vias at the bending point  $P_B$  while the Manhattan architecture needs just one. Shen et al. [8] proposed a ZST clock routing algorithm based on X-architecture and associated with the balanced bipartition approach and the DME algorithm [5], named as DME-4. Liquid routing model [7] routes signal wires with diagonal, horizontal, and vertical directions in a single layer as long as possible for via minimization, but it may cause unnecessary detours. Wang and Mak [9] presented a dynamic programming approach called node via minimization (NVM) to optimize via cost for a clock routing tree based on X-architecture.

In this study, we propose a new X-based clock routing algorithm that combines the DME approach and the pattern-matching routing. For a set of clock sinks (points), the GMA concept is first adopted to determine each pair of points. With the given definition of 16 kinds of X-pattern wires for a pair of points, the algorithm constructs its parallelogram and merges its segment for centralizing all of the sinks (points). An X-flip is employed to shorten the wire length of each pair of points as possible. Moreover, a wire sizing is applied to remove all of the snaking wires for saving routing area. Experimental results on benchmarks compared with other algorithms show more improvements in clock delay, wire length, power consumption, and via cost.

The remainder of this study is organized as follows. In Section II, we first give the problem formulation. Section III introduces the FED delay model dedicated for estimating the wire delay. Section IV presents our algorithm. Experimental results are shown in Section V. Finally, we conclude this study in Section VI.

#### **II. PROBLEM FORMULATION**

Here, we define our X-architecture with the limitation of  $\pm 45^{\circ}$  routing wires and four metal layers, M1 and M2 for horizontal and vertical wires, M3 and M4 for  $\pm 45^{\circ}$  wires. Basically, the X-architecture wire connecting two points has many kinds of routing paths. These routing paths may contain one or two more bending points that need vias to connect metals in different layers. Therefore, we use the simplest X-pattern that just needs a bending point for via cost minimization. We want to formalize the X-patterns for a pair of points and let these patterns be a library as parts of our problem formulation. These X-patterns can help us to determine the best merging segment for a pair of points whereas the DME approach is applied.

Firstly, we assume that a pair of points based on X-architecture will be connected from the start point,  $s_1$ , to the end point,  $s_2$ . Then, the routing area can be tiled into four zones denoted as LT (left-top). LB (right-top), (left-bottom), RT and RB (right-bottom) respectively, as shown in Fig. 2(a). Therefore, the zone location of each start point can be obtained through their coordinates. Secondly, the zone of the start point,  $s_1$ , is tiled into four subzones denoted as SLT (sub-LT), SRT (sub-RT), SLB (sub-LB), and SRB (sub-RB), as shown in Fig. 2(b). The zone location of each end point can also

be obtained. With the relative locations of  $s_1$  and  $s_2$ , there are two kinds of X-pattern wires. As shown in Fig. 2(c), one is the upper-tiled wire denoted as *PTN\_1* and another is the lower-tiled wire denoted as *PTN\_2*. An illustration is referred in Fig. 2(c), if the start point,  $s_1$ , locates on LT of routing area and the end point,  $s_2$ , locates on SLB of  $s_1$ , their X-pattern will be *PTN\_2*. If the start point,  $s_1$ , locates on LT of routing area and the end point,  $s_2$ , locates on SLB of  $s_1$ , their X-pattern will be *PTN\_2*. If the start point,  $s_2$ , locates on SLT of routing area and the end point,  $s_2$ , locates on SLB of  $s_1$ , their X-pattern will be *PTN\_R* which is randomly chosen between *PTN\_1* and *PTN\_2*. Table I shows all the cases of X-pattern wires (i.e., 16-kind of X-patterns) for a pair of points.



Fig. 2 (a) A routing area is tiled into four zones, (b) the zone  $s_1$  is tiled into four subzones, and (c) X-pattern cases of  $s_1$ - $s_2$ .

Table I 16-kind of X-patterns for a pair of points.

| Zone location of |       | Zone location | of end point |       |  |  |  |  |  |  |
|------------------|-------|---------------|--------------|-------|--|--|--|--|--|--|
| start point      | SLT   | SRT           | SLB          | SRB   |  |  |  |  |  |  |
| LT               | PTN_R | PTN_1         | PTN_2        | PTN_R |  |  |  |  |  |  |
| RT               | PTN_1 | PTN_R         | PTN_R        | PTN_2 |  |  |  |  |  |  |
| LB               | PTN_2 | PTN_R         | PTN_R        | PTN_1 |  |  |  |  |  |  |
| RB               | PTN_R | PTN_2         | PTN_1        | PTN_R |  |  |  |  |  |  |

With the X-architecture clock routing, vias are required at bending points. NVM [9] defines two kinds of vias depending on their locations as shown in Fig. 3. An edge via, EV, is needed when we route a tree edge. A via which locates on an internal node or leaf node is defined as a node via, NV. Reducing the number of vias is also critical in X-clock routing.



Therefore, the problem of an X-architecture clock tree construction is defined as follows.

Given a set of n clock sinks,  $S = \{s_1, s_2, ..., s_n\}$ , and a set of 16-kind X-pattern wires for a pair of sinks (points), the objective is to construct a ZST based on X-architecture with zero skew and minimum values in terms of clock delay, wire length, and via cost.

#### **III. DELAY MODEL**

Elmore delay (ED) model [10] is widely used for the calculation of wire delay in the clock tree synthesis. A wire with width *w* and length *I* can be modelled as a  $\pi$ -RC circuit. Figure 4 shows a single wire and its equivalent circuit, where *r* is the sheet resistance,  $c_a$  and  $c_f$  are the unit area and fringing capacitances, respectively.



Fig. 4 (a) A single wire and (b) its equivalent  $\pi$ -RC model.

Due to the ED model has the overestimated delay, this model causes limited accuracy in wire delay. Alternately, the Fitted Elmore delay (FED) model [11] is more efficient and accurate than the ED model. The wire delay based on the FED model with a load capacitance  $C_L$  at sink *i* is denoted as

$$FED(C_{L}, l_{i}, w_{i}) = r(l_{i} / w_{i}) [0.5(Dc_{a}w_{i} + Ec_{f})l_{i} + FC_{L,i}]$$
(1)

where the coefficients *D*, *E*, and *F* are obtained by a curve fitting technique that approximates HSPICE simulation data.  $C_{load,i}$  is defined as the capacitance of a tree rooted at node *i*,

$$C_{load, i} = \begin{cases} C_{L,i} \text{ if } i \text{ is a sink } S_i \\ \sum_{i \in T(i)} [(Dc_a w_j + Ec_f) l_j / F + C_{load, j}] \text{ if } i \text{ is an internal node} \end{cases}$$
(2)

where T(i) is the set of tree edges at the downstream of node *i*. Then, the FED delay from root to sink *k* can be calculated by

$$Delay(k) = \sum_{i \in P(k)} r(l_i / w_i) \left[ 0.5(Dc_a w_i + Ec_f) l_i + FC_{load, i} \right]$$
(3)

where P(k) is the set of tree edges along the path from root to sink *k*.

#### **IV. PATTERN-MATCHING-BASED X-CLOCK**

#### **ROUTING ALGORITHM**

Given a set of sinks  $S = \{s_1, s_2, ..., s_n\}$ , we apply DME approach associated with a set of 16-kind of X-patterns for a pair of points to construct a ZST with minimal clock delay, wire length, and via cost. Our algorithm called pattern-matching X-clock routing (PMX) will be described in the following subsections.

#### A. Determine Pair of Points in GMA (DPPG)

Since GMA is a bottom-up geometric matching algorithm for the clock tree construction, it can find the pair of points which has the shortest distance from the start point to other end points. The GMA focuses primarily on path-length balancing and addresses on clock skew minimization. Thus, we firstly use GMA to determine pair of points for minimum total wire length.

#### B. Choose Proper X-Pattern (CPXP)

To apply our defined X-patterns for a pair of points determined by *DPPG* introduced in IV.A, the routing area is firstly tiled into four zones, LT, RT, LB, and RB. Then the zone location for each sink or point can be determined through its coordinate. With the given set of 16-kind of X-patterns for a pair of two points referred in Table I, we can choose the proper X-pattern wires for this pair of points, called the technique of *CPXP*.

For a pair of points,  $s_1$  and  $s_2$ , shown in Fig. 5, the start point,  $s_1$ , locates on LT of routing area and the end point,  $s_2$ , locates on SRT of  $s_1$ , their X-pattern wire will be *PTN\_1* denoted as *CPXP*( $s_1, s_2$ ) = *PTN\_1*. Then we exchange their roles to get the other proper X-pattern. The start point,  $s_2$ , locates on RT of routing area and the end point,  $s_1$ , locates on SLB of  $s_2$ , and its X-pattern will be *PTN\_1* or *PTN\_2* depending the random one, i.e., *CPXP*( $s_1, s_2$ ) = *PTN\_R*. Finally, the X-pattern for the pair of points is determined by the intersection between *CPXP*( $s_1, s_2$ ) and *CPXP*( $s_2, s_1$ ). Figure 5 illustrates the proper X-patterns for the clock tree which contains 8 sinks, such as,  $CPXP(X1,X2) \cap CPXP(X2,X1) = PTN_1 \cap PTN_R$   $= PTN_1$  and  $CPXP(X4,X6) \cap CPXP(X6,X4) =$   $PTN_R \cap PTN_2 = PTN_2$ . If  $CPXP(X_n, X_{n+1}) \cap$   $CPXP(X_{n+1}, X_n) = PTN_1 \cap PTN_2$ , we have to assign  $PTN_R$  to their intersection.



Fig. 5 The proper X-patterns for the clock tree with 8 sinks.

# C. Determine Coordinate of Tapping Point (DCTP)

The determination of a tapping point  $P_t$  for a pair of points is a key part in a ZST construction. Figure 6(a) shows the X-pattern of a pair of points, the bending point  $P_B$  is on the bend of wires where two edge vias should be inserted. Figure 6(b) is the equivalent FED model of Fig. 6(a) for determining the coordinate of tapping point  $P_t$ . The roots of subtree<sub>1</sub> and subtree<sub>2</sub> are denoted as  $s_1$  and  $s_2$ , respectively. The wire delay is defined as from  $P_t$  to a root node. We can use binary search [12] to determine the zero skew condition, x, and force two wire delays based on X-architecture,  $d_x(s_1, P_t)$  and  $d_x(s_2, P_t)$ , to be equal. For  $0 \le x \le 1$ , we can get  $P_t$  that it locates the connecting wire between two subtrees with zero skew. But, for x < 0 or x > 1, an extra snaking wire is added to balance wire delays.



Fig. 6 (a) The X-pattern for a pair of points and (b) its equivalent FED model for determining the tapping point  $P_t$ .

#### D. Wire Sizing

Adding snaking wires is the normal method for zero skew during a clock tree construction. Here, we may size the wires where are attached by snaking wires to remove the snaking wires and improve clock delay. It is noted in [13], wire sizing can release the routing resources occupied by snaking wires, but it may cause extra power consumption due to the wider wires. We should treat them carefully.

For simplicity, we discuss only the case that x<0, i.e., the zero skew condition of x is smaller than zero. It means that the clock skew is a negative value and  $d_x(s_1, P_t)$  is larger than  $d_x(s_2, P_t)$  as shown in Fig. 6(b). A widen wire can compensate the clock skew to satisfy the zero

skew constraint. Therefore,  $w_2$  can be derived through the following formula.

$$w_{2} = \frac{l_{2}(0.5Ec_{f}l_{2} + FC_{L,2})}{(l_{1}/w_{1}) \cdot (0.5Ec_{f}l_{1} + FC_{L,1}) + 0.5Dc_{a}(l_{1} - l_{2})}$$
(4)

#### E. DME-X

A traditional DME algorithm consists of bottom-up and top-down phases and always obtains the ZST construction with minimum total wire length. The bottom-up phase constructs a tree of wire segments which represent the possible placement of internal nodes for generating a ZST and the top-down phase then resolves the exact locations of all internal nodes in the ZST.

DME-4 [8] constructs a tiled octangular region (TOR) for each point in bottom-up phase illustrated in Fig. 7(a), where the radius of a TOR is the distance between its centre and boundary. The merging segment of two TORs for a pair of points is determined as shown in Fig. 7(b). Then, the top-down phase resolves exact coordinates from the merging segment to construct the ZST.



DME-4, and (c) the merging segment in DME-X.

Differently from the TOR construction of DME-4, our DME-X integrates two phases to simplify the merging-segment phase. In Fig. 7(c), we construct a parallelogram and assign two opposite acmes as  $s_1$  and  $s_2$  whose interior angels must be 45° with each other. Two X-pattern wires, *PTN\_1* and *PTN\_2*, are easily determined with the identical wire length and the coordinates of two tapping points  $P_t$  and  $P_t$  can thus be calculated by  $DCTP(s_1, r)$ S<sub>2</sub>).

#### F. Power Estimation

In our X-clock routing, the power consumption is just estimated by charging the equivalent wire and sink capacitors. The total power consumption in a clock tree is represented as follows.

$$Power = \sum_{\forall e_i} C_{load, i} F_{clk} V_{dd}^2$$
(5)

where  $F_{clk}$  and  $V_{dd}$  are the clock frequency and supplying voltage.

#### G. X-Flip

While the ZST construction is used for a pair of two points, its X-pattern may be improved by X-flip for shortening the wire length. Figure 8 illustrates the improvement after launching X-flip. As shown in Fig. 8(a), both tapping points X9 and X9' were the candidates of X1 and X2. But X9' was rejected due to it did not directly match to the proper X-pattern of X1 and X2. Tapping points X10 and X10' were in the same situation. The above two cases resulted in longer wire length. This disadvantage can be modified by X-flip to re-evaluate the distance between two sets of tapping points. After launching X-flip to shorten wire length, Fig. 8(b) shows the improvements of 4.9% and 3% in terms of total wire length and power consumption, respectively, than that of Fig. 8(a).



Figure 9 shows our algorithm, called the pattern-matching-based X-clock routing with X-flip (PMXF), which combines the above subsections. The time complexity depends on the set of *n* clock sinks and the running time of DPPG(), DCTP(), CPXP(), X-Flip(), DME-X(), and WireSizing(). Since the worst running time for the first two sub procedures is  $O(\log n)$  and the remains are O(n), therefore, the time complexity is O(nlogn) or  $O(n\log^2 n)$ .

Input: A set of S sinks and 16-kind of X-patterns for a pair of points Output: A ZST based on X-architecture with zero skew and minimal delay

begin **While**(|S| > 1)

- 1
- { (s<sub>1</sub>, s<sub>2</sub>) = **DPPG**(S); //Determine a pair of points using GMA 2 3 Pattern =  $CPXP(s_1, s_2) \cap CPXP(s_2, s_1)$ ; //Choose proper X-pattern
- $P_t = DCTP(s_1, s_2, x);$  //Find tapping point  $P_t$  of  $s_1$  and  $s_2$
- 5 If(x<0) WireSizing(s1, s2); //Adjust w2
- If(x>1) WireSizing(s<sub>2</sub>, s<sub>1</sub>); //Adjust w<sub>1</sub>
- 6 7 DME-X(s<sub>1</sub>, s<sub>2</sub>, P<sub>t</sub>, Pattern); //Construct the clock tree
- 8 9 **X-Flip** $(s_1, s_2)$ ; //Reduce wire length
- Insert(S, Pt); //Insert Pt to S

10} end

Fig. 9 The pattern-matching-based X-clock routing algorithm

#### V. EXPERIMENTAL RESULTS

Our algorithm has been implemented with C++ and performed on a PC with P4-1.7GHz and 0.13µm 512MB memory. The fabrication parameters for FED model adopted in DME-4 [8] and our experiments are shown in Table II. The IBM benchmarks [4], r1-r5, are used to evaluated our algorithm.

| Table II The adopted parameters based on 0.13µm process. |                                                        |   |             |                 |      |  |  |  |  |  |
|----------------------------------------------------------|--------------------------------------------------------|---|-------------|-----------------|------|--|--|--|--|--|
| r                                                        | $r = 0.623\Omega/\mu m$ D 1.12673×ln2 $F_{c/k}$ 100MHz |   |             |                 |      |  |  |  |  |  |
| Ca                                                       | 0.00598fF/μm                                           | Е | 1.10463×ln2 | V <sub>dd</sub> | 1.2V |  |  |  |  |  |
| Cf                                                       | 0.043fF/µm                                             | F | 1.04836×ln2 |                 |      |  |  |  |  |  |

Due to limited experimental results in literatures [8-9], Table III shows the comparison of PMXF, and DME-4 [8] in delay, wire length, and power. The ratio here is defined as PMXF/DME-4. From the table, the PMXF has better performance of 16%, 0.8%, and 1.5% on average in terms of delay, wire length, and power respectively than that of the DME-4. Figure 10 shows the X-clock tree construction of r5 based on the PMXF.

We also compare our wire length and via costs with NVM [9], even if NVM does not list the delays and other results. The NVM adopts the ED model and the unit wire resistance and capacitance  $0.075\Omega/\mu m$  and  $0.118 fF/\mu m$ . For are fair comparison, we also implement our algorithm with

the same delay model. As shown in Table IV, the PMXF performs efficiently on via minimization in terms of edge via, EV, and node via, NV, than that of the NVM. The total via cost is reduced up to 17.4% on average.

#### **VI. CONCLUSION**

X-architecture has been proven more effective in terms of wire length and clock delay than Manhattan architecture. With the above experimental results, we observe that the DME-4[8] adopts the balanced bipartition in pair-up procedure to reduce the wire length and capacitance variation of merging segments as possible. Moreover, our proposed algorithm employs the pattern-matching-based method associated with X-flip in pair-up procedure to minimize the delay and wire length and thus performs well in clock delay, power, and via cost.

In the extended works, we will integrate the considerations of buffer insertion, crosstalk reduction, and DFM (design for manufacture) with the nanometer process into our PMXF.

#### ACKNOWLEDGMENT

The authors would like to thank the National Science Council, Taiwan, Republic of China, for financially supporting under Contract No. NSC 96-2221-E-343-007.

#### REFERENCES

- [1] International Technology Roadmap for Semiconductors, 2001 Edition.Available:http://public.itrs.net/Files/2001ITR S/Interconnect.pdf
- [2] M. A. B. Jackson, A. Srinivasan, and E. S. Kuh, "Clock routing for high performance IC's," in *Proc.* ACM/IEEE DAC, 1990, pp. 573-579.
- [3] A. B. Kahng, J. Cong, and G. Robins, "High-performance clock routing based on recursive

geometric matching," in *Proc. ACM/IEEE DAC*, 1991, pp. 322-327.

- [4] R. S. Tsay, "Exact zero skew," in *Proc. IEEE ICCAD*, 1991, pp. 336-339.
- [5] T. H. Chao, Y. C. Hsu, J. M. Ho, K.D. Boese, and A.B. Kahng, "Zero Skew Clock Routing with Minimum Wire-length," *IEEE Trans. on Circ. and Syst.*, vol. 39, no. 11, pp. 799-814, 1992.
- [6] http://www.xinitiative.org/
- [7] S. L. Teig, "The X-architecture: not your father's diagonal wiring," in Proc. International Workshop on System-Level Interconnect Prediction, 2002, pp. 33-37.
- [8] W. Shen, Y. Cai, J. Hu, X. Hong, and B. Lu, "High Performance Clock Routing in X-architecture," *IEEE International Symposium On Circuits and Systems*, 2006, pp. 2081-2084.
- [9] C. H. Wang and W. K. Mak, "λ-Geometry clock tree constriction with wire length and via minimization," *IEEE International Symposium on VLSI-DAT*, 2007, pp. 124-127.
- [10] W. C. Elmore, "The transient response of damped linear network with particular regard to wideband amplifiers," *Journal Applied Physics*, vol. 19, pp. 55-63, 1948.
- [11] Arif Ishaq Abou-Seido, Brian Nowak, and Chris Chu, "Fitted Elmore Delay: a simple and accurate interconnect delay model," *IEEE Trans. on VLSI*, vol. 12, no. 7, pp. 691-696, 2004.
- [12] Jan-Ou Wu, Chia-Chun Tsai, Chung-Chieh Kuo, and Trong-Yen Lee, "Zero-skew driven for buffered RLC clock tree construction," *IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences*, vol. E90-A, no. 3, pp. 651-658, 2007.
- [13] M. A. El-Moursy and E. G. Friedman, "Optimum Wire Sizing of RLC Interconnect With Repeaters," in *Proc. IEEE Great Lakes Symposium on VLSI*, April 2003, pp. 27-32.

| Fable III Th | ne comparison | of our algorithm | and DME-4 [8 | B1 in terms of dela | v. wire lenath. | and power. |
|--------------|---------------|------------------|--------------|---------------------|-----------------|------------|
|              |               |                  |              |                     | ,,              |            |

| Bench-  | #Sink | delay (μs) |                  | wire     | e length (μm)    | power (W) |                  |  |
|---------|-------|------------|------------------|----------|------------------|-----------|------------------|--|
| mark    | #SINK | DME-4[8]   | PMXF (ratio)     | DME-4[8] | PMXF (ratio)     | DME-4[8]  | PMXF (ratio)     |  |
| r1      | 267   | 0.471340   | 0.310858 (0.659) | 1414960  | 1383347 (0.977)  | 0.074594  | 0.076785 (1.029) |  |
| r2      | 598   | 1.145970   | 0.841717 (0.734) | 2863420  | 2863408 (0.999)  | 0.180590  | 0.174153 (0.964) |  |
| r3      | 862   | 1.664930   | 1.790971 (1.075) | 3656580  | 3651790 (0.998)  | 0.254845  | 0.258602 (1.015) |  |
| r4      | 1903  | 4.631840   | 3.989911 (0.861) | 7245500  | 7221328 (0.996)  | 0.589042  | 0.583533 (0.991) |  |
| r5      | 3101  | 9.053950   | 7.881827 (0.871) | 10971100 | 10855445 (0.989) | 0.981078  | 0.909697 (0.927) |  |
| Average |       | -          | - (0.840)        | -        | - (0.992)        | -         | - (0.985)        |  |

| Bench-      |       | NVM[9] |      |           |                  | PMXF |      |                   |                  |            |           |             |
|-------------|-------|--------|------|-----------|------------------|------|------|-------------------|------------------|------------|-----------|-------------|
| #Si<br>mark | #Sink | EV     | NV   | total via | wire length (µm) | EV   | NV   | total via (ratio) | wire length (µm) | delay (µs) | power (W) | runtime (s) |
| r1          | 267   | 654    | 832  | 1486      | 1200300          | 509  | 720  | 1229 (0.827)      | 1364700 (1.137)  | 0.137993   | 0.163641  | 7.491       |
| r2          | 598   | 1542   | 1859 | 3401      | 2354000          | 1209 | 1658 | 2867 (0.843)      | 2788433 (1.185)  | 0.320785   | 0.374030  | 30.144      |
| r3          | 862   | 2232   | 2689 | 4921      | 3074900          | 1658 | 2335 | 3993 (0.811)      | 3696636 (1.202)  | 0.498202   | 0.514669  | 66.957      |
| r4          | 1903  | 4723   | 6046 | 10769     | 6145000          | 3712 | 5200 | 8912 (0.828)      | 7363705 (1.198)  | 1.614070   | 1.185089  | 790.046     |
| r5          | 3101  | 7863   | 9818 | 17681     | 9152300          | 6042 | 8504 | 14546 (0.823)     | 10854213 (1.186) | 2.095517   | 1.952519  | 1689.281    |
| Avera       | age   | -      | -    | -         | -                | -    | -    | - (0.826)         | - (1.181)        | -          | -         | -           |

Table IV The comparison of our algorithm and NVM [9] for benchmarks' results.



### 研究成果與相關論文發表

- 1. <u>Chia-Chun Tsai</u>, Chung-Chieh Kuo, Jan-Ou Wu, Trong-Yen Lee, and Rong-Shue Hsiao, "X-Clock Routing Based on Pattern Matching," *21<sup>st</sup> Annual IEEE International SOC Conference* (SOCC 2008), Sept. 17-20, 2008, Newport Beach, CA, USA.
- <u>Chia-Chun Tsai</u>, Wei-Shi Lin, Jan-Ou Wu, Chung-Chieh Kuo, and Trong-Yen Lee, "Layer Assignment Considering Manufacturability in X-Architecture Clock Tree," *IEEE 8<sup>th</sup> International Conference on Computer and Information Technology*, pp. 880-885, July 8-11, 2008, Sydney, Australia. (EI indexed)
- 3. <u>Chia-Chun Tsai</u>, Chia-Yi Chang, Wei-Bin Chang, and Guang-Ming Wu, "The Delay Effects Considering Vias in X-Clock Routing," *The 16<sup>th</sup> National Conference on Automation Technology*, June 27-28, 2008, Kaoshiung, Taiwan.
- 4. <u>Chia-Chun Tsai</u>, Kai-Wei Hong, and Trong-Yen Lee, "A Bisection-Based Power Reduction Design for CMOS Flash Analog-to-Digital Converters," *Journal of Circuits, Systems, and Computers*, Paper ID: A07-080, 2008. (EI)
- 5. Chia-Chun Tsai, Jan-Ou Wu, and Trong-Yen Lee, "GDME: Grey Relational Clustering

Applied to a Clock Tree Construction with Zero Skew and Minimal Delay," *IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences*, Vol.E91-A, No.1, pp. 365-374, Jan. 2008. (SCI, EI)

- 6. <u>Chia-Chun Tsai</u>, Kwok-Fong Kual, Trong-Yen Lee, and Rong-Shue Hsiao, "A Transmission Interface Integrated Circuit Design for ISO14443A RFID Transponders," *International SoC Design Conference*, pp. 509-512, Oct. 15-16, 2007, Seoul, Korea.
- 7. <u>Chia-Chun Tsai</u>, Jan-ou Wu, Chien-Wen Kao, and Trong-Yen Lee, "Coupling-Aware RLC-Based Clock Routing for Crosstalk Minimization," WSEAS Transactions on Circuits and Systems, Vol. 6, Issue 9, pp. 559-565, Sep. 2007. (EI)
- 8. <u>Chia-Chun Tsai</u>, Kwok-Fong Kual, and Trong-Yen Lee, "An RF Circuit Design of Transmission Interface for ISO14443A RFID Transponders," WSEAS Transactions on Circuits and Systems, Vol. 6, Issue 8, pp. 532-538, Aug. 2007. (EI)
- 9. <u>Chia-Chun Tsai</u>, Chung-Chieh Kuo, Jan-Ou Wu, Trong-Yen Lee, and Rong-Shue Hsiao, "A Topology-Based Construction for X-Architecture Clock Routing," *The 18th VLSI Design/CAD Symposium*, pp. 166-169, August 2007, Hualein, Taiwan.

### 出席國際會議研討心得報告

南華大學資工系 蔡加春 教授

國科會專題計畫補助:NSC 96-2221-E-343-007, 2007/8~2008/7

### X-結構多重時脈繞線合成策略應用於奈米製程技術之研究(I)

### X-Architecture Multiple Clock Routing Synthesis for Applications in Nanometer Process Technology(I)

## 2007 年單晶片設計國際研討會 ISOCC 2007, Seoul, Korea

單晶片設計國際研討會(ISOCC---International SoC Design Conference)是 一個高品質又專業的技術研討會,它提供給此領域的工業界、學術界及單晶片 設計與系統技術應用者等經驗交流的機會。此 ISOCC 研討會於 1992 年發源於 韓國,每年十月下旬都在韓國首都首爾市 (Seoul city) 舉行,為單晶片設計重要 的國際會議之一,也引起熱烈的迴響與好評。

ISOCC2007 研討會於 2007 年 10 月 15-16 日選在首爾市最重要的 COEX 會議中心,此 COEX 會議中心類似臺北世貿中心,它是國內外貿易重心,且是國際航空與重要城市的轉運樞紐站,又是最大型的 Mall。計有來自世界各地 10 個國家專家學者計有 85 篇論文被接受在會議上口頭發表,及 41 篇論文在會議上海報發表。





本研討會分兩整天舉行,85 篇論文分 20 sessions,分四個 meeting rooms 同時進行,41 篇論文分兩場海報展示。另有三場値得細聽的 Keynote Speeches,分別為 Samsung Electronics 執行副總裁 Nam-Sung Woo 的" SoC for Mobile Solutions: Status and Challenges", Mentor Graphics 之 CEO 暨主席 Walden C.

Rhines 的"Changing the Design Rules", Professor Bang-Sup Song, ECE Department, University of California, San Diego, 的"Digitizing TV Tuners for Media SoC"。還有

十五篇的 Invited papers, 談了各種 SoC 設計經驗與研究成果。



我在當時投稿時就選擇以海報展示與發表論文,使與會學者有更多時間來

共同參與討論。

<u>Chia-Chun Tsai</u>, Kwok-Fong Kual, Trong-Yen Lee, and Rong-Shue Hsiao, "A Transmission Interface Integrated Circuit Design for ISO14443A RFID Transponders," *International SoC Design Conference*, pp. 509-512, Oct. 15-16, 2007, Seoul, Korea.



本研討會還附有晶片設計競賽的 session,計有四十個團隊參賽,並以攤位 方式來展示其功能,所有參賽的作品均具有實務與商品化的潛力;還有一些 SoC 公司共同來參加展示,值得觀摩與學習。



此行參加研討會,認識一些來自澳洲、印度、日本、美國、韓國等國際學者,也認識不少韓國研究生,藉此了解他們研究方向與成果。



同時也抽空觀賞韓國古蹟,在這些古蹟中的建築與文字幾乎來自古老的中

## 或。



此行收穫良多,並帶回 2007 International SoC Design Conference 等相關資料,及與一些國際學者與業界交流經驗。最後感謝國科會計畫補助機票、註冊 費與生活費。

## 2008 年電腦與資訊技術國際研討會 CIT 2008, Sydney, Australia

電腦與資訊技術國際研討會(CIT---International Conference on Computer and Information Technology)是一個高品質又專業的技術研討會,它提供給此領域的業界、學術界及資訊技術系統應用者等經驗交流的機會,引起熱烈的迴響與好

評。



是第八屆 CIT 國際研討會 (CIT2008),於 2008 年 7 月 8-11 日在澳洲雪梨科技 大學 B 區第五棟大樓舉行(Building 5, Block B – CM05B, Haymarket, City Campus, University of Technology Sydney, Connor Quay Street & Ultimo Road, Haymarket, Sydney),主辦單位為 CIT Organizing Committee,協辦單位 為 IEEE Computer Society, IEEE Technical Committee on Scalable Computing, University of Technology at Sydney, Research Institute for Information and Communication Technology, Korea University, BK 21 Information Technology Division, Korea University, ARC Research Network in Enterprise Information Infrastructure (EII), Australia, Federation of Chinese Scholars in Australia (FOCSA), Australian Chinese ICT Professional Society °



本年國際研討會計有來自世界各個國家之專家學者投稿超過 550 篇論文, 但只有 150 篇論文被接受在會議上口頭發表,接受率約 27%,這些被接受論文 分佈於四個整天與四間 meeting rooms 同時舉行,包含 16 個 regular lecture sessions。另有四場 Keynotes: Prof. *David Suter* "High Dimensional Data Analysis in Computer Vision"、 Prof. *Laurence Tianruo Yang* "Ubiquitous/Pervasive Intelligence: Visions and Challenges"、Prof. *Wanlei Zhou* "Detection and Traceback of DDoS Attacks"及 Prof. *Albert Y. Zomaya* "Dynamic Mobility Management"

國內在此會議中總計發表論文達二十篇,來自台大、中正大學、中山大學、 嘉義大學、輔仁大學、雲林科大、朝陽科大、淡江大學、元智大學、實踐大學 等,南華大學只有本人參與,藉此國內外學者交流,有助於增加南華大學的曝 光率。本人擔任 VLSI and design methodology session 的 Chair,並有一篇論文 lectures 在會中發表。

<u>Chia-Chun Tsai</u>, Wei-Shi Lin, Jan-Ou Wu, Chung-Chieh Kuo, and Trong-Yen Lee, "Layer Assignment Considering Manufacturability in X-Architecture Clock Tree," *IEEE 8<sup>th</sup> International Conference on Computer and Information Technology* (CIT), pp. 880-885, July 8-11, 2008, Sydney, Australia.



此行參加研討會,與來自世界各地之國際學者相互交流,藉此了解他們研究方向與成果,並帶回大會相關資料與論文摘要及光碟片,及與一些國際學者與業界交流經驗,也認識不少各國的研究生,感受他們研究的積極精神與英文表達能力。





藉此機會參訪一些文化古蹟,雪梨百年的中央火車站、市政廳(Town Hall)、 QVB (Queen Victoria Building)及英國人登陸澳洲的地方 The Rocks,另有雄偉的 雪梨海港大橋、藝文活動氣息隆重的雪梨歌劇院、塔隆加動物園的無尾熊及袋 鼠等。



於 CIT 國際研討會結束後,順道至國際佛光會所屬之南天寺 (Nan Tien Temple)與南天大學 (Nan Tien University)籌備處拜訪,在妙友法師引領下,介紹 南天寺景物與籌備中的南天大學願景,並與南天寺住持依來法師恰談南天大學 與南華大學未來交流作初步交換意見,及與覺屏法師討論南華大學學生來此作 寒暑假短期的遊學 (Study-Tour) 可能合作的模式,含英文的學習及文化參訪,

未來可讓參與學生抵英文學分與通識涵養點數等。另外,住持依來法師告知「澳 洲大多數是公立大學,目前私立大學只有兩間,在此要辦理一間私立大學仍有 很多挑戰」。藉此行,覺屏法師與佛光義工葉穎萱小姐專程陪行至 Wollongong 與 Kiama 欣賞海岸美景,及拜彷臥龍崗大學 (University of Wollongong) 校園,該 校規模有三萬人,華人學生佔三分之一,而來自中國大陸學生佔多數,日後這 些學生留下來工作與移民至澳洲的可能性大增,值得後續觀察。



此行澳洲之機票、住宿與生活費較高,而主要經費由南華大學補助,部份 經費由國科會計畫國外差旅費餘額四千多元支付;澳洲之旅收穫豐碩,感謝所 有補助單位與熱心協助者。