## 行政院國家科學委員會專題研究計畫 成果報告

# 具有減低延遲、傾斜、串音之時脈繞線合成策略應用於多重時脈系統晶片之研究(2/2) 研究成果報告(完整版)

計畫類別:個別型

計 畫 編 號 : NSC 95-2221-E-343-007-

執 行 期 間 : 95年08月01日至96年07月31日

執 行 單 位 : 南華大學資訊工程學系

計畫主持人: 蔡加春

計畫參與人員:博士班研究生-兼任助理:吳占鰲、郭仲傑

碩士班研究生-兼任助理:林葳士、許峰慈、江銑紘、劉宗明

報 告 附 件 : 出席國際會議研究心得報告及發表論文

處 理 方 式 : 本計畫可公開查詢

中華民國96年08月19日

## 行政院國家科學委員會補助專題研究計畫成果報告

※ 具有減低延遲、傾斜、串音之時脈繞線合成策略應用於 ※

※ 多重時脈系統晶片之研究(2/2)

\*

**※** 

計畫類別: ☑個別型計畫 □整合型計畫

計畫編號:NSC 95-2221-E-343-007-

執行期間:95年8月1日至96年7月31日

計畫主持人: 蔡加春 教授

共同主持人:

**※** 

計畫參與人員:吳占鰲、郭仲傑、林崴士、許峰慈、江銧紘、劉宗明

執行單位:南華大學資工系

中華民國 96 年 8 月 20 日

### 具有減低延遲、傾斜、串音之時脈繞線合成策略應用於多重時脈系統晶片之研究(2/2) Clock Routing Synthesized Strategies for Reducing Delay, Skew, and Crosstalk in Multi-Clock SoC (2/2)

計畫類別:個別型計畫

國科會計劃編號: NSC 95-2221-E-343-007 執行期限: 95 年 8 月 1 日至 96 年 7 月 31 日

主持人:南華大學資工系 蔡加春教授

#### 一、 中文摘要

時脈延遲與時脈傾斜是影響晶片系統設計的兩個主要因素。一個系統晶片是由多個 IP 或模組所組成,其時脈網路可被分割成多重時域的時脈,且每一個子時脈網路亦包含數個 IP 和模組,而我們也可適時地切換不同時域的時脈,以降低系統晶片的功率消耗。這樣的時脈網路結構下,也因此造成時脈繞線在晶片系統設計的複雜性,在時脈合成中如何達到最小的時脈延遲與時脈傾斜及最低的串音影響,將是在晶片系統實體設計中之一些時脈繞線策略,包含有 RLC 延遲模型的連線分析、時脈中點的數值計算法、結合灰色理論和 DME 的時脈繞線法、RLC 時脈樹中插入緩衝器的建構和減少時脈樹的串音干擾等。

首先,我們以目前的 RLC 延遲模型做連線分析,利用 RLC 樹的二階轉換函數與數值延遲模型,配合動差與結合 LU 矩陣的分解,提出另一種延遲模型分析,並使用最小平方曲線逼近法,得到兩個不同阻尼因素的經驗延遲公式。經實驗結果,在 RLC 線段方面,我們延遲時間比 Elmore, CPC, IFN 和 LW 等延遲模型絕對誤差總平均準確 15.91%;在 RLC 時脈樹方面,我們比 LW 和 IFN 延遲模型總平均絕對誤差準確了 3.24%.

接著,我們提出時脈中點的數值計算法,把分佈均勻的 RLC 等效線段轉換成 RC 等效線段的延遲模型,如此能使兩子樹的時脈中點尋求可由數值計算法成功地由下往上找出零傾斜的合成樹,利用此程序可以遞迴方式而建構出零傾斜多層的 RLC 時脈樹。我們使用 DME 的繞線方法配合標準例子和 HSPICE 比較延遲的準確度,結果在時脈傾斜和時脈延遲的絕對誤差分別只有 0.016% 與 0.51%。

進一步,我們完成了 GDME 利用灰色關聯度和 DME 時脈繞線成功地建構出 RLC 時脈樹,首先使用灰色關聯度 分析在晶片系統中的時脈端聚集,藉著各 IP 時脈輸入端的 座標、負載電容、內部的延遲和傾斜等因素來計算時脈端 的配對,然後採用 DME 由下往上及由上往下建構出 RLC 時脈樹,經由標準例子測試,實驗結果顯示 GDME 和 HSPICE 在時脈傾斜和延遲的比較分別只有 0.017% 和 0.2%的誤差,和其它 DME 方法在總連線長度比較總平均 改進 3.58%。

另外,我們提出緩衝器驅動的零傾斜 RLC 時脈樹,結合了 RLC 延遲模型、零傾斜中點法和緩衝器插入等,我們利用單元緩衝器來截斷時脈繞線在成長建構中所產生的非零傾斜的遞增現象,而建構出零傾斜的 RLC 時脈繞線樹,經由標準例子測試,實驗結果顯示具有插入緩衝器方法在時脈延遲可改進達 97%,和 LTM-MMM-AWA/DME and LTM-GMA-AWA/DME 比較在總連線長度分別改進 10%和2%,和 IDME 比較在最大時脈延遲改善 23.04%。

最後,我們提出耦合管理演算法來減少時脈繞線的串音,使用兩個 RLC 時脈繞線先分析出有無受串音影響的時脈傾斜和時脈延遲的參數,並以串音參數大小之判斷值結合重新繞線的方法,完成最小串音的時脈繞線,實驗證明

顯示此方法在考量串音影響之時脈延遲和傾斜能分別有效 地改善 4.4% and 20%。

**關鍵字:**單晶片系統,串音,時脈繞線,時脈延遲,時脈傾斜誤差。

#### 英文摘要

Clock delay and clock skew are critical for SoC (system-on a chip) design. Since an SoC consists of a number of IPs (Intellectual properties) and modules, its clock system may be partitioned into multiple domains each covering several IPs and modules. We can turn off certain domains to reduce power consumption in idle modules. Consequently, clock routing in SoC is complicated and how to synthesize a clock routing with minimal clock delay, clock skew, and crosstalk is critical for SoC. In this study, we propose several clock routing strategies including interconnection analysis of RLC delay model, numerical-based tapping point search, mixed Grey theory and DME clock routing, buffered RLC clock tree construction, and coupling-aware crosstalk reduction for clock synthesis in SoC physical design.

Firstly, we give the interconnection analysis for few existed RLC delay models and propose a numerical delay model based on second-order transfer function of an RLC tree. Combining LU decomposition matrices associated with matching moments, we derive two empirical delay formulas for different damping factors using the least squares curve fitting to obtain time domain responses of all the branches of an RLC tree. Compared with SPICE simulation, experimental results show that our delay model has the accuracy of 15.91% in total absolute average error to single RLC section than other delay models, Elmore, CPC, IFN and LW, and the accuracy of 3.24% in total average error to all the sinks in an RLC interconnected tree than LW and IFN models.

Secondly, we present a new numerical method based on uniformly distributed RLC model for tapping point search. The model simplifies an RLC wire to be the equivalent RC-based delay model such that a tapping point for two RLC-based subtrees can be accurately formulated in numerical approach. With the bottom-up recursion for two-based subtrees, tapping points can be successively determined by the numerical formulas to form a new zero-skew merged tree. This procedure is recursively operated and propagated to upgraded levels to get a zero-skew multi-level RLC clock tree. Benchmarks are tested by our approach associated with DME algorithm in linear running time and experimental results compared with Hspice show the absolute average errors of only 0.016% and 0.51% in skew ratio and critical delay, respectively.

Thirdly, we combine Grey relation with DME, called GDME, to successfully construct the RLC clock tree. Grey relational analysis is first used to predefine the clustering match of clock sinks in SoC. The parameters of each IP's clock sink, location, capacitive load, intrinsic delay, and intrinsic skew are accounted into the determination of each pair-sink matching. Then, DME algorithm based on bottom-up and

top-down phases is applied to construct a clock tree. Benchmarks are evaluated by our GDME algorithm and experimental results compared with Hspice show the absolute average errors of 0.017% and 0.2% in terms of skew and delay, respectively. The results compared with other DME methods have the improvement of up to 3.58% on average in total wire length.

Fourthly, we propose a zero-skew driven buffered RLC clock tree construction. The techniques of RLC model, exact-zero skew, and buffer insertion are counted into our approach. We insert unit-size buffers into each level of clock tree to interrupt the non-zero skew upward propagation and, thus can enable the reliable construction of a buffered RLC clock tree with zero skew. Experimental results for testing benchmarks show the improvement of up to 97% in terms of path delay than that of no any buffer insertion. The results LTM-MMM-AWA/DME with compared LTM-GMA-AWA/DME approaches have the savings of 10% and 2% respectively on average in total wirelength and, compared with IDME method achieve an average improvement of up to 23.04% in terms of clock delay.

Finally, we investigate a coupling-aware algorithm to reduce the crosstalk of clock routings. We conduct two-clock RLC-based routings and give empirical experiments without/with considering crosstalk interaction to prove that clock delay and clock skew would be degenerated due to crosstalk interaction between routing interconnections. The proposed coupling-aware algorithm is used to reroute two clock routings with crosstalk minimization. Experimental results show that clock delay and clock skew can be improved up to 4.4% and 20%, respectively, than that of no any consideration of crosstalk reduction.

Keywords: SoC, Crosstalk, Clock routing, Clock delay, Clock skew.

#### 二、 計畫的緣由與目的

半導體製程、積體電路設計與設計自動化工具 等技術的快速發展,結合電腦、通訊及消費者等 3C 高階電子應用產品,已成為新世紀的主流。這些 3C 高階電子產品其最重要的關鍵技術則建立於系統單 晶片(SoC--system on a chip),而 SoC 是建置在系統層 次的整合(SLI--system level integration), 即整合一個系 統需求之所有獨立個體積體電路元件於一顆單晶片 上,這些個體積體電路元件可以為 IP (Intellectual property) 或獨立的電路模組,甚至以其硬體、軟體或 韌體為代表。然而安排與整合這些 IP 與模組於一個 系統單晶片上就成為極複雜的工作,目前傳統的積體 電路設計流程與電腦輔助設計自動化工具並不能完 全符合 SoC 設計流程需求。

#### (A) Multiple Clocks 影響 SoC 系統效能

對系統單晶片 SoC 而言,時脈(Clock)工作頻率 與穩定度是決定整體系統的效能,它可由一個或多個 時脈信號源(clock source)來整合與分流所有 IP 與模 組的同步動作,然而,每一個 IP 都有它自己原來的 時脈工作頻率與容許的時脈傾斜誤差(clock skew), 而各個 IP 間所需要的介面模組也可能需要額外的時 脈信號以達到系統動作的需求,如何在各個 IP 與介 面模組之間的不同時脈信號源,尋求一個兼考量串音 干擾(Crosstalk)之有效的方法而得到最佳的時脈繞線 與最少的時脈傾斜誤差,以符合系統單晶片高效能動 作的要求,將是 SoC 重要關鍵技術之一。

#### (B) RLC 連線延遲模型的精確性

隨著 IC 製程與晶片工作頻率的不斷提昇,評估 時脈樹的首要考量為 clock delay 的計算。由圖一可知 在IC 製程 0.35μm 以前,電路的連線延遲(Interconnect delay) 遠小於閘級延遲 [1],設計者在規劃電路時, 只考慮到閘級延遲而忽略了連線延遲的影響,使得晶 片效能的提升有限。當時脈工作頻率超過 GHz 以上 時,以往採用一階 RC 模型來計算連線延遲已不再那 麼精確,電威(Inductance)效應也必須納入考量,而須 改使用二階 RLC 模型[2-6]來評估與計算連線延遲。 如圖二所示為一個時脈樹的 RLC 等效結構,每個端 點還包括輸入電容負載 CLi 與內部的時脈延遲 Ti。



圖一 閘級延遲與連線延遲的關係圖



圖二 一個時脈樹的 RLC 等效結構

#### (C) RLC 連線延遲模型的時脈樹建構

建立一個最小的時脈延遲與時脈傾斜誤差的時 脈樹是基本要求[7-9],在晶片系統內,假如時脈傾斜 誤差過大,則會使電路不能正常動作。因此有許多的 研究集中在建立時脈樹[10-14]。上述時脈樹建置的演 算法,在評估時脈延遲時所套用的均為 Elmore RC 延 遲模式或線性延遲模式,已不符合高製程技術及高 工作頻率晶片設計。因此,建置時脈樹並同時應配 合二階 RLC 模型對時脈延遲與傾斜誤差的評估已是 不可或缺的必要條件。

#### (D)串音(Crosstalk)干擾與解決之道

系統晶片之多重時脈結構中(Multiple clock structure),各相同頻率或不同頻率間的時脈繞線,相 互間信號亦可能會互相干擾,使原本不被考慮的兩條 連線間的電容和電感雜訊串音干擾效應(Crosstalk noise)在二階 RLC 模型上也越來越明顯。這些效應尤 其在系統晶片(SoC)的多時脈信號繞線上,將造成增 加時脈信號的延遲(Clock delay)與傾斜(Clock skew),也影響系統時脈信號的不同步,而對系統運 作的正確性將產生了嚴重的影響。

#### (E) Multiple Clocks 在 SoC 重點與目標

綜合以上之相關研究所述,在考量 RLC 連線模型下,SoC 的多重時脈繞線工作具有相當的複雜度。對完成的多重時脈繞線結果作串音干擾檢測與補救解決策略,這對 SoC 系統效能仍具有關鍵性的技術與目標。

最後,重新審視與驗證在 RLC 連線延遲方法下 之多時脈樹合成方法效能,同時整合現有積體電路設 計流程與強化 SoC 電腦輔助設計自動化技術,得到 最佳的時脈繞線效能。

#### 三、 研究方法及成果

我們提出五個時脈繞線合成策略來配合時脈合成設計,如圖三之流程,從實際的晶片設計流程來建立一些經驗法則,結合 physical placement & routing,以 Synopsys 的 Astro 先完成 placement 工作,建置適當的 clock tree,最後完成較佳的 clock 繞線。完成繞線後,再抽取 clock tree 所有的電路參數資訊等,經timing 的分析與評估其 clock path delay 與 clock skew 是否符合系統的要求,如果有任何未滿足之處,則回到 clock planning & tree synthesis 重新作調整與評估,及考量對 clock tree 作 buffer 插入與調整 buffer 大小,且更新 netlist。



圖三 積體電路時脈合成設計流程

首先,我們以目前的 RLC 延遲模型做連線分析,利用 RLC 樹的二階轉換函數與數值延遲模型,配合動差與結合 LU 矩陣的分解,提出另一種延遲模型分析,並使用最小平方曲線逼近法,得到兩個不同阻尼因素的經驗延遲公式。經實驗結果,如圖四(a)在 RLC線段方面,我們延遲時間比 Elmore [15]、CPC [16]、IFN [2] 和 LW[17]等延遲模型絕對誤差總平均準確15.91%;在圖四(b)RLC 時脈樹方面,我們比 LW 和IFN 延遲模型總平均絕對誤差準確了 3.24%。

接著,我們提出時脈中點的數值計算法,把分佈 均勻的 RLC 等效線段轉換成 RC 等效線段的延遲模型,如此能使兩子樹的時脈中點尋求可由數值計算法 成功地由下往上找出零傾斜的合成樹,利用此程序可 以遞迴方式而建構出零傾斜多層的 RLC 時脈樹。我 們使用 DME 的繞線方法配合標準例子和 HSPICE 比 較延遲的準確度,結果在表 1 顯示時脈傾斜和時脈延遲的絕對誤差如分別只有 0.016% 與 0.51%。



(b) 圖四 延遲模型絕對誤差比較 表 1 時脈傾斜和時脈延遲的絕對誤差

| Bench-   | #     | Ours      |      |        | Hspice    |          |          | Error |
|----------|-------|-----------|------|--------|-----------|----------|----------|-------|
| marks    | Sinks | Delay(ps) | Skew | CPU(s) | Delay(ps) | Skew(ps) | Ratio(%) | (%)   |
| r1       | 267   | 40383.58  | 0    | 0.031  | 40408     | 7        | 0.017    | -0.06 |
| r2       | 598   | 107174.53 | 0    | 0.125  | 107210    | 50       | 0.046    | -0.03 |
| r3       | 862   | 161620.94 | 0    | 0.156  | 161560    | 30       | 0.018    | 0.04  |
| r4       | 1903  | 473502.92 | 0    | 0.469  | 473000    | 100      | 0.021    | 0.10  |
| r5       | 3101  | 802415.55 | 0    | 1.171  | 801260    | 20       | 0.002    | 2.24  |
| Pri1     | 269   | 4883.19   | 0    | 0.047  | 4849.5    | 0.2      | 0.004    | 0.69  |
| Pri2     | 603   | 20013.75  | 0    | 0.094  | 19928     | 1        | 0.005    | 0.43  |
| Absolute |       | -         | -    | 0.299  | -         | -        | 0.016    | 0.51  |

進一步,我們完成了 GDME 利用灰色關聯度和DME 時脈繞線成功地建構出 RLC 時脈樹,首先使用灰色關聯度分析在晶片系統中的時脈端聚集,藉著各IP 時脈輸入端的座標、負載電容、內部的延遲和傾斜等因素來計算時脈端的配對,然後採用 DME 由下往上及由上往下建構出 RLC 時脈樹,經由標準例子測試,實驗結果在表 2 顯示 GDME 和 HSPICE 在時脈傾斜和延遲的比較分別只有 0.017% 和 0.2%的誤差,和其它 DME 方法在總連線長度比較總平均改進3.58%。

表 2 GDME 和 HSPICE 在時脈傾斜和延遲的比較

| Bench-<br>marks     | #<br>Sinks | О            | Hspice        |              |               | Delay        |                      |              |
|---------------------|------------|--------------|---------------|--------------|---------------|--------------|----------------------|--------------|
|                     |            | Cost<br>(mm) | Delay<br>(ps) | Skew<br>(ps) | Delay<br>(ps) | Skew<br>(ps) | Skew<br>ratio<br>(%) | error<br>(%) |
| Pri1                | 269        | 130.7        | 4870.07       | 0            | 4831          | 0.2          | 0.004                | 0.79         |
| Pri2                | 603        | 347.9        | 18538.91      | 0            | 18459         | 0            | 0                    | 0.43         |
| r1                  | 267        | 1433.9       | 42451.56      | 0            | 42486         | 5            | 0.012                | -0.08        |
| r2                  | 598        | 2902.9       | 114132.49     | 0            | 114160        | 80           | 0.070                | -0.02        |
| r3                  | 862        | 3670.7       | 158598.84     | 0            | 158520        | 10           | 0.006                | 0.05         |
| r4                  | 1903       | 7578.3       | 460173.06     | 0            | 459710        | 110          | 0.024                | 0.10         |
| r5                  | 3101       | 10916.9      | 772744.89     | 0            | 771680        | 20           | 0.003                | 0.14         |
| Absolute<br>average |            | -            | -             | -            | -             | -            | 0.017                | 0.20         |

另外,我們提出緩衝器驅動的零傾斜 RLC 時脈樹,結合了 RLC 延遲模型、零傾斜中點法和緩衝器插入等,我們利用單元緩衝器來截斷時脈繞線在成長

建構中所產生的非零傾斜的遞增現象,而建構出零傾斜的 RLC 時脈繞線樹,經由標準例子測試,實驗結果顯示具有插入緩衝器方法在時脈延遲可改進達97%,和 LTM-MMM-AWA/DME and LTM-GMA-AWA/DME 比較在總連線長度分別改進10%和2%,和 IDME 比較如表3最大時脈延遲改善23.04%。

| 表 3 | 和 | <b>IDME</b> | 比較總達 | 線長度 |
|-----|---|-------------|------|-----|
|-----|---|-------------|------|-----|

| Bench-  | #     | IDME [18]     |              | Propos        | sed method | Saving   | Saving        |  |
|---------|-------|---------------|--------------|---------------|------------|----------|---------------|--|
| mark    | Sinks | Delay<br>(ps) | #<br>Buffers | Delay<br>(ps) | # Buffers  | in Delay | in<br>Buffers |  |
| Pri1    | 269   | 925           | 103x2        | 686           | 535        | 25.84%   | -159%         |  |
| Pri2    | 603   | 1171          | 319x3        | 798           | 1203       | 31.85%   | -26%          |  |
| r1      | 267   | 1013          | 78 x 3       | 935           | 531        | 7.70%    | -126%         |  |
| r2      | 598   | 1526          | 125x4        | 1154          | 1195       | 24.38%   | -139%         |  |
| r3      | 862   | 1439          | 123x3        | 1174          | 1723       | 18.42%   | -366%         |  |
| r4      | 1903  | 2005          | 199x4        | 1495          | 3803       | 25.44%   | -446%         |  |
| r5      | 3101  | 2279          | 390x5        | 1646          | 6199       | 27.78%   | -217%         |  |
| Average |       | -             | -            | -             | -          | 23.04%   | -212%         |  |

最後,我們提出耦合管理演算法來減少時脈繞線的串音,使用兩個 RLC 時脈繞線先分析出有無受串音影響的時脈傾斜和時脈延遲的參數,並以串音參數大小之判斷值結合重新繞線的方法,完成最小串音的時脈繞線,圖五所示為有考慮串音影響的兩時脈繞線結果,實驗證明顯示此方法在考量串音影響之時脈延遲和傾斜能分別有效地改善 4.4% and 20%。



圖五 具256 時脈點最小串音影響的繞線

#### 四、 結論與討論

本研究計畫,我們在時脈合成繞線流程設計所提 出五個策略都已經完成,且由實驗結果對於晶片系統 時脈設計流程,也能提供相當的幫助,同時更能取代 配合時脈繞線合成設計的流程。

本研究群的相關研究結果,96 年於 IEICE 和 JGS 期刊各發表一篇論文,並有一篇 IEICE 期刊論文複審 中,95 年發表會議論文 5 篇,96 年發表會議論文 2 篇。

#### 參考文獻

- T. Mitsuhashi, T. Aoki, M. Murakata, and K. Yoshida, "Physical design CAD in deep sub-micron era," Proc. of European on Design Automation Conference and Exhibition, pp. 350-355, 1996.
- 2. Y. I. Ismail and E. G. Friedman, "Effects of inductance on the propagation delay and repeater insertion in VLSI

- circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 8, No. 2, pp. 195–206, April 2000.
- 3. Y. I. Ismail and E. G. Friedman, "Optimum Repeater Insertion Based on a COMS Delay Model for On-Chip RLC Interconnect," Proc. 11th IEEE Intern. ASIC Conference, pp. 369-373, 1998.
- 4. Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Equivalent Elmore delay for RLC trees," IEEE Trans. on CAD of Integrated Circuits and Systems, pp. 83-97, Jan. 2000.
- Chih-Ching Yan, Chia-Chun Tsai, and Wen-Ta Lee, "Performance Driven Based on Signal Repeater Insertion for RLC Interconnections," Proc. The 12th VLSI Design/CAD Symposium, paper B1-9, August 2001, ROC.
- A.B. Kahng and S. Muddu, "An analytical delay model for RLC interconnections," IEEE Trans. Computer-Aided Design, vol. 16, pp.1507-1514, Dec. 1997
- J. Burkis, "Clock tree synthesis for high performance ASIC," in Proc. The 4<sup>th</sup> Annual IEEE ASIC Seminar & Exhibits, pp. 9-8.1-8.3, 1991.
- 8. E. G. Friedman, Clock distribution networks in VLSI circuits and systems, IEEE Press, 1995.
- 9. A. Takahashi and Y. Kajitani, "Performance and reliability driven clock scheduling of sequential logic circuits", ASP-DAC, pp.37-42, 1997.
- Naveed Sherwani, Algorithms for VLSI Physical Design Automation, 2nd Ed., 1995, Kluwer Academic Publishers.
- M. A. B. Jackson, A Srinivasan, and E. S. Kuh, "Clock routing for high performance ICs," ACM/IEEE DAC, pp. 573~579, 1990.
- A. B. Kahng, J. Cong, and G. Robins, "High-performance clock routing based on recursive geometric matching," in Proc. ACM/IEEE Design Automation Conference, pp. 322-327, 1991.
- Kenneth D. Boese and Andrew B. Kahng, "Zero-Skew Clock Routing Trees With Minimum Wirelength," Proc. IEEE 5<sup>th</sup> Intl. ASIC Conference, Rochester, pp.1.1.1-1.1.5, September 1992.
- 14. R. Tsay, "Exact zero skew," in Proc. IEEE International Conf. on CAD, pp. 336-339, Nov. 1991.
- J. Rubinstein, P. Penfield, Jr., and M. A. Horowitz, "Signal delay in RC tree networks," *IEEE Trans. CAD of Integrated Circuits and Systems*, vol. 2, no. 3, pp. 202-211, July 1983.
- 16. Tai-Chen Chen, Song-Ra Pan, and Yao-Wen Chang, "Timing modeling and optimization under the transmission line model," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 12, no. 1, pp. 28-41, Jan. 2004.
- 17. Ching-An Lin and Chien-Hsien Wu, "Second-order approximations for RLC trees," *IEEE Trans. CAD of Integrated Circuits and Systems*, vol. 23, no. 7, pp. 1124-1128, July 2004.
- I. M. Liu, T. L. Chou, A. Aziz, and D. F. Wong, "Zero-skew clock tree construction by simultaneous routing, wire sizing and buffer insertion," *Proc. of The International Symposium on Physical Design*, pp. 33-38, 2000.

#### 研究成果與論文發表

The proposed RLC delay model is based on second-order moment matching for RLC trees and derives two empirical formulas for delay calculation. Our approach combines the distributed loop matrices and decomposes them to be the expression matching moments with LU matrices. Experimental results are published in [19, 20]. For the results, we always get the most accurate in propagation delay compared with Hspice. And, our delay model has more accurate than existing delay models, Elmore, CPC, IFN and LW, about 15.91% in total absolute average error in single RLC section and more accurate than LW and IFN about 3.24% in total average error to all the sinks in RLC interconnect tree. But, we find that the delay models adopted by CPC and IFN are dedicated for transmission lines. Thus, our delay model is suitable for general RLC wires in clock synthesis while CPC and IFN are for transmission lines.

- <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Chung-Chieh Kuo, Trong-Yen Lee, and Rong-Shue Hsiao "Delay Modeling Based on Second-Order Moment Matching for RLC Trees," *The 17th VLSI Design/CAD Symposium*, pp. 449-452, Aug. 2006.
- <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Trong-Yen Lee, and Rong-Shue Hsiao, "Delay Modeling for RLC Trees with LU Decomposition Matrice," *IEEE The Fourth International Conference on Information Technology and Applications* (ICITA), pp. 688-692, Jan. 2007.

The proposed approach of numerical-based tapping point searching is based on uniformly distributed RLC model that simplifies the RLC wire to be the equivalent RC-based delay model such that the numerical-based tapping point search for RLC clock tree is not difficult. We have proved that the numerical-based tapping point search associated with the DME approach can construct an exact zero-skew multi-level RLC clock tree. Experimental results for benchmarks have been published in [21, 22]. They show that our approach associated with DME algorithm compared with Hspice are 0.016% and 0.51% in skew ratio and critical delay, respectively, in absolute average error.

- 21. Jan-Ou Wu, <u>Chia-Chun Tsai</u>, Yu-Ting Hsieh, Chung-Chieh Kuo, and Trong-Yen Lee, "Exact Zero-Skew RLC Clock Tree Construction Based on Tapping Point Numerical Search," *The 17th VLSI Design/CAD Symposium*, pp. 461-464, Aug. 2006.
- 22. <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Yu-Ting Hsieh, Chung-Chieh Kuo, and Trong-Yen Lee, "Tapping point Numerical-Based Search for Exact Zero-Skew RLC Clock Tree Construction," *IEEE Asia Pacific Conference on Circuits and Systems* (APCCAS), pp. 813-816, Dec. 2006.

The proposed GDME clock routing algorithm associates all the parameters, intrinsic delay, intrinsic skew, capacitive load, and locations of each clock sink of an IP in SoC with Grey relational analysis as well as the DME approach to construct a zero-skew RLC clock tree. Grey relational analysis is used to predefine the clustering match of clock sinks and the DME based on bottom-up and top-down phases is used for clock tree construction. Experimental results are published in [23, 24]. The results demonstrate that the GDME improves up to 3.58% for total average in terms of total wire length compared with other improved DME algorithms. And, the results compared with Hspice are 0.017% and 0.2% absolute average error in terms of skew and delay, respectively.

 Chia-Chun Tsai, Jan-Ou Wu, Yu-Ting Hsieh, Trong-Yen Lee, and Rong-Shue Hsiao, "RLC Clock Tree Construction Based on DME Algorithms Associated with Grey Relational Cluster," IEEE The Fourth

- International Conference on Information Technology and Applications (ICITA), pp. 706-711, Jan. 2007.
- 24. Jan-Ou Wu, <u>Chia-Chun Tsai</u>, Yu-Ting Hsieh, and Trong-Yen Lee, "Grey relational clustering associated with DME algorithm for zero-skew clock tree construction in SoC," *The Journal of Grey System*, vol. 18, no. 4, pp. 287-304, Dec. 2006.

The approach of zero-skew driven buffered RLC-based clock tree construction combines the techniques of the nonlinear RLC delay model and buffer insertion. Many unit-size buffers are inserted into each level of clock tree to interrupt the non-zero skew upward propagation for the reliable construction of a buffered RLC clock tree with zero skew. Experimental results are published in [25, 26]. The results show the improvement of up to 97% for benchmarks in terms of path delay in the situation without/with buffer insertion and outperform of 10% and 2% on average in terms of wire length compared with LTM-MMM-AWA/DME and LTM-GMA-AWA/DME, respectively. And, the proposed algorithm achieves the improvement of up to 23.04% on average in terms of clock delay compared with IDME.

- 25. <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Chung-Chieh Kuo, Trong-Yen Lee, and Wen-Ta Lee, "Zero-skew driven for RLC clock tree construction in SoC," *The 3<sup>rd</sup> International Conference on Information Technology and Applications* (ICITA), vol. 1, pp. 561-566, July 2005.
- Jan-Ou Wu, <u>Chia-Chun Tsai</u>, Chung-Chieh Kuo, and Trong-Yen Lee, "Zero-Skew Driven for Buffered RLC Clock Tree Construction," *IEICE Trans. on Fundamental of Electronics, Communication and Computer Sciences*, vol. E90-A, no. 3, pp. 651-658, Mar. 2007.

Finally, crosstalk reduction for clock routings is investigated in chapter 6. The approach conducts two-clock RLC-based routings and gives experiments without/with considering crosstalk interaction. Then, a coupling-aware algorithm is presented to reroute two-clock routing with crosstalk minimization. Experimental results are published in [27, 28]. The results show the improvement of up to 4.4% and 20% on average in terms clock delay and skew respectively, than that of no any crosstalk reduction.

- 27. <u>Chia-Chun Tsai</u>, Chien-Wen Kao, Jan-Ou Wu, Trong-Yen Lee, and Rong-Shue Hsiao, "Crosstalk analysis and reduction for RLC-based clock routing," *The 16th VLSI Design/CAD Symposium*, Poster Session P1, Aug. 2005.
- 28. <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Chien-Wen Kao, Trong-Yen Lee, and Rong-Shue Hsiao, "Coupling aware RLC-based clock routings for crosstalk minimization," *IEEE International Symposium Circuits and Systems* (ISCAS), pp. 21-24, May 2006.

### 出席國際會議研討心得報告

南華大學資工系 蔡加春 教授

國科會專題計畫補助: NSC 95-2221-E-343-007, 2006/8~2007/7

NSC 94-2215-E-343-001 , 2005/8~2006/7

具有減低延遲、傾斜、串音之時脈繞線合成策略應用於多重時脈系統晶片之研究(1/2-2/2)

Clock Routing Synthesized Strategies for Reducing Delay, Skew, and Crosstalk in Multi-Clock SoC (1/2-2/2)

### ● 2007 年資訊技術與應用國際研討會 (ICITA 2007)

資訊技術與應用國際研討會(ICITA---International Conference on Information Technology & Applications)是一個高品質又專業的技術研討會,它提供給此領域的業界、學術界及資訊技術系統應用者等經驗交流的機會。,引起熱烈的迴響與好評。今年是第四屆 ICITA 國際研討會(ICITA2007),於 2007 年 1 月 15-18 日在中國黑龍江省哈爾濱(Harbin)之黑龍江大學國際學生大樓會議中心舉行,國內有七位學者前往發表論文。

本年國際研討會計有來自世界各地 11 個國家專家學者投稿,但只有 150 篇 論文被接受在會議上發表,這些被接受論文分布於兩天半與三個 meeting rooms 舉行,包含 20 regular lecture sessions。本人計有兩篇論文 lectures 在會中發表。

- Chia-Chun Tsai, Jan-Ou Wu, Yu-Ting Hsieh, Trong-Yen Lee, and Rong-Shue Hsiao, "RLC Clock Tree Construction Based on DME Algorithms Associated with Grey Relational Cluster," IEEE The Fourth International Conference on Information Technology and Applications (ICITA), pp. 706-711, Jan 15-18, 2007, Harbin. (Paper ID:1615)
- Chia-Chun Tsai, Jan-Ou Wu, Chung-Chieh Kuo, Trong-Yen Lee, and Rong-Shue Hsiao, "Delay Modeling for RLC Trees with LU Decomposition Matrice," *IEEE The Fourth International Conference on Information Technology and Applications* (ICITA), pp. 688-692, Jan 15-18, 2007, Harbin. (Paper ID:1611)





此行參加研討會,與來自世界各地之國際學者相互交流,藉此了解他們研究方向與成果,並帶回大會相關資料與論文摘要及光碟片,及與一些國際學者與業界交流經驗,也認識不少大陸研究生,感受他們研究的積極精神與英文表達能力,並參訪黑龍江大學與哈爾濱理工大學及第二次世界大戰侵華日軍七三一部隊遺址。雖然生活經費透支不少,但也有更豐碩的收穫;感謝國科會計畫所補助之機票與註冊費。







### ● 2006 年亞洲太平洋電路與系統國際會議 (APCCAS 2006)

亞洲太平洋電路與系統國際研討會(APCCAS---Asia Pacific Conference on Circuits and Systems)是一個高品質又多元專業的技術研討會,每兩年在不同國家舉辦,它提供給此領域的業界、學術界及電路設計與系統技術應用者等經驗交流的機會,上一次第七屆 APCCAS 2004 由臺灣之國立成功大學主辦。第八屆

APCCAS 於 2006 年 12 月 4-7 日在新加 (Singapore) 的 Grand Copthorne Waterfront 旅館舉行,會議主題爲"The Ever Evolving Circuits and Systems",會議主題最高指導主席爲劉濱達教授,國內有十幾位學者前往發表論文,本人與蕭培墉教授一同前往。

本年國際研討會計有來自世界各地大量文章投稿,但只有 500 篇論文被接受在會議上發表,這些被接受論文分布於三整天與七個 meeting rooms 舉行,包含51 regular lecture sessions, 22 special lecture sessions, and 1 regular poster session,可見會議內容之豐富。另有兩場 Keynote talks,Prof. Liu Rueywen of the University of Notre Dame: MIMO Wireless Communication with (Strong) Interferences: Channel-Design for Interference-Free and Optimal Capacity 及 Dr. Tetsuro Itakura of Toshiba Corporation: Trend of Analog Circuits and Low-Voltage Design, respectively.

本人計有兩篇 lectures 在會中發表, 環擔任 session chair.

- <u>Chia-Chun Tsai</u>, Jan-Ou Wu, Yu-Ting Shieh, Chung-Chieh Kuo, and Trong-Yen Lee, "Tapping Point Numerical-Based Search for Exact Zero-Skew RLC Clock Tree Construction," *IEEE Asia-Pacific Conference on Circuits and Systems*, pp. 813-816, Dec. 4-7, 2006, Singapore. (Paper ID:170)
- Chia-Chun Tsai, Jan-Ou Wu, Trong-Yen Lee, and Rong-Shue Hsiao, "Propagation Delay Minimization on RLC-Based Bus with Repeater Insertion," *IEEE Asia-Pacific Conference on Circuits and Systems*, pp. 1287-1290, Dec. 4-7, 2006, Singapore. (Paper ID:167)





此行參加研討會,與來自世界各地之國際學者相互交流,藉此了解他們研究方向與成果,並帶回大會相關資料與論文摘要及光碟片,及與一些國際學者與業界交流經驗,並與蕭培墉教授參觀南陽理工大學 (Nanyang Technological University),收穫豐碩。感謝國科會計畫所補助之機票與註冊費。





## ● 2006 年電路與系統國際研討會 (ISCAS 2006)

電路與系統國際研討會(ISCAS---International Symposium on Circuits and Systems)是一個高品質又多元專業的技術研討會,每年在不同國家舉辦,它提供給此領域的業界、學術界及電路設計與系統技術應用者等經驗交流的機會。 2006 年 5 月 21-24 日在希臘庫斯島 (Island of Kos, Greece) 舉行,國內有大批的學者前往發表論文,本人與馮武雄教授、蕭培墉教授、張孟州教授與高文忠教授等同往。

本年國際研討會投稿計有來自世界各地 52 個國家專家學者投稿 2429 篇文章,但只有 1439 篇論文被接受在會議上發表,接受率只有 59%,這些被接受論文分布於三整天 17 tracks 與 288 sessions,包含 14 lectures and 10 posters,還有 20 special sessions 與 demo session,可見會議內容之豐富。

本人計有一篇 lecture (Paper ID:1523,Chia-Chun Tsai, Jan-Ou Wu, Chien-Wen Kao, Trong-Yen Lee, and Rong-Shue Hsiao, "Coupling Aware RLC-Based Clock Routings for Crosstalk Minimization,") 與兩篇 posters (Paper ID:1668,Chia-Chun Tsai, Huang-Chi Chou, Trong-Yen Lee, and Rong-Shue Hsiao, "A Single Chip Image Sensor Embedded Smooth Spatial Filter with A/D Conversion,"。 Paper ID:2205,Chun-Ying Lai, Shyh-Kang Jeng, Yao-Wen Chang, and Chia-Chun Tsai, "Inductance Extraction for General Interconnect Structures,")在會中發表。





此行參加研討會,與來自世界各地之國際學者相互交流,藉此了解他們研究方向與成果,並帶回大會相關資料與論文摘要及光碟片,及與一些國際學者與業界交流經驗。感謝國科會計畫所補助之機票與註冊費;雖然生活經費透支不少,但也有更豐碩的收穫。

## ● 2005 年單晶片設計國際會議 (ISOCC 2005)

單晶片設計國際會議(ISOCC---International SoC Design Conference)是一個高品質又專業的技術研討會,它提供給此領域的工業界、學術界及單晶片設計與系統技術應用者等經驗交流的機會。此 ISOCC 研討會於 1992 年發源於韓國,每年十月下旬都在韓國首都首爾市舉行,爲單晶片設計重要的國際會議之一,也引起熱烈的迴響與好評。

ISOCC 研討會於 2005 年 10 月 20-21 日選在首爾市最重要的 COEX 會議中心,此 COEX 會議中心類似臺北世貿中心,它是國內外貿易重心,又是國際航空與重要城市的轉運樞紐站,又是最大型的 Mall。計有來自世界各地 8 個國家專家學者投稿 136 篇文章,但只有 85 篇論文被接受在會議上口頭發表。





本研討會分兩整天舉行,85 篇論文分 21 sessions,分四個 meeting rooms 同時進行。另有三場值得細聽的 Keynote Speeches,分別為 Samsung Electronics 總裁 Oh-Hyun Kwon 的"Creating Unique Values in SoC Competition",Synopsys 總裁 Chi-Foon Chan 的"IC Design Challenges in an SoC ERA",UC Santa Cruz 教授 Sung-Mo Kang 的"Challenges and Innovations for Development of SoCs"。還有幾場 Invited talks 談了很多 SoC 設計經驗與研究成果,包含我國清華大學林永隆教授的"SoC Design Foundry and a Case of Complex Multimedia SoC",美國 University of Illinois 的 Martin D. F. Wong 教授之"Floorplan Desin for Complex VLSi Systems"。









此次台灣有五位學者與研究單位參加此研討會,發表五篇文章。我的口頭 論文發表安排於 10/20 下午 1:50 至 2:10,題目爲「A Current-Mode CMOS Image Sensor Based on Smooth Spatial Filter」,與會學者高度參與討論,尤其 Session 主持人韓國漢 陽大學 Jeongjin Roh 教授更是對我們的晶片設計的技術頗有興趣,屢次詢問到深 層關鍵技術。





本研討會還附有晶片設計競賽的 session,有來自我國中央大學蔡宗漢教授帶領的團隊來參賽,所有參賽的作品均具有實務與商品化的潛力,值得觀摩與學習。





此行參加研討會,認識一些來自澳洲、印度、日本、美國、韓國等國際學者,也認識不少韓國研究生,藉此了解他們研究方向與成果。同時也抽空觀賞韓國古蹟,在這些古蹟中的建築與文字幾乎來自古老的中國。







此行收穫良多,並帶回 2005 International SoC Design Conference 等相關資料,及與一些國際學者與業界交流經驗。最後感謝國科會補助機票與註冊費。