快速基因完全比對演算法

南華大學機構典藏系統 > 科技學院 > 資訊管理學系 > 博碩士論文 > Item 987654321/17725

請使用永久網址來引用或連結此文件: http://nhuir.nhu.edu.tw/handle/987654321/17725

題名:	快速基因完全比對演算法
其他題名:	A fast exact gene matching algorithm
作者:	孫顯智 Sun, Hsien-chih
貢獻者:	資訊管理學系廖怡欽 Yi-ching Liaw
關鍵詞:	完全基因序列比對;字串比對 String matching;Exact gene sequence matching
日期:	2013
上傳時間:	2015-01-05 11:58:59 (UTC+8)
摘要:	隨著基因定序成本的降低，取得基因序列變得越來越容易，透過比對基因序列與基因片段，可達到身分識別、親屬關係鑑定、疾病預防及診斷等應用。現有字串比對演算法，雖可進行基因比對，但比對速度緩慢。為了提升基因比對速度，Srikantha等人於2010年提出一套快速基因完全比對演算法。該演算法使用下採樣與雜湊表技術，可有效降低基因比對的時間複雜度。該演算法雖可降低時間複雜度，但當基因片段長度不足時無法使用，且存在許多無效的運算動作。為了提高該演算法的可用性以及提高基因比對速度，本論文提出三個改善方法。其中『多連續位置清單擷取方法』用來改善該演算法在基因片段長度不足時無法順利執行的情況；『線性位置過濾方法』及『去除無效的位置過濾動作』用來降低基因比對的時間複雜度。實驗結果顯示，所提方法在基因長度不足時仍可有效使用雜湊表內容，達到提升基因比對速度的效果，有效提升演算法的可用性。在一般情況下所提方法也可有效減少38%~95%的比對時間。　　With the decreasing of the DNA sequencing cost, to obtain the DNA sequence of a person becomes easier than before. Having a DNA sequence, we can check if a specific gene segment appears in it for purposes of identity recognition, paternity testing, and disease diagnosis and prevention. Existent string matching algorithms can be easily applied on such problems (gene matching problems) without any modification, but always takes a lot of computational time. To increase the gene matching speed, Srikantha et al. proposed a fast exact gene matching algorithm in 2010 using the down-sampling and hash table techniques. Srikantha's algorithm can effectively reduce the time-complexity of the gene matching process, but cannot be used for short gene segments and contains many redundant operations. To increase the availability of the algorithm and the gene matching speed, this thesis presents three improving methods. Where the multiple continuous location-lists retrieving method is used to make the algorithm applicable for all lengths of gene segments. The linear location filtering and the redundant filtering operation removing methods are used to reduce the time-complexity of gene matching process. Experimental results reveal that the proposed algorithm can effectively utilize the information in hash table to improve the gene matching speed for all lengths of gene segments. In general, the proposed algorithm can effectively reduce about 38% to 95% computational time.
顯示於類別:	[資訊管理學系] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
101NHU05396042-001.pdf		946Kb	Adobe PDF	224	檢視/開啟
index.html		0Kb	HTML	351	檢視/開啟

在NHUIR中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....