基於間隙法與K-means分群法之遺漏值推估模式

NHUIR > College of Science and Technology > Department of Information Management > Disserations and Theses > Item 987654321/22494

Please use this identifier to cite or link to this item: http://nhuir.nhu.edu.tw/handle/987654321/22494

Title:	基於間隙法與K-means分群法之遺漏值推估模式
Other Titles:	A Missing Value Estimation Model Based on the Gap Statistical Method and K-means Method
Authors:	李建逸 Lee, Yian-yi
Contributors:	資訊管理學研究所邱宏彬 Hung-pin Chiu
Keywords:	間隙統計法;K-means法;自組織映射圖網路;資料探勘;遺漏值問題 K-means Method;Missing Value;Data Mining;SOM-based model;Gap Statistical Method
Date:	2006
Issue Date:	2015-08-04 14:50:02 (UTC+8)
Abstract:	資料探勘是由大量資料中挖掘出隠藏知識的重要技術。整合這些來自不同資料來源的資料,往往產生許多遺漏的資料值。過多的遺漏值將明顯影響資料分析結果的有效性，此即遺漏值的問題。資料的分群分析是克服遺漏值問題常用的方法，此類方法透過群內的成員彼此相似性高，而群間的相異性較低之「物以類聚」特性來獲得遺漏值較適切的推估值。K-means分群法是進行分群分析的著名方法。然而，在複雜多樣的資料中，為K-means分群法決定群組個數是一件困難的工作。間隙法(Gap statistical method)會根據輸入資料的分佈自動估計出最佳分群個數，因此可解決K-means分群法事先決定群組個數的缺失，同時間隙法不需太多的疊代次數工作即可獲得好的結果。本研究嘗試整合間隙法與K-means分群法建立一個通用型之資料遺漏值的推估模式，以求出資料遺漏值的最適當推估值，讓使用者可以在使用資料探勘方法時仍可保有最大的資訊量，期使挖掘出的結果更有意義。　　　本研究將此推估模式應用到台電發電量之資料庫上，以驗證本研究方法之可行性與有效性。實驗結果顯示本研究方法優於SOM-based的推估方法。　　Data mining is a vitally important technique to unveil hidden information from a set of raw data. However, the integration of different sources of raw data usually comes along with missing values that may well be affecting the interpretation of data analysis. Such a bias effect is known as an issue of missing value of data integration. Data clustering techniques are widely deploying solutions to minimize possibilities of encountering missing values. 　　　Members of the raw data in a cluster are with similar characteristics and that will notably differ from other clusters. This feature of a data cluster is useful to derive a better similarity of data estimation model. To date, K-means method is a well known data clustering technique. However, while raw data are coming from various sources, K-means method is difficult to decide how many numbers of data cluster shall be made within. Among many approaches, the Gap statistical method is a fairly good approach to automatically estimate the number of data clusters that can compensate the shortage of K-means method. It also needs less re-iterate generations to derive better results.　　　This study investigates into an integration of the K-means method and the Gap statistical method in order to find a generic missing value estimate model. The model will derive a most suitable estimation value which is beneficial to mine better results while holding the integration of vast number of raw data. The integration model of the study uses a database of power generation of the Taipower Company to testify its feasibility and effectiveness. The experiment results of the study show more statistical confidence than the SOM-based estimation model.
Appears in Collections:	[Department of Information Management] Disserations and Theses

Files in This Item:

File	Description	Size	Format
094NHU05396006-001.pdf		803Kb	Adobe PDF	0	View/Open
index.html		0Kb	HTML	142	View/Open

Loading...