English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 18278/19583 (93%)
造訪人次 : 940344      線上人數 : 1620
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://nhuir.nhu.edu.tw/handle/987654321/22494


    題名: 基於間隙法與K-means分群法之遺漏值推估模式
    其他題名: A Missing Value Estimation Model Based on the Gap Statistical Method and K-means Method
    作者: 李建逸
    Lee, Yian-yi
    貢獻者: 資訊管理學研究所
    邱宏彬
    Hung-pin Chiu
    關鍵詞: 間隙統計法;K-means法;自組織映射圖網路;資料探勘;遺漏值問題
    K-means Method;Missing Value;Data Mining;SOM-based model;Gap Statistical Method
    日期: 2006
    上傳時間: 2015-08-04 14:50:02 (UTC+8)
    摘要:   資料探勘是由大量資料中挖掘出隠藏知識的重要技術。整合這些來自不同資料來源的資料,往往產生許多遺漏的資料值。過多的遺漏值將明顯影響資料分析結果的有效性,此即遺漏值的問題。資料的分群分析是克服遺漏值問題常用的方法,此類方法透過群內的成員彼此相似性高,而群間的相異性較低之「物以類聚」特性來獲得遺漏值較適切的推估值。K-means分群法是進行分群分析的著名方法。然而,在複雜多樣的資料中,為K-means分群法決定群組個數是一件困難的工作。間隙法(Gap statistical method)會根據輸入資料的分佈自動估計出最佳分群個數,因此可解決K-means分群法事先決定群組個數的缺失,同時間隙法不需太多的疊代次數工作即可獲得好的結果。本研究嘗試整合間隙法與K-means分群法建立一個通用型之資料遺漏值的推估模式,以求出資料遺漏值的最適當推估值,讓使用者可以在使用資料探勘方法時仍可保有最大的資訊量,期使挖掘出的結果更有意義。   本研究將此推估模式應用到台電發電量之資料庫上,以驗證本研究方法之可行性與有效性。實驗結果顯示本研究方法優於SOM-based的推估方法。
      Data mining is a vitally important technique to unveil hidden information from a set of raw data. However, the integration of different sources of raw data usually comes along with missing values that may well be affecting the interpretation of data analysis. Such a bias effect is known as an issue of missing value of data integration. Data clustering techniques are widely deploying solutions to minimize possibilities of encountering missing values.    Members of the raw data in a cluster are with similar characteristics and that will notably differ from other clusters. This feature of a data cluster is useful to derive a better similarity of data estimation model. To date, K-means method is a well known data clustering technique. However, while raw data are coming from various sources, K-means method is difficult to decide how many numbers of data cluster shall be made within. Among many approaches, the Gap statistical method is a fairly good approach to automatically estimate the number of data clusters that can compensate the shortage of K-means method. It also needs less re-iterate generations to derive better results.   This study investigates into an integration of the K-means method and the Gap statistical method in order to find a generic missing value estimate model. The model will derive a most suitable estimation value which is beneficial to mine better results while holding the integration of vast number of raw data. The integration model of the study uses a database of power generation of the Taipower Company to testify its feasibility and effectiveness. The experiment results of the study show more statistical confidence than the SOM-based estimation model.
    顯示於類別:[資訊管理學系] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    094NHU05396006-001.pdf803KbAdobe PDF0檢視/開啟
    index.html0KbHTML142檢視/開啟


    在NHUIR中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋