尋找最接近k 個鄰居(k nearest neighbors; kNN)的問題是由一組已知資料集中找出與 查詢點最接近的k 個鄰居出來,這個問題普遍發生在許多科學與工程應用中,隨著資 訊科技的進步以及網際網路應用的普及,快速kNN 資料搜尋技術也變得愈來愈重要。 現有快速資料搜尋方法的表現很容易受到資料集的資料量、資料維度、以及資料分佈 情形影響,同樣的方法在不同的資料集下可能會有不同的表現。針對任意一個資料集, 由於資料集的分佈特性通常不同於論文中所使用的資料集而且沒有一種快速kNN 搜尋 方法在各種條件下都能有一致的效能表現,因此要選擇一個適用的快速搜尋方法並不 容易。為此,本計劃希望建立一套內含各類型資料集的效能檢測資料庫,用來評估各 種快速kNN 搜尋方法的效能表現。使用效能檢測資料庫,我們可以更全面的了解快速 kNN 搜尋方法的效能表現,方便選擇可用的快速搜尋方法。再者,藉由效能檢測資料 庫的幫助,我們預期可以開發出在各種條件下均能有一致表現的快速kNN 搜尋方法。 The problem of k nearest neighbors (kNN) search is to find the nearest k neighbors for a query point from a given data set. This problem occurs in many scientific and engineering applications. As the progress of information technology and the popularity of internet applications, the technique of fast kNN search becomes more and more important. The performance of a fast kNN search method is highly influenced by some factors, such as, the number, the dimension, and the distribution of data points in a data set. The performance of a fast kNN search method may be quite different under different data sets. To select a suitable fast kNN search method for a data set is not an easy task, since the distribution of the data set usually is different from the data sets tested in the literature. There is no fast kNN method can be used to get a consistent performance for any types of data sets. To solve this problem, we would like to create a performance evaluation database, which consists of all kinds of data sets, to evaluate fast kNN search methods. Using this database, we can fully understand the performance of a fast kNN search method and can help us to select a fast kNN search method for a data set more easily. Moreover, through the help of the performance evaluation database, it is hopeful to develop a better fast kNN method which has a consistent performance for all kinds of data sets.