【博士論文】学術データベース

博士論文 / A Study on Top-down Search Algorithms for m-Closest Keywords Queries Problem over Spatial Web 空間Webデータにおけるm-最近接キーワード検索問題のトップダウン解法に関する研究

著者

書誌事項

タイトル

A Study on Top-down Search Algorithms for m-Closest Keywords Queries Problem over Spatial Web

タイトル別名

空間Webデータにおけるm-最近接キーワード検索問題のトップダウン解法に関する研究

著者名

邱原

著者名

Yuan Qiu

学位授与大学

電気通信大学 (大学ID:0032) (CAT機関ID:KI000323)

取得学位

博士(工学)

学位授与番号

甲第904号

学位授与年月日

2017-03-24

注記・抄録

This thesis addresses the problem of m-closest keywords queries (mCK queries) over spatial web objects that contain descriptive texts and spatial information. The mCK query is a problem to find the optimal set of records in the sense that they are the spatially-closest records that satisfy m user-given keywords in their texts. The mCK query can be widely used in various applications to find the place of user’s interest. Generally, top-down search techniques using tree-style data structures are appropriate for finding optimal results of queries over spatial datasets. Thus in order to solve the mCK query problem, a previous study of NUS group assumed a specialized R*-tree (called bR*-tree) to store all records and proposed a top-down approach which uses an Apriori-based node-set enumeration in top-down process. However this assumption of prepared bR*-tree is not applicable to practical spatial web datasets, and the pruning ability of Apriori-based enumeration is highly dependent on the data distribution. In this thesis, we do not expect any prepared data-partitioning, but assume that we create a grid partitioning from necessary data only when an mCK query is given. Under this assumption, we propose a new search strategy termed Diameter Candidate Check (DCC), which can find a smaller node-set at an earlier stage of search so that it can reduce search space more efficiently. According to DCC search strategy, we firstly employ an implementation of DCC strategy in a nested loop search algorithm (called DCC-NL). Next, we improve the DCC-NL in a recursive way (called RDCC). RDCC can afford a more reasonable priority order of node-set enumeration. We also uses a tight lower bound to improve pruning ability in RDCC. RDCC performs well in a wide variey of data distributions, but it has still deficiency when one data-point has many query keywords and numerous node-sets are generated. Hence in order to avoid the generation of node-sets which is an unstable factor of search efficiency, we propose another different top-down search approach called Pairwise Expansion. Finally, we discuss some optimization techniques to enhance Pairwise Expansion approach. We first discuss the index structure in the Pairwise Expansion approach, and try to use an on-the-fly kd-tree to reduce building cost in the query process. Also a new lower bound and an upper bound are employed for more powerful pruning in Pairwise Expansion. We evaluate these approaches by using both real datasets and synthetic datasets for different data distributions, including 1.6 million of Flickr photo data. The result shows that DCC strategy can provide more stable search performance than the Apriori-based approach. And the Pairwise Expansion approach enhanced with lower/upper bounds, has more advantages over those algorithms having node-set generation, and is applicable for real spatial web data.

2016

各種コード

NII論文ID(NAID)

500001036856

NII著者ID(NRID)
  • 8000001130922
  • 8000001130923
本文言語コード

eng

データ提供元

機関リポジトリ / NDLデジタルコレクション

博士論文 / 電気通信大学 / 工学

博士論文 / 電気通信大学

博士論文 / 工学

関連著者

博士論文 / 大学

博士論文 / 学位