【博士論文】学術データベース

博士論文 / Development of Optimization Method with the Use of Genetic Algorithms for Natural Language and Related Models (遺伝的アルゴリズムを用いた自然言語とその関連モデルの最適化手法の開発)

著者

書誌事項

タイトル

Development of Optimization Method with the Use of Genetic Algorithms for Natural Language and Related Models

タイトル別名

(遺伝的アルゴリズムを用いた自然言語とその関連モデルの最適化手法の開発)

著者名

PAWEL CEZARY LEMPA

学位授与大学

北見工業大学 (大学ID:0007) (CAT機関ID:KI000072)

取得学位

博士(工学)

学位授与番号

甲第170号

学位授与年月日

2018-09-10

注記・抄録

Language models are an indispensable element of Natural Language Processing (NLP)research. They are used in machine translation, speech recognition, part-of-speech tagging,handwriting recognition, syntactic parsing, information retrieval and others. In short,language models are probability distributions over sequences of words. There are countlessnumbers of NLP solutions, algorithms and programs applying language models in specictasks. Unfortunately, often these are not optimized, but rely on default, most commonlyused sets of parameters. For example, many of them use numerous objective functionswith dierent variables but without proper weights applied to them. Users usually setthese variables themselves, which causes the results not to exceed a certain mediocre level.In case of small number of variables, users can adjust them manually, but optimization ofobjective functions with massive number of variables, especially multi-objective functionsis dicult and time consuming. This was the motivation to propose an application of aGenetic Algorithms (GAs) to optimize the weighting process.GAs are subset of Evolutionary Algorithms (EAs), inspired by the process of nat-ural selection known from nature. They use bio-inspired operators such as selection,crossover and mutation to generate solutions for optimization and search problems. Thisway GAs represent randomized heuristic search strategies simulating natural selectionprocess, where the population is composed of candidate solutions. They are focused onevolving a population from which strong and diverse candidates can emerge via mutationand crossover (mating). There exist dierent types of GAs, moreover the same type ofGA can bring dierent quality of solutions, depending on multiple variables, which includestarting population, number of generations or tness function. Finding the best startingparameters and type of GA the most appropriate for a given optimization problem is anext challenge. For that reason, I created a library that automatically applies multipletypes of GAs in optimization purposes.The library was created in C++ language, with the use of .NET environment. Its maingoal is to be used with dierent secondary programs and applications, without signicantinterfering in the original structure of the solution. Basic function of library allows the useof several dierent kinds of GAs like: Simple GA, Uniform Crossover GA, n-point CrossoverGA, GA with sexual selection, GA with chromosome aging and so forth. User can freelydene starting parameters for GA including: population size, starting population, numberof generations, type of mutation and crossover. Advanced functions of the library allow theuse of multithreaded processing for running several GAs in the same time. Basic optionof multithreading runs the same type of GA with dierent starting parameters, advancedversion allows to exchange information between dierent threads every set number ofgenerations. In case of large number of variables to compute, it is also possible to separatea mutation and crossover for several threads running at the same time.The most important functionality of the library is its easy adjustability in optimizationof dierent kinds of applications. The library is used to run the original program in everygeneration of GA with new weights for variables generated from natural selection. Timeof program running is closely related with original program processing time. It dependson the type of original solution and the time of processing one generation is similar to onerun of the optimized program.During creating and testing the library, numerous experiments have been carried out.In preliminary experiments the library was used for optimization of construction of me-chanical elements. Later the application was tested on natural language processing andrelated solutions. One part of the research was optimizing Quantitative Learner's Moti-vation Model. The goal of this experiment was to optimize the formula for prediction oflearning motivation by means of dierent weights for three values: interest, usefulness inthe future and satisfaction. For this optimization, an application in C# using GA librarywas created. Data sets for the experiments were acquired from questionnaires enquiringabout the above three elements in actual university classes. The results of the experi-ment showed improvement in the estimation of student's learning motivation up to over17 percentage points of Fscore.The nal experiment aimed to optimize the implementation of Support Vector Ma-chines (SVMs) for the problem of pattern recognition in natural language data. SVMsare a machine learning algorithm based on statistical learning theory. They are applied tolarge number of real-world applications, such as text categorization, hand-written charac-ter recognition, etc. Original program was created in C++. For this application numerousdierent types of GAs were tested with dierent number of generations, weight range andstarting parameters. Optimization was successful, with dierent scale of improvementbased on previously mentioned conditions, with the highest achieved improvement of over6 percentage points of recall comparing to baseline and reaching 78%. All experimentsdata are included in this work.

各種コード

NII論文ID(NAID)

500001081002

NII著者ID(NRID)
  • 8000001203829
本文言語コード

eng

データ提供元

機関リポジトリ / NDLデジタルコレクション

博士論文 / 北見工業大学 / 工学

博士論文 / 北見工業大学

博士論文 / 工学

関連著者

博士論文 / 大学

博士論文 / 学位