Browsing by Author "Alsamurai, Ather Abdulrahem Mohammedsaed"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Master Thesis Text categorization based on semantic similarity with word2vector(Çankaya Üniversitesi, 2017) Alsamurai, Ather Abdulrahem MohammedsaedWith an increase in online information, which is mostly in the form of a text document, there was a need to organize it so that management and retrieval by the search engine became easier. It is difficult to manually organize these documents, therefore, machine-learning algorithms can be used to classify and organize them. Mostly, they are faster, more accurate and less expensive than manual classification. Most traditional approaches of machine learning algorithms depend on the term frequency in determining the importance of the term within a document and neglect semantically similar words. For this reason, we proposed to build a classifier based on semantically similar words in text classification by using the Word2Vector model as a tool to compute the similarity between documents and capture the correct topic. So we built two models by applying three phases: the first phase, we applied preprocessing steps and the second phase, we created a dictionary for top ten categories of Reuters 21578 datasets and the final phase we trained Word2Vector model on the Wikipedia English dataset and use it to compute similarity v between documents. Depending on the results of our study, we found that the second model (the most similar predicted topic) is better than the first model (average based predicted topic) in all categories. When we compare the results of our study with other studies, we found that result of our study is a parallel to the results of other studies, but not overcome them, although these studies use feature selection in the improvement of their results while we use feature extraction in explaining of our results.