Improving File Security through an Optimized Auto-Classification Approach Using Learning Models

Açıkgöz, Zeliha

Improving File Security through an Optimized Auto-Classification Approach Using Learning Models

dc.contributor.advisor	Arslan, Recep Sinan
dc.contributor.advisor	Arslan, Serdar
dc.contributor.author	Açıkgöz, Zeliha
dc.date.accessioned	2026-04-03T15:00:58Z
dc.date.available	2026-04-03T15:00:58Z
dc.date.issued	2024
dc.description.abstract	PDF dosyalarını hedef alan kötü amaçlı yazılımlar dijital güvenlik açısından ciddi bir tehdit oluşturmaktadır. Bu çalışmada PDF dosyalarının sınıflandırılması için kapsamlı bir yöntem önerilmiştir. Çalışma kapsamında PyPDF2, PDFMiner ve PyMuPDF kütüphaneleri kullanılarak PDF'lerden 43 farklı genel ve yapısal özellik çıkarılmıştır. Çalışmada iki faklı aşama bulunmaktadır. İlk aşamada kullanılan veriseti tek sütun olacak şekilde TF-IDF, N-gram Count Vectorizer ve Word2Vec yöntemleri ile sayısallaştırılarak özellik seçimi yapılmadan model eğitimlerinde kullanılmıştır. İkinci aşamada ise metin içeren sütunlar Word2Vec ile sayısallaştırıldıktan sonra özellik seçim yöntemleri uygulanarak model eğitimlerinde kullanılmıştır. İlk aşamada yedi farklı makine öğrenmesi ve dört farklı derin öğrenme modeli uygulanmıştır. İkinci aşamada ise makine öğrenme modellerine ek özgün tasarlanmış Çok Dallı CNN modeli kullanılmıştır. Özellik seçiminde SelectKBest, Recursive Feature Elimination (RFE) ve Lasso yöntemleri uygulanmıştır. Önerilen Çok Dallı CNN mimarisi özellik seçimi yöntemlerinin sonuçlarına uygulanmıştır. Çok Dallı CNN modeli yapılan test sonucunda Lasso özellik seçimiyle 0.9982 doğruluk değeri elde edilmiştir. Makine öğrenimi modelleriyle yapılan deneyler, özellik çıkarımı olan ve olmayan veri setleri üzerinde değerlendirilmiş ve karşılaştırmalı olarak doğruluk, kesinlik, geri çağırma oranı ve F1 puanı gibi metrikler her iki aşama için de analiz edilmiştir. Çalışma, yaklaşık 30.000 PDF dosyasından oluşan kapsamlı bir veri seti üzerinde test edilmiştir. Elde edilen sonuçlar, PDF tabanlı kötü amaçlı yazılımların tespiti için etkili bir yaklaşım sağlamayı amaçlamaktadır.	tr
dc.description.abstract	Malware targeting PDF files poses a serious threat to digital security. In this study, a comprehensive method is proposed for the classification of PDF files. Within the scope of the study, 43 different general and structural features were extracted from PDFs using PyPDF2, PDFMiner and PyMuPDF libraries. There are two different stages in the study. In the first stage, the dataset used was digitized as a single column with TF-IDF, N-gram Count Vectorizer and Word2Vec methods and used in model training without feature selection. In the second stage, the columns containing text were digitized with Word2Vec and used in model training by applying feature selection methods. In the first stage, seven different machine learning and four different deep learning models were applied. In the second stage, an original designed Multi-Branch CNN model was used in addition to the machine learning models. SelectKBest, Recursive Feature Elimination (RFE) and Lasso methods were applied in feature selection. The proposed Multi-Branch CNN architecture was applied to the results of feature selection methods. As a result of the test conducted on the Multi- Branch CNN model, 0.9982 accuracy value was obtained with Lasso feature selection. Experiments with machine learning models are evaluated on datasets with and without feature extraction, and comparatively, metrics such as accuracy, precision, recall rate, and F1 score are analyzed for both stages. The study is tested on a comprehensive dataset of approximately 30,000 PDF files. The obtained results aim to provide an effective approach for the detection of PDF-based malware.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.12416/16013
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=KOgdn9H3uVnWeb15j2W4h_jXA8d7px3YGpup_G00cXEmXxguEBmZ7o99TBKxhMqt
dc.language.iso	en
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr
dc.title	Improving File Security through an Optimized Auto-Classification Approach Using Learning Models	en_US
dc.title	Öğrenme Modellerini Kullanarak Optimize Edilmiş Otomatik Sınıflandırma Yaklaşımıyla Dosya Güvenliğini İyileştirme	tr
dc.type	Master Thesis
dspace.entity.type	Publication
gdc.description.department	FEN BİLİMLERİ ENSTİTÜSÜ / Bilgisayar Mühendisliği Ana Bilim Dalı
gdc.description.department	Çankaya Üniversitesi
gdc.description.endpage	80
gdc.identifier.yoktezid	993238
relation.isAuthorOfPublication.latestForDiscovery	ee02ccda-1b5e-4bba-b8b3-ece13ce2ec47
relation.isOrgUnitOfPublication.latestForDiscovery	0b9123e4-4136-493b-9ffd-be856af2cdb1

Collections

Yüksek Lisans Tezleri

Improving File Security through an Optimized Auto-Classification Approach Using Learning Models

Files

Collections