An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance

dc.contributor.author Demir, Onur
dc.contributor.author Saran, Murat
dc.date.accessioned 2026-06-05T08:49:35Z
dc.date.available 2026-06-05T08:49:35Z
dc.date.issued 2026
dc.description.abstract The rapid growth of scientific literature demands scalable methods that can track research evolution, yet density-based topic models such as BERTopic systematically exclude low-density documents as outliers, obscuring emerging and niche research areas. We propose a Neuro-Symbolic, Uncertainty-Gated Framework that recovers these outliers through geometric centroid reassignment and an ontological entropy gate derived from the Computer Science Ontology (CSO), routing only genuinely ambiguous cases to a local Large Language Model (Qwen2.5-14B via Ollama). A controlled ablation study demonstrates that centroid reassignment provides the largest coverage gain (+ 22.9 percentage points (pp)), the CSO entropy gate preserves niche-topic integrity, and selective LLM routing adds an additional + 5.9 pp. On 12,535 Turkish computer engineering theses (TR-CS; 2001-2025), the full pipeline raises coverage from 75.5% +/- 1.2 % (Bare BERTopic) to 95.7% +/- 0.4% (five-seed means) while maintaining competitive coherence (NPMI = 0.112 +/- 0.006) and cross-seed stability (AMI = 0.832 +/- 0.015), at similar to 15x fewer LLM calls than a fully generative Pure-LLM baseline. Mann-Kendall trend tests on the high-coverage series identify 69 statistically significant trends (FDR q < 0.05), and cross-corpus validation on similar to 200K arXiv CS abstracts confirms that the architecture generalizes beyond the primary dataset. The framework offers a reproducible, cost-effective solution for monitoring scientific developments in rapidly evolving fields.
dc.identifier.doi 10.1109/ACCESS.2026.3687277
dc.identifier.issn 2169-3536
dc.identifier.scopus 2-s2.0-105037802188
dc.identifier.uri https://hdl.handle.net/20.500.12416/16136
dc.identifier.uri https://doi.org/10.1109/ACCESS.2026.3687277
dc.language.iso en
dc.publisher IEEE-Inst Electrical Electronics Engineers Inc
dc.relation.ispartof IEEE Access
dc.rights info:eu-repo/semantics/openAccess
dc.subject Computer Science Ontology (CSO)
dc.subject Large Language Models (LLMs)
dc.subject Scientometrics
dc.subject Neuro-Symbolic AI
dc.subject Topic Modeling
dc.subject Outlier Detection
dc.subject Trend Analysis
dc.title An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance en_US
dc.type Article
dspace.entity.type Publication
gdc.author.scopusid 60615606700
gdc.author.scopusid 24722292900
gdc.author.wosid Saran, Murat/U-5382-2018
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.description.department Çankaya University
gdc.description.departmenttemp [Demir, Onur; Saran, Murat] Cankaya Univ, Dept Comp Engn, Ankara, Turkiye
gdc.description.endpage 66464
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.startpage 66445
gdc.description.volume 14
gdc.description.woscitationindex Science Citation Index Expanded
gdc.identifier.wos WOS:001763003100037
gdc.index.type Scopus
gdc.index.type WoS
relation.isAuthorOfPublication.latestForDiscovery f92fb8be-a1b3-4888-abaf-40ad03004780
relation.isOrgUnitOfPublication.latestForDiscovery 0b9123e4-4136-493b-9ffd-be856af2cdb1

Files