An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance

Demir, Onur; Saran, Murat

doi:10.1109/ACCESS.2026.3687277

An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance

dc.contributor.author	Demir, Onur
dc.contributor.author	Saran, Murat
dc.date.accessioned	2026-06-05T08:49:35Z
dc.date.available	2026-06-05T08:49:35Z
dc.date.issued	2026
dc.description.abstract	The rapid growth of scientific literature demands scalable methods that can track research evolution, yet density-based topic models such as BERTopic systematically exclude low-density documents as outliers, obscuring emerging and niche research areas. We propose a Neuro-Symbolic, Uncertainty-Gated Framework that recovers these outliers through geometric centroid reassignment and an ontological entropy gate derived from the Computer Science Ontology (CSO), routing only genuinely ambiguous cases to a local Large Language Model (Qwen2.5-14B via Ollama). A controlled ablation study demonstrates that centroid reassignment provides the largest coverage gain (+ 22.9 percentage points (pp)), the CSO entropy gate preserves niche-topic integrity, and selective LLM routing adds an additional + 5.9 pp. On 12,535 Turkish computer engineering theses (TR-CS; 2001-2025), the full pipeline raises coverage from 75.5% +/- 1.2 % (Bare BERTopic) to 95.7% +/- 0.4% (five-seed means) while maintaining competitive coherence (NPMI = 0.112 +/- 0.006) and cross-seed stability (AMI = 0.832 +/- 0.015), at similar to 15x fewer LLM calls than a fully generative Pure-LLM baseline. Mann-Kendall trend tests on the high-coverage series identify 69 statistically significant trends (FDR q < 0.05), and cross-corpus validation on similar to 200K arXiv CS abstracts confirms that the architecture generalizes beyond the primary dataset. The framework offers a reproducible, cost-effective solution for monitoring scientific developments in rapidly evolving fields.
dc.identifier.doi	10.1109/ACCESS.2026.3687277
dc.identifier.issn	2169-3536
dc.identifier.scopus	2-s2.0-105037802188
dc.identifier.uri	https://hdl.handle.net/20.500.12416/16136
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2026.3687277
dc.language.iso	en
dc.publisher	IEEE-Inst Electrical Electronics Engineers Inc
dc.relation.ispartof	IEEE Access
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Computer Science Ontology (CSO)
dc.subject	Large Language Models (LLMs)
dc.subject	Scientometrics
dc.subject	Neuro-Symbolic AI
dc.subject	Topic Modeling
dc.subject	Outlier Detection
dc.subject	Trend Analysis
dc.title	An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance	en_US
dc.type	Article
dspace.entity.type	Publication
gdc.author.scopusid	60615606700
gdc.author.scopusid	24722292900
gdc.author.wosid	Saran, Murat/U-5382-2018
gdc.coar.access	open access
gdc.coar.type	text::journal::journal article
gdc.description.department	Çankaya University
gdc.description.departmenttemp	[Demir, Onur; Saran, Murat] Cankaya Univ, Dept Comp Engn, Ankara, Turkiye
gdc.description.endpage	66464
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.startpage	66445
gdc.description.volume	14
gdc.description.woscitationindex	Science Citation Index Expanded
gdc.identifier.wos	WOS:001763003100037
gdc.index.type	Scopus
gdc.index.type	WoS
relation.isAuthorOfPublication.latestForDiscovery	f92fb8be-a1b3-4888-abaf-40ad03004780
relation.isOrgUnitOfPublication.latestForDiscovery	0b9123e4-4136-493b-9ffd-be856af2cdb1

Collections

Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance

Files

Collections