An Uncertainty-Gated Neuro-Symbolic Framework for High-Coverage Topic Modeling and Trend Analysis in Scholarly Corpora with LLM Assistance
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
The rapid growth of scientific literature demands scalable methods that can track research evolution, yet density-based topic models such as BERTopic systematically exclude low-density documents as outliers, obscuring emerging and niche research areas. We propose a Neuro-Symbolic, Uncertainty-Gated Framework that recovers these outliers through geometric centroid reassignment and an ontological entropy gate derived from the Computer Science Ontology (CSO), routing only genuinely ambiguous cases to a local Large Language Model (Qwen2.5-14B via Ollama). A controlled ablation study demonstrates that centroid reassignment provides the largest coverage gain (+ 22.9 percentage points (pp)), the CSO entropy gate preserves niche-topic integrity, and selective LLM routing adds an additional + 5.9 pp. On 12,535 Turkish computer engineering theses (TR-CS; 2001-2025), the full pipeline raises coverage from 75.5% +/- 1.2 % (Bare BERTopic) to 95.7% +/- 0.4% (five-seed means) while maintaining competitive coherence (NPMI = 0.112 +/- 0.006) and cross-seed stability (AMI = 0.832 +/- 0.015), at similar to 15x fewer LLM calls than a fully generative Pure-LLM baseline. Mann-Kendall trend tests on the high-coverage series identify 69 statistically significant trends (FDR q < 0.05), and cross-corpus validation on similar to 200K arXiv CS abstracts confirms that the architecture generalizes beyond the primary dataset. The framework offers a reproducible, cost-effective solution for monitoring scientific developments in rapidly evolving fields.
Description
Keywords
Computer Science Ontology (CSO), Large Language Models (LLMs), Scientometrics, Neuro-Symbolic AI, Topic Modeling, Outlier Detection, Trend Analysis
Fields of Science
Citation
WoS Q
Scopus Q
Source
Volume
14
Issue
Start Page
66445
End Page
66464
