Browsing by Author "Yumusak, Semih"

Now showing 1 - 8 of 8

Citation - WoS: 0
Citation - Scopus: 2
A Discovery and Analysis Engine for Semantic Web
(Assoc Computing Machinery, 2018) Yumusak, Semih; Kamilaris, Andreas; Dogdu, Erdogan; Kodaz, Halife; Uysal, Elif; Aras, Riza Emre; Bilgisayar Mühendisliği
The Semantic Web promotes common data formats and exchange protocols on the web towards better interoperability among systems and machines. Although Semantic Web technologies are being used to semantically annotate data and resources for easier reuse, the ad hoc discovery of these data sources remains an open issue. Popular Semantic Web endpoint repositories such as SPARQLES, Linking Open Data Project (LOD Cloud), and LODStats do not include recently published datasets and are not updated frequently by the publishers. Hence, there is a need for a web-based dynamic search engine that discovers these endpoints and datasets at frequent intervals. To address this need, a novel web meta-crawling method is proposed for discovering Linked Data sources on the Web. We implemented the method in a prototype system named SPARQL Endpoints Discovery (SpEnD). In this paper, we describe the design and implementation of SpEnD, together with an analysis and evaluation of its operation, in comparison to the aforementioned static endpoint repositories in terms of time performance, availability, and size. Findings indicate that SpEnD outperforms existing Linked Data resource discovery methods.
Citation - WoS: 0
A Novel Hypercube-based Approach to Overlay Design Algorithms on Topic Distribution Networks
(Gazi Univ, 2022) Yumusak, Semih; Layazali, Sina; Oztoprak, Kasim; Hassanpour, Reza; Yazılım Mühendisliği
Data communication in peer-to-peer (P2P) network requires a fine-grained optimization for memory and processing to lower the total energy consumption. When the concept of Publish/subscribe (Pub/Sub) systems were used as a communication tool in a P2P network, the network required additional optimization algorithms to reduce the complexity. The major difficulty for such networks was creating an overlay design algorithm (ODA) to define the communication patterns. Although some ODAs may perform worse on a high-scale, some may have better average/maximum node degrees. Based on the experimentation and previous works, this study designed an algorithm called the Hypercube-ODA, which reduces the average/maximum node degree for a topic connected Pub/Sub network. The Hypercube-ODA algorithm creates the overlay network by creating random cubes within the network and arranging the nodes with the cubes they belong to. In this paper, the details of the proposed Hypercube algorithm were presented and its performance was compared with the existing ODAs. Results from the experiments indicate that the proposed method outperforms other ODA methods in terms of lower average node degree (lowering the average node degree by up to 60%).
Citation - WoS: 0
Citation - Scopus: 0
Classification of Linked Data Sources Using Semantic Scoring
(Ieice-inst Electronics information Communication Engineers, 2018) Yumusak, Semih; Dogdu, Erdogan; Kodaz, Halife; 142876; Bilgisayar Mühendisliği
Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs: comment and rdfs: label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.
Citation - WoS: 9
Citation - Scopus: 10
Low-Diameter Topic-Based Pub/Sub Overlay Network Construction With Minimum Maximum Node Degree
(Peerj inc, 2021) Yumusak, Semih; Layazali, Sina; Oztoprak, Kasim; Hassanpour, Reza; Yazılım Mühendisliği
In the construction of effective and scalable overlay networks, publish/subscribe (pub/sub) network designers prefer to keep the diameter and maximum node degree of the network low. However, existing algorithms are not capable of simultaneously decreasing the maximum node degree and the network diameter. To address this issue in an overlay network with various topics, we present herein a heuristic algorithm, called the constant-diameter minimum-maximum degree (CD-MAX), which decreases the maximum node degree and maintains the diameter of the overlay network at two as the highest. The proposed algorithm based on the greedy merge algorithm selects the node with the minimum number of neighbors. The output of the CD-MAX algorithm is enhanced by applying a refinement stage through the CD-MAX-Ref algorithm, which further improves the maximum node degrees. The numerical results of the algorithm simulation indicate that the CD-MAX and CD-MAX-Ref algorithms improve the maximum node-degree by up to 64% and run up to four times faster than similar algorithms.
Low-diameter topic-based pub/sub overlay Network Construction with minimum–maximum node Degree
(2021) Yumusak, Semih; Layazali, Sina; Öztoprak, Kasım; Hassanpour, Reza; Yazılım Mühendisliği
In the construction of effective and scalable overlay networks, publish/subscribe (pub/sub) network designers prefer to keep the diameter and maximum node degree of the network low. However, existing algorithms are not capable of simultaneously decreasing the maximum node degree and the network diameter. To address this issue in an overlay network with various topics, we present herein a heuristic algorithm, called the constant-diameter minimum–maximum degree (CD-MAX), which decreases the maximum node degree and maintains the diameter of the overlay network at two as the highest. The proposed algorithm based on the greedy merge algorithm selects the node with the minimum number of neighbors. The output of the CD-MAX algorithm is enhanced by applying a refinement stage through the CD-MAX-Ref algorithm, which further improves the maximum node degrees. The numerical results of the algorithm simulation indicate that the CD-MAX and CD-MAX-Ref algorithms improve the maximum node-degree by up to 64% and run up to four times faster than similar algorithms.
Citation - WoS: 8
Citation - Scopus: 11
Sentiment Analysis for the Social Media: A Case Study for Turkish General Elections
(Assoc Computing Machinery, 2017) Uysal, Elif; Yumusak, Semih; Oztoprak, Kasim; Dogdu, Erdogan; Bilgisayar Mühendisliği
The ideas expressed in social media are not always compliant with natural language rules, and the mood and emotion indicators are mostly highlighted by emoticons and emotion specific keywords. There are language independent emotion keywords (e.g. love, hate, good, bad), besides every language has its own particular emotion specific keywords. These keywords can be used for polarity analysis for a particular sentence. In this study, we first created a Turkish dictionary containing emotion specific keywords. Then, we used this dictionary to detect the polarity of tweets that are collected by querying political keywords right before the Turkish general election in 2015. The tweets were collected based on their relatedness with three main categories: the political leaders, ideologies, and political parties. The polarity of these tweets are analyzed in comparison with the election results.
Citation - WoS: 0
Citation - Scopus: 0
Spend Portal: Linked Data Discovery Using Sparql Endpoints
(Ieee, 2017) Yumusak, Semih; Aras, Riza Emre; Uysal, Elif; Dogdu, Erdogan; Kodaz, Halife; Oztoprak, Kasim; Bilgisayar Mühendisliği
We present the project SpEnD, a complete SPARQL endpoint discovery and analysis portal. In a previous study, the SPARQL endpoint discovery and analysis steps of the SpEnD system were explained in detail. In the SpEnD portal, the SPARQL endpoints are extracted from the web by using web crawling techniques, monitored and analyzed by live querying the endpoints systematically. After many sustainability improvements in the SpEnD project, the SpEnD system is now online as a portal. SpEnD portal currently serves 1487 SPARQL endpoints, out of which 911 endpoints are uniquely found by SpEnD only when compared to the other existing SPARQL endpoint repositories. In this portal, the analytic results and the content information are shared for every SPARQL endpoint. The endpoints stored in the repository are monitored and updated continuously.
Citation - WoS: 16
Citation - Scopus: 20
Spend: Linked Data Sparql Endpoints Discovery Using Search Engines
(Ieice-inst Electronics information Communication Engineers, 2017) Yumusak, Semih; Dogdu, Erdogan; Kodaz, Halife; Kamilaris, Andreas; Vandenbussche, Pierre-Yves; Bilgisayar Mühendisliği
Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub. io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.