Classification of Linked Data Sources Using Semantic Scoring

Yumusak, Semih; Dogdu, Erdogan; Kodaz, Halife

Classification of Linked Data Sources Using Semantic Scoring

Files

Doğdu, Erdoğan.pdf (5.95 MB)

Date

2018

Authors

Yumusak, Semih

Dogdu, Erdogan

Kodaz, Halife

Publisher

Ieice-inst Electronics information Communication Engineers

Organizational Units

Organizational Unit

Bilgisayar Mühendisliği

Bölümümüzün temel amacı iş yaşamındaki kapsamlı problemlere profesyonel sorumluluk ve etik bilinciyle, bireysel ve takım içinde, teknolojik değişimlere hızla uyum sağlayarak çözüm geliştirebilen ve uygulayabilen, bilgisayar bilimleri ve mühendisliği alanında akademik ve ileri düzey araştırma ve geliştirme yapabilen, yenilikçi ve girişimci bir vizyonla ulusal ve uluslararası düzeyde yeni teknolojilerin geliştirilmesine ve mevcutların iyileştirilmesine katkı verebilen, mesleklerinde saygı duyulan mezunlar yetiştirmeyi hedeflemektedir.

Abstract

Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs: comment and rdfs: label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.

Description

Kodaz, Halife/0000-0001-8602-4262; Yumusak, Semih/0000-0002-8878-4991; Dogdu, Erdogan/0000-0001-5987-0164

ORCID

Kodaz, Halife

Yumusak, Semih

Dogdu, Erdogan

Keywords

Linked Data, Semantic Classification, Wordnet

Citation

Kasnesis, Panagiotis; Tatlas, Nicolaos-Alexandros; Mitilineos, Stelios A.; et al., "Acoustic Sensor Data Flow for Cultural Heritage Monitoring and Safeguarding", Acoustic Sensor Data Flow for Cultural Heritage Monitoring and Safeguarding, Vol. 19, No. 7, pp. 99-107, (2018).

WoS Q

Q4

Scopus Q

Q4

Source

15th International Semantic Web Conference (ISWC) -- OCT 17-21, 2016 -- Kobe, JAPAN

Volume

E101D

Issue

1

Start Page

99

End Page

107

URI

https://doi.org/10.1587/transinf.2017SWP0011

Collections

WoS İndeksli Yayınlar Koleksiyonu
Bilgisayar Mühendisliği Bölümü Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Full item page

Classification of Linked Data Sources Using Semantic Scoring

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Organizational Units

Journal Issue

Events

Abstract

Description

ORCID

Keywords

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections