Browsing by Author "Dogdu, Erdogan"
Now showing 1 - 18 of 18
- Results Per Page
- Sort Options
Conference Object Citation - Scopus: 2A Discovery and Analysis Engine for Semantic Web(Assoc Computing Machinery, 2018) Kamilaris, Andreas; Dogdu, Erdogan; Kodaz, Halife; Uysal, Elif; Aras, Riza Emre; Yumusak, SemihThe Semantic Web promotes common data formats and exchange protocols on the web towards better interoperability among systems and machines. Although Semantic Web technologies are being used to semantically annotate data and resources for easier reuse, the ad hoc discovery of these data sources remains an open issue. Popular Semantic Web endpoint repositories such as SPARQLES, Linking Open Data Project (LOD Cloud), and LODStats do not include recently published datasets and are not updated frequently by the publishers. Hence, there is a need for a web-based dynamic search engine that discovers these endpoints and datasets at frequent intervals. To address this need, a novel web meta-crawling method is proposed for discovering Linked Data sources on the Web. We implemented the method in a prototype system named SPARQL Endpoints Discovery (SpEnD). In this paper, we describe the design and implementation of SpEnD, together with an analysis and evaluation of its operation, in comparison to the aforementioned static endpoint repositories in terms of time performance, availability, and size. Findings indicate that SpEnD outperforms existing Linked Data resource discovery methods.Conference Object Perceptions, Expectations and Implementations of Big Data in Public Sector(Ieee, 2018) Ozbayoglu, Murat; Yazici, Ali; Karakaya, Ziya; Dogdu, ErdoganBig Data is one of the most commonly encountered buzzwords among IT professionals nowadays. Technological advancements in data acquisition, storage, telecommunications, embedded systems and sensor technologies resulted in huge inflows of streaming data coming from variety of sources, ranging from financial streaming data to social media tweets, or wearable health gadgets to drone flight logs. The processing and analysis of such data is a difficult task, but as appointed by many IT experts, it is crucial to have a Big Data Implementation plan in today's challenging industry standards. In this study, we performed a survey among IT professionals working in the public sector and tried to address some of their implementation issues and their perception of Big Data today and their expectations about how the industry will evolve. The results indicate that most of the public sector professionals are aware of the current Big Data requirements, embrace the Big Data challenge and are optimistic about the future.Conference Object Citation - WoS: 40Citation - Scopus: 77Malware Classification Using Deep Learning Methods(Assoc Computing Machinery, 2018) Dogdu, Erdogan; Cakir, BugraMalware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today's cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data.Article Citation - WoS: 288Citation - Scopus: 368Context-Aware Computing, Learning, and Big Data in Internet of Things: a Survey(Ieee-inst Electrical Electronics Engineers inc, 2018) Dogdu, Erdogan; Ozbayoglu, Ahmet Murat; Sezer, Omer BeratInternet of Things (IoT) has been growing rapidly due to recent advancements in communications and sensor technologies. Meanwhile, with this revolutionary transformation, researchers, implementers, deployers, and users are faced with many challenges. IoT is a complicated, crowded, and complex field; there are various types of devices, protocols, communication channels, architectures, middleware, and more. Standardization efforts are plenty, and this chaos will continue for quite some time. What is clear, on the other hand, is that IoT deployments are increasing with accelerating speed, and this trend will not stop in the near future. As the field grows in numbers and heterogeneity, "intelligence" becomes a focal point in IoT. Since data now becomes "big data," understanding, learning, and reasoning with big data is paramount for the future success of IoT. One of the major problems in the path to intelligent IoT is understanding "context," or making sense of the environment, situation, or status using data from sensors, and then acting accordingly in autonomous ways. This is called "context-aware computing," and it now requires both sensing and, increasingly, learning, as IoT systems get more data and better learning from this big data. In this survey, we review the field, first, from a historical perspective, covering ubiquitous and pervasive computing, ambient intelligence, and wireless sensor networks, and then, move to context-aware computing studies. Finally, we review learning and big data studies related to IoT. We also identify the open issues and provide an insight for future study areas for IoT researchers.Conference Object Citation - WoS: 7Phishing E-Mail Detection by Using Deep Learning Algorithms(Assoc Computing Machinery, 2018) Hassanpour, Reza; Dogdu, Erdogan; Choupani, Roya; Goker, Onur; Nazli, NazliConference Object Classification of Linked Data Sources Using Semantic Scoring(Ieice-inst Electronics information Communication Engineers, 2018) Dogdu, Erdogan; Kodaz, Halife; Yumusak, SemihLinked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs: comment and rdfs: label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.Conference Object Citation - WoS: 56Citation - Scopus: 89A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters(Elsevier Science Bv, 2017) Ozbayoglu, Murat; Dogdu, Erdogan; Sezer, Omer BeratIn this study, we propose a stock trading system based on optimized technical analysis parameters for creating buy-sell points using genetic algorithms. The model is developed utilizing Apache Spark big data platform. The optimized parameters are then passed to a deep MLP neural network for buy-sell-hold predictions. Dow 30 stocks are chosen for model validation. Each Dow stock is trained separately using daily close prices between 1996-2016 and tested between 2007-2016. The results indicate that optimizing the technical indicator parameters not only enhances the stock trading performance but also provides a model that might be used as an alternative to Buy and Hold and other standard technical analysis models. (c) 2017 The Authors. Published by Elsevier B.V.Conference Object Citation - WoS: 1Topic Distribution Constant Diameter Overlay Design Algorithm (td-Cd(Ieee, 2017) Oztoprak, Kasim; Dogdu, Erdogan; Layazali, SinaPublish/subscribe communication systems, where nodes subscribe to many different topics of interest, are becoming increasingly more common in application domains such as social networks, Internet of Things, etc. Designing overlay networks that connect the nodes subscribed to each distinct topic is hence a fundamental problem in these systems. For scalability and efficiency, it is important to keep the maximum node degree of the overlay in the publish/subscribe system low. Ideally one would like to be able not only to keep the maximum node degree of the overlay low, but also to ensure that the network has low diameter. We address this problem by presenting Topic Distribution Constant Diameter Overlay Design Algorithm (TD-CD-ODA) that achieves a minimal maximum node degree in a low-diameter setting. We have shown experimentally that the algorithm performs well in both targets in comparison to the other overlay design algorithms.Article Citation - WoS: 16Citation - Scopus: 18The Impact of Incapacitation of Multiple Critical Sensor Nodes on Wireless Sensor Network Lifetime(Ieee-inst Electrical Electronics Engineers inc, 2017) Tavli, Bulent; Kahjogh, Behnam Ojaghi; Dogdu, Erdogan; Yildiz, Huseyin UgurWireless sensor networks (WSNs) are envisioned to be utilized in many application areas, such as critical infrastructure monitoring, and therefore, WSN nodes are potential targets for adversaries. Network lifetime is one of the most important performance indicators in WSNs. The possibility of reducing the network lifetime significantly by eliminating a certain subset of nodes through various attacks will create the opportunity for the adversaries to hamper the performance of WSNs with a low risk of detection. However, the extent of reduction in network lifetime due to elimination of a group of critical sensor nodes has never been investigated in the literature. Therefore, in this letter, we create two novel algorithms based on a linear programming framework to model and analyze the impact of critical node elimination attacks on WSNs and explore the parameter space through numerical evaluations of the algorithms. Our results show that critical node elimination attacks can significantly shorten the network lifetime.Conference Object Citation - WoS: 9Citation - Scopus: 20Multi-Label Classification of Text Documents Using Deep Learning(Ieee, 2020) Mohammed, Hamza Haruna; Dogdu, Erdogan; Gorur, Abdul Kadir; Choupani, RoyaRecently, studies in the field of Natural Language Processing and its related applications continue to mount up. Machine learning is proven to be predominantly data-driven in the sense that generic model building methods are used and then tailored to specific application domains. Needless to say, this has proven to be a very effective approach in modeling the complicated data dependencies we frequently experience in practice, making very few assumptions, and allowing the information to talk for themselves. Examples of these applications can be found in chemical process engineering, climate science, healthcare, and linguistic processing systems for natural languages, to name a few. Text classification is one of the important machine learning tasks that is used in many digital applications today; such as in document filtering, search engines, document management systems, and many more. Text classification is the process of categorizing of text documents into a given set of labels. Furthermore, multi-label text classification is the task of categorization of text documents into one or more labels simultaneously. Over the years, many methods for classifying text documents have been proposed, including the popularly known bag of words (BoW) method, support vector machine (SVM), tree induction, and label-vector embedding, to mention a few. These kinds of tools can be used in many digital applications, such as document filtering, search engines, document management systems, etc. Lately, deep learning-based approaches are getting more attention, especially in extreme multi-label text classification case. Deep learning has proven to be one of the major solutions to many machine learning applications, especially those involving high-dimensional and unstructured data. However, it is of paramount importance in many applications to be able to reason accurately about the uncertainties associated with the predictions of the models. In this paper, we explore and compare the recent deep learning-based methods for multi-label text classification. We investigate two scenarios. First, multi-label classification model with ordinary embedding layer, and second with Glove, word2vec, and FastText as pre-trained embedding corpus for the given models. We evaluated these different neural network model performances in terms of multi-label evaluation metrics for the two approaches, and compare the results with the previous studies.Conference Object Citation - WoS: 139Citation - Scopus: 210Intrusion Detection Using Big Data and Deep Learning Techniques(Assoc Computing Machinery, 2019) Dogdu, Erdogan; Faker, OsamaIn this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.Conference Object Citation - WoS: 34Citation - Scopus: 58Weather Data Analysis and Sensor Fault Detection Using an Extended Iot Framework With Semantics, Big Data, and Machine Learning(Ieee, 2017) Sezer, Omer Berat; Ozbayoglu, Murat; Dogdu, Erdogan; Onal, Aras CanIn recent years, big data and Internet of Things (IoT) implementations started getting more attention. Researchers focused on developing big data analytics solutions using machine learning models. Machine learning is a rising trend in this field due to its ability to extract hidden features and patterns even in highly complex datasets. In this study, we used our Big Data IoT Framework in a weather data analysis use case. We implemented weather clustering and sensor anomaly detection using a publicly available dataset. We provided the implementation details of each framework layer (acquisition, ETL, data processing, learning and decision) for this particular use case. Our chosen learning model within the library is Scikit-Learn based k-means clustering. The data analysis results indicate that it is possible to extract meaningful information from a relatively complex dataset using our framework.Conference Object Citation - WoS: 16Citation - Scopus: 21Spend: Linked Data Sparql Endpoints Discovery Using Search Engines(Ieice-inst Electronics information Communication Engineers, 2017) Yumusak, Semih; Dogdu, Erdogan; Kodaz, Halife; Kamilaris, Andreas; Vandenbussche, Pierre-YvesLinked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub. io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.Conference Object Spend Portal: Linked Data Discovery Using Sparql Endpoints(Ieee, 2017) Yumusak, Semih; Aras, Riza Emre; Uysal, Elif; Dogdu, Erdogan; Kodaz, Halife; Oztoprak, KasimWe present the project SpEnD, a complete SPARQL endpoint discovery and analysis portal. In a previous study, the SPARQL endpoint discovery and analysis steps of the SpEnD system were explained in detail. In the SpEnD portal, the SPARQL endpoints are extracted from the web by using web crawling techniques, monitored and analyzed by live querying the endpoints systematically. After many sustainability improvements in the SpEnD project, the SpEnD system is now online as a portal. SpEnD portal currently serves 1487 SPARQL endpoints, out of which 911 endpoints are uniquely found by SpEnD only when compared to the other existing SPARQL endpoint repositories. In this portal, the analytic results and the content information are shared for every SPARQL endpoint. The endpoints stored in the repository are monitored and updated continuously.Conference Object Citation - WoS: 30Citation - Scopus: 51An Artificial Neural Network-Based Stock Trading System Using Technical Analysis and Big Data Framework(Assoc Computing Machinery, 2017) Ozbayoglu, A. Murat; Dogdu, Erdogan; Sezer, Omer BeratIn this paper, a neural network-based stock price prediction and trading system using technical analysis indicators is presented. The model developed first converts the financial time series data into a series of buy-sell-hold trigger signals using the most commonly preferred technical analysis indicators. Then, a Multilayer Perceptron (MLP) artificial neural network (ANN) model is trained in the learning stage on the daily stock prices between 1997 and 2007 for all of the Dow30 stocks. Apache Spark big data framework is used in the training stage. The trained model is then tested with data from 2007 to 2017. The results indicate that by choosing the most appropriate technical indicators, the neural network model can achieve comparable results against the Buy and Hold strategy in most of the cases. Furthermore, fine tuning the technical indicators and/or optimization strategy can enhance the overall trading performance.Conference Object Citation - WoS: 8Citation - Scopus: 11Sentiment Analysis for the Social Media: a Case Study for Turkish General Elections(Assoc Computing Machinery, 2017) Yumusak, Semih; Oztoprak, Kasim; Dogdu, Erdogan; Uysal, ElifThe ideas expressed in social media are not always compliant with natural language rules, and the mood and emotion indicators are mostly highlighted by emoticons and emotion specific keywords. There are language independent emotion keywords (e.g. love, hate, good, bad), besides every language has its own particular emotion specific keywords. These keywords can be used for polarity analysis for a particular sentence. In this study, we first created a Turkish dictionary containing emotion specific keywords. Then, we used this dictionary to detect the polarity of tweets that are collected by querying political keywords right before the Turkish general election in 2015. The tweets were collected based on their relatedness with three main categories: the political leaders, ideologies, and political parties. The polarity of these tweets are analyzed in comparison with the election results.Conference Object Citation - WoS: 3Citation - Scopus: 4Improvement of General Inquirer Features With Quantity Analysis(Ieee, 2018) Karadeniz, Talha; Dogdu, ErdoganGeneral Inquirer is a word-affect association vocabulary having 11896 entries. Ranging from rectitude to expressiveness, it comes with a flavor of categories. Despite the extensive content, a mapping from "To be or not to be." to "How much?" can be beneficial for word representation. In this work, we apply a method of window based analysis to obtain real valued General Inquirer attributes. Sentence Completion task is chosen to calculate the effectiveness of the operation. After whitening post-process, total cosine similarity convention is followed to concentrate on embedding improvement. Results indicate that our quantity focused variant is considerable.Conference Object Citation - WoS: 4Citation - Scopus: 12Mis-Iot: Modular Intelligent Server Based Internet of Things Framework With Big Data and Machine Learning(Ieee, 2018) Sezer, Omer Berat; Ozbayoglu, Murat; Dogdu, Erdogan; Onal, Aras CanInternet of Things world is getting bigger everyday with new developments in all fronts. The new IoT world requires better handling of big data and better usage with more intelligence integrated in all phases. Here we present MIS-IoT (Modular Intelligent Server Based Internet of Things Framework with Big Data and Machine Learning) framework, which is "modular" and therefore open for new extensions, "intelligent" by providing machine learning and deep learning methods on "big data" coming from IoT objects, "server-based" in a service-oriented way by offering services via standart Web protocols. We present an overview of the design and implementation details of MIS-IoT along with a case study evaluation of the system, showing the intelligence capabilities in anomaly detection over real-time weather data.

