Browsing by Author "Doğdu, Erdoğan"

Now showing 1 - 19 of 19

A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters
(Elsevier Science Bv, 2017) Sezer, Omer Berat; Doğdu, Erdoğan; Ozbayoglu, Murat; Dogdu, Erdogan; 142876
In this study, we propose a stock trading system based on optimized technical analysis parameters for creating buy-sell points using genetic algorithms. The model is developed utilizing Apache Spark big data platform. The optimized parameters are then passed to a deep MLP neural network for buy-sell-hold predictions. Dow 30 stocks are chosen for model validation. Each Dow stock is trained separately using daily close prices between 1996-2016 and tested between 2007-2016. The results indicate that optimizing the technical indicator parameters not only enhances the stock trading performance but also provides a model that might be used as an alternative to Buy and Hold and other standard technical analysis models. (c) 2017 The Authors. Published by Elsevier B.V.
A Discovery and Analysis Engine for Semantic Web
(2018) Doğdu, Erdoğan; Kamilaris, Andreas; Doğdu, Erdoğan; Kodaz, Halife; Uysal, Elif; Aras, Riza Emre
The Semantic Web promotes common data formats and exchange protocols on the web towards better interoperability among systems and machines. Although Semantic Web technologies are being used to semantically annotate data and resources for easier reuse, the ad hoc discovery of these data sources remains an open issue. Popular Semantic Web endpoint repositories such as SPARQLES, Linking Open Data Project (LOD Cloud), and LODStats do not include recently published datasets and are not updated frequently by the publishers. Hence, there is a need for a web-based dynamic search engine that discovers these endpoints and datasets at frequent intervals. To address this need, a novel web meta-crawling method is proposed for discovering Linked Data sources on the Web. We implemented the method in a prototype system named SPARQL Endpoints Discovery (SpEnD). In this paper, we describe the design and implementation of SpEnD, together with an analysis and evaluation of its operation, in comparison to the aforementioned static endpoint repositories in terms of time performance, availability, and size. Findings indicate that SpEnD outperforms existing Linked Data resource discovery methods.
An Artificial Neural Network-Based Stock Trading System Using Technical Analysis And Big Data Framework
(Assoc Computing Machinery, 2017) Sezer, Omer Berat; Doğdu, Erdoğan; Ozbayoglu, A. Murat; Dogdu, Erdogan
In this paper, a neural network-based stock price prediction and trading system using technical analysis indicators is presented. The model developed first converts the financial time series data into a series of buy-sell-hold trigger signals using the most commonly preferred technical analysis indicators. Then, a Multilayer Perceptron (MLP) artificial neural network (ANN) model is trained in the learning stage on the daily stock prices between 1997 and 2007 for all of the Dow30 stocks. Apache Spark big data framework is used in the training stage. The trained model is then tested with data from 2007 to 2017. The results indicate that by choosing the most appropriate technical indicators, the neural network model can achieve comparable results against the Buy and Hold strategy in most of the cases. Furthermore, fine tuning the technical indicators and/or optimization strategy can enhance the overall trading performance.
Classification of Linked Data Sources Using Semantic Scoring
(Ieice-inst Electronics information Communication Engineers, 2018) Yumusak, Semih; Doğdu, Erdoğan; Dogdu, Erdogan; Kodaz, Halife; 142876
Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs: comment and rdfs: label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.
Context-Aware Computing, Learning, and Big Data in Internet of Things: A Survey
(Ieee-inst Electrical Electronics Engineers inc, 2018) Sezer, Omer Berat; Doğdu, Erdoğan; Dogdu, Erdogan; Ozbayoglu, Ahmet Murat
Internet of Things (IoT) has been growing rapidly due to recent advancements in communications and sensor technologies. Meanwhile, with this revolutionary transformation, researchers, implementers, deployers, and users are faced with many challenges. IoT is a complicated, crowded, and complex field; there are various types of devices, protocols, communication channels, architectures, middleware, and more. Standardization efforts are plenty, and this chaos will continue for quite some time. What is clear, on the other hand, is that IoT deployments are increasing with accelerating speed, and this trend will not stop in the near future. As the field grows in numbers and heterogeneity, "intelligence" becomes a focal point in IoT. Since data now becomes "big data," understanding, learning, and reasoning with big data is paramount for the future success of IoT. One of the major problems in the path to intelligent IoT is understanding "context," or making sense of the environment, situation, or status using data from sensors, and then acting accordingly in autonomous ways. This is called "context-aware computing," and it now requires both sensing and, increasingly, learning, as IoT systems get more data and better learning from this big data. In this survey, we review the field, first, from a historical perspective, covering ubiquitous and pervasive computing, ambient intelligence, and wireless sensor networks, and then, move to context-aware computing studies. Finally, we review learning and big data studies related to IoT. We also identify the open issues and provide an insight for future study areas for IoT researchers.
Improvement of General Inquirer Features with Quantity Analysis
(IEEE, 2018) Karadeniz, Talha; Doğdu, Erdoğan
General Inquirer is a word-affect association vocabulary having 11896 entries. Ranging from rectitude to expressiveness, it comes with a flavor of categories. Despite the extensive content, a mapping from "To be or not to be." to "How much?" can be beneficial for word representation. In this work, we apply a method of window based analysis to obtain real valued General Inquirer attributes. Sentence Completion task is chosen to calculate the effectiveness of the operation. After whitening post-process, total cosine similarity convention is followed to concentrate on embedding improvement. Results indicate that our quantity focused variant is considerable.
Intrusion Detection Using Big Data and Deep Learning Techniques
(Assoc Computing Machinery, 2019) Doğdu, Erdoğan; Doğdu, Erdoğan
In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.
Link Prediction in Knowledge Graphs with Numeric Triples Using Clustering
(2020) Doğdu, Erdoğan; Choupani, Roya; Doğdu, Erdoğan
Knowledge graphs (KG) include large amounts of structured data in many different domains. Knowledge or information is captured by entities and relationships between them in KG. One of the open problems in knowledge graphs area is "link prediction", that is predicting new relationships or links between the given existing entities in KG. A recent approach in graph-based learning problems is "graph embedding", in which graphs are represented as low-dimensional vectors. Then, it is easier to make link predictions using these vector representations. We also use graph embedding for graph representations. A sub-problem of link prediction in KG is the link prediction in the presence of literal values, and specifically numeric values, on the receiving end of links. This is a harder problem because of the numeric literal values taking arbitrary values. For such entries link prediction models cannot work, because numeric entities are not embedded in the vector space. There are several studies in this area, but they are all complex approaches. In this study, we propose a novel approach for link prediction in KG in the presence of numerical values. To overcome the embedding problem of numeric values, we used a clustering approach for clustering these numerical values in a knowledge graph and then used the clusters for performing link prediction. Then we clustered the numerical values to enhance the prediction rates and evaluated our method on a part of Freebase knowledge graph, which includes entities, relations, and numerical literals. Test results show that a considerable increase in link prediction rate can be achieved in comparison to previous studies. © 2020 IEEE.
Malware classification using deep learning methods
(2018) Doğdu, Erdoğan; Doğdu, Erdoğan
Malware, short for Malicious Software, is growing continuously in numbers and sophistication as our digital world continuous to grow. It is a very serious problem and many efforts are devoted to malware detection in today’s cybersecurity world. Many machine learning algorithms are used for the automatic detection of malware in recent years. Most recently, deep learning is being used with better performance. Deep learning models are shown to work much better in the analysis of long sequences of system calls. In this paper a shallow deep learning-based feature extraction method (word2vec) is used for representing any given malware based on its opcodes. Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is used to validate the model performance without sacrificing a validation split. Evaluation results show up to 96% accuracy with limited sample data. © 2018 Association for Computing Machinery.
MIS-IoT: Modular Intelligent Server Based Internet of Things Framework with Big Data and Machine Learning
(IEEE, 2018) Doğdu, Erdoğan; Sezer, Ömer Berat; Özbayoğlu, Murat; Doğdu, Erdoğan
Internet of Things world is getting bigger everyday with new developments in all fronts. The new IoT world requires better handling of big data and better usage with more intelligence integrated in all phases. Here we present MIS-IoT (Modular Intelligent Server Based Internet of Things Framework with Big Data and Machine Learning) framework, which is "modular" and therefore open for new extensions, "intelligent" by providing machine learning and deep learning methods on "big data" coming from IoT objects, "server-based" in a service-oriented way by offering services via standart Web protocols. We present an overview of the design and implementation details of MIS-IoT along with a case study evaluation of the system, showing the intelligence capabilities in anomaly detection over real-time weather data.
Perceptions, Expectations and Implementations of Big Data in Public Sector
(2018) Doğdu, Erdoğan; Özbayoğlu, Murat; Yazıcı, Ali; Karakaya, Ziya
Big Data is one of the most commonly encountered buzzwords among IT professionals nowadays. Technological advancements in data acquisition, storage, telecommunications, embedded systems and sensor technologies resulted in huge inflows of streaming data coming from variety of sources, ranging from financial streaming data to social media tweets, or wearable health gadgets to drone flight logs. The processing and analysis of such data is a difficult task, but as appointed by many IT experts, it is crucial to have a Big Data Implementation plan in today’s challenging industry standards. In this study, we performed a survey among IT professionals working in the public sector and tried to address some of their implementation issues and their perception of Big Data today and their expectations about how the industry will evolve. The results indicate that most of the public sector professionals are aware of the current Big Data requirements, embrace the Big Data challenge and are optimistic about the future.
Perceptions, Expectations and Implementations of Big Data in Public Sector
(IEEE, 2018) Doğdu, Erdoğan; Özbayoğlu, Murat; Yazıcı, Ali; Karakaya, Ziya
Big Data is one of the most commonly encountered buzzwords among IT professionals nowadays. Technological advancements in data acquisition, storage, telecommunications, embedded systems and sensor technologies resulted in huge inflows of streaming data coming from variety of sources, ranging from financial streaming data to social media tweets, or wearable health gadgets to drone flight logs. The processing and analysis of such data is a difficult task, but as appointed by many IT experts, it is crucial to have a Big Data Implementation plan in today's challenging industry standards. In this study, we performed a survey among IT professionals working in the public sector and tried to address some of their implementation issues and their perception of Big Data today and their expectations about how the industry will evolve. The results indicate that most of the public sector professionals are aware of the current Big Data requirements, embrace the Big Data challenge and are optimistic about the future.
Phishing e-mail detection by using deep learning algorithms
(2018) Hassanpour, Reza; Doğdu, Erdoğan; Choupani, Roya; Göker, Onur; 21259
Phishing e-mails are considered as spam e-mails, which aim to collect sensitive personal information about the users via network. Since the main purpose of this behavior is mostly to harm users financially, it is vital to detect these phishing or spam e-mails immediately to prevent unauthorized access to users’ vital information. To detect phishing e-mails, using a quicker and robust classification method is important. Considering the billions of e-mails on the Internet, this classification process is supposed to be done in a limited time to analyze the results. In this work, we present some of the early results on the classification of spam email using deep learning and machine methods. We utilize word2vec to represent emails instead of using the popular keyword or other rule-based methods. Vector representations are then fed into a neural network to create a learning model. We have tested our method on an open dataset and found over 96% accuracy levels with the deep learning classification methods in comparison to the standard machine learning algorithms.
Sentiment Analysis for the Social Media: A Case Study for Turkish General Elections
(Assoc Computing Machinery, 2017) Uysal, Elif; Doğdu, Erdoğan; Yumusak, Semih; Oztoprak, Kasim; Dogdu, Erdogan
The ideas expressed in social media are not always compliant with natural language rules, and the mood and emotion indicators are mostly highlighted by emoticons and emotion specific keywords. There are language independent emotion keywords (e.g. love, hate, good, bad), besides every language has its own particular emotion specific keywords. These keywords can be used for polarity analysis for a particular sentence. In this study, we first created a Turkish dictionary containing emotion specific keywords. Then, we used this dictionary to detect the polarity of tweets that are collected by querying political keywords right before the Turkish general election in 2015. The tweets were collected based on their relatedness with three main categories: the political leaders, ideologies, and political parties. The polarity of these tweets are analyzed in comparison with the election results.
SpEnD portal: linked data discovery using SPARQL endpoints
(IEEE, 2017) Doğdu, Erdoğan; Aras, Rıza Emre; Uysal, Elif; Doğdu, Erdoğan; Kodaz, Halife; Öztoprak, Kasım
We present the project SpEnD, a complete SPARQL endpoint discovery and analysis portal. In a previous study, the SPARQL endpoint discovery and analysis steps of the SpEnD system were explained in detail. In the SpEnD portal, the SPARQL endpoints are extracted from the web by using web crawling techniques, monitored and analyzed by live querying the endpoints systematically. After many sustainability improvements in the SpEnD project, the SpEnD system is now online as a portal. SpEnD portal currently serves 1487 SPARQL endpoints, out of which 911 endpoints are uniquely found by SpEnD only when compared to the other existing SPARQL endpoint repositories. In this portal, the analytic results and the content information are shared for every SPARQL endpoint. The endpoints stored in the repository are monitored and updated continuously.
The impact of incapacitation of multiple critical sensor nodes on wireless sensor network lifetime
(Ieee-inst Electrical Electronics Engineers inc, 2017) Yildiz, Huseyin Ugur; Doğdu, Erdoğan; Tavli, Bulent; Kahjogh, Behnam Ojaghi; Dogdu, Erdogan
Wireless sensor networks (WSNs) are envisioned to be utilized in many application areas, such as critical infrastructure monitoring, and therefore, WSN nodes are potential targets for adversaries. Network lifetime is one of the most important performance indicators in WSNs. The possibility of reducing the network lifetime significantly by eliminating a certain subset of nodes through various attacks will create the opportunity for the adversaries to hamper the performance of WSNs with a low risk of detection. However, the extent of reduction in network lifetime due to elimination of a group of critical sensor nodes has never been investigated in the literature. Therefore, in this letter, we create two novel algorithms based on a linear programming framework to model and analyze the impact of critical node elimination attacks on WSNs and explore the parameter space through numerical evaluations of the algorithms. Our results show that critical node elimination attacks can significantly shorten the network lifetime.
Topic distribution constant diameter overlay design algorithm (TD-CD-ODA)
(IEEE, 2017) Doğdu, Erdoğan; Layazali, Sina; Doğdu, Erdoğan
Publish/subscribe communication systems, where nodes subscribe to many different topics of interest, are becoming increasingly more common in application domains such as social networks, Internet of Things, etc. Designing overlay networks that connect the nodes subscribed to each distinct topic is hence a fundamental problem in these systems. For scalability and efficiency, it is important to keep the maximum node degree of the overlay in the publish/subscribe system low. Ideally one would like to be able not only to keep the maximum node degree of the overlay low, but also to ensure that the network has low diameter. We address this problem by presenting Topic Distribution Constant Diameter Overlay Design Algorithm (TD-CD-ODA) that achieves a minimal maximum node degree in a low-diameter setting. We have shown experimentally that the algorithm performs well in both targets in comparison to the other overlay design algorithms.
Topic Distribution Constant Diameter Overlay Design Algorithm (TD-CD-ODA)
(Ieee, 2017) Layazali, Sina; Doğdu, Erdoğan; Oztoprak, Kasim; Dogdu, Erdogan; 129828
Publish/subscribe communication systems, where nodes subscribe to many different topics of interest, are becoming increasingly more common in application domains such as social networks, Internet of Things, etc. Designing overlay networks that connect the nodes subscribed to each distinct topic is hence a fundamental problem in these systems. For scalability and efficiency, it is important to keep the maximum node degree of the overlay in the publish/subscribe system low. Ideally one would like to be able not only to keep the maximum node degree of the overlay low, but also to ensure that the network has low diameter. We address this problem by presenting Topic Distribution Constant Diameter Overlay Design Algorithm (TD-CD-ODA) that achieves a minimal maximum node degree in a low-diameter setting. We have shown experimentally that the algorithm performs well in both targets in comparison to the other overlay design algorithms.
Weather data analysis and sensor fault detection using an extended ıot framework with semantics, big data, and machine learning
(IEEE, 2017) Doğdu, Erdoğan; Doğdu, Erdoğan; Özbayoğlu, Murat; Önal, Aras Can
In recent years, big data and Internet of Things (IoT) implementations started getting more attention. Researchers focused on developing big data analytics solutions using machine learning models. Machine learning is a rising trend in this field due to its ability to extract hidden features and patterns even in highly complex datasets. In this study, we used our Big Data IoT Framework in a weather data analysis use case. We implemented weather clustering and sensor anomaly detection using a publicly available dataset. We provided the implementation details of each framework layer (acquisition, ETL, data processing, learning and decision) for this particular use case. Our chosen learning model within the library is Scikit-Learn based k-means clustering. The data analysis results indicate that it is possible to extract meaningful information from a relatively complex dataset using our framework.