Big Data Reduction and Visualization Using the K-Means Algorithm

Akyol, HakanKızılduman, Hale SemaDökeroğlu, TanselBig Data Reduction and Visualization Using the K-Means AlgorithmBig Data Reduction and Visualization Using the K-Means AlgorithmMy University2022Big DataData ReductionVisualizationK-MeansMy UniversityMy University2024-02-142024-02-142022enArticleAkyol, H.; Kızılduman, H.S.; Dökeroğlu, T. (2022). "Big Data Reduction and Visualization Using the K-Means Algorithm", Ankara Science University, Researcher, Vol.2, No.1., pp.40-45.2717-9494https://hdl.handle.net/20.500.12416/719410.55185/researcher.1135824info:eu-repo/semantics/openAccessA huge amount of data is being produced every day in our era. In addition to high-performance processing approaches, efficiently visualizing this quantity of data (up to Terabytes) remains a major difficulty. In this study, we use the well-known clustering method K-means as a data reduction strategy that keeps the visual quality of the provided huge data as high as possible. The centroids of the dataset are used to display the distribution properties of data in a straightforward manner. Our data comes from a recent Kaggle big data set (Click Through Rate), and it is displayed using Box plots on reduced datasets, compared to the original plots. It is discovered that K-means is an effective strategy for reducing the amount of huge data in order to view the original data without sacrificing its distribution information quality