Comparison between k-means and k-medoids Algorithms for a Group-Feature based Sliding Window Clustering

Ju-Yon Yang, Junho Shim

Abstract


The demand for processing large data streams is growing rapidly as the generation and processing of large volumes of data become more popular. A variety of large data processing technologies are being developed to suit the increasing demand. One of the technologies that researchers have particularly observed is the data stream clustering with sliding windows. Data stream clustering with sliding windows may create a new set of clusters whenever the window moves. Previous data stream clustering techniques with sliding windows exploit the coresets, also known as group features that summarize the data. In this paper, we present some reformable elements of a group-feature based algorithm, and propose our algorithm that modified the clustering algorithm of the original one. We conduct a performance comparison between two algorithms by using different parameter values. Finally, we provide some guideline for the selective use of those algorithms with regard to the parameter values and their impacts on the performance.


Full Text:

PDF

References


Ackermann, M. R., Martens, M. Raupach, C., Wsierket, K., Lammersen, C., and Sohler, C., “StreamKm++: A clustering algorithm for data streams,” Journal of Experimental Algorithmics, Vol. 17, No. 1, pp. 2.4:2.1-2.4:2.30, 2012.

Aggarwal et al., “A framework for clustering evolving data streams,” Proceedings of the 29th international conference on Very large data bases, Vol. 29, pp. 81-92, 2003.

Anderson, M. J., “A new method for non- parametric multivariate analysis of variance,” Austral Ecology, Vol. 26, No. 1, pp. 32-46, 2001.

Braverman, V., Lang, H., Levin, K., and Monemizadeh, M., “Clustering problems on sliding windows,” Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 1374-1390, 2016.

Cao, F., Ester, M., Qian W., and Zhou A., “Density-based clustering over an evolving data stream with noise,” 2006 SIAM Conference on Data Mining, pp. 328-329, 2006.

De Mauro, A., Greco, M., and Grimaldi, M., “A formal definition of Big Data based on its essential features,” Library Review, Vol. 65, No. 3, pp. 122-135, 2016.

Raff, E., “JSAT: Java statistical analysis tool, a library for machine learning,” The Journal of Machine Learning Research, Vol. 18, No. 1, pp. 792-796, 2017.

Yang, B. and Shim, J., “Practical Datasets for Similarity Measures and Their Threshold Values,” The Journal of Society for e-Business Studies, Vol. 18, No. 1, pp. 97-105, 2013.

Youn, J. H., A Scalable Clustering Algorithm for High-dimensional Data Streams over Sliding Windows, Diss. Seoul National University, 2017.

Zhang, T., Remakrishnan, R., and Livny, M., “Birch, An efficient data clustering method for very large databases,” SIGMOD record, Vol. 25, No. 2, pp. 103-114, 1996.

Zhou, X. and Jin, Q., “A heuristic approach to discovering user correlations from organized social stream data,” Multimedia Tools and Applications, Vol. 76, No. 9, pp. 11487-11507, 2017.


Refbacks

  • There are currently no refbacks.