Online Sparse Representation Clustering for Evolving Data Streams

Jie Chen, Shengxiang Yang*, Conor Fahy, Zhu Wang, Yinan Guo, Yingke Chen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
61 Downloads (Pure)

Abstract

Data stream clustering can be performed to discover the patterns underlying continuously arriving sequences of data. A number of data stream clustering algorithms for finding clusters in arbitrary shapes and handling outliers, such as density-based clustering algorithms, have been proposed. However, these algorithms are often limited in their ability to construct and merge microclusters by measuring the Euclidean distances between high-dimensional data objects, e.g., transferring valuable knowledge from historical landmark windows to the current landmark window, and exploiting evolving subspace structures adaptively. We propose an online sparse representation clustering (OSRC) method to learn an affinity matrix for evaluating the relationships among high-dimensional data objects in evolving data streams. We first introduce a low-dimensional projection (LDP) into sparse representation to adaptively reduce the potential negative influence associated with the noise and redundancy contained in high-dimensional data. Then, we take advantage of the l2,1 -norm optimization technique to choose the appropriate number of representative data objects and form a specific dictionary for sparse representation. The specific dictionary is integrated into sparse representation to adaptively exploit the evolving subspace structures of the high-dimensional data objects. Moreover, the data object representatives from the current landmark window can transfer valuable knowledge to the next landmark window. The experimental results based on a synthetic dataset and six benchmark datasets validate the effectiveness of the proposed method compared to that of state-of-the-art methods for data stream clustering.
Original languageEnglish
Pages (from-to)525-539
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number1
Early online date27 Oct 2023
DOIs
Publication statusPublished - Jan 2025

Keywords

  • Clustering
  • Clustering algorithms
  • Data models
  • Dictionaries
  • Heuristic algorithms
  • Optimization
  • Sparse matrices
  • Streams
  • data stream
  • high-dimensional data
  • sparse representation
  • subspace structure

Cite this