TY - JOUR
T1 - Online Sparse Representation Clustering for Evolving Data Streams
AU - Chen, Jie
AU - Yang, Shengxiang
AU - Fahy, Conor
AU - Wang, Zhu
AU - Guo, Yinan
AU - Chen, Yingke
N1 - Funding information: This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61303015 and Grant 61973305, in part by the 111 Project under Grant B21014, in part by the Beijing Natural Science Foundation under Grant IS23066, and in part by the Royal Society International
Exchanges 2020 Cost Share under Grant IEC\NSFC\201085.
PY - 2025/1
Y1 - 2025/1
N2 - Data stream clustering can be performed to discover the patterns underlying continuously arriving sequences of data. A number of data stream clustering algorithms for finding clusters in arbitrary shapes and handling outliers, such as density-based clustering algorithms, have been proposed. However, these algorithms are often limited in their ability to construct and merge microclusters by measuring the Euclidean distances between high-dimensional data objects, e.g., transferring valuable knowledge from historical landmark windows to the current landmark window, and exploiting evolving subspace structures adaptively. We propose an online sparse representation clustering (OSRC) method to learn an affinity matrix for evaluating the relationships among high-dimensional data objects in evolving data streams. We first introduce a low-dimensional projection (LDP) into sparse representation to adaptively reduce the potential negative influence associated with the noise and redundancy contained in high-dimensional data. Then, we take advantage of the l2,1 -norm optimization technique to choose the appropriate number of representative data objects and form a specific dictionary for sparse representation. The specific dictionary is integrated into sparse representation to adaptively exploit the evolving subspace structures of the high-dimensional data objects. Moreover, the data object representatives from the current landmark window can transfer valuable knowledge to the next landmark window. The experimental results based on a synthetic dataset and six benchmark datasets validate the effectiveness of the proposed method compared to that of state-of-the-art methods for data stream clustering.
AB - Data stream clustering can be performed to discover the patterns underlying continuously arriving sequences of data. A number of data stream clustering algorithms for finding clusters in arbitrary shapes and handling outliers, such as density-based clustering algorithms, have been proposed. However, these algorithms are often limited in their ability to construct and merge microclusters by measuring the Euclidean distances between high-dimensional data objects, e.g., transferring valuable knowledge from historical landmark windows to the current landmark window, and exploiting evolving subspace structures adaptively. We propose an online sparse representation clustering (OSRC) method to learn an affinity matrix for evaluating the relationships among high-dimensional data objects in evolving data streams. We first introduce a low-dimensional projection (LDP) into sparse representation to adaptively reduce the potential negative influence associated with the noise and redundancy contained in high-dimensional data. Then, we take advantage of the l2,1 -norm optimization technique to choose the appropriate number of representative data objects and form a specific dictionary for sparse representation. The specific dictionary is integrated into sparse representation to adaptively exploit the evolving subspace structures of the high-dimensional data objects. Moreover, the data object representatives from the current landmark window can transfer valuable knowledge to the next landmark window. The experimental results based on a synthetic dataset and six benchmark datasets validate the effectiveness of the proposed method compared to that of state-of-the-art methods for data stream clustering.
KW - Clustering
KW - Clustering algorithms
KW - Data models
KW - Dictionaries
KW - Heuristic algorithms
KW - Optimization
KW - Sparse matrices
KW - Streams
KW - data stream
KW - high-dimensional data
KW - sparse representation
KW - subspace structure
UR - http://www.scopus.com/inward/record.url?scp=85181799114&partnerID=8YFLogxK
U2 - 10.1109/tnnls.2023.3325556
DO - 10.1109/tnnls.2023.3325556
M3 - Article
SN - 2162-237X
VL - 36
SP - 525
EP - 539
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
ER -