TY - JOUR
T1 - (k, ε, δ)-Anonymization
T2 - privacy-preserving data release based on k-anonymity and differential privacy
AU - Tsou, Yao Tung
AU - Alraja, Mansour Naser
AU - Chen, Li Sheng
AU - Chang, Yu Hsiang
AU - Hu, Yung Li
AU - Huang, Yennun
AU - Yu, Chia-Mu
AU - Tsai, Pei Yuan
N1 - Funding Information: This work is supported by the Ministry of Science and Technology, Taiwan, under grant MOST 107-2221-E-035-020-MY3 and MOST 109-2221-E-001-019-MY3. This work is supported by Academia Sinica AS-KPQ-109-DSTCP. This research work is supported by the Research Council (TRC), Sultanate of Oman (Block Fund-Research Grant).
PY - 2021/9/1
Y1 - 2021/9/1
N2 - The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new guarantees of consumer privacy adversely affect data sharing and data application markets because service companies (e.g., Apple, Google, Microsoft) cannot provide immediate and optimized services through analysis of collected consumer experiences. Therefore, data de-identification technology (e.g., k-anonymity and differential privacy) is a candidate solution to protect sharing data privacy. Various workarounds based on existing methods such as k-anonymity and differential privacy technologies have been proposed. However, they are limited in data utility, and their data sets have high dimensionality (the so-called curse of dimensionality). In this paper, we propose the (k, ε, δ)-anonymization synthetic data set generation mechanism (called (k, ε, δ)-anonymization for short) to protect data privacy before releasing data sets to be analyzed. Synthetic data sets generated by (k, ε, δ)-anonymization satisfy the definitions of k-anonymity and differential privacy by applying KD-tree and random sampling mechanisms. Moreover, (k, ε, δ)-anonymization uses principle component analysis to rationally replace high-dimensional data sets with lower-dimensional data sets for consideration of efficient computation. Finally, we confirm the relationships between parameters k, ε, and δ for k-anonymity and (ε, δ)-differential privacy and estimate the utility of (k, ε, δ)-anonymization synthetic data sets. We report a privacy analysis and a series of experiments that prove that (k, ε, δ)-anonymization is feasible and efficient.
AB - The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new guarantees of consumer privacy adversely affect data sharing and data application markets because service companies (e.g., Apple, Google, Microsoft) cannot provide immediate and optimized services through analysis of collected consumer experiences. Therefore, data de-identification technology (e.g., k-anonymity and differential privacy) is a candidate solution to protect sharing data privacy. Various workarounds based on existing methods such as k-anonymity and differential privacy technologies have been proposed. However, they are limited in data utility, and their data sets have high dimensionality (the so-called curse of dimensionality). In this paper, we propose the (k, ε, δ)-anonymization synthetic data set generation mechanism (called (k, ε, δ)-anonymization for short) to protect data privacy before releasing data sets to be analyzed. Synthetic data sets generated by (k, ε, δ)-anonymization satisfy the definitions of k-anonymity and differential privacy by applying KD-tree and random sampling mechanisms. Moreover, (k, ε, δ)-anonymization uses principle component analysis to rationally replace high-dimensional data sets with lower-dimensional data sets for consideration of efficient computation. Finally, we confirm the relationships between parameters k, ε, and δ for k-anonymity and (ε, δ)-differential privacy and estimate the utility of (k, ε, δ)-anonymization synthetic data sets. We report a privacy analysis and a series of experiments that prove that (k, ε, δ)-anonymization is feasible and efficient.
KW - Data privacy
KW - Differential privacy
KW - k-anonymity
KW - Synthetic data set
UR - http://www.scopus.com/inward/record.url?scp=85112604234&partnerID=8YFLogxK
U2 - 10.1007/s11761-021-00324-2
DO - 10.1007/s11761-021-00324-2
M3 - Article
AN - SCOPUS:85112604234
SN - 1863-2386
VL - 15
SP - 175
EP - 185
JO - Service Oriented Computing and Applications
JF - Service Oriented Computing and Applications
IS - 3
ER -