Anonymous publication of sensitive transactional data

Handle URI:
http://hdl.handle.net/10754/561590
Title:
Anonymous publication of sensitive transactional data
Authors:
Ghinita, Gabriel; Kalnis, Panos ( 0000-0002-5060-1360 ) ; Tao, Yufei
Abstract:
Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and ℓ-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time. © 2006 IEEE.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
Institute of Electrical and Electronics Engineers
Journal:
IEEE Transactions on Knowledge and Data Engineering
Issue Date:
Feb-2011
DOI:
10.1109/TKDE.2010.101
Type:
Article
ISSN:
10414347
Sponsors:
This paper is an extended version of [1]. The research of Yufei Tao was supported by grants GRF 1202/06, 4161/07, 4173/08, and 4169/09 from the RGC of HKSAR, and a grant with project code 2050395 from CUHK.
Appears in Collections:
Articles; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorGhinita, Gabrielen
dc.contributor.authorKalnis, Panosen
dc.contributor.authorTao, Yufeien
dc.date.accessioned2015-08-02T09:14:53Zen
dc.date.available2015-08-02T09:14:53Zen
dc.date.issued2011-02en
dc.identifier.issn10414347en
dc.identifier.doi10.1109/TKDE.2010.101en
dc.identifier.urihttp://hdl.handle.net/10754/561590en
dc.description.abstractExisting research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and ℓ-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time. © 2006 IEEE.en
dc.description.sponsorshipThis paper is an extended version of [1]. The research of Yufei Tao was supported by grants GRF 1202/06, 4161/07, 4173/08, and 4169/09 from the RGC of HKSAR, and a grant with project code 2050395 from CUHK.en
dc.publisherInstitute of Electrical and Electronics Engineersen
dc.subjectanonymityen
dc.subjectPrivacyen
dc.subjecttransactional dataen
dc.titleAnonymous publication of sensitive transactional dataen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.identifier.journalIEEE Transactions on Knowledge and Data Engineeringen
dc.contributor.institutionDepartment of Computer Science, Purdue University, 305 N University St., West Lafayette, IN 47907, United Statesen
dc.contributor.institutionDepartment of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kongen
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.