Optimal Anonymization for Transaction Publishing
-
Graphical Abstract
-
Abstract
Most works on privacy preserving data publishing have focused on anonymizing relational data. Only a few works were on transaction data, all of which are heuristic based and do not provide any guarantee on the optimality of data utility. This paper presents an optimal algorithm, which first mines the most general privacy threats in transaction data, and then finds an optimal generalization solution to eliminate all threats. Several novel techniques are proposed, including an inverse lexicographic tree with strong pruning techniques for mining privacy threats, and a cut enumeration tree with a cost based pruning technique for searching the optimal solution. Experiments show that our algorithm outperforms prior algorithms in terms of data utility and efficiency on real world databases.
-
-