Loading [MathJax]/jax/output/SVG/jax.js
Yuanyuan Wang, Xing Zhang, Zhiguang Chu, et al., “An enhanced clustering-based (k, t)-anonymity algorithm for graphs,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 365–372, 2025. DOI: 10.23919/cje.2023.00.276
Citation: Yuanyuan Wang, Xing Zhang, Zhiguang Chu, et al., “An enhanced clustering-based (k, t)-anonymity algorithm for graphs,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 365–372, 2025. DOI: 10.23919/cje.2023.00.276

An Enhanced Clustering-Based (k, t)-Anonymity Algorithm for Graphs

More Information
  • Author Bio:

    Wang Yuanyuan: Yuanyuan Wang was born in Zhoukou, China. She is currently pursuing the M.S. degree in computer science with Liaoning University of Technology, Jinzhou, China. Her research interest focuses on privacy protection of graph data. (Email: wyyuand@163.com)

    Zhang Xing: Xing Zhang received the Ph.D. degree in computer science and technology from Beijing University of Technology, Beijing, China. Currently, he is a Professor with Liaoning University of Technology, Jinzhou, China. He is also a professional member of China Computer Federation. His research interests include network architecture and protocols, information security, and related areas. (Email: dr.zhangxing@gmail.com)

    Chu Zhiguang: Zhiguang Chu was born in Yingkou, China. He is currently pursuing the Ph.D. degree with Beijing University of Technology, Beijing, China. He is also a Senior Engineer with Liaoning University of Technology, Jinzhou, China. His research interests include data analysis and privacy protection. (Email: chuzg@lnut.edu.cn)

    Shi Wei: Wei Shi received the M.S. degree from Liaoning University of Technology, Jinzhou, China. Currently, she is a Laboratory Technician with Liaoning University of Technology, Jinzhou, China. Her research interests include information security and privacy protection. (Email: shiwei@lnut.edu.cn)

    Li Xiang: Xiang Li received the Ph.D. degree in computer and information science from Temple University, Philadelphia, PA, USA. He is currently employed as a Lecturer with Liaoning University of Technology, Jinzhou, China. His research interests include general artificial intelligence, cognitive science, and artificial intelli-gence emotions. (Email: xiangliagi@lnut.edu.cn)

  • Corresponding author:

    Zhang Xing, Email: dr.zhangxing@gmail.com

  • Received Date: September 27, 2023
  • Accepted Date: January 11, 2024
  • Available Online: March 31, 2024
  • Published Date: March 31, 2024
  • As people become increasingly reliant on the Internet, securely storing and publishing private data has become an important issue. In real life, the release of graph data can lead to privacy breaches, which is a highly challenging problem. Although current research has addressed the issue of identity disclosure, there are still two challenges: First, the privacy protection for large-scale datasets is not yet comprehensive; Second, it is difficult to simultaneously protect the privacy of nodes, edges, and attributes in social networks. To address these issues, this paper proposes a (k, t)-graph anonymity algorithm based on enhanced clustering. The algorithm uses k-means++ clustering for k-anonymity and t-closeness to improve k-anonymity. We evaluate the privacy and efficiency of this method on two datasets and achieved good results. This research is of great significance for addressing the problem of privacy breaches that may arise from the publication of graph data.

  • Kakao Talk is the most influential chat tool in Republic of Korea, known as the “national chat tool”. According to official data, Kakao Talk has more than 47 million users in Republic of Korea, accounting for over 90% of the total population, which means that Koreans highly rely on Kakao Talk for socializing and work. In Tencent’s Q3 2022 financial report, the combined monthly active users of WeChat and Weixin reached 1.309 billion, a year-on-year increase of 3.7% and a month-on-month increase of 0.8%. The monthly active accounts of QQ mobile terminals reached 574 million, a year-on-year increase of 0.1% and a month-on-month increase of 1.0%. The rapid development of these social networking applications has brought about increasingly severe privacy breaches.

    According to Cable News Network, personal data of over 200 million Twitter users was leaked in 2023. Hackers recently released the personal data and email addresses of Twitter users for free on a forum. This massive data leak can cause anonymous Twitter accounts to become invalid, exposing the true identities of account holders, and hackers can use stolen Twitter accounts to invade other social platforms. Among the exposed 200 million user data, there are many highly influential users who may face hacker extortion.

    In this article, we aim to design a privacy protection algorithm suitable for social networks to meet the needs of large-scale sensitive attribute data. Currently, k-anonymity has been widely applied in privacy protection of graph data. Sweeney [1] proposed the k-anonymity privacy protection model in 2002, which used quasi-identifiers to counter link attacks. k-anonymity is achieved through generalization and concealment techniques, describing data more generally and abstractly, and not disclosing certain data items to publish less accurate data. This makes it difficult for observers to link records through quasi-identifiers, with at least k records having the same quasi-identifier. However, k-anonymity has limitations and cannot defend against homogeneity attacks and background knowledge attacks, so some solutions have been proposed. In 2007, reference [2] proposed a new standard, l-diversity, whose core idea is to require good representativeness of sensitive attribute values in each group. However, l-diversity has limitations and is difficult to achieve when the proportion of sensitive attributes is severely imbalanced, and it does not consider the overall distribution of sensitive attributes, which may leak more privacy in the face of skewness attacks. In the same year, reference [3] proposed t-closeness to address the shortcomings of k-anonymity and l-diversity, which requires the distribution of sensitive attributes in any equivalence class to be close to the attribute distribution in the overall table. t-closeness using the Earth mover’s distance (EMD) measure can be incorporated into the general framework of privacy algorithms. Although t-closeness can prevent attribute disclosure, it cannot prevent identity disclosure, so it may be necessary to use k-anonymity and t-closeness together.

    We propose a (k, t)-graph anonymity algorithm based on enhanced clustering, which combines k-anonymity and t-closeness techniques to achieve privacy protection. Our design approach is to first use the k-means++ clustering algorithm to find the cluster centroids, balance the distribution of nodes in the cluster by computing the distance matrix and sorting it, and generalize node attributes to satisfy the privacy protection definition of k-anonymity. Finally, we use t-closeness privacy protection to protect the privacy of sensitive attributes.

    Our method is based on clustering and utilizes k-anonymity and t-closeness techniques for privacy protection. The design approach is as follows:

    Firstly, we use the k-means++ clustering algorithm to find the centroids for clustering. By computing the distance matrix and sorting it, the distribution of nodes in the cluster can be balanced, achieving privacy protection. During the clustering process, we assign nodes to the nearest centroid based on their attribute values, achieving clustering and grouping of the data.

    Secondly, to satisfy the privacy protection definition of k-anonymity, we generalize node attributes to hide individual identity information. Specifically, we generalize attribute values into a range, such as generalizing age from an exact value to a range of age groups. In this way, the attribute values of each node are hidden within its range, protecting individual privacy.

    Finally, to protect the privacy of sensitive attributes, we use t-closeness privacy protection. For each sensitive attribute, we calculate its distance to the target data record and divide it based on the distance threshold t. By evaluating the information value of the dataset, we can determine if the dataset meets the requirements of the t-closeness privacy protection model, thus protecting the privacy of sensitive data.

    By combining k-anonymity and t-closeness techniques, our method can protect the privacy of the dataset and improve the information utilization value of the dataset. Additionally, our method can be extended to large-scale datasets and multi-attribute datasets, providing an effective solution for data privacy protection.

    Our method has several main contributions in data privacy protection as follows:

    1) We use the k-means++ clustering algorithm and combine the concept of distance matrix to optimize the selection of centroids and balance the number of nodes in the clusters. Through this method, we improve the utilization of information and ensure that all users enjoy equal privacy protection.

    2) We comprehensively use k-anonymity and t-closeness techniques to meet the anonymity and privacy protection requirements of the dataset simultaneously. Through this method, we improve the security and reliability of data privacy protection.

    3) We hide individual identity information by gener-alizing node attributes while retaining the statistical characteristics of the data, taking into account both data privacy protection and analysis utilization. This method can effectively protect user privacy while ensuring that the analysis utilization of the data is not affected.

    4) Our method is suitable for large-scale datasets and multi-attribute datasets, providing an effective solution for data privacy protection. This method has good scalability and adaptability and can be applied to various types of datasets.

    5) Finally, we conduct experimental verification, the results show that our algorithm can effectively achieve privacy protection of large-scale graph data, prevent privacy leakage of structure and attributes, and improve the efficiency of the algorithm while ensuring privacy protection. This indicates that our method has practical relevance and can offer an effective solution for the field of data privacy protection.

    Privacy risks can be divided into identity disclosure, relationship disclosure, and content disclosure. To protect data in social networks, a large number of research papers have proposed privacy protection mechanisms, including differential privacy, graph modification techniques, clustering, and anonymization techniques, among others. Differential privacy is achieved by designing random algorithms that virtually do not affect the algorithm output regardless of whether a particular data point is present in the dataset or not. In 2016, a new workflow for differential privacy was proposed for graph topology in [4], but the study was limited to differential privacy processing of topology structure and did not address privacy protection of node attribute data. In 2018, a new differential privacy algorithm was proposed in [5], which reduced the quadratic error and could match the optimal non-privacy algorithm in some cases, but efficiency optimization remains unclear and requires further improvement. In 2020, a random matrix method for online social network (OSN) data publishing was proposed in [6], but the method was limited to processing static graph data, and further research is needed for dynamic social network data. In 2020, a novel AsgLDP framework was proposed in [7] for collecting and generating attribute social graphs in a decentralized manner under local differential privacy (LDP), but the study was limited to attribute graph processing and had limitations for other types of social network data. In 2021, a blowfish privacy (BP) mechanism was proposed in [8] to provide privacy for sequentially published graphs, but the method only applies to subgraphs and requires further experimentation. In 2022, a community detection method based on the local differential privacy model (LDPCD) was provided in [9] for local differential privacy protection, but the method only applies to local community detection, and further research is needed for global community detection.

    Differential privacy and k-anonymity are two different privacy protection methods. Differential privacy resists differential attacks by adding noise, while k-anonymity achieves privacy protection through de-identification. In 2012, reference [10] proposed a formal framework describing the regular and irregular cases of k-anonymous graphs and proved that n-confusion is the extension of k-anonymity. However, the study was limited to theoretical analysis and did not consider the overall distribution of sensitive attributes. In 2014, a k-core model was proposed in [11], and a new structural network anonymization technique called (k, δ)-core anonymity was introduced, which achieved privacy protection on graph structure but did not consider attribute privacy. In 2016, a privacy measurement method called (k, l)-anonymity was proposed in [12] and its resistance to active attacks was measured in real-life social graphs. In 2018, reference [13] improved the maximum common subgraph detection algorithm, maximum common subgraph detection verification (MPD-V), and proposed a new K+-isomorphism method, which reduced the anonymization cost, but effectiveness and efficiency still need to be improved. In 2019, a new privacy attribute was proposed in [14] to address the high cost of using (k, l)-anonymity as a privacy target in perturbation-based anonymization techniques. In 2020, a graph anonymization method satisfying edge-based (k, l)-anonymity was proposed in [15], but efficiency remains to be improved. In 2023, an original graph was converted into a so-called kt-secure graph by k-anonymity and t-closeness to protect the privacy of graph structure and vertex attributes in [16], but there are still issues with its application to large-scale datasets. In conclusion, although these studies focus on social network privacy protection, there is still a lack of a perfect solution to protect large-scale graph data and sensitive attributes.

    A data table satisfies k-anonymity if each record in any equivalence class is indistinguishable from at least one other k1 record in terms of quasi-identifier attributes. Therefore, the probability of correct identification in the k-anonymity model is at most 1/k.

    Definition 1 (Attribute graph) An attribute graph G can be represented as a quadruple G=(V,E,Λ,F), where V denotes the set of N nodes, E={(vi,vj|1i,jN,ij)} denotes the set of edges, Λ={a1,a2,,aL} denotes the set of L categorical attributes of the nodes in V. The attributes of each node viV can be described by an attribute vector γ(vi)=[a1(vi),a2(vi),,aL(vi)] with a length of L, where aj(vi) denotes the observation of node vi on attribute aj. F={f1,f2,,fL} denotes the set of L functions, where fi:Vdom(ai) indicates the value of the attribute ai of a specified node viV in its value field dom(ai).

    Definition 2 (k-anonymity [17]) k-anonymity technique is that the number of records in any equivalent group is k, and when an attacker performs a link attack, the result will be associated with at least k1 records in the equivalent group.

    Let T denote a dataset table and QI be a quasi-identifier of the dataset, for a particular k, a dataset T is said to satisfy k-anonymity if the following conditions are satisfied.

    For any record of r1T, there are at least k1 the other records r2,r3,,rkT, make QIr1=QIr2,QIr3,,QIrk, where QIr denotes the column r containing the quasi-identifier.

    Definition 3 (t-closeness [3]) t-closeness principle: An equivalence class is said to be t-closeness if the distance between the distribution of a sensitive attribute in the class and its attribute distribution in the whole table does not exceed a threshold t. This table also has a t-closeness.

    Definition 4 (SSA) SSA represents the sum of squares for the effect of Factor A, which is the sum of the squared differences between the mean of each level of Factor A and the overall mean of the data.

    The k-means++ algorithm is an optimized k-means clustering (KMC) algorithm. In the initial stage, it randomly selects an initial clustering center from the input set of data points. Then, for each point x in the dataset, its squared distance from the selected clustering center is calculated and a corresponding weight is assigned to each point. During each iteration, the selection of new clustering centers will be based on the magnitude of these weights. After k iterations, O(n×k) distances need to be calculated for each iteration. Eventually, we select k clustering centers and use these k centers of mass to perform the standard k-means algorithm using them as initialization centers of mass.

    In graph data privacy preservation, the distance between sensitive attributes and global distribution is usually measured by calculating the EMD between them. By comparing the distance between sensitive attributes and global distribution, the effectiveness of privacy preserving algorithms in protecting sensitive attributes can be evaluated, where the EMD is calculated by

    EMD(P,Q)=1m1mi=1|ij=1(pjqj)| (1)

    This paper proposes a (k, t)-graph anonymity algorithm based on enhanced clustering, named ECKT, and the specific process framework is shown in Figure 1. The algorithm aims to partition data into multiple clusters and perform attribute generalization on each cluster to ensure compliance with k-anonymity and t-closeness constraints. The algorithm is applicable in scenarios where data needs to be published, such as healthcare and financial services.

    Figure  1.  Framework of privacy-preserving model based on the ECKT algorithm.

    Specifically, Algorithm 1 uses the k-means++ algorithm to select the initial centroids for clustering and runs the standard k-means algorithm to partition the data into c clusters. To ensure balance among the users in each cluster and reduce the number of clusters that do not meet the k-user requirement, we calculate the distance between each user and each centroid and sort them to generate an increasing distance matrix. We loop through the matrix to ensure that each cluster contains at least k users. For clusters that do not meet the k-user requirement, we merge them to ensure that each cluster satisfies the k-anonymity constraint. When merging clusters, we select the clusters to merge and ensure that the merged cluster does not exceed 2k users. This process is repeated until all clusters satisfy the k-anonymity constraint.

    Algorithm 1 ECKT algorithm
    Input:
     N: Set of users;
     A: Set of attributes;
     k: k-anonymity level;
     t: t-closeness level.
    Output:
     C: Clusters ensuring the k-anonymity and t-closeness.
    1: C=kMeansPlusPlus(N,p);
    2: for each i[1,N] do
    3: for each j[1,C] do
    4:  Euclidean(ui,uj)=manhattan_distance(ui,uj);
    5: end for
    6: end for
    7: t1;
    8: for each i[1,N] do
    9: for each j[1,C] do
    10:  D(t,1)Euclidean(ci,cj);
    11:  D(t,2)i;
    12:  D(t,3)j;
    13:  tt+1;
    14: end for
    15: end for
    16: D.sort(key=lambdax:x[0]);
    17: assignedn×1zeromatrix;
    18: C p×1arrayofemptylists;
    19: for each m[D] do
    20: uD(m,2);
    21: cD(m,3);
    22: if assigned(u)=0 AND length(Cc)k then
    23:  Ccappendu;
    24:  assigned(u)1;
    25: end if
    26: end for
    27: c=generalizeAttributes(D,C);
    28: for each i[1,len(emd)] do
    29: if emd(i,2)<t then
    30:  L1join(emd(i,1));
    31: else
    32:  L2join(emd(i,1));
    33: end if
    34: end for
    35: Ciappend(L1,L2).

    Next, we perform attribute generalization on each user in each cluster to ensure that they satisfy the k-anonymity constraint. After forming C clusters, we calculate the distance between the sensitive attribute distribution and the global distribution in each cluster and use the EMD [18] algorithm to measure their distance. Clusters with a distance less than or equal to t are considered to satisfy the t-closeness constraint. For clusters that do not satisfy the t-closeness constraint, we partition their records to clusters with similar sensitive attribute distributions to ensure that the published data satisfies both the k-anonymity and t-closeness constraints. Specifically, we perform attribute generalization on each sensitive attribute in each cluster to ensure that records in each cluster have similar sensitive attribute distributions.

    In order to enhance the effectiveness of the clustering algorithm, this paper utilizes the k-means++ algorithm to select p points from the user set N as the initial clustering center C. Subsequently, the distance between each user and the cluster center is calculated, and the distance, user index, and cluster center index are stored in the distance matrix D (Lines 2–4 in Algorithm 1). Then, the matrix D is sorted based on the distance (Lines 5–14 in Algorithm 1). Next, the distance threshold t and the k-means algorithm are employed to allocate data points to the cluster center (Lines 15–24 in Algorithm 1), and the attributes of each cluster center are generalized to achieve k-anonymity. This approach protects user privacy by generalizing attributes, and ultimately outputs c clustering clusters that satisfy k-anonymity. The EMD is used to calculate the distance between generalized cluster centers, which is then stored in the distance matrix emd (Line 25 in Algorithm 1). It is desired that the subset after partitioning contains sensitive data records that are close to the target data records within the distance threshold t. For clusters with a distance less than or equal to t, they are considered t-closeness (Lines 26–29 in Algorithm 1). Otherwise, the records of that cluster are partitioned and merged into clusters with similar distributions of sensitive attributes (Lines 30–33 in Algorithm 1).

    In order to validate the effectiveness of our algorithm, we used two real-world datasets: the Yelp dataset [19] and the Facebook dataset [20]. The Yelp dataset is a public dataset that contains over 1 million users and over 7 million reviews, including information on businesses, users, and reviews. The association between businesses and users is achieved through users’ reviews of businesses, where each review contains the IDs of the business and the user, enabling the linkage of business and user information. On the other hand, the Facebook dataset provided by Stanford University is a public social network dataset containing real user data. The dataset consists of 10 communities, each containing 1000 to 5000 users, including information such as gender, age, geographic location, interests, and social relationships between users. We conducted experimental analyses on these two datasets using different privacy protection techniques for varying numbers of users.

    We compared our proposed ECKT algorithm with three other algorithms. The first algorithm is the l-diversity equivalence class clustering (LECC) method based on diversity of the l-diversity model. The second algorithm is the probability-based random perturbation (PRO) algorithm, which achieves data perturbation by perturbing the graph G. The third algorithm is the simple KMC algorithm, which achieves data clustering by partitioning the dataset into k clusters. In comparing these methods, we considered three parameters: degree of anonymization (DA), information loss (IL) [21], and the number of users in each cluster. To compute the DA for each user, we used the number of users assigned to the cluster, thus equating the user’s DA to the cluster’s DA.

    DA=degree(Cui)×i (2)

    where Cui denotes the degree of anonymity of user ui belonging to cluster C.

    IL=SSESST (3)

    where SSE is the sum of squares within clusters and SST is the sum of squares between clusters.

    SSE=ki=1nij=1(xij¯Cl)(xij¯Cl) (4)

    where k is the number of clusters, ni is the number of users n belonging to cluster Ci, and ¯Cl is the amount of homogeneity belonging to cluster Ci.

    SSA=ki=1t2init2n (5)

    where ti is the sum of users in cluster Ci, ni is the user n belonging to cluster Ci, and n is the total number of users.

    Our experiment aims to compare the performance of the LECC algorithm [22], PRO algorithm [23], KMC algorithm, and ECKT algorithm for varying numbers of users. As both datasets contain a large amount of user data, we considered different numbers of users and observed their performance differences. The experimental results for the Yelp dataset are presented in Figure 2, while the results for the Facebook dataset are presented in Figure 3.

    Figure  2.  Experimental results for the Yelp dataset. (a) DA: degree of anonymization while changing the number of users; (b) IL: information loss while changing the number of users.
    Figure  3.  Experimental results for the Facebook dataset. (a) DA: degree of anonymization while increasing the number of users; (b) IL: information loss while increasing the number of users.

    Our first experiment aimed to observe experimental results on Yelp. Compared to the Yelp dataset, the Facebook dataset was relatively small. We set the range of user numbers to vary from 1000 to 4000, and presented the comparison results of DA and IL (as shown in Figure 2). We found that as the number of users increased, DA also increased. This is because the method we proposed provides privacy guarantees for both the structure and attributes of the network, making it superior to other methods in terms of DA, especially compared to the conventional KMC method.

    Moreover, we also discovered from Figure 2 that IL and DA are contradictory. As the number of users increases, the performance of IL decreases. Although our information loss is less than 1% compared to those of LECC and KMC, our algorithm can significantly improve DA by 3 times. Overall, our algorithm is superior to PRO and LECC, as we balance anonymity and consider all data. Although our information loss is slightly higher than KMC, our algorithm is still significantly better than the KMC method in terms of anonymity. It should be noted that privacy protection can result in some loss of hyperedge information.

    Our second experiment aimed to observe experimental results on Facebook. We set the range of user numbers to vary from 5000 to 20000, consistent with the measurements on the Yelp dataset. Figure 3 shows the results of DA. From the figures, it is evident that our method is far superior to other methods in terms of DA, with ECKT achieving up to 10 times the anonymity level of KMC, 2.3 times that of PRO, and 1.3 times that of LECC. This is because our method implements complete privacy concepts such as k-anonymity and t-closeness.

    Although our method is slightly lower than KMC in terms of IL, when compared with other methods, the information loss can be reduced by up to 10 times. Other methods lack complete privacy concepts, resulting in lower DA and higher IL. It should be noted that our method implements complete privacy concepts and provides privacy protection for both the structure and attributes of the network, making it superior to other methods in terms of anonymity and information loss.

    Figure 4 illustrates the comparison of the execution time of the algorithms under different number of users. PRO algorithm is relatively slow due to its probabilistic nature, which may require several iterations to obtain sufficiently accurate results. KMC, as an iterative algorithm, usually has a relatively fast runtime, but its performance may be affected by the selection of the initial clustering centers. The LECC approach has more stringent requirements on the privacy preserving stringent, it relies on the k-means clustering algorithm and has a relatively slow running time. Our ECKT algorithm is slightly slower than KMC in terms of runtime; however, as the number of users increases, our algorithm has an advantage over the PRO and LECC algorithms in terms of running time. This is due to the fact that we use k-means++ optimization algorithm in the selection of initial clustering centers, which effectively reduces the running time.

    Figure  4.  Comparison of execution time of each algorithm for different number of users.

    Our main solution is to use the k-means++ method during clustering and introduce the concept of distance matrix to balance the number of nodes in each cluster, solving the uncertainty of k-anonymity privacy and improving data privacy protection by combining the t-closeness technique. Although our algorithm has effectively balanced the issues of privacy and information loss, there are still several aspects that can be further explored in future research: 1) considering multiple sensitive attributes of the graph and other privacy definitions to improve our algorithm; 2) trying other clustering methods to improve our algorithm; 3) applying our method to dynamic social network models. These improvements can further enhance the efficiency and scalability of our algorithm and promote the development of privacy protection in this field.

  • [1]
    L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557–570, 2002. DOI: 10.1142/S0218488502001648
    [2]
    A. Machanavajjhala, D. Kifer, J. Gehrke, et al., “L-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, article no. 3, 2007.
    [3]
    N. H. Li, T. C. Li, and S. Venkatasubramanian, “T-closeness: Privacy beyond k-anonymity and l-diversity,” in Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, pp. 106–115, 2007.
    [4]
    F. Ahmed, A. X. Liu, and R. Jin, “Social graph publishing with privacy guarantees,” in Proceedings of the 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan, pp. 447–456, 2016.
    [5]
    C. Borgs, J. Chayes, A. Smith, et al., “Revealing network structure, confidentially: Improved rates for node-private graphon estimation,” in Proceedings of the 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), Paris, France, pp. 533–543, 2018.
    [6]
    F. Ahmed, A. X. Liu, and R. Jin, “Publishing social network graph eigenspectrum with privacy guarantees,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 2, pp. 892–906, 2020. DOI: 10.1109/TNSE.2019.2901716
    [7]
    C. K. Wei, S. L. Ji, C. C. Liu, et al., “AsgLDP: Collecting and generating decentralized attributed graphs with local differential privacy,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3239–3254, 2020. DOI: 10.1109/TIFS.2020.2985524
    [8]
    E. Chicha, B. A. Bouna, M. Nassar, et al., “A user-centric mechanism for sequentially releasing graph datasets under blowfish privacy,” ACM Transactions on Internet Technology, vol. 21, no. 1, article no. 20, 2021. DOI: 10.1145/3431501
    [9]
    Z. J. Zhang, “LDPCD: A novel method for locally differentially private community detection,” Computational Intelligence and Neuroscience, vol. 2022, article no. 4080047, 2022. DOI: 10.1155/2022/4080047
    [10]
    K. Stokes and V. Torra, “Reidentification and k-anonymity: A model for disclosure risk in graphs,” Soft Computing, vol. 16, no. 10, pp. 1657–1670, 2012. DOI: 10.1007/s00500-012-0850-4
    [11]
    R. Assam, M. Hassani, M. Brysch, et al., “(k, d)-core anonymity: Structural anonymization of massive networks,” in Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, article no. 17, 2014.
    [12]
    R. Trujillo-Rasua and I. G. Yero, “k-metric antidimension: A privacy measure for social graphs,” Information Sciences, vol. 328, pp. 403–417, 2016. DOI: 10.1016/j.ins.2015.08.048
    [13]
    H. Rong, T. H. Ma, M. L. Tang, et al., “A novel subgraph K+- isomorphism method in social network based on graph similarity detection,” Soft Computing, vol. 22, no. 8, pp. 2583–2601, 2018. DOI: 10.1007/s00500-017-2513-y
    [14]
    S. Mauw, Y. Ramírez-Cruz, and R. Trujillo-Rasua, “Conditional adjacency anonymity in social graphs under active attacks,” Knowledge and Information Systems, vol. 61, no. 1, pp. 485–511, 2019. DOI: 10.1007/s10115-018-1283-x
    [15]
    R. Mortazavi and S. H. Erfani, “GRAM: An efficient (k, l) graph anonymization method,” Expert Systems with Applications, vol. 153, article no. 113454, 2020. DOI: 10.1016/j.eswa.2020.113454
    [16]
    W. L. Ren, K. Ghazinour, and X. Lian, “kt-safety: Graph release via k-anonymity and t-closeness,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 9, pp. 9102–9113, 2023. DOI: 10.1109/TKDE.2022.3221333
    [17]
    P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression,” Available at: https://dataprivacylab.org/dataprivacy/projects/kanonymity/paper3.pdf, 1998.
    [18]
    M. B. Jamshidi, N. Alibeigi, N. Rabbani, et al., “Artificial neural networks: A powerful tool for cognitive science,” in Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, Canada, pp. 674–679, 2018.
    [19]
    “Yelp dataset challenge,” Available at: https://datacollaboratives.org/cases/yelp-dataset-challenge.html.
    [20]
    “Stanford large network dataset collection,” Available at: http://snap.stanford.edu/data/.
    [21]
    J. Domingo-Ferrer and J. M. Mateo-Sanz, “Practical data-oriented microaggregation for statistical disclosure control,” IEEE Transactions on Knowledge and data Engineering, vol. 14, no. 1, pp. 189–201, 2002. DOI: 10.1109/69.979982
    [22]
    M. Siddula, Y. S. Li, X. Z. Cheng, et al., “Anonymization in online social networks based on enhanced equi-cardinal clustering,” IEEE Transactions on Computational Social Systems, vol. 6, no. 4, pp. 809–820, 2019. DOI: 10.1109/TCSS.2019.2928324
    [23]
    F. Bonchi, A. Gionis, and T. Tassa, “Identity obfuscation in graphs through the information theoretic lens,” Information Sciences, vol. 275, pp. 232–256, 2014. DOI: 10.1016/j.ins.2014.02.035

Catalog

    Figures(4)

    Article Metrics

    Article views (266) PDF downloads (15) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return