Anonymization is a significant problem when handling Industrial Internet of Things (IIoT) data. Machine Learning (ML) applications require decrypted data to perform tasks efficiently, which means that third parties involved in data processing may have access to sensitive information. This poses a risk of privacy leaks and information leakage for the companies generating the data. Consequently, due to these concerns, companies are hesitant to share their IIoT data with third parties.
The state of the art in addressing the anonymization problem involves various approaches such as encryption, homomorphic encryption, cryptographic techniques, and distributed/federated learning. However, these methods have limitations in terms of computational costs, explainability of ML models, and vulnerabilities to cyber-attacks. Furthermore, existing privacy preservation techniques often result in a trade-off between privacy and accuracy, where achieving high privacy protection leads to a significant loss in ML model accuracy. These challenges hinder the effective and efficient preservation of IIoT data privacy.
In this context, a research team from Kadir Has University in Turkey proposed a novel method that combines Generative Adversarial Networks (GAN) and Differential Privacy (DP) to preserve sensitive data in IIoT operations. The hybrid approach aims to achieve privacy preservation with minimal accuracy loss and low additional computational costs. The GAN is used to generate synthetic copies of sensitive data, while DP introduces random noise and parameters to maintain privacy. The proposed method is tested using publicly available datasets and a realistic IIoT dataset collected from a confectionery production process.
The authors propose a hybrid privacy-preserving approach for IIoT environments. Their method involves two main components: GAN and DP.
- GAN: They use GAN, specifically the Conditional Tabular GAN (CTGAN) approach, to create a synthetic copy (XG) of the original data set (XO). GAN learns the distribution of the data and generates synthetic data with similar statistics to the original.
- DP: To enhance privacy, they add random noise from a Laplace distribution to sensitive features in the data. This technique preserves privacy while maintaining the overall probability distribution of the data.
The proposed approach involves the following:
- Creating a synthetic data set with GAN.
- Replacing sensitive features.
- Applying differential privacy by adding random noise.
The resulting data set is privacy-preserving and can be used for machine learning analysis without compromising sensitive information. The algorithm’s complexity depends on the number of sensitive features and the size of the data set. The authors emphasize that their method ensures overall privacy protection for IIoT data.
The evaluation performed in this paper involved conducting experiments to test the proposed hybrid approach for privacy-preserving data synthesis and prediction. The experiments were done on four SCADA data sets: wind turbine, steam production, energy efficiency, and synchronous motors. The experiments used the CTGAN synthetic data generation and differential privacy (DP) techniques. The evaluation criteria included measuring accuracy using the R-squared metric and privacy preservation using six privacy metrics. The results showed that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods, such as CTGAN and DP. The experiments also tested the performance of the proposed method on data sets with hidden sensitive features and demonstrated its ability to protect such sensitive data.
In conclusion, the paper proposed a novel hybrid approach combining GAN and DP to address the anonymization problem in Industrial Internet of Things (IIoT) data. The proposed method involves creating a synthetic data set using GAN and applying DP by adding random noise to sensitive features. The evaluation results demonstrated that the proposed hybrid approach achieved higher accuracy and privacy preservation than other methods. This approach offers a promising solution for preserving sensitive data in IIoT environments while minimizing accuracy loss and computational costs.