Machine Unlearning: Teaching AI to Forget
Keywords:
Machine unlearning, Data privacy, Ethical AI, Right to be Forgotten, Model retraining, GDPR complianceAbstract
Over the past few years, artificial intelligence (AI) and machine learning (ML) systems have become excessively reliant on the collection of data. This reliance on data is to the benefit of model performance; however, it raises serious concerns about issues surrounding privacy, data retention, and ethical use, particularly given the introduction of regulatory instruments, including the General Data Protection Regulation (GDPR) and the “Right to be Forgotten”. Traditional machine learning models are not able to selectively forget certain data after the model’s training is Complete; therefore, it is difficult for users to request the deletion of data, or situations when data must be deleted because of legality, or ethics.
Machine unlearning is a new field of research that aims to explore algorithms and frameworks to unlearn specific training data in a model, without the overhead of complete retraining of the model. There are multiple techniques to achieve unlearning from a model, including complete retraining, sharded and isolated slicing (SISA), knowledge distillation, and approximate gradient-based removal. These techniques are each designed to further enhance efficiency versus appropriateness in forgetting.
Machine unlearning represents an area of completion that is more significant than simply regulatory compliance. It also allows for ethical AI because an system can effectively value the user’s choices in data, while facilitating that any out-of-date or sensitive information can be removed without compromising the model. As AI continues to permeate different sectors, such as healthcare, finance, and social media, we must ensure we can unlearn, for trust and transparency. Future work in this area will focus on scaling up and verifying unlearning effects in adversarial deep learning architectures, but we are getting closer.