Description

Adversarial attacks can manipulate our AI models. In this project, methods to detect such attacks are being studied in order to enhance the model's resilience.

Problem Context

Adversarial AI uses the specific qualities of AI models to manipulate them. Experiments show that almost all AI models are vulnerable to some type of Adversarial AI. Especially the use of open-source datasets and pretrained models is risky, as they can contain secret backdoors or malicious, "poisoned" data. This can result in misfunctioning of the model or the loss of valuable, personal, or secret information. As long as these vulnerabilities pose a threat for the robustness of AI models, organizations are rightfully weary of using open-source datasets and pretrained models for AI in critical tasks. This hampers them to take full advantage of the benefits of AI.

Solution

This project studies the inner workings of adversarial AI, poisoning attacks specifically. We study the behavior of an AI model when under attack, looking for metrics and mechanisms that allow us to detect when it has been poisoned. This way, we can make our AI models more resilient against such attacks. The detection mechanisms also enable organizations to find out to what extent their AI models are actually being attacked in practice.

Contact

  • Jip van Stijn, Scientist, e-mail: jip.vanstijn@tno.nl