The AI that your company buys (or builds) may not be as secure as you think!
As companies across the globe accelerate AI adoption, the corresponding security threats also continue to increase rapidly. More importantly, the potential consequences of security threats are becoming more severe as AI gets applied in critical business functions, especially for solving more high-end, more expensive problems. Apart from the regular security risks that threaten any hardware or software system, AI-powered applications face certain unique challenges that keep increasing as more early-stage, inadequately-tested AI innovations keep flooding the market at a rapid pace.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Security risks in AI systems
In late 2021, Cynerio (a cyber-security company) detected anomalous network traffic during the deployment of Aethon TUG autonomous robots in a hospital. Investigations revealed five zero-day vulnerabilities, including the risk of malicious Javascript code insertion, that could lead to hackers shutting down hospital doors and elevators, conducting illegal surveillance of patients and staff, or even carrying out more extensive cyberattacks on hospital networks. Cynerio termed this set of five vulnerabilities as JekyllBot:5, and reported them to the US Cybersecurity & Infrastructure Security Agency (CISA.) Read the full report here.
The Aethon TUG smart robots are deployed at hundreds of hospitals around the world for transportation of medicines and supplies, and other tasks. While it has been recently reported that all the five zero-day vulnerabilities are patched, the JekyllBot:5 issue is symptomatic of a larger industry problem: the inadequate security of many AI-powered systems. Security vulnerabilities are often inherent in algorithms, data systems, hardware & software engineering workloads, machine learning models, neural network architectures, and operational pipelines – these make AI systems highly vulnerable to different types of exploitations and malicious attacks.
For instance, recent NLP research has indicated that even a single imperceptible encoding injection can significantly impact the performance of vulnerable models, and three such injections can completely break down most standard models. This vulnerability affects a lot of real-world language models, even those developed by well-known AI companies. Even Computer Vision networks are susceptible to attacks through subtle perturbations that may not be perceptible to human vision. Similarly, data poisoning in recommender systems is a widely-accepted threat. Overfit models and high-dimensional classifiers may be particularly vulnerable to adversarial attacks.
In a 2020 IEEE report, the researchers identified 78 risks in Machine Learning systems, the top 10 being the following:
- Adversarial Examples
- Data Poisoning
- Online System Manipulation
- Transfer Learning Attack
- Data Confidentiality
- Data Trustworthiness
- Reproducibility
- Overfitting
- Encoding Integrity
- Output integrity
It is also observed that the security vulnerabilities sometimes increase if the AI applications are executed on legacy hardware, and IoT/edge/mobile devices. This is particularly true in APT (Advanced Persistent Threat) cases.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
A Summary of Critical Security Threats
AI security incidents are often classified into three categories:
- Extraction attacks that focus on extracting critical information about different aspects of the AI systems
- Manipulation attacks that focus on changing the system behaviour by surreptitiously introducing changes to the input data, training data, or to the model itself
- System-specific attacks (focus on exploiting the (conventional) software and hardware deficiencies of the overall system.
Many researchers also evaluate AI security from three aspects: Influence (causative & exploratory attacks), Security Violation (availability & integrity attacks), and Specificity (indiscriminate & targeted attacks.) While the categorization or nomenclature may differ from company to company, several of these threats may be overlapping in nature, or maybe interchangeably used.
Here are some of the most common security threats of AI systems.
Adversarial Attacks
In an Adversarial attack, unperceived perturbations (also called adversarial perturbations) are injected into the regular input samples during inferencing to produce adversarial examples. For instance, queries may be surreptitiously modified to infer irregular output that affects the general model performance, or specific output desired by the attacking entity that may cause large-scale damages. This is generally an exploratory attack aimed at availability violation.
Evasion attacks, like adversarial examples, are based on input manipulation where the attackers introduce new inputs or changes to the AI system, thereby leading to an anomalous or different output. Some researchers also club other types of attacks (e.g., extraction and poisoning) under adversarial attacks. Even the most advanced classifiers, particularly those related to images, may be vulnerable to adversarial attacks. In general, these attacks can be white-box (attackers have full information about the target models, e.g., parameters and network layers) or black-box (attackers can only observe the model outputs, and do not know the target model.)
Some of the common adversarial attacks include Carlini & Wagner Attack, Deepfool Attack, Fast Gradient Sign Method, Jacobian-based Saliency Map Attack, Limited-Memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), and Zeroth-Order Optimization Attack.
Backdoor & Software Trojan Attacks
These attacks ensure that models perform the original tasks as usual (e.g., no deviation from the expected model accuracies, or the defined criteria) but they secretly inject malicious tasks into the models (i.e., perform malicious tasks in a stealth manner. These attacks are not only difficult to identify (due to their stealth nature) but the severity of their impact is often significantly high. Software trojans are generally introduced during the training process through data poisoning, and the threats are actualized when the inputs containing the trojan triggers are activated.
Deep Learning introduces a new dimension to backdoor attacks. In principle, the focus of AI developers is to enable the models to learn specific tasks but not on understanding whether the models have learnt anything beyond these tasks. With Deep Learning, there is always a risk that the model learns something more, and this additional knowledge may be leveraged for backdoor attacks.
Data & Model Poisoning
Data Poisoning attacks occur during model training/retraining through the injection of small amounts of ‘poisoned’ data into the training (and/or test) datasets. The goal is to modify the statistical properties of the datasets on which models are trained, thereby increasing the risk of misclassified samples (error-generic attacks) or forcing the models to yield specific outputs as desired by the attacker (error-specific attacks) during inferencing. As these attacks threaten both the integrity and availability of AI applications, they are considered causative in nature. Clean label attacks, gradient descent attacks, and label-flipping attacks are common approaches to poisoning. The risks of training data attacks are high when open-source data sources are significantly leveraged.
In Model Poisoning, also sometimes referred to as Logic Corruption, the models are directly affected through manipulation of the algorithms & neural networks (e.g., by changing the parameters or network layers.) This can be indiscriminate or non-targeted poisoning (the predictions or accuracies for all inputs) as well as targeted poisoning (only specific inputs or classes may be misclassified.) In general, model poisoning is more severe than data poisoning, especially targeted poisoning attacks.
Hardware-based Attacks
Hardware-based attacks are of multiple types. compromise the defenses of many AI platforms. Here are a few of the main ones.
Hardware Trojans are introduced into the integrated circuit of the physical systems through malicious chip design, or fabrication. These trojans then compromise the integrity of the models by making malicious modifications that lead to incorrect model outputs from all or some of the input data. For instance, the computation block may be modified through MAC pruning. In many cases, the targeted incorrect labels may be hidden in the trojan triggers.
Fault injection is another type of attack where faults are injected into neural networks through parameters or functions. This injection may be inadvertent in nature (e.g., soft errors caused by damaged chips or defects in hardware architecture) or through serious threats (e.g., row hammer errors where DRAM is exploited by the attackers). The fault injections may end up causing large-scale problems, such as incorrect results being generated by computation blocks.
Side Channel attacks take place when attackers get access to physical information (e.g., cache information, electromagnetic leaks, memory access pattern, and power consumption) from which they determine the input data, and model architecture and parameters.
Model Extraction & Inversion
Model extraction attacks are aimed at extracting model information, such as architecture details (e.g., number of layers & activation functions), and parameters & hyperparameters. The goal is to construct the original models as closely as possible. Equation solving, metamodel training, and substitute model training are commonly used techniques.
Model extraction may be followed by model inversion attacks. This involves reconstructing the datasets on which the models are built. Furthermore, model inversion attacks can also be leveraged for membership inference attacks (determine whether a specific input sample was included in the training data) and property inference attacks (determine whether a certain statistical property exists in the training dataset.)
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
How secure are Prediction APIs?
APIs form the backbone of the modern software industry. At the same time, API-related security incidents are also a reality today. Security misconfigurations, leaky APIs, poor access control mechanisms, and other technical problems lead to major risks like account takeovers, data loss, remote code executions, etc. Some of the top API security vulnerabilities listed by the Open Web Application Security Project (OWASP) include Broken Function Level Authorization, Broken Object Level Authorization, Broken User Authentication, and Injection.
Prediction APIs have emerged as one of the most popular paradigms of AI consumption today, especially in the Machine Learning-as-a-Service (MLaaS) industry. However, the security of prediction APIs is a big challenge not only due to the regular API vulnerabilities, but also because they are particularly vulnerable to model extraction attacks. These attacks are aimed at targeting the confidentiality of AI applications by surreptitiously creating similar copies of the original models, thereby stealing model functionalities. Surrogate (or substitute) models are created by reverse-engineering the output (predictions) of the original models. A sequence of API queries is made with different inputs, and the resultant output data are carefully assessed to determine input-output mapping patterns. After a series of iterations, the substitute models may be able to generate almost the same predictions as the original models. These surrogate models can later be leveraged for adversarial attacks, avoidance of API usage charges, model inversion attacks, and other security incidents.
In general, AI applications that deal with structured data, leverage traditional machine learning techniques, or adopt gradient-based explanatory methods like saliency maps may witness high vulnerability to model extraction. Furthermore, recent research shows that even the functionalities of modern deep neural networks can be stolen. For instance, BERT-based APIs in Natural Language Processing increase the risk of model extraction – read this ICLR 2020 paper for more details.
Another critical point to note is that most AI companies deploy prediction APIs in a black-box fashion. In other words, the client organizations that consume these APIs have negligible information about the models that are deployed, how these models are developed and trained, and other techno-operational aspects of the machine learning pipeline. As a result, clients have no robust mechanism to determine the security of the machine learning software that they are consuming, and this can lead to severe damage. For instance, in the event of model poisoning attacks, the compromised predictions may lead to business losses, and even create large-scale downstream problems.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Transfer Learning has serious security risks.
Transfer learning is one of the most important innovations of recent times. It enables the transfer of knowledge (model structure, types of layers, parameters, etc.) from a pre-trained model to a new model (called a fine-tuned, or student model.) Despite the fact that the domains or the learning tasks of the student models may be different from that of the pre-trained ones, Transfer Learning generally accelerates the process of complex model development, and helps to overcome the data and compute challenges of many use cases.
In practice, the pre-trained models are trained on very large datasets, and released for downstream AI development. The downstream developers then build new models by leveraging only the early layers (and sometimes, also the mid-layers) of the pre-trained models. The new model training involves two approaches: feature extraction (i.e., deriving feature extractors from the pre-trained layers), and fine-tuning (certain pre-trained parameters of the feature extractors are modified to address the specific needs of the downstream domains.)
Data Science teams often leverage Open Source-based pre-trained models but are seldom known to conduct adequate due diligence on the authors of those models, or how those models were developed and operationalized. This leads to a couple of security risks:
- the risk of using malicious pre-trained models of adversarial entities
- the risk of using compromised pre-trained models of genuine AI developers
When malicious or compromised teacher models are leveraged for pre-training, the corresponding student models end up inheriting some or all the vulnerabilities of the teacher models. The rise of transfer learning has increased the risk of Backdoor and Trojaning attacks, particularly in autonomous driving and facial recognition.
Moreover, the student models may also inherit the fingerprintable behaviors of the safe teacher models (called teacher model fingerprinting.) An example of this is the feature map of the pre-trained models that may, at least to some extent, be passed on to the fine-tuned models. Information of this feature map may be extracted by adversaries through specifically-crafted queries to the student models while leveraging prediction APIs, or MLaaS platforms. Read this paper for more details.
Most deep learning models conduct implicit memorization that may be exploited to recover sensitive information from the training data, thus leading to leakage of membership information, data properties, or reconstructed feature values. Furthermore, the extensive use of transformers today increases the risk of model manipulation, thus causing changes in the overall system behavior.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Poisoning attacks on Federated Learning systems
Federated AI has witnessed increased adoption in recent years, particularly in edge applications, or in scenarios where user privacy may be absolutely critical. This learning paradigm enables models to be trained over multiple datasets that reside in different local hosts without the need for any dataset to be stored centrally, or accessed directly by the AI developers. Models are sent to the individual datasets, and each data provider (participant) conducts training locally, and sends the updated model parameters to a centralized server (aggregator). The final (global) models are developed through the aggregation of all the local updates.
Poisoning in federated learning is linked to Byzantine threats, i.e., the misbehavior (or failure) of one or more participants in a distributed system. In most cases, there may not be any central authority validating the local datasets on which the models are trained, or even the updated model parameters that the participants share for aggregation. As a result, even a few malicious participants may cause massive problems, and the more malicious players get added, the more vulnerable the systems become. Microsoft’s Tay chatbot is a prime example. The underlying NLP models quickly learned offensive and racist language from malicious users, and the chatbot had to be pulled down in less than 24 hours post-launch.
Federated learning is vulnerable to both data poisoning and model poisoning. Note that the localized training of models on the individual hosts (e.g., smartphones & edge devices) enable a new attack surface due to the risk that some of these hosts may be compromised. Adversarial model replacement is a poisoning technique through which malicious participants introduce backdoor functionalities into the global models. Another threat is Sybil attacks where pseudo-entities (Sybil identities) are introduced in the learning environment, thereby enabling the development of partly or fully invalid models. Moreover, security risks may also arise from compromised central servers, poor aggregation algorithms, and high-risk communication protocols between the AI developers and any of the participants.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
The security vulnerabilities of multi-GPU platforms
Complex machine learning workloads (e.g., training large neural networks) often require high amounts of compute and memory that individual GPUs may not be able to provide. Multi-GPU systems have emerged as a great solution for resource-intensive AI workloads by integrating multiple single GPUs, enabling unified memory accesses, and deploying high-speed network communication.
Recent research indicates that multi-GPU platforms, including those developed by industry pioneers, may have a high vulnerability to micro-architectural covert and side channel attacks. It is common for these platforms to simultaneously run multiple AI applications belonging to different business functions, or even different companies. These applications may covertly communicate with one another (covert channel), or may spy on other applications (side channel), often through the shared memory access. Moreover, these attacks may also be able to bypass common isolation-based defense mechanisms. Note that these vulnerabilities also affect multi-CPU systems that are often used to run traditional data science applications.
Furthermore, it has also been observed that unexpected attack channels crop up as GPU systems get augmented with new features. For instance, unified virtual memory and multi-process service are two features that improve the efficiency and programmability of GPU kernels – however, they may also create covert-timing channels through the translation lookaside buffer hierarchy. Read this report to deep dive into this example. The risks tend to get further aggravated in the case of multi-GPU systems.
[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]
Conclusion
The Asilomar AI Principles included two points on safety and risks:
- AI systems should be safe and secure throughout their operational lifetime, and verifiably so where applicable and feasible.
- Risks posed by AI systems, especially catastrophic or existential risks, must be subject to planning and mitigation efforts commensurate with their expected impact.
While the pace of AI adoption continues to increase, the above principles are occasionally applied in practice, even by some industry leaders. Here are some of the important reasons.
- The Talent Gap: AI & Data Science companies that are entrusted with developing and implementing AI applications often lack adequate talent in cybersecurity and high-end engineering. The AI security threats will continue to be sub-optimally addressed as long as this talent gap itself remains unaddressed.
- Inherent security risks in AI technologies: Many security risks and vulnerabilities are inherent in AI technologies and innovations. For instance, over-parameterized models (which are commonly used in Deep Learning), or very wide neural networks are highly vulnerable to threats like surgically-induced data perturbations. As new AI innovations get introduced, the list of these security threats also keeps increasing.
- Over-reliance on conventional security: Companies that build or deploy AI may rely entirely on conventional security techniques and protocols, such as access controls, data encryption, firewalls, and 3rd-party Cloud infrastructure. These are necessary but not sufficient measures for safeguarding AI applications. Consequently, the systems may still fall short in terms of critical security risk mitigation.
- The limitations of Open Source: Many AI teams leverage Open-Source datasets or codebases without adequate due diligence. This is a serious gap, and may easily introduce significant security vulnerabilities into the AI development process.
- Technical debt in AI: AI applications are subject to new types of technical debt (over & above the conventional ones), such as data debt and model debt. Additionally, it is not uncommon to see AI code smells and anti-patterns remain largely unaddressed in the repositories. While their impact is generally in terms of maintainability, performance or reliability, the long-term technical debt accumulated over time may also induce certain security vulnerabilities.
- Early-stage AI ecosystem: Most tools, techniques and technologies in AI are still evolving. For instance, explainability or interpretability in Deep Learning is often a black box. When the workings of any technology cannot be accurately explained or interpreted, the limitations of that technology, including security risks, may also remain partly or entirely unknown.
Safeguarding against, and building resilience to security attacks, breaches and exploitations are not easy to achieve. Two measures are critically important.
The first one is to ensure that both AI developers and implementers build an adequate understanding of the security vulnerabilities that are inherent in each AI technology, and the general hardware and software security threats that may threaten any system. The second measure is to integrate modern security engineering practices into the overall AI development-deployment-monitoring cycle.
This is the first of a two-part article series. This paper focuses on the critical security challenges in AI. The second one will focus on techniques and solutions to address these challenges.
References
Towards a Robust and Trustworthy Machine Learning System Development: An Engineering Perspective (Xiong et al, 2022)
Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems (Dutta et al, 2022)
Security of Neural Networks from Hardware Perspective: A Survey and Beyond (Xu et al, 2021)
A survey on security and privacy of federated learning (Mothukuri et al, 2020)
Towards Security Threats of Deep Learning Systems: A Survey (He et al, 2020)