OWASP Large Language Model (LLM) Top 10 Explained

Since their inception, LLMs or large language models have rapidly integrated into various fields over the past couple of years, giving rise to a new set of security challenges in the field. LLMs like ChatGPT or GitHub’s Co-Pilot are also prone to cyber-attacks, where exploiting a single vulnerability can disrupt thousands of organizations that rely on them for their day-to-day activities.

To address this security gap, OWASP has created a list that helps identify and mitigate the most severe vulnerabilities that could deeply impact LLMs.

The OWASP Large Language Model (LLM) Top 10 List

Prompt Injection
Insecure Output Handling
Training Data Poisoning
Model Denial of Service
Supply Chain Vulnerabilities
Sensitive Information Disclosure
Insecure Plugin Design
Excessive Agency
Overreliance
Model Theft

What is the OWASP LLM Top 10?

The OWASP Large Language Model (LLM) Top 10 lists the most frequent and significant security risks in large language model applications. It aims to educate developers, designers, and organizations about potential security risks that may arise from the deployment of large language models.

OWASP Large Language Model (LLM) Top 10 Explained with Examples

LLM01: Prompt Injection

Prompt Injection is an attack in which the attacker uses crafted input prompts to manipulate the LLM to execute unintended actions or extract sensitive information.

It can be performed in two ways:

Direct Prompt Injections, or jailbreaking, is an attack in which the attackers modify or reveal the system prompts, allowing them to interact with the underlying system by exploiting the insecure functions.
Indirect Prompt Injection is a method in which LLMs accept external input, and attackers send specially crafted input prompts that allow them to manipulate the users or the systems associated with the LLM.

ℹ️ In 2022, researchers from OpenAI discovered that GPT-3 was vulnerable to prompt injection attacks. Specially crafted input prompts could cause the model to perform malicious and unintended actions.

Mitigation Suggestions

Establish Trust boundaries and treat LLM as an external user
Set up proper access control mechanisms to limit access to the backend
Separate the external user prompts from the predefined prompts

Example Attack Scenarios for Prompt Injections

An attacker inputs specially crafted prompts to trick the LLM intro into revealing confidential information about the application, such as API keys, or modifying the outputs to perform unintended actions.

LLM02: Insecure Output Handling

Insecure Output Handling is a vulnerability that occurs due to insufficient validation and sanitization of input and output and improper handling of the output generated by the LLMs before it is passed on to the applications.

This security weakness can lead to vulnerabilities like XSS, SSRF, CSRF or even remote code execution.

ℹ️ In 2024, researchers discovered that the Mintplex Labs chatbot was vulnerable to a cross-site scripting attack due to a lack of input sanitization. It granted users indirect access to functionalities with the help of malicious input prompts.

Mitigation Suggestions:

Setup Input Validation and Sanitization mechanisms.
Implement Output Encoding before directing the output to end users.
Implement proper access controls to avoid the processing of sensitive commands or prompts.

Example Attack Scenarios for Insecure Output Handling

An LLM allows the crafting of SQL queries for a database, and the attackers request a query to update the users’ table or delete all the tables. If this query is not validated and sanitized properly, it could delete all the databases associated with the LLMs.

LLM03: Training Data Poisoning

Training Data Poisoning is a weakness that occurs when attackers modify or manipulate the training data with harmful data. This causes the LLM to learn from incorrect or biased data and produce skewed predictions.

This can cause the poisoned data to be served to users or lead to issues like software exploitation, which can harm the brand’s reputation.

ℹ️ In 2023, researchers used data augmentation techniques to create trojan LLMs by adding malicious data to the training data set, allowing them to embed backdoors in various LLMs. The models generated predefined responses to specific prompts that could trigger the backdoor.

Mitigation Suggestions

Verify the source of the training data
Use strict input filters to allow specific data for training.
Monitor and Verify the data for anomalies using a data curation tool.

Beyond these preventive measures, AI penetration testing services can help simulate real-world data poisoning attacks to test your model’s resilience against malicious training data.

Example Attack Scenarios for Training Data Poisoning

A competitor or attackers create documents with biased information to feed to the targeted LLMs training data, which causes it to unintentionally generate a more biased output, which benefits the attackers.

LLM04: Model Denial of Service

Model Denial of Service (DoS) is a type of attack in which attackers cause resource-heavy operations to disrupt the availability of the LLM, slowing it down or making it unavailable to users or associated applications.

This could also lead to the model learning from this barrage of inputs and allowing the attacker to manipulate the context window(Input length) set by the LLM.

ℹ️ In 2023, OpenAI suffered “periodic outages” on its API and ChatGPT services due to a DDoS attack on its infrastructure

Mitigation Suggestions

Implement proper request throttling or rate-limiting mechanisms.
Set a strict input length limit for the context window.
Continuously monitor the utilization of resources and restrict excessive utilization.

Example Attack Scenarios for Model Denial of Service

Attackers can flood the LLM with a high volume of long inputs that can reach the limit of the defined context window, causing strain and increased resource usage, making it unresponsive.

LLM05: Supply Chain Vulnerabilities

Supply chain vulnerabilities generally occur in LLM applications when the third-party resources or libraries used in the development introduce external security risks to the applications.

Such a vulnerability can result in the application becoming un-operational and even lead to data breaches.

ℹ️ In 2022, Hugging Face, which was using a third-party NLP plugin with a critical vulnerability, was injected with malicious code by attackers allowing them to achieve remote code execution.

Mitigation Suggestions

Use Verified and secure third-party dependencies
Use Model and Code Signing
Regularly update third-party components

Example Attack Scenarios for Supply Chain Vulnerabilities

Attackers can manipulate and inject publicly available datasets with malicious inputs to generate a backdoor when it is used to retrain a model. This allows the attackers to perform unauthorized actions or even access sensitive data in that environment. Proactive AI penetration testing can help detect such supply chain–driven weaknesses early in the model lifecycle.

LLM06: Sensitive Information Disclosure

Sensitive Data Exposure is a vulnerability that can occur when the LLM reveals sensitive information about the system or the algorithms being used in its output. Attackers can access such information to gain unauthorized access to the system.

It can lead to data breaches and loss of Intellectual Property and cause legal penalties for non-compliance and privacy violations.

ℹ️ In 2022, researchers discovered that a chatbot created by a healthcare organization was exposing sensitive patient information to its users due to a lack of sanitization. This data breach exposed personal health records, leading to legal consequences for non-compliance.

Mitigation Suggestions

Implement data sanitization mechanisms.
Implement strict data access policies.
Monitor model outputs for information leaks.

Example Attack Scenarios for Sensitive Information Disclosure

Attackers can craft malicious input prompts that exploit the absence or ineffectiveness of the input validation mechanisms deployed in the LLMs to reveal the PII of other application users.

LLM07: Insecure Plugin Design

Insecure Plugin Design occurs when the LLM plugins introduce vulnerabilities into the system. These vulnerabilities can cause the LLMs to accept insecure input prompts or have improper access control mechanisms, making them easier targets for attackers to exploit.

Due to the lack of proper authentication and authorization, the plugins trust the data input via other plugins, causing data leakages, privilege escalation, or remote code execution, which leads to system failure.

ℹ️ In 2023, OpenAI’s ChatGPT faced a security vulnerability due to an insecure plugin design. A third-party plugin that enhanced the chatbot’s functionality allowed for arbitrary code execution.

Mitigation Suggestions

Follow secure coding practices for development.
Restrict plugin access to data and functions.
Plugins should use proper authentication systems to maintain access controls.

Example Attack Scenarios for Insecure Plugin Design

An insecure plugin can be targeted by attackers to input content generated by other insecure LLMs and perform any unauthorized actions as the plugin assumes the data is being input by an end user.

LLM08: Excessive Agency

Excessive Agency is a vulnerability that occurs when the LLMs are given control over crucial functions or are given excessive permissions. These actions can cause unintended damage to the data or the associated applications.

It can deeply impact the confidentiality, integrity, and availability of the associated applications and introduce various vulnerabilities.

Mitigation Suggestions

Limit the autonomy and permissions of LLMs
Implement human-in-loop controls for critical operations
Implement authorization in downstream applications

Example Attack Scenarios for Excessive Agency

Attackers can exploit a plugin developed for executing specific shell commands by feeding it instructions to perform undefined and higher privilege commands to gain unauthorized access or perform unintended functions.

LLM09: Overreliance

Overreliance in LLMs can occur when the output generated by the models is trusted by the users and the applications without any validation or confirmation on whether the generated outputs are accurate.

This overreliance on the LLMs’ outputs can lead to miscommunication or introduce security vulnerabilities due to incorrect outputs.

ℹ️ In 2023, a novice lawyer used ChatGPT to write a motion for him and generate the document. Even though he provided the model with proper data and input prompts, the model generated several fake citations, which led to a false case and the lawyer’s disbarment.

Mitigation Suggestions

Use LLM outputs as recommendations.
Review and validate LLM outputs for accuracy.
Enhance the model by fine-tuning output quality.

Example Attack Scenarios for Overreliance

A healthcare system designed to provide diagnosis and prescribing medicines, due to overreliance, can lead to incorrect and harmful treatments or procedures and can be fatal for the patients.

LLM10: Model Theft

Model Theft refers to unauthorized access to the LLM, which allows attackers to gain sensitive information about the model, such as its parameters, which can be used to replicate the model and use it for themselves.

This vulnerability leads to financial and reputational loss to organizations and creates mistrust amongst the users of the LLMs.

Mitigation Suggestions

Secure LLM with strong encryption suites.
Implement robust access control mechanisms.
Regularly monitor and audit the access logs.

Example Attack Scenarios for Model Theft

An attacker can query the model repeatedly with selected inputs, collecting sensitive information from the outputs to replicate the model and use it for themselves without access to the original one.

Final Thoughts

The OWASP LLM Top 10 lists critical security risks associated with LLMs. Identifying and mitigating these vulnerabilities helps developers and organizations ensure secure and robust models.

Practices like role-based access controls, strong cipher suites, input validation, and regular monitoring of the LLMs and their output can help protect them from various cyberattacks.