What is AI/LLM Security | AI/LLM Security

Defining AI/LLM Security

AI/LLM security is the discipline of identifying, assessing, and mitigating risks unique to systems that incorporate artificial intelligence — particularly large language models — into their architecture. It sits at the intersection of traditional application security, data security, and a set of entirely novel threat categories that emerge from the probabilistic, natural-language-driven nature of modern AI systems.

Unlike traditional software security where we protect deterministic code from well-understood exploit classes, AI security must contend with systems that:

Accept natural language as a primary input — blurring the boundary between instructions and data
Produce non-deterministic outputs — the same input can yield different results
Learn from and potentially memorize training data — creating novel data leakage vectors
Can be given autonomous agency — making decisions and taking actions with real-world consequences
Exhibit emergent behaviors — capabilities and vulnerabilities that weren’t explicitly programmed

Why AI Security Matters

1. Expanding Attack Surface

Every LLM integration adds a new class of entry point. A traditional web application might have forms, APIs, and file uploads. An LLM-powered application adds:

Natural language inputs that bypass conventional input validation
Retrieved documents that can carry injected instructions
Tool calls that bridge the LLM to backend systems
Multi-modal inputs (images, audio, PDFs) that can contain hidden payloads

2. Data Exposure at Scale

LLMs process and generate text that may contain sensitive information:

Training data memorization: Models can regurgitate snippets of training data, including PII, credentials, and proprietary code
Context window leakage: System prompts, conversation history, and RAG-retrieved documents can be exfiltrated through crafted prompts
Inference data logging: Prompts and completions flowing through APIs create new data stores that must be protected

3. Autonomous Action

Agentic AI systems can execute code, call APIs, send emails, modify databases, and interact with external services. A compromised agent doesn’t just leak data — it can take destructive action on behalf of the attacker with whatever permissions it has been granted.

4. Scale of Deployment

LLMs are being integrated into virtually every category of software — from customer support to code generation to medical diagnosis to legal research. The blast radius of a novel LLM vulnerability class is enormous because it potentially affects every system using that pattern.

5. Non-Determinism and Unpredictability

Traditional security testing assumes reproducible behavior. LLMs introduce:

Temperature-based randomness in outputs
Sensitivity to prompt phrasing and ordering
Emergent capabilities that appear at certain model scales
Behavior changes after model updates or fine-tuning

This makes comprehensive security testing significantly harder — a prompt injection that fails 99 times might succeed on the 100th attempt.

How AI Security Differs from Traditional Software Security

Dimension	Traditional Software Security	AI/LLM Security
Input nature	Structured data (forms, JSON, SQL)	Natural language, images, audio — unstructured and ambiguous
Input validation	Well-defined schemas, type checking, allowlists	No reliable way to separate instructions from data in natural language
Behavior model	Deterministic — same input produces same output	Probabilistic — outputs vary based on sampling, temperature, context
Attack taxonomy	Mature (OWASP Top 10, CWE, MITRE ATT&CK)	Emerging and evolving rapidly (OWASP Top 10 for LLMs, MITRE ATLAS)
Vulnerability discovery	Code review, SAST, DAST, fuzzing	Red-teaming, adversarial probing, manual prompt testing
Patching	Deploy code fix, vulnerability is resolved	Retraining is expensive; guardrails can often be bypassed
Trust boundaries	Clear (client/server, user/admin, internal/external)	Blurred — the LLM processes trusted and untrusted content in the same context window
Exploitation	Requires technical skill (crafting payloads, understanding protocols)	Can be done in plain English — dramatically lower barrier to entry
Defense maturity	Decades of tools, frameworks, and best practices	Early stage — most defenses are heuristic-based and incomplete
Supply chain	Libraries, packages, containers	All of the above PLUS model weights, training data, fine-tuning datasets, embedding models
Compliance frameworks	Well-established (PCI DSS, SOC 2, HIPAA)	Emerging (EU AI Act, NIST AI RMF, ISO/IEC 42001)

The Fundamental Difference: No Code/Data Separation

In traditional computing, there is a clear distinction between code (instructions the machine executes) and data (information the machine processes). SQL injection, XSS, and command injection all exploit failures to maintain this boundary — but the boundary itself exists and can be enforced.

In LLM systems, there is no inherent separation between instructions and data. The system prompt, user input, retrieved documents, and tool outputs are all processed as a single stream of tokens. The model must infer which tokens are instructions to follow and which are content to process — and this inference can be manipulated. This is why prompt injection is often called the “SQL injection of AI” — except there is no prepared statement equivalent that fully solves it.

The AI Threat Landscape

Threat Categories Overview

mindmap
  root((AI/LLM<br/>Threat Landscape))
    **Input Attacks**
      Prompt Injection
      Jailbreaking
      Adversarial Examples
      Multi-modal Exploits
    **Model Attacks**
      Data/Model Poisoning
      Model Theft / Extraction
      Backdoor Attacks
      Membership Inference
    **Output Attacks**
      Hallucination & Misinfo
      Unsafe Code Generation
      PII Leakage
      Improper Output Handling
    **Supply Chain**
      Malicious Models
      Poisoned Datasets
      Compromised Frameworks
      Namespace Hijacking
    **Agentic Risks**
      Excessive Agency
      Tool Misuse
      Confused Deputy
      Privilege Escalation
      Auto-exploit
    **Infra & Ops**
      Unbounded Consumption
      API Key Exposure
      Model Denial of Service
      Prompt/Data Logging

1. Prompt Injection

The most discussed and arguably most dangerous LLM vulnerability. An attacker crafts input that causes the model to deviate from its intended instructions.

Direct prompt injection: The user directly instructs the model to ignore its system prompt or perform unauthorized actions.
Indirect prompt injection: Malicious instructions are embedded in external content (web pages, documents, emails) that the LLM retrieves and processes.

2. Data and Model Poisoning

Corrupting the training data or fine-tuning data to introduce backdoors, biases, or malicious behaviors into the model. This can happen at the pre-training, fine-tuning, or RAG data layer.

3. Model Theft and Extraction

Extracting proprietary model weights, architecture, or training data through:

Repeated queries to reverse-engineer model behavior
Side-channel attacks on inference infrastructure
Insider threats with access to model artifacts

4. Supply Chain Attacks

Compromising upstream components: malicious models on Hugging Face, backdoored fine-tuning datasets, compromised inference frameworks, or poisoned vector database content.

5. Privacy Attacks

Extracting sensitive information that the model memorized during training:

Training data extraction (membership inference, data extraction attacks)
System prompt leakage
PII exposure in generated outputs

6. Misuse and Abuse

Using LLMs to generate harmful content at scale:

Deepfakes and synthetic media
Phishing and social engineering content
Malware generation
Disinformation campaigns

7. Agent Exploitation

Targeting LLM-based agents that have tool access:

Hijacking agent actions through prompt injection
Exploiting excessive permissions (confused deputy attacks)
Chaining multiple tool calls to achieve unauthorized objectives

The CIA Triad Applied to AI Systems

The classic CIA triad — Confidentiality, Integrity, and Availability — maps directly to AI/LLM security, but with AI-specific nuances:

Confidentiality

Traditional	AI-Specific
Protect databases and files from unauthorized access	Prevent extraction of training data, system prompts, and conversation history
Encrypt data at rest and in transit	Protect model weights as intellectual property
Access control on sensitive resources	Prevent PII leakage through generated outputs
	Ensure RAG-retrieved documents respect authorization boundaries
	Protect embedding vectors from inversion attacks

Integrity

Traditional	AI-Specific
Prevent unauthorized modification of data	Prevent data/model poisoning that corrupts model behavior
Ensure code hasn’t been tampered with	Verify model provenance and supply chain integrity
Input validation to prevent injection attacks	Defend against prompt injection that manipulates outputs
	Prevent hallucinations that present false information as fact
	Ensure RAG pipeline retrieves authentic, untampered documents

Availability

Traditional	AI-Specific
Protect against DDoS	Prevent model denial of service via resource-intensive prompts
Ensure system uptime	Protect against unbounded consumption (token exhaustion, compute abuse)
Capacity planning and scaling	Rate limiting on inference APIs
	Prevent model extraction attacks that degrade service quality
	Guard against adversarial inputs that cause inference failures

Attack Surface of LLM Applications

An LLM application has a layered attack surface. Understanding each layer is critical for building a comprehensive defense strategy.

graph TD
    USER["USER / ATTACKER"]

    USER --> L1

    subgraph L1["LAYER 1: INPUT LAYER"]
        L1_DESC["User prompts (text, images, audio, files)<br/>API requests with prompt parameters<br/>Conversation history and session context<br/>Multi-modal inputs (images with hidden text, audio commands)"]
        L1_ATK["Attacks: Direct prompt injection, jailbreaking,<br/>adversarial images, resource exhaustion via long inputs"]
    end

    L1 --> L2

    subgraph L2["LAYER 2: RETRIEVAL LAYER (RAG)"]
        L2_DESC["Document ingestion pipeline<br/>Embedding model (converts text to vectors)<br/>Vector database (stores and searches embeddings)<br/>Retrieved context (documents injected into prompt)"]
        L2_ATK["Attacks: Indirect prompt injection via documents,<br/>RAG poisoning, embedding inversion,<br/>vector DB manipulation, document metadata injection"]
    end

    L2 --> L3

    subgraph L3["LAYER 3: ORCHESTRATION / AGENTIC LAYER"]
        L3_DESC["System prompt / instruction template<br/>Tool definitions and function schemas<br/>Chain-of-thought / planning logic<br/>Memory systems (short-term and long-term)<br/>Multi-agent communication"]
        L3_ATK["Attacks: System prompt extraction, tool misuse via injection,<br/>excessive agency, confused deputy, memory poisoning,<br/>inter-agent prompt injection"]
    end

    L3 --> L4

    subgraph L4["LAYER 4: MODEL LAYER"]
        L4_DESC["Model weights and parameters<br/>Fine-tuning data and process<br/>Alignment / safety training (RLHF, Constitutional AI)<br/>Inference configuration (temperature, top-p, max tokens)"]
        L4_ATK["Attacks: Model theft/extraction, training data extraction,<br/>fine-tuning poisoning, jailbreak bypass of alignment,<br/>membership inference"]
    end

    L4 --> L5

    subgraph L5["LAYER 5: INFRASTRUCTURE / RUNTIME LAYER"]
        L5_DESC["API gateway and authentication<br/>Logging and monitoring pipeline<br/>Cloud platform (Azure, AWS, GCP)<br/>GPU/TPU compute resources<br/>Network configuration and egress controls"]
        L5_ATK["Attacks: API key theft, log injection, side-channel attacks<br/>on GPU memory, resource exhaustion / billing attacks,<br/>insufficient egress filtering"]
    end

Defense-in-Depth for AI Systems

No single layer of defense is sufficient. A robust AI security posture requires controls at every layer:

Input layer: Input filtering, content moderation, rate limiting, multi-modal scanning
Retrieval layer: Document provenance verification, embedding monitoring, access-control-aware retrieval
Orchestration layer: Least-privilege tool permissions, human-in-the-loop for sensitive actions, output filtering before tool execution
Model layer: Model provenance verification, alignment testing, red-teaming, guardrail models
Infrastructure layer: Standard cloud security controls, API authentication, logging, network segmentation, cost controls

Emerging Regulatory Landscape

AI security is increasingly shaped by regulation:

Framework	Scope	Key Requirements
EU AI Act (2024)	EU-wide, risk-based regulation	Risk classification, transparency, conformity assessments for high-risk AI
NIST AI Risk Management Framework	US voluntary framework	Governance, mapping, measuring, and managing AI risks
ISO/IEC 42001	International standard	AI management system requirements
MITRE ATLAS	Knowledge base	Adversarial threat landscape for AI systems, ATT&CK-style framework
OWASP Top 10 for LLMs	Application security guidance	Top 10 risks specific to LLM applications (covered in the next section)
Executive Order 14110 (US, 2023)	US policy	Safety and security requirements for frontier AI models

Key Takeaways

AI security is not just “infosec + AI” — it requires understanding fundamentally new threat categories that don’t exist in traditional software.
The attack surface is layered — from user input through retrieval, orchestration, model, and infrastructure. Each layer requires distinct security controls.
Natural language is the new attack vector — prompt injection is possible precisely because LLMs cannot reliably distinguish instructions from data. This is an unsolved problem.
The barrier to exploitation is lower than ever — many AI attacks can be executed in plain English, without any programming or security expertise.
Defense-in-depth is essential — no single guardrail, filter, or alignment technique is sufficient. Security must be applied at every layer of the AI application stack.
The regulatory environment is evolving rapidly — organizations deploying AI must track emerging requirements from the EU AI Act, NIST AI RMF, and sector-specific guidance.

References

OWASP. (2025). “OWASP Top 10 for Large Language Model Applications.” https://owasp.org/www-project-top-10-for-large-language-model-applications/
MITRE. (2024). “ATLAS — Adversarial Threat Landscape for AI Systems.” https://atlas.mitre.org/
NIST. (2023). “AI Risk Management Framework (AI RMF 1.0).” https://www.nist.gov/artificial-intelligence/risk-management-framework
European Commission. (2024). “EU Artificial Intelligence Act.” https://artificialintelligenceact.eu/
ISO/IEC. (2023). “ISO/IEC 42001:2023 — Artificial Intelligence Management System.” https://www.iso.org/standard/81230.html
Greshake, K. et al. (2023). “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.” https://arxiv.org/abs/2302.12173
Perez, F. & Ribeiro, I. (2022). “Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs through a Global Scale Prompt Hacking Competition.” https://arxiv.org/abs/2311.16119
Carlini, N. et al. (2023). “Extracting Training Data from Large Language Models.” https://arxiv.org/abs/2012.07805
The White House. (2023). “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.” https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
Anthropic. (2023). “Anthropic’s Responsible Scaling Policy.” https://www.anthropic.com/index/anthropics-responsible-scaling-policy