← Back to AI/LLM Security

What is AI/LLM Security

10 min read

Defining AI/LLM Security

AI/LLM security is the discipline of identifying, assessing, and mitigating risks unique to systems that incorporate artificial intelligence — particularly large language models — into their architecture. It sits at the intersection of traditional application security, data security, and a set of entirely novel threat categories that emerge from the probabilistic, natural-language-driven nature of modern AI systems.

Unlike traditional software security where we protect deterministic code from well-understood exploit classes, AI security must contend with systems that:

  • Accept natural language as a primary input — blurring the boundary between instructions and data
  • Produce non-deterministic outputs — the same input can yield different results
  • Learn from and potentially memorize training data — creating novel data leakage vectors
  • Can be given autonomous agency — making decisions and taking actions with real-world consequences
  • Exhibit emergent behaviors — capabilities and vulnerabilities that weren’t explicitly programmed

Why AI Security Matters

1. Expanding Attack Surface

Every LLM integration adds a new class of entry point. A traditional web application might have forms, APIs, and file uploads. An LLM-powered application adds:

  • Natural language inputs that bypass conventional input validation
  • Retrieved documents that can carry injected instructions
  • Tool calls that bridge the LLM to backend systems
  • Multi-modal inputs (images, audio, PDFs) that can contain hidden payloads

2. Data Exposure at Scale

LLMs process and generate text that may contain sensitive information:

  • Training data memorization: Models can regurgitate snippets of training data, including PII, credentials, and proprietary code
  • Context window leakage: System prompts, conversation history, and RAG-retrieved documents can be exfiltrated through crafted prompts
  • Inference data logging: Prompts and completions flowing through APIs create new data stores that must be protected

3. Autonomous Action

Agentic AI systems can execute code, call APIs, send emails, modify databases, and interact with external services. A compromised agent doesn’t just leak data — it can take destructive action on behalf of the attacker with whatever permissions it has been granted.

4. Scale of Deployment

LLMs are being integrated into virtually every category of software — from customer support to code generation to medical diagnosis to legal research. The blast radius of a novel LLM vulnerability class is enormous because it potentially affects every system using that pattern.

5. Non-Determinism and Unpredictability

Traditional security testing assumes reproducible behavior. LLMs introduce:

  • Temperature-based randomness in outputs
  • Sensitivity to prompt phrasing and ordering
  • Emergent capabilities that appear at certain model scales
  • Behavior changes after model updates or fine-tuning

This makes comprehensive security testing significantly harder — a prompt injection that fails 99 times might succeed on the 100th attempt.


How AI Security Differs from Traditional Software Security

DimensionTraditional Software SecurityAI/LLM Security
Input natureStructured data (forms, JSON, SQL)Natural language, images, audio — unstructured and ambiguous
Input validationWell-defined schemas, type checking, allowlistsNo reliable way to separate instructions from data in natural language
Behavior modelDeterministic — same input produces same outputProbabilistic — outputs vary based on sampling, temperature, context
Attack taxonomyMature (OWASP Top 10, CWE, MITRE ATT&CK)Emerging and evolving rapidly (OWASP Top 10 for LLMs, MITRE ATLAS)
Vulnerability discoveryCode review, SAST, DAST, fuzzingRed-teaming, adversarial probing, manual prompt testing
PatchingDeploy code fix, vulnerability is resolvedRetraining is expensive; guardrails can often be bypassed
Trust boundariesClear (client/server, user/admin, internal/external)Blurred — the LLM processes trusted and untrusted content in the same context window
ExploitationRequires technical skill (crafting payloads, understanding protocols)Can be done in plain English — dramatically lower barrier to entry
Defense maturityDecades of tools, frameworks, and best practicesEarly stage — most defenses are heuristic-based and incomplete
Supply chainLibraries, packages, containersAll of the above PLUS model weights, training data, fine-tuning datasets, embedding models
Compliance frameworksWell-established (PCI DSS, SOC 2, HIPAA)Emerging (EU AI Act, NIST AI RMF, ISO/IEC 42001)

The Fundamental Difference: No Code/Data Separation

In traditional computing, there is a clear distinction between code (instructions the machine executes) and data (information the machine processes). SQL injection, XSS, and command injection all exploit failures to maintain this boundary — but the boundary itself exists and can be enforced.

In LLM systems, there is no inherent separation between instructions and data. The system prompt, user input, retrieved documents, and tool outputs are all processed as a single stream of tokens. The model must infer which tokens are instructions to follow and which are content to process — and this inference can be manipulated. This is why prompt injection is often called the “SQL injection of AI” — except there is no prepared statement equivalent that fully solves it.


The AI Threat Landscape

Threat Categories Overview

mindmap
  root((AI/LLM<br/>Threat Landscape))
    **Input Attacks**
      Prompt Injection
      Jailbreaking
      Adversarial Examples
      Multi-modal Exploits
    **Model Attacks**
      Data/Model Poisoning
      Model Theft / Extraction
      Backdoor Attacks
      Membership Inference
    **Output Attacks**
      Hallucination & Misinfo
      Unsafe Code Generation
      PII Leakage
      Improper Output Handling
    **Supply Chain**
      Malicious Models
      Poisoned Datasets
      Compromised Frameworks
      Namespace Hijacking
    **Agentic Risks**
      Excessive Agency
      Tool Misuse
      Confused Deputy
      Privilege Escalation
      Auto-exploit
    **Infra & Ops**
      Unbounded Consumption
      API Key Exposure
      Model Denial of Service
      Prompt/Data Logging

1. Prompt Injection

The most discussed and arguably most dangerous LLM vulnerability. An attacker crafts input that causes the model to deviate from its intended instructions.

  • Direct prompt injection: The user directly instructs the model to ignore its system prompt or perform unauthorized actions.
  • Indirect prompt injection: Malicious instructions are embedded in external content (web pages, documents, emails) that the LLM retrieves and processes.

2. Data and Model Poisoning

Corrupting the training data or fine-tuning data to introduce backdoors, biases, or malicious behaviors into the model. This can happen at the pre-training, fine-tuning, or RAG data layer.

3. Model Theft and Extraction

Extracting proprietary model weights, architecture, or training data through:

  • Repeated queries to reverse-engineer model behavior
  • Side-channel attacks on inference infrastructure
  • Insider threats with access to model artifacts

4. Supply Chain Attacks

Compromising upstream components: malicious models on Hugging Face, backdoored fine-tuning datasets, compromised inference frameworks, or poisoned vector database content.

5. Privacy Attacks

Extracting sensitive information that the model memorized during training:

  • Training data extraction (membership inference, data extraction attacks)
  • System prompt leakage
  • PII exposure in generated outputs

6. Misuse and Abuse

Using LLMs to generate harmful content at scale:

  • Deepfakes and synthetic media
  • Phishing and social engineering content
  • Malware generation
  • Disinformation campaigns

7. Agent Exploitation

Targeting LLM-based agents that have tool access:

  • Hijacking agent actions through prompt injection
  • Exploiting excessive permissions (confused deputy attacks)
  • Chaining multiple tool calls to achieve unauthorized objectives

The CIA Triad Applied to AI Systems

The classic CIA triad — Confidentiality, Integrity, and Availability — maps directly to AI/LLM security, but with AI-specific nuances:

Confidentiality

TraditionalAI-Specific
Protect databases and files from unauthorized accessPrevent extraction of training data, system prompts, and conversation history
Encrypt data at rest and in transitProtect model weights as intellectual property
Access control on sensitive resourcesPrevent PII leakage through generated outputs
Ensure RAG-retrieved documents respect authorization boundaries
Protect embedding vectors from inversion attacks

Integrity

TraditionalAI-Specific
Prevent unauthorized modification of dataPrevent data/model poisoning that corrupts model behavior
Ensure code hasn’t been tampered withVerify model provenance and supply chain integrity
Input validation to prevent injection attacksDefend against prompt injection that manipulates outputs
Prevent hallucinations that present false information as fact
Ensure RAG pipeline retrieves authentic, untampered documents

Availability

TraditionalAI-Specific
Protect against DDoSPrevent model denial of service via resource-intensive prompts
Ensure system uptimeProtect against unbounded consumption (token exhaustion, compute abuse)
Capacity planning and scalingRate limiting on inference APIs
Prevent model extraction attacks that degrade service quality
Guard against adversarial inputs that cause inference failures

Attack Surface of LLM Applications

An LLM application has a layered attack surface. Understanding each layer is critical for building a comprehensive defense strategy.

graph TD
    USER["USER / ATTACKER"]

    USER --> L1

    subgraph L1["LAYER 1: INPUT LAYER"]
        L1_DESC["User prompts (text, images, audio, files)<br/>API requests with prompt parameters<br/>Conversation history and session context<br/>Multi-modal inputs (images with hidden text, audio commands)"]
        L1_ATK["Attacks: Direct prompt injection, jailbreaking,<br/>adversarial images, resource exhaustion via long inputs"]
    end

    L1 --> L2

    subgraph L2["LAYER 2: RETRIEVAL LAYER (RAG)"]
        L2_DESC["Document ingestion pipeline<br/>Embedding model (converts text to vectors)<br/>Vector database (stores and searches embeddings)<br/>Retrieved context (documents injected into prompt)"]
        L2_ATK["Attacks: Indirect prompt injection via documents,<br/>RAG poisoning, embedding inversion,<br/>vector DB manipulation, document metadata injection"]
    end

    L2 --> L3

    subgraph L3["LAYER 3: ORCHESTRATION / AGENTIC LAYER"]
        L3_DESC["System prompt / instruction template<br/>Tool definitions and function schemas<br/>Chain-of-thought / planning logic<br/>Memory systems (short-term and long-term)<br/>Multi-agent communication"]
        L3_ATK["Attacks: System prompt extraction, tool misuse via injection,<br/>excessive agency, confused deputy, memory poisoning,<br/>inter-agent prompt injection"]
    end

    L3 --> L4

    subgraph L4["LAYER 4: MODEL LAYER"]
        L4_DESC["Model weights and parameters<br/>Fine-tuning data and process<br/>Alignment / safety training (RLHF, Constitutional AI)<br/>Inference configuration (temperature, top-p, max tokens)"]
        L4_ATK["Attacks: Model theft/extraction, training data extraction,<br/>fine-tuning poisoning, jailbreak bypass of alignment,<br/>membership inference"]
    end

    L4 --> L5

    subgraph L5["LAYER 5: INFRASTRUCTURE / RUNTIME LAYER"]
        L5_DESC["API gateway and authentication<br/>Logging and monitoring pipeline<br/>Cloud platform (Azure, AWS, GCP)<br/>GPU/TPU compute resources<br/>Network configuration and egress controls"]
        L5_ATK["Attacks: API key theft, log injection, side-channel attacks<br/>on GPU memory, resource exhaustion / billing attacks,<br/>insufficient egress filtering"]
    end

Defense-in-Depth for AI Systems

No single layer of defense is sufficient. A robust AI security posture requires controls at every layer:

  1. Input layer: Input filtering, content moderation, rate limiting, multi-modal scanning
  2. Retrieval layer: Document provenance verification, embedding monitoring, access-control-aware retrieval
  3. Orchestration layer: Least-privilege tool permissions, human-in-the-loop for sensitive actions, output filtering before tool execution
  4. Model layer: Model provenance verification, alignment testing, red-teaming, guardrail models
  5. Infrastructure layer: Standard cloud security controls, API authentication, logging, network segmentation, cost controls

Emerging Regulatory Landscape

AI security is increasingly shaped by regulation:

FrameworkScopeKey Requirements
EU AI Act (2024)EU-wide, risk-based regulationRisk classification, transparency, conformity assessments for high-risk AI
NIST AI Risk Management FrameworkUS voluntary frameworkGovernance, mapping, measuring, and managing AI risks
ISO/IEC 42001International standardAI management system requirements
MITRE ATLASKnowledge baseAdversarial threat landscape for AI systems, ATT&CK-style framework
OWASP Top 10 for LLMsApplication security guidanceTop 10 risks specific to LLM applications (covered in the next section)
Executive Order 14110 (US, 2023)US policySafety and security requirements for frontier AI models

Key Takeaways

  1. AI security is not just “infosec + AI” — it requires understanding fundamentally new threat categories that don’t exist in traditional software.

  2. The attack surface is layered — from user input through retrieval, orchestration, model, and infrastructure. Each layer requires distinct security controls.

  3. Natural language is the new attack vector — prompt injection is possible precisely because LLMs cannot reliably distinguish instructions from data. This is an unsolved problem.

  4. The barrier to exploitation is lower than ever — many AI attacks can be executed in plain English, without any programming or security expertise.

  5. Defense-in-depth is essential — no single guardrail, filter, or alignment technique is sufficient. Security must be applied at every layer of the AI application stack.

  6. The regulatory environment is evolving rapidly — organizations deploying AI must track emerging requirements from the EU AI Act, NIST AI RMF, and sector-specific guidance.


References