Introduction
Security and compliance are the topics founders most often defer and the ones that most often block the first enterprise deal. For AI products specifically, the standard controls (encryption, RBAC, audit logs) need to be extended with new ones (prompt injection defenses, data exfiltration review, vendor data flow documentation). This guide covers what HIPAA, GDPR, and SOC 2 each require, and how LLMs change the picture.
HIPAA — Business Associate Agreements and the LLM data flow
HIPAA applies if your product handles Protected Health Information (PHI) for U.S. healthcare entities or their business associates. The core requirements are a Business Associate Agreement (BAA) with every vendor touching PHI, minimum-necessary access, encryption in transit and at rest, access audit logs, breach notification procedures, and workforce training.
For AI specifically: AWS Bedrock, Azure OpenAI, and direct Anthropic and OpenAI BAAs (for qualifying accounts) all support HIPAA workloads. Budget 4–8 weeks and $30k–$80k for first-time HIPAA readiness. Third-party assessments (HITRUST) add substantially more.
- BAA with every PHI-touching vendor, including LLM provider
- AWS Bedrock, Azure OpenAI, direct vendor BAAs are all viable
- Budget: 4–8 weeks and $30k–$80k for first-time HIPAA readiness
GDPR — lawful basis, retention, and the right to opt out
GDPR applies to any product used by EU residents. For AI, the specific angles are: a lawful basis for processing personal data with an LLM, clear retention policies covering prompts and outputs and vector stores, the right to opt out of model-training feedback loops, and data-subject access that includes any personal data stored as model context or embeddings.
Practical implementation: document a lawful basis per processing activity, default to provider training opt-out, set retention for prompt/response logs, and build a data-subject access workflow that can surface embeddings and stored context along with more typical records.
- Document lawful basis per processing activity
- Default to training opt-out with your LLM provider
- Set retention on prompt/response logs and vector stores
- Data-subject access must include embeddings and stored context
SOC 2 — trust services criteria meet AI controls
SOC 2 Type II is the most common enterprise SaaS gate. Expect 4–6 months of observation, Drata or Vanta implementation, a complete infosec policy set, and a formal auditor. Budget $40k–$80k all-in for a first-time audit.
AI-specific additions to standard SOC 2 controls: model access audit (who queried what), prompt injection defenses, data leakage reviews on tool outputs, and documentation of AI vendor data handling. These bolt onto the standard controls rather than replacing them.
- 4–6 months observation period for Type II
- Drata/Vanta + full policy set + formal auditor
- Budget: $40k–$80k first-time audit
- AI additions: model access audit, prompt injection defenses, leakage review
Prompt injection and data exfiltration
Prompt injection is the AI-specific attack surface. Any agent with tool access can be tricked into misbehaving by crafted input in retrieved content, user messages, or tool outputs. Mitigations: strict tool allowlists with parameter validation, structured output schemas, rate limits per-user and per-tool, clear separation between trusted and untrusted text in prompts, and logs you can actually search for post-incident forensics.
Data exfiltration risk is adjacent: an agent with web access and a tool that can make outbound HTTP requests is a prompt-injection-exfil engine. Lock down network egress, whitelist target domains, and require human approval for any agent action crossing a trust boundary.
- Allowlist tools strictly and validate every parameter
- Separate trusted vs untrusted text clearly in prompts
- Rate-limit per-user and per-tool; log everything
- Lock down outbound network egress for tool-using agents
PII handling — redaction and access boundaries
Minimize PII flowing into LLM calls. Redact before the call, not after: strip SSN, credit cards, emails, and other identifiers server-side before constructing the prompt. For cases where the model genuinely needs PII (patient intake, legal intake), tokenize: replace real PII with placeholders, let the model work on placeholders, swap real values back in on the client side.
Enforce per-tenant access boundaries at the retrieval layer, not the prompt layer. A tenant should never be able to construct a prompt that retrieves another tenant's documents, even in principle.
- Redact PII before prompt construction, not after
- Tokenize when model genuinely needs PII context
- Enforce tenant boundaries at retrieval, not prompt, layer
Model governance and the audit trail
Every model interaction relevant to a regulated decision must be auditable. Log the prompt template version, the system prompt, the tool set, the model version, the raw response, and the final decision. Retain logs per your regulatory retention policy (HIPAA: 6 years minimum; many healthcare customers prefer 7).
Model version changes are regulated events in some jurisdictions. Document the change, revalidate with evals, and maintain a rollback path. 'We upgraded to the latest model' is not an acceptable audit answer.
- Log prompt template version, system prompt, tools, model, response, decision
- Retain logs per regulatory policy (HIPAA: 6+ years)
- Treat model version changes as regulated events
Conclusion
Compliance is not a feature you add at the end; it is a set of architectural choices you make at the start. Plan for HIPAA with the right vendor BAAs, for GDPR with the right data handling and retention, for SOC 2 with the right documented controls, and for AI-specific threats with allowlists, validation, and audit trails. Teams that treat compliance as an architectural discipline spend a fraction of what teams who retrofit pay.
