The September Deadline That Changed Everything
In late June 2025, our Chief Legal Officer dropped a bomb during the quarterly risk review: “We have 90 days to achieve full EU AI Act compliance, or we shut down all AI systems in our European operations.”
The implications were staggering. Our European business represented 32% of company revenue—approximately $240M annually. We had deployed 23 AI models across customer service, fraud detection, credit scoring, and marketing personalization. None of them had proper governance documentation. None had undergone formal risk assessments. We were operating in regulatory limbo.
After reading the comprehensive Q4 2025 AI governance guide, I realized the scope of work ahead. This is the complete story of our 90-day compliance sprint—what worked, what failed, and the $180K we spent to avoid a $240M problem.
Week 1-2: The Horror of Discovery
Before we could comply, we needed to know exactly what AI systems we were running. This turned out to be far more complex than anyone anticipated.
The AI System Inventory
We formed a “tiger team” of 5 people:
- Me (VP of Engineering)
- AI/ML Tech Lead
- Data Governance Manager
- Information Security Lead
- Legal Counsel (specialized in AI regulation)
Our first task: complete inventory of all AI systems. The results were shocking.
# ai_inventory_discovery.py - Script to find AI usage across infrastructure
import subprocess
import json
from collections import defaultdict
def scan_kubernetes_deployments():
"""Scan K8s for AI model deployments"""
result = subprocess.run(
['kubectl', 'get', 'deployments', '--all-namespaces', '-o', 'json'],
capture_output=True, text=True
)
deployments = json.loads(result.stdout)
ai_deployments = []
for deployment in deployments['items']:
# Look for AI-related labels or annotations
labels = deployment.get('metadata', {}).get('labels', {})
annotations = deployment.get('metadata', {}).get('annotations', {})
if any(keyword in str(labels).lower() or keyword in str(annotations).lower()
for keyword in ['ml', 'model', 'ai', 'inference', 'predict']):
ai_deployments.append({
'name': deployment['metadata']['name'],
'namespace': deployment['metadata']['namespace'],
'replicas': deployment['spec']['replicas'],
'labels': labels
})
return ai_deployments
def scan_api_calls():
"""Check for external AI API usage from logs"""
# Scan CloudWatch logs for OpenAI, Anthropic, etc. API calls
ai_services = ['openai', 'anthropic', 'cohere', 'huggingface']
usage = defaultdict(list)
for service in ai_services:
# Query CloudWatch Insights
logs = query_cloudwatch(f'fields @message | filter @message like /{service}/i')
if logs:
usage[service] = logs
return usage
def scan_sagemaker_endpoints():
"""Find AWS SageMaker model endpoints"""
import boto3
client = boto3.client('sagemaker')
endpoints = client.list_endpoints()['Endpoints']
return [
{
'name': ep['EndpointName'],
'status': ep['EndpointStatus'],
'created': ep['CreationTime']
}
for ep in endpoints
]
# Run discovery
print("🔍 Scanning for AI systems...")
k8s_ai = scan_kubernetes_deployments()
api_usage = scan_api_calls()
sagemaker = scan_sagemaker_endpoints()
print(f"\n📊 Discovery Results:")
print(f" • Kubernetes AI deployments: {len(k8s_ai)}")
print(f" • External AI API services: {len(api_usage)}")
print(f" • SageMaker endpoints: {len(sagemaker)}")
Discovery Results:
- 23 AI models deployed across infrastructure
- 47 different services calling AI APIs
- 8 “shadow AI” projects nobody knew about (marketing team using ChatGPT plugins)
- 12 third-party SaaS tools with embedded AI that counted as AI systems under EU rules
The actual count: 90 AI systems requiring governance. We thought we had 23.
The Budget Reality Check
Initial cost estimation for compliance:
Personnel (90 days):
- 5 tiger team members (50% allocation): $125,000
- External AI governance consultants: $45,000
- Legal counsel (specialized): $35,000
Technology & Tools:
- AI governance platform (ModelOps): $18,000
- Documentation tooling: $8,000
- Audit trail infrastructure: $12,000
Training & Education:
- Team training on EU AI Act: $15,000
- Department-wide AI literacy: $22,000
TOTAL: $280,000
CFO approved $180,000. We had to be efficient.
Week 3-4: Risk Classification Framework
The EU AI Act categorizes AI systems into risk levels:
- Unacceptable risk: Banned (e.g., social scoring)
- High risk: Strict requirements (e.g., credit scoring, hiring)
- Limited risk: Transparency obligations
- Minimal risk: No obligations
We needed to classify all 90 systems.
Risk Assessment Methodology
We built a classification framework based on EU guidance:
# risk_classifier.py
from enum import Enum
from typing import List, Dict
class RiskLevel(Enum):
UNACCEPTABLE = "unacceptable"
HIGH = "high"
LIMITED = "limited"
MINIMAL = "minimal"
class AISystemClassifier:
"""Classify AI systems according to EU AI Act"""
HIGH_RISK_DOMAINS = [
'credit_scoring',
'employment_hiring',
'essential_services_access',
'law_enforcement',
'border_control',
'justice_administration',
'democratic_processes'
]
HIGH_RISK_USES = [
'biometric_identification',
'critical_infrastructure',
'education_assessment',
'worker_performance_evaluation'
]
def classify_system(
self,
system_name: str,
domain: str,
purpose: str,
decision_making_role: str,
user_facing: bool,
data_used: List[str]
) -> tuple[RiskLevel, str]:
"""
Classify AI system risk level with justification.
Args:
system_name: Name of AI system
domain: Business domain (credit, hiring, etc.)
purpose: What the system does
decision_making_role: automated, assisted, informational
user_facing: Whether users interact directly
data_used: Types of data processed
Returns:
(risk_level, justification)
"""
# Check for unacceptable uses
if self._is_unacceptable(purpose, data_used):
return (
RiskLevel.UNACCEPTABLE,
"System uses prohibited AI practices"
)
# Check high-risk criteria
if domain in self.HIGH_RISK_DOMAINS:
return (
RiskLevel.HIGH,
f"System operates in high-risk domain: {domain}"
)
if any(use in purpose.lower() for use in self.HIGH_RISK_USES):
return (
RiskLevel.HIGH,
"System performs high-risk function"
)
if decision_making_role == 'automated' and self._affects_legal_rights(domain):
return (
RiskLevel.HIGH,
"System makes automated decisions affecting legal rights"
)
# Check limited risk criteria
if user_facing and decision_making_role == 'assisted':
return (
RiskLevel.LIMITED,
"System assists human decision-making with user interaction"
)
# Default to minimal risk
return (
RiskLevel.MINIMAL,
"System does not meet higher risk criteria"
)
def _is_unacceptable(self, purpose: str, data_used: List[str]) -> bool:
"""Check if system uses prohibited practices"""
prohibited = [
'social scoring',
'subliminal manipulation',
'exploitation of vulnerabilities',
'real-time biometric identification' # with exceptions
]
return any(term in purpose.lower() for term in prohibited)
def _affects_legal_rights(self, domain: str) -> bool:
"""Check if domain affects legal rights or access to services"""
legal_impact_domains = [
'credit_scoring',
'employment_hiring',
'insurance_underwriting',
'housing_access',
'education_admission'
]
return domain in legal_impact_domains
# Run classification on all systems
classifier = AISystemClassifier()
systems_to_classify = [
{
'name': 'Credit Risk Model v3',
'domain': 'credit_scoring',
'purpose': 'Automated credit approval decisions',
'decision_making_role': 'automated',
'user_facing': False,
'data_used': ['financial_history', 'demographics']
},
{
'name': 'Customer Service Chatbot',
'domain': 'customer_support',
'purpose': 'Answer customer questions, route to agents',
'decision_making_role': 'assisted',
'user_facing': True,
'data_used': ['conversation_history', 'account_data']
},
{
'name': 'Marketing Personalization Engine',
'domain': 'marketing',
'purpose': 'Recommend products based on behavior',
'decision_making_role': 'informational',
'user_facing': True,
'data_used': ['browsing_history', 'purchase_history']
}
# ... 87 more systems
]
results = {}
for system in systems_to_classify:
risk_level, justification = classifier.classify_system(
system['name'],
system['domain'],
system['purpose'],
system['decision_making_role'],
system['user_facing'],
system['data_used']
)
results[system['name']] = {
'risk_level': risk_level.value,
'justification': justification
}
Classification Results:
- High risk: 17 systems (credit scoring, hiring, fraud detection with automated decisions)
- Limited risk: 28 systems (chatbots, recommendation engines, assisted decision-making)
- Minimal risk: 45 systems (analytics, internal tools, non-user-facing)
- Unacceptable: 0 systems (fortunately)
The 17 high-risk systems became our primary focus.
Week 5-7: High-Risk System Documentation
EU AI Act requires extensive documentation for high-risk systems. Each system needs:
- Technical documentation (architecture, training data, performance metrics)
- Risk assessment (potential harms, mitigation measures)
- Data governance (data sources, quality, bias testing)
- Human oversight mechanisms
- Transparency information (for users)
- Conformity assessment (third-party audit)
Documentation Template
We created a standardized template:
# AI System Documentation Template (EU AI Act Compliance)
## 1. System Overview
**System Name:** [Name]
**Risk Classification:** [High/Limited/Minimal]
**Business Purpose:** [Description]
**Deployment Date:** [Date]
**Current Version:** [Version]
## 2. Technical Specifications
### 2.1 Architecture
- Model Type: [e.g., Random Forest, Neural Network, LLM]
- Model Size: [Parameters/features]
- Infrastructure: [Where deployed]
- Input/Output Schema: [Data formats]
### 2.2 Training Data
- Data Sources: [List all sources]
- Data Volume: [Number of samples]
- Data Collection Period: [Date range]
- Data Quality Assurance: [Validation methods]
- Bias Testing: [Methods and results]
### 2.3 Performance Metrics
- Accuracy: [%]
- Precision: [%]
- Recall: [%]
- F1 Score: [Value]
- False Positive Rate: [%]
- False Negative Rate: [%]
### 2.4 Known Limitations
- [List limitations and edge cases]
## 3. Risk Assessment
### 3.1 Potential Harms
| Risk Category | Severity | Likelihood | Mitigation |
|--------------|----------|------------|------------|
| Discriminatory outcomes | High | Medium | Bias testing, fairness constraints |
| Privacy violations | High | Low | Data minimization, anonymization |
| System failures | Medium | Low | Redundancy, fallbacks |
### 3.2 Mitigation Measures
- [Detailed mitigation implementations]
## 4. Human Oversight
### 4.1 Human Review Process
- **Review Trigger:** [When humans review]
- **Review Frequency:** [How often]
- **Review Authority:** [Who can override]
- **Escalation Path:** [Decision escalation]
### 4.2 Override Mechanisms
- [How humans can override AI decisions]
## 5. Transparency & User Rights
### 5.1 User Notification
- [How users are informed about AI usage]
### 5.2 Explanation Rights
- [How users can request explanations]
### 5.3 Complaint Process
- [How users can contest decisions]
## 6. Monitoring & Maintenance
### 6.1 Performance Monitoring
- Real-time monitoring: [Tools/dashboards]
- Alert thresholds: [Metrics and thresholds]
### 6.2 Model Retraining
- Retraining frequency: [Schedule]
- Retraining triggers: [Drift detection]
### 6.3 Incident Response
- [Incident handling procedures]
## 7. Audit Trail
### 7.1 Logging Requirements
- Decision logs: [What's logged]
- Retention period: [How long]
- Access controls: [Who can access]
### 7.2 Audit Readiness
- Last audit date: [Date]
- Next scheduled audit: [Date]
## 8. Third-Party Components
### 8.1 Dependencies
| Component | Vendor | Purpose | Compliance Status |
|-----------|--------|---------|-------------------|
| [Name] | [Vendor] | [Purpose] | [Verified/Pending] |
## 9. Compliance Sign-Off
- **Technical Lead:** [Name, Date, Signature]
- **Legal Counsel:** [Name, Date, Signature]
- **Risk Officer:** [Name, Date, Signature]
- **Data Protection Officer:** [Name, Date, Signature]
## 10. Version History
| Version | Date | Changes | Approver |
|---------|------|---------|----------|
| 1.0 | [Date] | Initial documentation | [Name] |
The Documentation Crunch
Documenting 17 high-risk systems in 3 weeks was brutal. We divided work:
- Week 5: Credit scoring systems (3 models) - highest regulatory risk
- Week 6: Hiring/HR systems (4 models) - second priority
- Week 7: Fraud detection (10 models) - volume challenge
Time per system: 16-20 hours average Total effort: ~300 hours Team size: 5 people working 50% time
Failure #1: Incomplete Training Data Documentation
Our first major failure: we couldn’t fully document training data provenance for 6 of our models.
These models were trained 2-3 years ago. The data scientists who built them had left the company. Training data was stored in S3 buckets with cryptic names. No metadata. No documentation.
The problem:
# What we found in S3
s3://ml-training-data-prod/
├── dataset-v2-20220315.parquet # 450GB, no metadata
├── dataset-v3-20220822.parquet # 380GB, no metadata
├── features-20230104.csv # 12GB, no metadata
└── labels-final.csv # 800MB, no metadata
# What we needed to know:
# - What data sources contributed to each file?
# - Were there demographic fields that could create bias?
# - How was data cleaned and preprocessed?
# - What was the train/test split?
Our solution:
- Reverse engineering: Analyzed model predictions to infer what features were used
- Statistical analysis: Compared dataset distributions to known source systems
- Expert interviews: Tracked down former employees for context
- Conservative documentation: Clearly stated gaps in knowledge
Cost of this failure: 80 additional hours, $12K in consultant fees
Lesson: Implement data lineage tracking from day one. We now use:
- MLflow for experiment tracking
- AWS Glue Data Catalog for data provenance
- Mandatory documentation as part of model approval
Week 8-10: Implementing Human Oversight
High-risk AI systems require “meaningful human oversight.” This meant redesigning our decision flows.
Credit Scoring System Redesign
Before EU AI Act:
Customer Application → AI Model → Automated Decision → Notify Customer
After EU AI Act:
Customer Application → AI Model → Risk Score
↓
[If score borderline OR requested review]
↓
Human Underwriter Review → Final Decision
↓
Notify Customer + Explanation
Implementing Review Thresholds
# credit_decision_flow.py
from enum import Enum
from dataclasses import dataclass
from typing import Optional
class DecisionType(Enum):
AUTO_APPROVE = "auto_approve"
AUTO_DENY = "auto_deny"
HUMAN_REVIEW = "human_review"
@dataclass
class CreditDecision:
applicant_id: str
risk_score: float # 0-1000
decision_type: DecisionType
decision: Optional[str] # approved/denied
confidence: float
explanation: dict
requires_review: bool
review_reason: Optional[str]
def make_credit_decision(
applicant_id: str,
features: dict,
model,
thresholds: dict
) -> CreditDecision:
"""
Make credit decision with EU AI Act compliance.
Implements human oversight requirements:
- Auto-approve only for high-confidence approvals
- Human review for borderline cases
- Explanation generation for all decisions
"""
# Get model prediction
risk_score = model.predict_proba(features)[0][1] * 1000
confidence = model.predict_confidence(features)
# Generate explanation using SHAP
explanation = generate_shap_explanation(model, features)
# Decision logic with human oversight triggers
if risk_score >= thresholds['auto_approve'] and confidence >= 0.85:
# High score, high confidence → auto-approve
return CreditDecision(
applicant_id=applicant_id,
risk_score=risk_score,
decision_type=DecisionType.AUTO_APPROVE,
decision='approved',
confidence=confidence,
explanation=explanation,
requires_review=False,
review_reason=None
)
elif risk_score <= thresholds['auto_deny'] and confidence >= 0.85:
# Low score, high confidence → auto-deny
return CreditDecision(
applicant_id=applicant_id,
risk_score=risk_score,
decision_type=DecisionType.AUTO_DENY,
decision='denied',
confidence=confidence,
explanation=explanation,
requires_review=True, # All denials require review
review_reason='Automated denial requires human verification'
)
else:
# Borderline score OR low confidence → human review
review_reason = []
if thresholds['auto_deny'] < risk_score < thresholds['auto_approve']:
review_reason.append('Borderline risk score')
if confidence < 0.85:
review_reason.append(f'Low confidence ({confidence:.2f})')
return CreditDecision(
applicant_id=applicant_id,
risk_score=risk_score,
decision_type=DecisionType.HUMAN_REVIEW,
decision=None, # Pending human review
confidence=confidence,
explanation=explanation,
requires_review=True,
review_reason='; '.join(review_reason)
)
def generate_shap_explanation(model, features):
"""Generate SHAP-based explanation for decision"""
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(features)
# Get top 5 features influencing decision
feature_importance = sorted(
zip(features.keys(), shap_values[0]),
key=lambda x: abs(x[1]),
reverse=True
)[:5]
return {
'top_factors': [
{
'feature': feat,
'impact': impact,
'direction': 'increases' if impact > 0 else 'decreases'
}
for feat, impact in feature_importance
]
}
Human Review Dashboard
We built a review dashboard for underwriters:
// review_dashboard.tsx
import React, { useState, useEffect } from 'react';
import { Card, Button, Badge, Table } from '@/components/ui';
interface CreditApplication {
id: string;
applicant_name: string;
risk_score: number;
confidence: number;
decision_type: string;
review_reason: string;
submitted_at: string;
ai_recommendation: string;
explanation: {
top_factors: Array<{
feature: string;
impact: number;
direction: string;
}>;
};
}
export function CreditReviewDashboard() {
const [applications, setApplications] = useState<CreditApplication[]>([]);
const [selectedApp, setSelectedApp] = useState<CreditApplication | null>(null);
useEffect(() => {
// Fetch applications pending review
fetchPendingReviews();
}, []);
const handleApprove = async (appId: string, notes: string) => {
await fetch(`/api/credit/review/${appId}/approve`, {
method: 'POST',
body: JSON.stringify({
reviewer_notes: notes,
override: selectedApp?.ai_recommendation !== 'approved'
})
});
// Refresh list
fetchPendingReviews();
};
const handleDeny = async (appId: string, reason: string) => {
await fetch(`/api/credit/review/${appId}/deny`, {
method: 'POST',
body: JSON.stringify({
denial_reason: reason,
override: selectedApp?.ai_recommendation !== 'denied'
})
});
fetchPendingReviews();
};
return (
<div className="p-6">
<h1 className="text-2xl font-bold mb-4">
Credit Applications Pending Review
</h1>
<div className="grid grid-cols-2 gap-4">
{/* Applications list */}
<Card>
<Table>
<thead>
<tr>
<th>Applicant</th>
<th>Risk Score</th>
<th>AI Recommendation</th>
<th>Review Reason</th>
<th>Actions</th>
</tr>
</thead>
<tbody>
{applications.map(app => (
<tr key={app.id} onClick={() => setSelectedApp(app)}>
<td>{app.applicant_name}</td>
<td>{app.risk_score}</td>
<td>
<Badge variant={
app.ai_recommendation === 'approved' ? 'success' : 'danger'
}>
{app.ai_recommendation}
</Badge>
</td>
<td>{app.review_reason}</td>
<td>
<Button size="sm">Review</Button>
</td>
</tr>
))}
</tbody>
</Table>
</Card>
{/* Application details */}
{selectedApp && (
<Card>
<h2 className="text-xl font-semibold mb-4">
Application Details
</h2>
<div className="space-y-4">
<div>
<h3 className="font-medium">AI Analysis</h3>
<p>Risk Score: {selectedApp.risk_score}</p>
<p>Confidence: {(selectedApp.confidence * 100).toFixed(1)}%</p>
<p>Recommendation: {selectedApp.ai_recommendation}</p>
</div>
<div>
<h3 className="font-medium">Key Factors</h3>
<ul>
{selectedApp.explanation.top_factors.map((factor, idx) => (
<li key={idx}>
<strong>{factor.feature}</strong>: {factor.direction} risk
(impact: {factor.impact.toFixed(2)})
</li>
))}
</ul>
</div>
<div className="flex gap-2">
<Button variant="success" onClick={() => handleApprove(selectedApp.id, '')}>
Approve
</Button>
<Button variant="danger" onClick={() => handleDeny(selectedApp.id, '')}>
Deny
</Button>
</div>
</div>
</Card>
)}
</div>
</div>
);
}
Results:
- 23% of applications now go to human review (was 5% before)
- Average review time: 8 minutes per application
- Override rate: 12% (humans disagree with AI 12% of the time)
- Cost: 2 additional underwriters ($140K/year)
Failure #2: Underwriter Pushback
Underwriters initially hated the new system. They felt:
- AI was making them rubber-stamps
- Review workload increased 4x
- They weren’t trusted to make decisions
Solution: Changed the framing entirely.
Before: “Review AI decisions” After: “AI assists YOUR decisions”
We repositioned the UI to emphasize human authority:
- Underwriters see AI recommendation as one input among many
- They have explicit override authority with no explanation needed
- Dashboard shows their override rate as a feature, not a bug
Cultural shift took 3 weeks, but satisfaction scores improved from 2.1/5 to 4.2/5.
Week 11-12: Audit Trail & Monitoring
EU AI Act requires comprehensive logging of all high-risk AI decisions for audit purposes.
Logging Infrastructure
# ai_audit_logger.py
import json
import hashlib
from datetime import datetime
from typing import Any, Dict
import boto3
class AIAuditLogger:
"""
Comprehensive audit logging for AI systems.
Logs all decisions with tamper-proof hashing for compliance.
"""
def __init__(self, s3_bucket: str, dynamodb_table: str):
self.s3 = boto3.client('s3')
self.dynamodb = boto3.resource('dynamodb')
self.table = self.dynamodb.Table(dynamodb_table)
self.s3_bucket = s3_bucket
def log_decision(
self,
system_id: str,
decision_id: str,
input_data: Dict[str, Any],
model_output: Dict[str, Any],
final_decision: str,
human_review: bool,
reviewer_id: str = None,
override: bool = False,
metadata: Dict[str, Any] = None
) -> str:
"""
Log AI decision with all required context.
Returns:
audit_id: Unique identifier for this log entry
"""
timestamp = datetime.utcnow()
audit_id = f"{system_id}-{decision_id}-{int(timestamp.timestamp())}"
# Create audit log entry
log_entry = {
'audit_id': audit_id,
'system_id': system_id,
'decision_id': decision_id,
'timestamp': timestamp.isoformat(),
'input_data_hash': self._hash_data(input_data),
'model_version': metadata.get('model_version'),
'model_output': model_output,
'final_decision': final_decision,
'human_review': human_review,
'reviewer_id': reviewer_id,
'override': override,
'metadata': metadata
}
# Store detailed data in S3 (for long-term retention)
s3_key = f"audit-logs/{system_id}/{timestamp.year}/{timestamp.month:02d}/{audit_id}.json"
self.s3.put_object(
Bucket=self.s3_bucket,
Key=s3_key,
Body=json.dumps({
**log_entry,
'input_data': input_data, # Full input data in S3
}),
ServerSideEncryption='AES256'
)
# Store index in DynamoDB (for fast queries)
self.table.put_item(Item={
'audit_id': audit_id,
'system_id': system_id,
'timestamp': timestamp.isoformat(),
'decision_id': decision_id,
'final_decision': final_decision,
'human_review': human_review,
'override': override,
's3_location': f"s3://{self.s3_bucket}/{s3_key}",
'ttl': int(timestamp.timestamp()) + (7 * 365 * 24 * 60 * 60) # 7 years
})
return audit_id
def _hash_data(self, data: Dict[str, Any]) -> str:
"""Create tamper-proof hash of input data"""
data_str = json.dumps(data, sort_keys=True)
return hashlib.sha256(data_str.encode()).hexdigest()
def query_logs(
self,
system_id: str = None,
start_date: datetime = None,
end_date: datetime = None,
decision_type: str = None
) -> list:
"""Query audit logs for compliance reporting"""
# Query DynamoDB with filters
# Return relevant log entries
pass
Storage costs:
- DynamoDB index: $180/month
- S3 long-term storage: $450/month
- Total: $630/month ongoing
Retention: 7 years (EU requirement)
Real-Time Monitoring Dashboard
We built Grafana dashboards tracking:
-- AI System Health Metrics
-- Decision volume
SELECT
system_id,
COUNT(*) as total_decisions,
SUM(CASE WHEN human_review THEN 1 ELSE 0 END) as human_reviews,
SUM(CASE WHEN override THEN 1 ELSE 0 END) as overrides,
AVG(CASE WHEN confidence IS NOT NULL THEN confidence ELSE NULL END) as avg_confidence
FROM ai_audit_logs
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY system_id;
-- Drift detection
SELECT
system_id,
DATE_TRUNC('hour', timestamp) as hour,
AVG(CASE WHEN final_decision = 'approved' THEN 1 ELSE 0 END) as approval_rate,
COUNT(*) as volume
FROM ai_audit_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY system_id, DATE_TRUNC('hour', timestamp)
ORDER BY hour DESC;
-- Alert on anomalies
SELECT
system_id,
COUNT(*) as decisions_last_hour,
(
SELECT AVG(count)
FROM (
SELECT COUNT(*) as count
FROM ai_audit_logs
WHERE system_id = a.system_id
AND timestamp > NOW() - INTERVAL '7 days'
AND timestamp < NOW() - INTERVAL '1 hour'
GROUP BY DATE_TRUNC('hour', timestamp)
) baseline
) as baseline_avg
FROM ai_audit_logs a
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY system_id
HAVING COUNT(*) > 2 * (
SELECT AVG(count)
FROM (
SELECT COUNT(*) as count
FROM ai_audit_logs
WHERE system_id = a.system_id
AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY DATE_TRUNC('hour', timestamp)
) baseline
);
Week 13: Third-Party Vendor Compliance
One of our biggest challenges: third-party AI vendors.
We used 12 SaaS tools with embedded AI:
- Salesforce Einstein (sales predictions)
- HubSpot AI (marketing automation)
- Zendesk Answer Bot (customer support)
- Workday ML (HR analytics)
- … 8 others
The problem: Under EU AI Act, we’re liable for their AI systems if they’re part of our high-risk applications.
Vendor Compliance Questionnaire
We sent this to all vendors:
# AI Vendor Compliance Questionnaire (EU AI Act)
## Section 1: AI System Classification
1. Does your product use AI or machine learning? [Yes/No]
2. If yes, describe the AI functionality:
3. What decisions does the AI system make?
4. Is the AI system's output:
- Fully automated decisions
- Recommendations to humans
- Information only
## Section 2: Risk Assessment
1. Could your AI system affect:
- Access to essential services (credit, insurance, etc.)
- Employment or worker management
- Education or training
- Law enforcement or justice
[Yes/No for each]
2. Does your AI system process:
- Biometric data
- Special category data (race, health, etc.)
- Data about minors
[Yes/No for each]
## Section 3: Documentation
1. Can you provide technical documentation including:
- Training data sources and characteristics
- Model architecture and validation methodology
- Performance metrics and limitations
[Yes/No, if yes, please attach]
2. Do you maintain audit trails of AI decisions? [Yes/No]
3. Retention period: [Duration]
## Section 4: Human Oversight
1. Can AI decisions be overridden by humans? [Yes/No]
2. Do you provide explanations for AI outputs? [Yes/No]
3. What level of human control is available:
- Full automation (no human involvement)
- Human-in-the-loop (human approves each decision)
- Human-on-the-loop (human monitors and can intervene)
- Human-in-command (AI only provides recommendations)
## Section 5: Compliance & Certification
1. Are you EU AI Act compliant? [Yes/No/In Progress]
2. Do you have relevant certifications (ISO, SOC2, etc.)? [List]
3. Who is legally responsible for AI system compliance? [You/Us/Shared]
## Section 6: Contractual Terms
1. Can you provide contractual clauses addressing:
- Liability for AI errors
- Compliance with EU AI Act
- Data processing agreements
- Audit rights
[Yes/No, please attach draft]
Results:
- 4 vendors: Full compliance documentation provided
- 5 vendors: Partial compliance, working on gaps
- 3 vendors: Could not provide adequate documentation
Failure #3: Vendor Standoff
Three critical vendors (representing 18% of our AI functionality) couldn’t demonstrate compliance.
Our options:
- Accept risk (potentially millions in fines)
- Replace vendors (6-9 months, huge disruption)
- Build in-house alternatives (expensive, time-consuming)
- Negotiate shared liability
We chose option 4: Negotiated custom contracts with shared liability clauses and vendor commitments to achieve compliance by Q2 2026.
Cost: $35K in additional legal fees, $15K/year more per vendor
Week 14: The Final Push
Last 2 weeks = documentation completion, internal audits, final sign-offs.
Compliance Checklist
# EU AI Act Compliance Checklist
## High-Risk Systems (17 systems)
### Credit Scoring Systems (3 systems)
- [x] Risk assessment completed
- [x] Technical documentation
- [x] Training data provenance documented
- [x] Human oversight implemented
- [x] Audit logging enabled
- [x] User rights procedures established
- [x] Third-party components verified
- [x] Legal review completed
### Hiring/HR Systems (4 systems)
- [x] Risk assessment completed
- [x] Technical documentation
- [x] Bias testing completed
- [x] Human oversight implemented
- [x] Audit logging enabled
- [x] Candidate rights procedures
- [x] Third-party components verified
- [x] Legal review completed
### Fraud Detection (10 systems)
- [x] Risk assessment completed
- [x] Technical documentation
- [x] False positive monitoring
- [x] Human review for account actions
- [x] Audit logging enabled
- [x] Customer appeal process
- [x] Third-party components verified
- [x] Legal review completed
## Limited Risk Systems (28 systems)
- [x] Transparency notices implemented
- [x] User disclosure requirements met
- [x] Documentation completed
## Organizational Requirements
- [x] AI governance policy approved
- [x] Designated AI compliance officer
- [x] Staff training completed (450 employees)
- [x] Incident response procedures
- [x] Regular audit schedule established
- [x] Vendor management processes
- [x] Board-level reporting
## External Requirements
- [ ] Third-party conformity assessment (scheduled Q4)
- [x] Data Protection Impact Assessments
- [x] Fundamental rights impact assessments
The Results
September 30, 2025: We achieved compliance certification.
Final Costs
Actual Spend:
- Personnel (internal): $125,000
- Consultants: $42,000
- Legal: $38,000
- Technology/tools: $22,000
- Training: $18,000
- Vendor negotiations: $35,000
Total: $280,000 (we went over budget by $100K)
CFO wasn’t happy about the overrun, but compared to $240M in European revenue at risk, it was justified.
Ongoing Costs
Monthly:
- Audit logging infrastructure: $630
- Compliance monitoring tools: $1,200
- Additional underwriters: $11,700 ($140K/year ÷ 12)
- Legal counsel (retainer): $3,000
- External audits: $2,500 (annual cost amortized)
Total: ~$19,000/month = $228,000/year
Business Impact
Positive:
- Maintained European operations ($240M revenue protected)
- Improved AI system quality (forced documentation revealed 8 model bugs)
- Better human oversight (12% of AI decisions now get corrected)
- Competitive advantage (compliance before competitors)
Negative:
- 23% of credit applications now take longer (human review)
- Increased operating costs ($228K/year)
- Reduced automation rate from 95% to 77%
Lessons for Your Compliance Journey
1. Start the Inventory Immediately
You have more AI systems than you think. Shadow AI is everywhere. Budget 2-3 weeks just for discovery.
2. Prioritize by Risk and Revenue
Focus on high-risk systems first, then work down. We tackled credit scoring before marketing AI because regulatory risk was higher.
3. Data Lineage is Critical
If you can’t document where training data came from, you’re in trouble. Implement data lineage tracking NOW, before you need it.
4. Human Oversight Costs Real Money
We added 2 FTEs for human review. Factor this into your ROI calculations. Not all AI automation will remain automated.
5. Vendors Are Your Biggest Unknown
Third-party AI creates compliance risk you can’t fully control. Get vendor commitments in writing.
6. Culture Change Takes Time
Underwriters, data scientists, and engineers all need to adapt. Budget time for change management, not just technical implementation.
7. Over-Document
When in doubt, document more. Regulators prefer over-documentation to gaps.
The Road Ahead
We achieved compliance, but this isn’t a one-time project. Ongoing requirements include:
Quarterly:
- Model performance reviews
- Bias testing updates
- Documentation reviews
Annual:
- Third-party conformity assessments ($45K)
- Full system audits
- Training refreshers
Continuous:
- Audit log monitoring
- Drift detection
- Incident response readiness
Final Thoughts
AI governance isn’t optional anymore. The EU AI Act is just the beginning—similar regulations are coming to California (AB 2013), Canada, and other jurisdictions.
The teams that invested in governance early gained:
- Competitive advantage (compliance before competitors)
- Better AI systems (documentation revealed bugs)
- Reduced regulatory risk
- Trust with customers and regulators
The cost is real ($280K upfront, $228K/year ongoing), but compared to the alternative (shutting down AI systems or facing fines), it’s a bargain.
For more on AI governance strategies, read the comprehensive Q4 2025 AI transformation guide.
My advice: Don’t wait until 90 days before the deadline. Start now.
Building AI governance infrastructure? Connect with me on LinkedIn to share experiences and lessons learned.