AI-powered endpoints are expensive to run and attractive targets for abuse. Rate limiting is essential to protect your infrastructure and budget.
Why Rate Limit AI Endpoints?
- Prevent cost overruns from API abuse
- Protect against denial-of-service attacks
- Ensure fair usage among users
- Maintain service availability
In-Memory Rate Limiting
Simple rate limiting for single-server deployments:
// lib/rate-limit.ts
const rateLimitMap = new Map<string, { count: number; resetAt: number }>();
export function checkRateLimit(
key: string,
limit: number,
windowMs: number
): { allowed: boolean; remaining: number } {
const now = Date.now();
const record = rateLimitMap.get(key);
if (!record || record.resetAt < now) {
rateLimitMap.set(key, { count: 1, resetAt: now + windowMs });
return { allowed: true, remaining: limit - 1 };
}
if (record.count >= limit) {
return { allowed: false, remaining: 0 };
}
record.count++;
return { allowed: true, remaining: limit - record.count };
}Using in API Routes
// app/api/ai/generate/route.ts
import { auth } from "@/auth";
import { checkRateLimit } from "@/lib/rate-limit";
export async function POST(request: Request) {
const session = await auth();
if (!session?.user?.id) {
return Response.json({ error: "Unauthorized" }, { status: 401 });
}
// Rate limit: 10 requests per minute per user
const { allowed, remaining } = checkRateLimit(
`ai:${session.user.id}`,
10,
60 * 1000
);
if (!allowed) {
return Response.json(
{ error: "Rate limit exceeded" },
{
status: 429,
headers: { "X-RateLimit-Remaining": "0" },
}
);
}
// Process AI request...
}Production Considerations
- Use Redis or similar for distributed rate limiting
- Implement sliding window algorithms for smoother limits
- Add different tiers for free vs. paid users
- Include retry-after headers in 429 responses
What VibeCheck Detects
- VC-ABUSE-001: AI endpoints without rate limiting
- VC-ABUSE-002: Missing cost controls on expensive operations