Hooking BigShield Into Your LLM Pipeline: Save Tokens, Block Fraud
Step-by-step tutorial for integrating BigShield email validation into LLM and AI application pipelines. Includes Express middleware, async batch processing, and cost savings calculations.
Why LLM Pipelines Need Email Validation
If you are running an AI product with a free tier (or even a generous paid tier), you have a fraud problem. We have seen it across hundreds of LLM platforms: fraudsters sign up with fake emails, burn through free credits, and move on. Rinse and repeat with another fake email.
The math is brutal. A single GPT-4-class API call costs $0.01-0.06 depending on context length. A fraudster who creates 100 accounts and burns the free tier on each might consume $500-2,000 in compute before you notice. Scale that to a fraud ring operating thousands of accounts, and you are looking at tens of thousands per month in wasted tokens.
The solution is simple in concept: validate the email before you spend a single token. In practice, the integration needs to be fast (you do not want to add seconds to your signup flow), reliable (downtime means blocked signups), and smart (you need to catch fraud without blocking real users). This tutorial shows you how to wire BigShield into your LLM pipeline at every level.
Architecture Overview
There are three main integration points for email validation in an LLM pipeline:
- Signup gate: Validate the email when the user creates an account, before they get any API keys or credits
- Request middleware: Re-validate on each API request (using cached results) to catch accounts that were initially clean but later flagged
- Batch processing: Async validation for bulk user imports, waitlist processing, or periodic re-evaluation of your user base
Let's implement each one. If you have not set up BigShield yet, our zero-to-hero implementation guide covers the basics.
Step 1: Signup Gate Validation
This is the most important integration point. You want to validate the email before provisioning any resources. Here is a complete Express route handler:
import express from 'express';
const app = express();
app.use(express.json());
const BIGSHIELD_API_KEY = process.env.BIGSHIELD_API_KEY;
const BIGSHIELD_URL = 'https://bigshield.app/api/v1/validate';
interface BigShieldResponse {
email: string;
score: number; // 0-100, higher = more trustworthy
verdict: 'pass' | 'warn' | 'fail';
signals: Array<{
name: string;
score_impact: number;
confidence: number;
details: string;
}>;
cached: boolean;
latency_ms: number;
}
async function validateEmail(email: string): Promise<BigShieldResponse> {
const response = await fetch(BIGSHIELD_URL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${BIGSHIELD_API_KEY}`,
},
body: JSON.stringify({ email }),
});
if (!response.ok) {
throw new Error(`BigShield API error: ${response.status}`);
}
return response.json() as Promise<BigShieldResponse>;
}
app.post('/api/signup', async (req, res) => {
const { email, password, name } = req.body;
// Validate email BEFORE creating the account
try {
const validation = await validateEmail(email);
if (validation.verdict === 'fail') {
// Score below 30: almost certainly fraudulent
return res.status(422).json({
error: 'This email address cannot be used for signup.',
// Don't reveal the specific reason to avoid helping fraudsters
});
}
if (validation.verdict === 'warn') {
// Score between 30-85: suspicious but not definitive
// Require additional verification
return res.status(200).json({
requiresVerification: true,
message: 'Please verify your email to continue.',
});
}
// Score above 85: looking good, proceed with account creation
const user = await createUser({ email, password, name });
// Store the BigShield score for future reference
await storeValidationResult(user.id, validation);
// Provision API keys and free-tier credits
const apiKey = await provisionApiKey(user.id);
return res.status(201).json({
user: { id: user.id, email },
apiKey,
});
} catch (error) {
// If BigShield is unreachable, fail open but flag for review
console.error('BigShield validation failed:', error);
const user = await createUser({ email, password, name });
await flagForManualReview(user.id, 'validation_unavailable');
const apiKey = await provisionApiKey(user.id);
return res.status(201).json({
user: { id: user.id, email },
apiKey,
});
}
});A few important design decisions here:
- Fail open: If BigShield is unreachable, we still create the account but flag it for review. This prevents an outage from blocking all signups.
- Vague error messages: We never tell the user why their email was rejected. Specific error messages help fraudsters tune their approach.
- Three-tier response: Pass, warn, and fail create different user experiences. Warned users get a chance to verify rather than being blocked outright.
Step 2: Express Middleware for Per-Request Validation
Once a user has an API key, you want to continue monitoring. An account that was clean at signup might get flagged later (for example, if the email domain starts being used for spam). Here is middleware that checks a cached validation score on every request:
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.UPSTASH_REDIS_URL!,
token: process.env.UPSTASH_REDIS_TOKEN!,
});
interface CachedValidation {
score: number;
verdict: string;
validatedAt: string;
}
// Middleware: check email validation on every LLM request
function bigshieldMiddleware(options: {
blockThreshold?: number;
cacheHours?: number;
revalidateHours?: number;
} = {}) {
const {
blockThreshold = 30,
cacheHours = 24,
revalidateHours = 168, // Re-validate weekly
} = options;
return async (
req: express.Request,
res: express.Response,
next: express.NextFunction
) => {
const apiKey = req.headers['x-api-key'] as string;
if (!apiKey) {
return res.status(401).json({ error: 'Missing API key' });
}
const user = await getUserByApiKey(apiKey);
if (!user) {
return res.status(401).json({ error: 'Invalid API key' });
}
// Check cached validation result
const cacheKey = `bigshield:validation:${user.email}`;
let cached = await redis.get<CachedValidation>(cacheKey);
if (!cached) {
// No cache, validate now
try {
const result = await validateEmail(user.email);
cached = {
score: result.score,
verdict: result.verdict,
validatedAt: new Date().toISOString(),
};
await redis.set(cacheKey, cached, { ex: cacheHours * 3600 });
} catch {
// If validation fails, allow the request but log it
console.warn(`BigShield validation failed for ${user.email}`);
return next();
}
}
// Check if revalidation is needed
const validatedAt = new Date(cached.validatedAt);
const hoursSinceValidation =
(Date.now() - validatedAt.getTime()) / (1000 * 60 * 60);
if (hoursSinceValidation > revalidateHours) {
// Trigger async revalidation (don't block the request)
revalidateAsync(user.email, cacheKey).catch(console.error);
}
// Block if score is too low
if (cached.score < blockThreshold) {
return res.status(403).json({
error: 'Account suspended. Contact support.',
});
}
// Attach score to request for downstream use
(req as any).emailScore = cached.score;
next();
};
}
async function revalidateAsync(email: string, cacheKey: string) {
const result = await validateEmail(email);
const cached: CachedValidation = {
score: result.score,
verdict: result.verdict,
validatedAt: new Date().toISOString(),
};
await redis.set(cacheKey, cached, { ex: 24 * 3600 });
// Alert if a previously good account is now flagged
if (result.score < 30) {
await alertTeam(`Account ${email} dropped to score ${result.score}`);
}
}
// Apply the middleware to your LLM endpoints
app.use('/api/v1/completions', bigshieldMiddleware({ blockThreshold: 25 }));
app.use('/api/v1/embeddings', bigshieldMiddleware({ blockThreshold: 25 }));This middleware adds near-zero latency for cached results (a single Redis lookup). Fresh validations happen asynchronously when the cache expires, so users never experience added wait time after their initial signup.
Step 3: Async Batch Processing
Sometimes you need to validate emails in bulk. Maybe you are importing users from another platform, processing a waitlist, or doing a periodic audit of your user base. Here is an efficient batch processing implementation using a simple queue:
interface BatchJob {
emails: string[];
onComplete: (results: Map<string, BigShieldResponse>) => void;
}
async function validateBatch(
emails: string[],
options: {
concurrency?: number;
delayMs?: number;
onProgress?: (completed: number, total: number) => void;
} = {}
): Promise<Map<string, BigShieldResponse>> {
const {
concurrency = 10,
delayMs = 50,
onProgress,
} = options;
const results = new Map<string, BigShieldResponse>();
const queue = [...emails];
let completed = 0;
async function processOne(): Promise<void> {
while (queue.length > 0) {
const email = queue.shift();
if (!email) break;
try {
const result = await validateEmail(email);
results.set(email, result);
} catch (error) {
console.error(`Failed to validate ${email}:`, error);
// Retry once after a delay
await new Promise(r => setTimeout(r, 1000));
try {
const result = await validateEmail(email);
results.set(email, result);
} catch {
// Store a failure result
results.set(email, {
email,
score: -1,
verdict: 'fail' as const,
signals: [],
cached: false,
latency_ms: 0,
});
}
}
completed++;
onProgress?.(completed, emails.length);
// Rate limit to stay within BigShield API limits
if (delayMs > 0) {
await new Promise(r => setTimeout(r, delayMs));
}
}
}
// Run workers in parallel
const workers = Array.from(
{ length: Math.min(concurrency, emails.length) },
() => processOne()
);
await Promise.all(workers);
return results;
}
// Example: Audit all users who signed up in the last 30 days
async function auditRecentSignups() {
const recentUsers = await db.query(
'SELECT email FROM users WHERE created_at > NOW() - INTERVAL '30 days''
);
const emails = recentUsers.rows.map(r => r.email);
console.log(`Auditing ${emails.length} recent signups...`);
const results = await validateBatch(emails, {
concurrency: 5,
delayMs: 100,
onProgress: (done, total) => {
if (done % 100 === 0) {
console.log(`Progress: ${done}/${total}`);
}
},
});
// Flag accounts that score poorly
let flagged = 0;
for (const [email, result] of results) {
if (result.score >= 0 && result.score < 30) {
await flagForReview(email, result);
flagged++;
}
}
console.log(`Audit complete. Flagged ${flagged} accounts for review.`);
}Cost Savings Calculations
Let's do the math on what this integration actually saves you. These numbers are based on real data from our case study on token waste savings.
Scenario: Mid-size AI startup, 10,000 signups/month
Without BigShield:
- 14% fraudulent signups = 1,400 fake accounts per month
- Average free-tier usage per fraudulent account: $18 in tokens
- Monthly token waste: $25,200
With BigShield:
- 10,000 validations at $0.005/each = $50/month (or free on the free tier for under 1,000/month)
- Catch rate: ~92% of fraudulent signups blocked
- Remaining fraud: 112 accounts x $18 = $2,016
- Monthly token waste: $2,016
- Monthly savings: $23,184
- ROI: 463x
Scenario: Larger platform, 100,000 signups/month
Without BigShield:
- 14% fraud = 14,000 fake accounts
- Monthly token waste: $252,000
With BigShield:
- 100,000 validations at $0.003/each (volume pricing) = $300/month
- Remaining fraud: 1,120 accounts x $18 = $20,160
- Monthly savings: $231,540
Even conservative estimates show 100x+ ROI. The validation cost is negligible compared to the compute costs of serving fraudulent accounts.
Advanced: Score-Based Token Budgets
Here is a pattern we love: instead of binary allow/block, use the BigShield score to dynamically set token budgets. Higher-trust accounts get more generous limits:
function getTokenBudget(emailScore: number, plan: string): number {
const baseBudget: Record<string, number> = {
free: 10_000,
starter: 100_000,
pro: 1_000_000,
};
const base = baseBudget[plan] || baseBudget.free;
// Score 90-100: full budget
// Score 70-89: 75% budget
// Score 50-69: 50% budget
// Score 30-49: 25% budget (these passed but are borderline)
if (emailScore >= 90) return base;
if (emailScore >= 70) return Math.floor(base * 0.75);
if (emailScore >= 50) return Math.floor(base * 0.5);
return Math.floor(base * 0.25);
}
// Use in your completion endpoint
app.post('/api/v1/completions', bigshieldMiddleware(), async (req, res) => {
const emailScore = (req as any).emailScore;
const user = (req as any).user;
const tokenBudget = getTokenBudget(emailScore, user.plan);
const tokensUsed = await getMonthlyTokenUsage(user.id);
if (tokensUsed >= tokenBudget) {
return res.status(429).json({
error: 'Monthly token limit reached.',
limit: tokenBudget,
used: tokensUsed,
});
}
// Proceed with LLM call, passing remaining budget
const remainingTokens = tokenBudget - tokensUsed;
const result = await generateCompletion(req.body, {
maxTokens: Math.min(req.body.max_tokens || 4096, remainingTokens),
});
return res.json(result);
});This approach is elegant because it does not create a hard barrier at any score threshold. Legitimate users who happen to have a slightly suspicious email (maybe they use a privacy relay) still get access, just with a more conservative budget until they build trust.
Error Handling and Resilience
Production integrations need solid error handling. Here are the key patterns:
- Circuit breaker: If BigShield returns errors on 3+ consecutive calls, disable validation for 60 seconds and fail open. Do not let a transient API issue block all signups.
- Timeout: Set a 2-second timeout on validation calls. BigShield typically responds in under 200ms, so if a request takes longer than 2 seconds, something is wrong.
- Idempotency: Cache validation results by email for at least an hour. There is no reason to re-validate the same email on every page load during a signup flow.
- Graceful degradation: If you cannot validate, let the user in but apply the minimum token budget and flag for async review.
Next Steps
That covers the main integration patterns for LLM pipelines. The key insight is that email validation should happen as early as possible in the pipeline, before you allocate any compute resources, and the results should be cached and used throughout the user lifecycle.
BigShield's API is designed for exactly this use case: sub-200ms response times, simple REST API, and a scoring model that works across industries. Get started with the free tier (1,000 validations/month) at bigshield.app and see how it fits into your stack.