Cost Tracking and Budget System
RankDisco implements comprehensive cost tracking to monitor API usage, enforce budget limits, and optimize operational spending across all external services.
Cost Tracking Overview
Why Cost Tracking Matters
External API costs can escalate quickly without visibility:
- DataForSEO charges per API call ($0.0035 - $0.02+ per request)
- ZenRows uses credit-based pricing ($69/month for 250K basic + 10K premium)
- Cloudflare Workers AI charges per neuron ($0.011 per 1K neurons)
- YouTube Data API has strict daily quotas (10K units/day free)
Cost tracking enables:
- Budget enforcement - Prevent runaway costs with daily/hourly limits
- Cost attribution - Know which workflows, domains, and operations cost money
- Optimization - Identify expensive operations and optimize pipelines
- Forecasting - Project monthly costs based on usage patterns
What Gets Tracked
Every external API call is tracked with:
| Field | Description |
|---|---|
service | Service identifier (e.g., dataforseo_backlinks, zenrows_premium) |
cost_usd | Actual or estimated cost in USD |
run_id | Associated crawl/workflow run |
project_id | Project context (if applicable) |
task_id | External task ID (e.g., DataForSEO task) |
success | Whether the request succeeded |
items_returned | Number of items in response |
source | Origin: cron, api, queue, webhook |
Service Costs
COST_SERVICES Constants
All tracked services are defined in /packages/api/src/lib/utils/cost-tracker.js:
export const COST_SERVICES = {
// DataForSEO - App Store
DATAFORSEO_APP_INFO: "dataforseo_app_info", // ~$0.01/request
DATAFORSEO_APP_LIST: "dataforseo_app_list", // ~$0.02/request
DATAFORSEO_SERP: "dataforseo_serp", // $0.0035/request
DATAFORSEO_KEYWORDS: "dataforseo_keywords", // ~$0.005/request
// DataForSEO - Backlinks
DATAFORSEO_REFERRING_DOMAINS: "dataforseo_referring_domains", // ~$0.02/request
DATAFORSEO_SUMMARY: "dataforseo_summary", // ~$0.02/request
DATAFORSEO_BACKLINKS: "dataforseo_backlinks", // ~$0.02/request
// DataForSEO - On-Page
DATAFORSEO_INSTANT_PAGES: "dataforseo_instant_pages", // $0.000125/URL
// ZenRows (web scraping)
ZENROWS_BASIC: "zenrows_basic", // ~$0.001/request (datacenter proxy)
ZENROWS_PREMIUM: "zenrows_premium", // ~$0.01/request (residential proxy)
ZENROWS_JS_RENDER: "zenrows_js_render", // ~$0.005/request (JS rendering)
// Free sources
RSS_FEED: "rss_feed", // $0 (Apple RSS)
ITUNES_API: "itunes_api", // $0 (direct API)
// Cloudflare services
CLOUDFLARE_D1_READ: "cloudflare_d1_read", // $0.0000005/read
CLOUDFLARE_D1_WRITE: "cloudflare_d1_write", // $0.000001/write
CLOUDFLARE_KV: "cloudflare_kv", // $0.0000005/op
CLOUDFLARE_WORKER: "cloudflare_worker", // $0.0000003/request
CF_NATIVE_FETCH: "cf_native_fetch", // $0 (subrequests free)
CF_WORKERS_AI: "cf_workers_ai", // ~$0.02/call (~54 neurons)
// YouTube Data API
YOUTUBE_API: "youtube_api", // $0 (within 10K units/day)
};
Default Cost Rates
The system uses these default rates when actual costs are not available:
| Service | Rate (USD) | Notes |
|---|---|---|
dataforseo_app_info | $0.01 | Actual cost from API response preferred |
dataforseo_app_list | $0.02 | Actual cost from API response preferred |
dataforseo_serp | $0.0035 | $3.50 per 1,000 requests |
dataforseo_keywords | $0.005 | Variable based on volume |
dataforseo_referring_domains | $0.02 | Backlinks API |
dataforseo_instant_pages | $0.000125 | On-Page API |
zenrows_basic | $0.001 | ~$1/1,000 requests |
zenrows_premium | $0.01 | ~$10/1,000 requests |
zenrows_js_render | $0.005 | ~$5/1,000 requests |
cf_workers_ai | $0.02 | ~54 neurons average per LLM call |
Budget Caps
LLM Daily Budget ($50)
The LLM classifier enforces a hard daily budget to prevent cost spirals:
// In classifier-llm.js
const MAX_DAILY_LLM_BUDGET_USD = 50.0; // $50/day limit
export async function classifyWithLLM(input, env) {
// Budget enforcement - prevent cost spirals
try {
const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);
if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
console.warn(`[llm] Daily budget exceeded: $${dailyCost.toFixed(2)} >= $${MAX_DAILY_LLM_BUDGET_USD}`);
return {
skipped: true,
skip_reason: "daily_budget_exceeded",
needs_review: true,
classification: {},
};
}
} catch (err) {
// Don't block on budget check errors, just log
console.error("[llm] Budget check failed:", err.message);
}
// ... continue with LLM call
}
When the budget is exceeded:
- LLM classification returns
skipped: truewithskip_reason: "daily_budget_exceeded" - URLs are marked
needs_review: truefor later processing - The pipeline continues with rules-based classification only
Per-Run Budget Tracking (KV-Based)
For individual runs, the budget.js utility tracks costs in KV:
// In lib/utils/budget.js
export async function checkBudget(kv, runId, endpoint, maxRequests, maxCostUsd) {
const countKey = `budget:${runId}:${endpoint}:requests`;
const costKey = `budget:${runId}:${endpoint}:cost_usd`;
const count = parseInt(await kv.get(countKey) || "0");
const cost = parseFloat(await kv.get(costKey) || "0");
if (maxRequests > 0 && count >= maxRequests) {
return { ok: false, reason: `request limit exceeded (${count}/${maxRequests})` };
}
if (maxCostUsd > 0 && cost >= maxCostUsd) {
return { ok: false, reason: `cost limit exceeded ($${cost.toFixed(2)}/$${maxCostUsd})` };
}
return { ok: true, currentCount: count, currentCost: cost };
}
export async function incrementBudget(kv, runId, endpoint, costEstimate = 0) {
const countKey = `budget:${runId}:${endpoint}:requests`;
const costKey = `budget:${runId}:${endpoint}:cost_usd`;
// TTL 90 days for budget window
const ttl = 60 * 60 * 24 * 90;
await kv.put(countKey, String(count + 1), { expirationTtl: ttl });
await kv.put(costKey, String(cost + costEstimate), { expirationTtl: ttl });
}
Workflow Budget Checking
The base workflow class (base.ts) includes budget checking per workflow:
// In workflows/base.ts
protected async checkBudget(
step: WorkflowStep,
stepName: string,
service: string,
maxCost: number,
): Promise<{ ok: boolean; remaining: number; spent: number }> {
return await step.do(`${stepName}-budget-check`, async () => {
const result = await this.env.DB.prepare(`
SELECT COALESCE(SUM(cost), 0) as total_spent
FROM api_costs
WHERE workflow_id = ?
AND created_at > datetime('now', '-1 hour')
`).bind(this.instanceId).first();
const spent = (result?.total_spent as number) || 0;
const remaining = maxCost - spent;
return { ok: remaining > 0, remaining, spent };
});
}
Free Tier Configuration
The system tracks free tier usage across services:
export const FREE_TIER_CONFIG = {
workers_ai: {
service: "workers_ai_llm",
quota_type: "daily",
free_units: 10000, // 10K neurons/day free
unit_name: "neurons",
cost_per_1k_units: 0.011,
units_per_request: 54, // ~54 neurons per LLM call
},
zenrows: {
service: "zenrows",
quota_type: "monthly",
free_units: 250000, // 250K basic credits
premium_units: 10000, // 10K premium credits
unit_name: "credits",
plan_cost: 69, // $69/month
},
d1: {
service: "cloudflare_d1",
quota_type: "daily",
free_reads: 5000000, // 5M reads/day
free_writes: 100000, // 100K writes/day
},
youtube: {
service: "youtube_api",
quota_type: "daily",
free_units: 10000, // 10K units/day
unit_name: "quota_units",
},
dataforseo: {
service: "dataforseo",
quota_type: "none", // No free tier - pay per use
free_units: 0,
},
};
Cost Attribution
Tracking Costs to Workflows
Every trackCost() call includes attribution context:
await trackCost(env, {
service: COST_SERVICES.DATAFORSEO_BACKLINKS,
cost_usd: task.cost, // Actual cost from API response
run_id: runId, // Links to crawl_job_runs
project_id: projectId, // Links to project
task_id: task.id, // External task reference
success: true,
items_returned: response.items?.length || 0,
source: 'queue', // Where called from
});
Workflow Cost Tracking
Workflows track costs at the step level:
// In domain-onboard workflow
await trackCost(this.env, {
service: COST_SERVICES.CF_WORKERS_AI,
cost_usd: 0.0002,
run_id: this.instanceId,
source: 'workflow',
});
Aggregating Run Costs
The updateRunCosts() function aggregates costs from api_costs to crawl_job_runs:
export async function updateRunCosts(env, runId) {
const costs = await env.DB.prepare(`
SELECT
SUM(cost_usd) as total_cost,
SUM(CASE WHEN service LIKE 'dataforseo%' THEN cost_usd ELSE 0 END) as dataforseo_cost,
SUM(CASE WHEN service LIKE 'zenrows%' THEN cost_usd ELSE 0 END) as zenrows_cost,
SUM(CASE WHEN service NOT LIKE 'dataforseo%' AND service NOT LIKE 'zenrows%'
THEN cost_usd ELSE 0 END) as other_cost
FROM api_costs
WHERE run_id = ?
`).bind(runId).first();
await env.DB.prepare(`
UPDATE crawl_job_runs SET
total_cost_usd = ?,
dataforseo_cost = ?,
zenrows_cost = ?,
other_cost = ?,
updated_at = ?
WHERE id = ?
`).bind(
costs?.total_cost || 0,
costs?.dataforseo_cost || 0,
costs?.zenrows_cost || 0,
costs?.other_cost || 0,
Date.now(),
runId
).run();
}
Cost Queries
Database Schema
The api_costs table stores all cost events:
CREATE TABLE api_costs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT, -- FK to crawl_runs
project_id TEXT, -- FK to projects
service TEXT NOT NULL, -- Service identifier
endpoint TEXT, -- Specific endpoint
cost_usd REAL NOT NULL DEFAULT 0,
request_count INTEGER DEFAULT 1,
app_id TEXT,
category_id TEXT,
task_id TEXT, -- External task ID
success INTEGER DEFAULT 1,
items_returned INTEGER,
created_at INTEGER NOT NULL,
source TEXT -- 'cron', 'api', 'queue', 'webhook'
);
-- Indexes for efficient queries
CREATE INDEX idx_api_costs_run ON api_costs(run_id);
CREATE INDEX idx_api_costs_project ON api_costs(project_id);
CREATE INDEX idx_api_costs_service ON api_costs(service);
CREATE INDEX idx_api_costs_created ON api_costs(created_at DESC);
CREATE INDEX idx_api_costs_service_date ON api_costs(service, created_at);
Common Query Examples
Total Cost Today
SELECT SUM(cost_usd) as total_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', 'start of day') * 1000;
Cost by Service (Last 7 Days)
SELECT
service,
SUM(cost_usd) as total_cost,
COUNT(*) as request_count,
SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) as success_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY service
ORDER BY total_cost DESC;
Daily Cost Breakdown
SELECT
date(created_at / 1000, 'unixepoch') as date,
service,
SUM(cost_usd) as total_cost,
COUNT(*) as request_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
GROUP BY date, service
ORDER BY date DESC, total_cost DESC;
Cost by Run
SELECT
run_id,
SUM(cost_usd) as total_cost,
COUNT(*) as api_calls,
SUM(items_returned) as total_items
FROM api_costs
WHERE run_id IS NOT NULL
GROUP BY run_id
ORDER BY total_cost DESC
LIMIT 20;
LLM Daily Usage Check
SELECT SUM(cost_usd) as daily_llm_cost
FROM api_costs
WHERE service = 'cf_workers_ai'
AND created_at >= strftime('%s', 'now', 'start of day') * 1000;
API Endpoints
The admin API provides cost visibility:
| Endpoint | Description |
|---|---|
GET /api/admin/costs | Overall cost statistics |
GET /api/admin/costs/run/:run_id | Costs for specific run |
GET /api/admin/costs/project/:project_id | Costs for specific project |
GET /api/admin/costs/daily | Daily breakdown |
GET /api/admin/costs/recent | Recent cost entries (debug) |
GET /api/admin/costs/zenrows | ZenRows quota tracking |
GET /api/admin/costs/summary | Comprehensive usage summary |
GET /api/admin/costs/free-tiers | Free tier configurations |
Example API Response
GET /api/admin/costs?minutes=60
{
"success": true,
"stats": {
"all_time": {
"total_cost_usd": 1234.56,
"total_requests": 89012,
"total_runs": 156,
"total_projects": 12
},
"today_cost_usd": 45.67,
"week_cost_usd": 234.89,
"month_cost_usd": 890.12,
"by_service": [
{ "service": "dataforseo_backlinks", "total_cost": 23.45, "requests": 1200 },
{ "service": "cf_workers_ai", "total_cost": 12.34, "requests": 617 },
{ "service": "zenrows_basic", "total_cost": 5.67, "requests": 5670 }
]
}
}
Budget Enforcement
How Budget Checks Prevent Overspending
The budget enforcement happens at multiple levels:
1. Pre-Request Check (LLM)
Before making expensive LLM calls:
const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);
if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
return { skipped: true, skip_reason: "daily_budget_exceeded" };
}
2. Per-Workflow Hourly Budget
Workflows check remaining budget before expensive steps:
const budget = await this.checkBudget(step, "backlinks", "dataforseo", 10.0);
if (!budget.ok) {
this.log(`Budget exhausted: spent $${budget.spent}`);
return { skipped: true, reason: "budget_exceeded" };
}
3. Per-Run Limits
Individual runs can have budget caps:
const budgetCheck = await checkBudget(env.DFS_BUDGETS, runId, "backlinks", 1000, 50.0);
if (!budgetCheck.ok) {
throw new Error(`Run budget exceeded: ${budgetCheck.reason}`);
}
Graceful Degradation
When budgets are exceeded, the system degrades gracefully:
| Scenario | Behavior |
|---|---|
| LLM budget exceeded | Fall back to rules-only classification |
| DataForSEO budget exceeded | Queue requests for later processing |
| ZenRows credits exhausted | Use cached data or skip non-critical scraping |
| Per-run budget exceeded | Complete run with partial results |
Cost Optimization
Pipeline Ordering
The classification pipeline is ordered from cheapest to most expensive:
1. Rules Engine (FREE) - Pattern matching, known domains
↓ (if confidence < 80%)
2. Vectorize (CHEAP) - ~$0.0001/query for similar URL lookup
↓ (if confidence < 70%)
3. Content Parser (MODERATE) - ZenRows fetch + parsing
↓ (if still uncertain)
4. LLM Fallback (EXPENSIVE) - ~$0.0002/call, daily budget capped
Caching Strategies
Multiple caching layers reduce costs:
| Cache | Purpose | TTL |
|---|---|---|
| Known Domains DB | Skip classification for known sites | Permanent |
| Vectorize | Similar URL classifications | N/A (similarity search) |
| Content Cache (R2) | Avoid re-fetching pages | 30 days |
| Classification KV | Recent classification results | 24 hours |
Early Termination
The pipeline terminates early when confidence is sufficient:
// In url-classifier.js
if (classification.confidence >= THRESHOLDS.HIGH_CONFIDENCE) {
return classification; // Skip remaining stages
}
Batch Operations
Expensive operations are batched:
// DataForSEO batch processing
const batch = urls.slice(0, 100); // Max 100 per batch
const results = await dataforseoInstantPages(batch, env);
// Track cost once for entire batch
await trackCost(env, {
service: COST_SERVICES.DATAFORSEO_INSTANT_PAGES,
cost_usd: batch.length * 0.000125,
request_count: batch.length,
});
Monitoring and Alerts
Real-Time Cost Monitoring
The admin console provides:
- Today's Cost - Running total updated with each API call
- Cost by Service - Breakdown showing which APIs cost most
- Trend Charts - Daily/hourly cost trends over time
- Budget Status - Progress toward daily/monthly limits
Setting Up Alerts
While RankDisco doesn't have built-in alerting, you can monitor via:
Cron-Based Budget Checks
// Check costs every hour
export default {
async scheduled(event, env, ctx) {
const stats = await getCostStats(env);
if (stats.today_cost_usd > 100) {
console.error(`ALERT: Daily cost exceeds $100: $${stats.today_cost_usd}`);
// Send to external alerting service
}
}
};
Query-Based Monitoring
-- Find cost anomalies (>2x average)
WITH daily_avg AS (
SELECT AVG(daily_cost) as avg_cost
FROM (
SELECT date(created_at/1000, 'unixepoch') as day, SUM(cost_usd) as daily_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
GROUP BY day
)
)
SELECT
date(created_at/1000, 'unixepoch') as day,
SUM(cost_usd) as daily_cost,
(SELECT avg_cost FROM daily_avg) as avg_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY day
HAVING daily_cost > 2 * (SELECT avg_cost FROM daily_avg);
Cost Reports
Generate periodic cost reports:
// GET /api/admin/costs/daily?days=30&granularity=day
const report = await getDailyCosts(env, {
days: 30,
granularity: "day",
});
// Aggregate by service
const byService = {};
for (const row of report.costs) {
byService[row.service] = (byService[row.service] || 0) + row.total_cost;
}
console.log("Monthly Cost Summary:");
console.log(`Total: $${Object.values(byService).reduce((a, b) => a + b, 0).toFixed(2)}`);
Object.entries(byService)
.sort(([,a], [,b]) => b - a)
.forEach(([service, cost]) => {
console.log(` ${service}: $${cost.toFixed(2)}`);
});
Dashboard Integration
Key metrics for operational dashboards:
| Metric | Query |
|---|---|
| Cost today | SUM(cost_usd) WHERE created_at >= today_start |
| LLM budget remaining | $50 - SUM(cost_usd) WHERE service='cf_workers_ai' AND today |
| ZenRows credits used | SUM(request_count) WHERE service LIKE 'zenrows%' AND this_month |
| Avg cost per domain onboard | AVG(cost) GROUP BY workflow_type='domain-onboard' |
| Cost per 1K URLs classified | SUM(cost) / COUNT(*) * 1000 |
Related Documentation
- Classification Pipeline - How classification stages work
- Workflow System - Workflow execution and tracking
- Admin Console - Using the admin dashboard