Skip to main content

Cost Tracking and Budget System

RankDisco implements comprehensive cost tracking to monitor API usage, enforce budget limits, and optimize operational spending across all external services.

Cost Tracking Overview

Why Cost Tracking Matters

External API costs can escalate quickly without visibility:

  • DataForSEO charges per API call ($0.0035 - $0.02+ per request)
  • ZenRows uses credit-based pricing ($69/month for 250K basic + 10K premium)
  • Cloudflare Workers AI charges per neuron ($0.011 per 1K neurons)
  • YouTube Data API has strict daily quotas (10K units/day free)

Cost tracking enables:

  1. Budget enforcement - Prevent runaway costs with daily/hourly limits
  2. Cost attribution - Know which workflows, domains, and operations cost money
  3. Optimization - Identify expensive operations and optimize pipelines
  4. Forecasting - Project monthly costs based on usage patterns

What Gets Tracked

Every external API call is tracked with:

FieldDescription
serviceService identifier (e.g., dataforseo_backlinks, zenrows_premium)
cost_usdActual or estimated cost in USD
run_idAssociated crawl/workflow run
project_idProject context (if applicable)
task_idExternal task ID (e.g., DataForSEO task)
successWhether the request succeeded
items_returnedNumber of items in response
sourceOrigin: cron, api, queue, webhook

Service Costs

COST_SERVICES Constants

All tracked services are defined in /packages/api/src/lib/utils/cost-tracker.js:

export const COST_SERVICES = {
// DataForSEO - App Store
DATAFORSEO_APP_INFO: "dataforseo_app_info", // ~$0.01/request
DATAFORSEO_APP_LIST: "dataforseo_app_list", // ~$0.02/request
DATAFORSEO_SERP: "dataforseo_serp", // $0.0035/request
DATAFORSEO_KEYWORDS: "dataforseo_keywords", // ~$0.005/request

// DataForSEO - Backlinks
DATAFORSEO_REFERRING_DOMAINS: "dataforseo_referring_domains", // ~$0.02/request
DATAFORSEO_SUMMARY: "dataforseo_summary", // ~$0.02/request
DATAFORSEO_BACKLINKS: "dataforseo_backlinks", // ~$0.02/request

// DataForSEO - On-Page
DATAFORSEO_INSTANT_PAGES: "dataforseo_instant_pages", // $0.000125/URL

// ZenRows (web scraping)
ZENROWS_BASIC: "zenrows_basic", // ~$0.001/request (datacenter proxy)
ZENROWS_PREMIUM: "zenrows_premium", // ~$0.01/request (residential proxy)
ZENROWS_JS_RENDER: "zenrows_js_render", // ~$0.005/request (JS rendering)

// Free sources
RSS_FEED: "rss_feed", // $0 (Apple RSS)
ITUNES_API: "itunes_api", // $0 (direct API)

// Cloudflare services
CLOUDFLARE_D1_READ: "cloudflare_d1_read", // $0.0000005/read
CLOUDFLARE_D1_WRITE: "cloudflare_d1_write", // $0.000001/write
CLOUDFLARE_KV: "cloudflare_kv", // $0.0000005/op
CLOUDFLARE_WORKER: "cloudflare_worker", // $0.0000003/request
CF_NATIVE_FETCH: "cf_native_fetch", // $0 (subrequests free)
CF_WORKERS_AI: "cf_workers_ai", // ~$0.02/call (~54 neurons)

// YouTube Data API
YOUTUBE_API: "youtube_api", // $0 (within 10K units/day)
};

Default Cost Rates

The system uses these default rates when actual costs are not available:

ServiceRate (USD)Notes
dataforseo_app_info$0.01Actual cost from API response preferred
dataforseo_app_list$0.02Actual cost from API response preferred
dataforseo_serp$0.0035$3.50 per 1,000 requests
dataforseo_keywords$0.005Variable based on volume
dataforseo_referring_domains$0.02Backlinks API
dataforseo_instant_pages$0.000125On-Page API
zenrows_basic$0.001~$1/1,000 requests
zenrows_premium$0.01~$10/1,000 requests
zenrows_js_render$0.005~$5/1,000 requests
cf_workers_ai$0.02~54 neurons average per LLM call

Budget Caps

LLM Daily Budget ($50)

The LLM classifier enforces a hard daily budget to prevent cost spirals:

// In classifier-llm.js
const MAX_DAILY_LLM_BUDGET_USD = 50.0; // $50/day limit

export async function classifyWithLLM(input, env) {
// Budget enforcement - prevent cost spirals
try {
const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);

if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
console.warn(`[llm] Daily budget exceeded: $${dailyCost.toFixed(2)} >= $${MAX_DAILY_LLM_BUDGET_USD}`);
return {
skipped: true,
skip_reason: "daily_budget_exceeded",
needs_review: true,
classification: {},
};
}
} catch (err) {
// Don't block on budget check errors, just log
console.error("[llm] Budget check failed:", err.message);
}

// ... continue with LLM call
}

When the budget is exceeded:

  • LLM classification returns skipped: true with skip_reason: "daily_budget_exceeded"
  • URLs are marked needs_review: true for later processing
  • The pipeline continues with rules-based classification only

Per-Run Budget Tracking (KV-Based)

For individual runs, the budget.js utility tracks costs in KV:

// In lib/utils/budget.js
export async function checkBudget(kv, runId, endpoint, maxRequests, maxCostUsd) {
const countKey = `budget:${runId}:${endpoint}:requests`;
const costKey = `budget:${runId}:${endpoint}:cost_usd`;

const count = parseInt(await kv.get(countKey) || "0");
const cost = parseFloat(await kv.get(costKey) || "0");

if (maxRequests > 0 && count >= maxRequests) {
return { ok: false, reason: `request limit exceeded (${count}/${maxRequests})` };
}
if (maxCostUsd > 0 && cost >= maxCostUsd) {
return { ok: false, reason: `cost limit exceeded ($${cost.toFixed(2)}/$${maxCostUsd})` };
}
return { ok: true, currentCount: count, currentCost: cost };
}

export async function incrementBudget(kv, runId, endpoint, costEstimate = 0) {
const countKey = `budget:${runId}:${endpoint}:requests`;
const costKey = `budget:${runId}:${endpoint}:cost_usd`;

// TTL 90 days for budget window
const ttl = 60 * 60 * 24 * 90;
await kv.put(countKey, String(count + 1), { expirationTtl: ttl });
await kv.put(costKey, String(cost + costEstimate), { expirationTtl: ttl });
}

Workflow Budget Checking

The base workflow class (base.ts) includes budget checking per workflow:

// In workflows/base.ts
protected async checkBudget(
step: WorkflowStep,
stepName: string,
service: string,
maxCost: number,
): Promise<{ ok: boolean; remaining: number; spent: number }> {
return await step.do(`${stepName}-budget-check`, async () => {
const result = await this.env.DB.prepare(`
SELECT COALESCE(SUM(cost), 0) as total_spent
FROM api_costs
WHERE workflow_id = ?
AND created_at > datetime('now', '-1 hour')
`).bind(this.instanceId).first();

const spent = (result?.total_spent as number) || 0;
const remaining = maxCost - spent;
return { ok: remaining > 0, remaining, spent };
});
}

Free Tier Configuration

The system tracks free tier usage across services:

export const FREE_TIER_CONFIG = {
workers_ai: {
service: "workers_ai_llm",
quota_type: "daily",
free_units: 10000, // 10K neurons/day free
unit_name: "neurons",
cost_per_1k_units: 0.011,
units_per_request: 54, // ~54 neurons per LLM call
},
zenrows: {
service: "zenrows",
quota_type: "monthly",
free_units: 250000, // 250K basic credits
premium_units: 10000, // 10K premium credits
unit_name: "credits",
plan_cost: 69, // $69/month
},
d1: {
service: "cloudflare_d1",
quota_type: "daily",
free_reads: 5000000, // 5M reads/day
free_writes: 100000, // 100K writes/day
},
youtube: {
service: "youtube_api",
quota_type: "daily",
free_units: 10000, // 10K units/day
unit_name: "quota_units",
},
dataforseo: {
service: "dataforseo",
quota_type: "none", // No free tier - pay per use
free_units: 0,
},
};

Cost Attribution

Tracking Costs to Workflows

Every trackCost() call includes attribution context:

await trackCost(env, {
service: COST_SERVICES.DATAFORSEO_BACKLINKS,
cost_usd: task.cost, // Actual cost from API response
run_id: runId, // Links to crawl_job_runs
project_id: projectId, // Links to project
task_id: task.id, // External task reference
success: true,
items_returned: response.items?.length || 0,
source: 'queue', // Where called from
});

Workflow Cost Tracking

Workflows track costs at the step level:

// In domain-onboard workflow
await trackCost(this.env, {
service: COST_SERVICES.CF_WORKERS_AI,
cost_usd: 0.0002,
run_id: this.instanceId,
source: 'workflow',
});

Aggregating Run Costs

The updateRunCosts() function aggregates costs from api_costs to crawl_job_runs:

export async function updateRunCosts(env, runId) {
const costs = await env.DB.prepare(`
SELECT
SUM(cost_usd) as total_cost,
SUM(CASE WHEN service LIKE 'dataforseo%' THEN cost_usd ELSE 0 END) as dataforseo_cost,
SUM(CASE WHEN service LIKE 'zenrows%' THEN cost_usd ELSE 0 END) as zenrows_cost,
SUM(CASE WHEN service NOT LIKE 'dataforseo%' AND service NOT LIKE 'zenrows%'
THEN cost_usd ELSE 0 END) as other_cost
FROM api_costs
WHERE run_id = ?
`).bind(runId).first();

await env.DB.prepare(`
UPDATE crawl_job_runs SET
total_cost_usd = ?,
dataforseo_cost = ?,
zenrows_cost = ?,
other_cost = ?,
updated_at = ?
WHERE id = ?
`).bind(
costs?.total_cost || 0,
costs?.dataforseo_cost || 0,
costs?.zenrows_cost || 0,
costs?.other_cost || 0,
Date.now(),
runId
).run();
}

Cost Queries

Database Schema

The api_costs table stores all cost events:

CREATE TABLE api_costs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT, -- FK to crawl_runs
project_id TEXT, -- FK to projects
service TEXT NOT NULL, -- Service identifier
endpoint TEXT, -- Specific endpoint
cost_usd REAL NOT NULL DEFAULT 0,
request_count INTEGER DEFAULT 1,
app_id TEXT,
category_id TEXT,
task_id TEXT, -- External task ID
success INTEGER DEFAULT 1,
items_returned INTEGER,
created_at INTEGER NOT NULL,
source TEXT -- 'cron', 'api', 'queue', 'webhook'
);

-- Indexes for efficient queries
CREATE INDEX idx_api_costs_run ON api_costs(run_id);
CREATE INDEX idx_api_costs_project ON api_costs(project_id);
CREATE INDEX idx_api_costs_service ON api_costs(service);
CREATE INDEX idx_api_costs_created ON api_costs(created_at DESC);
CREATE INDEX idx_api_costs_service_date ON api_costs(service, created_at);

Common Query Examples

Total Cost Today

SELECT SUM(cost_usd) as total_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', 'start of day') * 1000;

Cost by Service (Last 7 Days)

SELECT
service,
SUM(cost_usd) as total_cost,
COUNT(*) as request_count,
SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) as success_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY service
ORDER BY total_cost DESC;

Daily Cost Breakdown

SELECT
date(created_at / 1000, 'unixepoch') as date,
service,
SUM(cost_usd) as total_cost,
COUNT(*) as request_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
GROUP BY date, service
ORDER BY date DESC, total_cost DESC;

Cost by Run

SELECT
run_id,
SUM(cost_usd) as total_cost,
COUNT(*) as api_calls,
SUM(items_returned) as total_items
FROM api_costs
WHERE run_id IS NOT NULL
GROUP BY run_id
ORDER BY total_cost DESC
LIMIT 20;

LLM Daily Usage Check

SELECT SUM(cost_usd) as daily_llm_cost
FROM api_costs
WHERE service = 'cf_workers_ai'
AND created_at >= strftime('%s', 'now', 'start of day') * 1000;

API Endpoints

The admin API provides cost visibility:

EndpointDescription
GET /api/admin/costsOverall cost statistics
GET /api/admin/costs/run/:run_idCosts for specific run
GET /api/admin/costs/project/:project_idCosts for specific project
GET /api/admin/costs/dailyDaily breakdown
GET /api/admin/costs/recentRecent cost entries (debug)
GET /api/admin/costs/zenrowsZenRows quota tracking
GET /api/admin/costs/summaryComprehensive usage summary
GET /api/admin/costs/free-tiersFree tier configurations

Example API Response

GET /api/admin/costs?minutes=60
{
"success": true,
"stats": {
"all_time": {
"total_cost_usd": 1234.56,
"total_requests": 89012,
"total_runs": 156,
"total_projects": 12
},
"today_cost_usd": 45.67,
"week_cost_usd": 234.89,
"month_cost_usd": 890.12,
"by_service": [
{ "service": "dataforseo_backlinks", "total_cost": 23.45, "requests": 1200 },
{ "service": "cf_workers_ai", "total_cost": 12.34, "requests": 617 },
{ "service": "zenrows_basic", "total_cost": 5.67, "requests": 5670 }
]
}
}

Budget Enforcement

How Budget Checks Prevent Overspending

The budget enforcement happens at multiple levels:

1. Pre-Request Check (LLM)

Before making expensive LLM calls:

const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);
if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
return { skipped: true, skip_reason: "daily_budget_exceeded" };
}

2. Per-Workflow Hourly Budget

Workflows check remaining budget before expensive steps:

const budget = await this.checkBudget(step, "backlinks", "dataforseo", 10.0);
if (!budget.ok) {
this.log(`Budget exhausted: spent $${budget.spent}`);
return { skipped: true, reason: "budget_exceeded" };
}

3. Per-Run Limits

Individual runs can have budget caps:

const budgetCheck = await checkBudget(env.DFS_BUDGETS, runId, "backlinks", 1000, 50.0);
if (!budgetCheck.ok) {
throw new Error(`Run budget exceeded: ${budgetCheck.reason}`);
}

Graceful Degradation

When budgets are exceeded, the system degrades gracefully:

ScenarioBehavior
LLM budget exceededFall back to rules-only classification
DataForSEO budget exceededQueue requests for later processing
ZenRows credits exhaustedUse cached data or skip non-critical scraping
Per-run budget exceededComplete run with partial results

Cost Optimization

Pipeline Ordering

The classification pipeline is ordered from cheapest to most expensive:

1. Rules Engine (FREE) - Pattern matching, known domains
↓ (if confidence < 80%)
2. Vectorize (CHEAP) - ~$0.0001/query for similar URL lookup
↓ (if confidence < 70%)
3. Content Parser (MODERATE) - ZenRows fetch + parsing
↓ (if still uncertain)
4. LLM Fallback (EXPENSIVE) - ~$0.0002/call, daily budget capped

Caching Strategies

Multiple caching layers reduce costs:

CachePurposeTTL
Known Domains DBSkip classification for known sitesPermanent
VectorizeSimilar URL classificationsN/A (similarity search)
Content Cache (R2)Avoid re-fetching pages30 days
Classification KVRecent classification results24 hours

Early Termination

The pipeline terminates early when confidence is sufficient:

// In url-classifier.js
if (classification.confidence >= THRESHOLDS.HIGH_CONFIDENCE) {
return classification; // Skip remaining stages
}

Batch Operations

Expensive operations are batched:

// DataForSEO batch processing
const batch = urls.slice(0, 100); // Max 100 per batch
const results = await dataforseoInstantPages(batch, env);
// Track cost once for entire batch
await trackCost(env, {
service: COST_SERVICES.DATAFORSEO_INSTANT_PAGES,
cost_usd: batch.length * 0.000125,
request_count: batch.length,
});

Monitoring and Alerts

Real-Time Cost Monitoring

The admin console provides:

  1. Today's Cost - Running total updated with each API call
  2. Cost by Service - Breakdown showing which APIs cost most
  3. Trend Charts - Daily/hourly cost trends over time
  4. Budget Status - Progress toward daily/monthly limits

Setting Up Alerts

While RankDisco doesn't have built-in alerting, you can monitor via:

Cron-Based Budget Checks

// Check costs every hour
export default {
async scheduled(event, env, ctx) {
const stats = await getCostStats(env);

if (stats.today_cost_usd > 100) {
console.error(`ALERT: Daily cost exceeds $100: $${stats.today_cost_usd}`);
// Send to external alerting service
}
}
};

Query-Based Monitoring

-- Find cost anomalies (>2x average)
WITH daily_avg AS (
SELECT AVG(daily_cost) as avg_cost
FROM (
SELECT date(created_at/1000, 'unixepoch') as day, SUM(cost_usd) as daily_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
GROUP BY day
)
)
SELECT
date(created_at/1000, 'unixepoch') as day,
SUM(cost_usd) as daily_cost,
(SELECT avg_cost FROM daily_avg) as avg_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY day
HAVING daily_cost > 2 * (SELECT avg_cost FROM daily_avg);

Cost Reports

Generate periodic cost reports:

// GET /api/admin/costs/daily?days=30&granularity=day
const report = await getDailyCosts(env, {
days: 30,
granularity: "day",
});

// Aggregate by service
const byService = {};
for (const row of report.costs) {
byService[row.service] = (byService[row.service] || 0) + row.total_cost;
}

console.log("Monthly Cost Summary:");
console.log(`Total: $${Object.values(byService).reduce((a, b) => a + b, 0).toFixed(2)}`);
Object.entries(byService)
.sort(([,a], [,b]) => b - a)
.forEach(([service, cost]) => {
console.log(` ${service}: $${cost.toFixed(2)}`);
});

Dashboard Integration

Key metrics for operational dashboards:

MetricQuery
Cost todaySUM(cost_usd) WHERE created_at >= today_start
LLM budget remaining$50 - SUM(cost_usd) WHERE service='cf_workers_ai' AND today
ZenRows credits usedSUM(request_count) WHERE service LIKE 'zenrows%' AND this_month
Avg cost per domain onboardAVG(cost) GROUP BY workflow_type='domain-onboard'
Cost per 1K URLs classifiedSUM(cost) / COUNT(*) * 1000