Cost Tracking and Budget System

RankDisco implements comprehensive cost tracking to monitor API usage, enforce budget limits, and optimize operational spending across all external services.

Cost Tracking Overview

Why Cost Tracking Matters

External API costs can escalate quickly without visibility:

DataForSEO charges per API call ($0.0035 - $0.02+ per request)
ZenRows uses credit-based pricing ($69/month for 250K basic + 10K premium)
Cloudflare Workers AI charges per neuron ($0.011 per 1K neurons)
YouTube Data API has strict daily quotas (10K units/day free)

Cost tracking enables:

Budget enforcement - Prevent runaway costs with daily/hourly limits
Cost attribution - Know which workflows, domains, and operations cost money
Optimization - Identify expensive operations and optimize pipelines
Forecasting - Project monthly costs based on usage patterns

What Gets Tracked

Every external API call is tracked with:

Field	Description
`service`	Service identifier (e.g., `dataforseo_backlinks`, `zenrows_premium`)
`cost_usd`	Actual or estimated cost in USD
`run_id`	Associated crawl/workflow run
`project_id`	Project context (if applicable)
`task_id`	External task ID (e.g., DataForSEO task)
`success`	Whether the request succeeded
`items_returned`	Number of items in response
`source`	Origin: `cron`, `api`, `queue`, `webhook`

Service Costs

COST_SERVICES Constants

All tracked services are defined in /packages/api/src/lib/utils/cost-tracker.js:

export const COST_SERVICES = {
  // DataForSEO - App Store
  DATAFORSEO_APP_INFO: "dataforseo_app_info",      // ~$0.01/request
  DATAFORSEO_APP_LIST: "dataforseo_app_list",      // ~$0.02/request
  DATAFORSEO_SERP: "dataforseo_serp",              // $0.0035/request
  DATAFORSEO_KEYWORDS: "dataforseo_keywords",      // ~$0.005/request

  // DataForSEO - Backlinks
  DATAFORSEO_REFERRING_DOMAINS: "dataforseo_referring_domains", // ~$0.02/request
  DATAFORSEO_SUMMARY: "dataforseo_summary",                     // ~$0.02/request
  DATAFORSEO_BACKLINKS: "dataforseo_backlinks",                 // ~$0.02/request

  // DataForSEO - On-Page
  DATAFORSEO_INSTANT_PAGES: "dataforseo_instant_pages",  // $0.000125/URL

  // ZenRows (web scraping)
  ZENROWS_BASIC: "zenrows_basic",       // ~$0.001/request (datacenter proxy)
  ZENROWS_PREMIUM: "zenrows_premium",   // ~$0.01/request (residential proxy)
  ZENROWS_JS_RENDER: "zenrows_js_render", // ~$0.005/request (JS rendering)

  // Free sources
  RSS_FEED: "rss_feed",                 // $0 (Apple RSS)
  ITUNES_API: "itunes_api",             // $0 (direct API)

  // Cloudflare services
  CLOUDFLARE_D1_READ: "cloudflare_d1_read",   // $0.0000005/read
  CLOUDFLARE_D1_WRITE: "cloudflare_d1_write", // $0.000001/write
  CLOUDFLARE_KV: "cloudflare_kv",             // $0.0000005/op
  CLOUDFLARE_WORKER: "cloudflare_worker",     // $0.0000003/request
  CF_NATIVE_FETCH: "cf_native_fetch",         // $0 (subrequests free)
  CF_WORKERS_AI: "cf_workers_ai",             // ~$0.02/call (~54 neurons)

  // YouTube Data API
  YOUTUBE_API: "youtube_api",                 // $0 (within 10K units/day)
};

Default Cost Rates

The system uses these default rates when actual costs are not available:

Service	Rate (USD)	Notes
`dataforseo_app_info`	$0.01	Actual cost from API response preferred
`dataforseo_app_list`	$0.02	Actual cost from API response preferred
`dataforseo_serp`	$0.0035	$3.50 per 1,000 requests
`dataforseo_keywords`	$0.005	Variable based on volume
`dataforseo_referring_domains`	$0.02	Backlinks API
`dataforseo_instant_pages`	$0.000125	On-Page API
`zenrows_basic`	$0.001	~$1/1,000 requests
`zenrows_premium`	$0.01	~$10/1,000 requests
`zenrows_js_render`	$0.005	~$5/1,000 requests
`cf_workers_ai`	$0.02	~54 neurons average per LLM call

Budget Caps

LLM Daily Budget ($50)

The LLM classifier enforces a hard daily budget to prevent cost spirals:

// In classifier-llm.js
const MAX_DAILY_LLM_BUDGET_USD = 50.0; // $50/day limit

export async function classifyWithLLM(input, env) {
  // Budget enforcement - prevent cost spirals
  try {
    const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);

    if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
      console.warn(`[llm] Daily budget exceeded: $${dailyCost.toFixed(2)} >= $${MAX_DAILY_LLM_BUDGET_USD}`);
      return {
        skipped: true,
        skip_reason: "daily_budget_exceeded",
        needs_review: true,
        classification: {},
      };
    }
  } catch (err) {
    // Don't block on budget check errors, just log
    console.error("[llm] Budget check failed:", err.message);
  }
  
  // ... continue with LLM call
}

When the budget is exceeded:

LLM classification returns skipped: true with skip_reason: "daily_budget_exceeded"
URLs are marked needs_review: true for later processing
The pipeline continues with rules-based classification only

Per-Run Budget Tracking (KV-Based)

For individual runs, the budget.js utility tracks costs in KV:

// In lib/utils/budget.js
export async function checkBudget(kv, runId, endpoint, maxRequests, maxCostUsd) {
  const countKey = `budget:${runId}:${endpoint}:requests`;
  const costKey = `budget:${runId}:${endpoint}:cost_usd`;
  
  const count = parseInt(await kv.get(countKey) || "0");
  const cost = parseFloat(await kv.get(costKey) || "0");
  
  if (maxRequests > 0 && count >= maxRequests) {
    return { ok: false, reason: `request limit exceeded (${count}/${maxRequests})` };
  }
  if (maxCostUsd > 0 && cost >= maxCostUsd) {
    return { ok: false, reason: `cost limit exceeded ($${cost.toFixed(2)}/$${maxCostUsd})` };
  }
  return { ok: true, currentCount: count, currentCost: cost };
}

export async function incrementBudget(kv, runId, endpoint, costEstimate = 0) {
  const countKey = `budget:${runId}:${endpoint}:requests`;
  const costKey = `budget:${runId}:${endpoint}:cost_usd`;
  
  // TTL 90 days for budget window
  const ttl = 60 * 60 * 24 * 90;
  await kv.put(countKey, String(count + 1), { expirationTtl: ttl });
  await kv.put(costKey, String(cost + costEstimate), { expirationTtl: ttl });
}

Workflow Budget Checking

The base workflow class (base.ts) includes budget checking per workflow:

// In workflows/base.ts
protected async checkBudget(
  step: WorkflowStep,
  stepName: string,
  service: string,
  maxCost: number,
): Promise<{ ok: boolean; remaining: number; spent: number }> {
  return await step.do(`${stepName}-budget-check`, async () => {
    const result = await this.env.DB.prepare(`
      SELECT COALESCE(SUM(cost), 0) as total_spent
      FROM api_costs
      WHERE workflow_id = ?
        AND created_at > datetime('now', '-1 hour')
    `).bind(this.instanceId).first();

    const spent = (result?.total_spent as number) || 0;
    const remaining = maxCost - spent;
    return { ok: remaining > 0, remaining, spent };
  });
}

Free Tier Configuration

The system tracks free tier usage across services:

export const FREE_TIER_CONFIG = {
  workers_ai: {
    service: "workers_ai_llm",
    quota_type: "daily",
    free_units: 10000,        // 10K neurons/day free
    unit_name: "neurons",
    cost_per_1k_units: 0.011,
    units_per_request: 54,    // ~54 neurons per LLM call
  },
  zenrows: {
    service: "zenrows",
    quota_type: "monthly",
    free_units: 250000,       // 250K basic credits
    premium_units: 10000,     // 10K premium credits
    unit_name: "credits",
    plan_cost: 69,            // $69/month
  },
  d1: {
    service: "cloudflare_d1",
    quota_type: "daily",
    free_reads: 5000000,      // 5M reads/day
    free_writes: 100000,      // 100K writes/day
  },
  youtube: {
    service: "youtube_api",
    quota_type: "daily",
    free_units: 10000,        // 10K units/day
    unit_name: "quota_units",
  },
  dataforseo: {
    service: "dataforseo",
    quota_type: "none",       // No free tier - pay per use
    free_units: 0,
  },
};

Cost Attribution

Tracking Costs to Workflows

Every trackCost() call includes attribution context:

await trackCost(env, {
  service: COST_SERVICES.DATAFORSEO_BACKLINKS,
  cost_usd: task.cost,           // Actual cost from API response
  run_id: runId,                 // Links to crawl_job_runs
  project_id: projectId,         // Links to project
  task_id: task.id,              // External task reference
  success: true,
  items_returned: response.items?.length || 0,
  source: 'queue',               // Where called from
});

Workflow Cost Tracking

Workflows track costs at the step level:

// In domain-onboard workflow
await trackCost(this.env, {
  service: COST_SERVICES.CF_WORKERS_AI,
  cost_usd: 0.0002,
  run_id: this.instanceId,
  source: 'workflow',
});

Aggregating Run Costs

The updateRunCosts() function aggregates costs from api_costs to crawl_job_runs:

export async function updateRunCosts(env, runId) {
  const costs = await env.DB.prepare(`
    SELECT
      SUM(cost_usd) as total_cost,
      SUM(CASE WHEN service LIKE 'dataforseo%' THEN cost_usd ELSE 0 END) as dataforseo_cost,
      SUM(CASE WHEN service LIKE 'zenrows%' THEN cost_usd ELSE 0 END) as zenrows_cost,
      SUM(CASE WHEN service NOT LIKE 'dataforseo%' AND service NOT LIKE 'zenrows%' 
          THEN cost_usd ELSE 0 END) as other_cost
    FROM api_costs
    WHERE run_id = ?
  `).bind(runId).first();

  await env.DB.prepare(`
    UPDATE crawl_job_runs SET
      total_cost_usd = ?,
      dataforseo_cost = ?,
      zenrows_cost = ?,
      other_cost = ?,
      updated_at = ?
    WHERE id = ?
  `).bind(
    costs?.total_cost || 0,
    costs?.dataforseo_cost || 0,
    costs?.zenrows_cost || 0,
    costs?.other_cost || 0,
    Date.now(),
    runId
  ).run();
}

Cost Queries

Database Schema

The api_costs table stores all cost events:

CREATE TABLE api_costs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  run_id TEXT,                     -- FK to crawl_runs
  project_id TEXT,                 -- FK to projects
  service TEXT NOT NULL,           -- Service identifier
  endpoint TEXT,                   -- Specific endpoint
  cost_usd REAL NOT NULL DEFAULT 0,
  request_count INTEGER DEFAULT 1,
  app_id TEXT,
  category_id TEXT,
  task_id TEXT,                    -- External task ID
  success INTEGER DEFAULT 1,
  items_returned INTEGER,
  created_at INTEGER NOT NULL,
  source TEXT                      -- 'cron', 'api', 'queue', 'webhook'
);

-- Indexes for efficient queries
CREATE INDEX idx_api_costs_run ON api_costs(run_id);
CREATE INDEX idx_api_costs_project ON api_costs(project_id);
CREATE INDEX idx_api_costs_service ON api_costs(service);
CREATE INDEX idx_api_costs_created ON api_costs(created_at DESC);
CREATE INDEX idx_api_costs_service_date ON api_costs(service, created_at);

Common Query Examples

Total Cost Today

SELECT SUM(cost_usd) as total_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', 'start of day') * 1000;

Cost by Service (Last 7 Days)

SELECT
  service,
  SUM(cost_usd) as total_cost,
  COUNT(*) as request_count,
  SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) as success_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY service
ORDER BY total_cost DESC;

Daily Cost Breakdown

SELECT
  date(created_at / 1000, 'unixepoch') as date,
  service,
  SUM(cost_usd) as total_cost,
  COUNT(*) as request_count
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
GROUP BY date, service
ORDER BY date DESC, total_cost DESC;

Cost by Run

SELECT
  run_id,
  SUM(cost_usd) as total_cost,
  COUNT(*) as api_calls,
  SUM(items_returned) as total_items
FROM api_costs
WHERE run_id IS NOT NULL
GROUP BY run_id
ORDER BY total_cost DESC
LIMIT 20;

LLM Daily Usage Check

SELECT SUM(cost_usd) as daily_llm_cost
FROM api_costs
WHERE service = 'cf_workers_ai'
  AND created_at >= strftime('%s', 'now', 'start of day') * 1000;

API Endpoints

The admin API provides cost visibility:

Endpoint	Description
`GET /api/admin/costs`	Overall cost statistics
`GET /api/admin/costs/run/:run_id`	Costs for specific run
`GET /api/admin/costs/project/:project_id`	Costs for specific project
`GET /api/admin/costs/daily`	Daily breakdown
`GET /api/admin/costs/recent`	Recent cost entries (debug)
`GET /api/admin/costs/zenrows`	ZenRows quota tracking
`GET /api/admin/costs/summary`	Comprehensive usage summary
`GET /api/admin/costs/free-tiers`	Free tier configurations

Example API Response

GET /api/admin/costs?minutes=60

{
  "success": true,
  "stats": {
    "all_time": {
      "total_cost_usd": 1234.56,
      "total_requests": 89012,
      "total_runs": 156,
      "total_projects": 12
    },
    "today_cost_usd": 45.67,
    "week_cost_usd": 234.89,
    "month_cost_usd": 890.12,
    "by_service": [
      { "service": "dataforseo_backlinks", "total_cost": 23.45, "requests": 1200 },
      { "service": "cf_workers_ai", "total_cost": 12.34, "requests": 617 },
      { "service": "zenrows_basic", "total_cost": 5.67, "requests": 5670 }
    ]
  }
}

Budget Enforcement

How Budget Checks Prevent Overspending

The budget enforcement happens at multiple levels:

1. Pre-Request Check (LLM)

Before making expensive LLM calls:

const dailyCost = await getDailyServiceCost(env, COST_SERVICES.CF_WORKERS_AI);
if (dailyCost >= MAX_DAILY_LLM_BUDGET_USD) {
  return { skipped: true, skip_reason: "daily_budget_exceeded" };
}

2. Per-Workflow Hourly Budget

Workflows check remaining budget before expensive steps:

const budget = await this.checkBudget(step, "backlinks", "dataforseo", 10.0);
if (!budget.ok) {
  this.log(`Budget exhausted: spent $${budget.spent}`);
  return { skipped: true, reason: "budget_exceeded" };
}

3. Per-Run Limits

Individual runs can have budget caps:

const budgetCheck = await checkBudget(env.DFS_BUDGETS, runId, "backlinks", 1000, 50.0);
if (!budgetCheck.ok) {
  throw new Error(`Run budget exceeded: ${budgetCheck.reason}`);
}

Graceful Degradation

When budgets are exceeded, the system degrades gracefully:

Scenario	Behavior
LLM budget exceeded	Fall back to rules-only classification
DataForSEO budget exceeded	Queue requests for later processing
ZenRows credits exhausted	Use cached data or skip non-critical scraping
Per-run budget exceeded	Complete run with partial results

Cost Optimization

Pipeline Ordering

The classification pipeline is ordered from cheapest to most expensive:

1. Rules Engine (FREE) - Pattern matching, known domains
   ↓ (if confidence < 80%)
2. Vectorize (CHEAP) - ~$0.0001/query for similar URL lookup
   ↓ (if confidence < 70%)
3. Content Parser (MODERATE) - ZenRows fetch + parsing
   ↓ (if still uncertain)
4. LLM Fallback (EXPENSIVE) - ~$0.0002/call, daily budget capped

Caching Strategies

Multiple caching layers reduce costs:

Cache	Purpose	TTL
Known Domains DB	Skip classification for known sites	Permanent
Vectorize	Similar URL classifications	N/A (similarity search)
Content Cache (R2)	Avoid re-fetching pages	30 days
Classification KV	Recent classification results	24 hours

Early Termination

The pipeline terminates early when confidence is sufficient:

// In url-classifier.js
if (classification.confidence >= THRESHOLDS.HIGH_CONFIDENCE) {
  return classification; // Skip remaining stages
}

Batch Operations

Expensive operations are batched:

// DataForSEO batch processing
const batch = urls.slice(0, 100); // Max 100 per batch
const results = await dataforseoInstantPages(batch, env);
// Track cost once for entire batch
await trackCost(env, {
  service: COST_SERVICES.DATAFORSEO_INSTANT_PAGES,
  cost_usd: batch.length * 0.000125,
  request_count: batch.length,
});

Monitoring and Alerts

Real-Time Cost Monitoring

The admin console provides:

Today's Cost - Running total updated with each API call
Cost by Service - Breakdown showing which APIs cost most
Trend Charts - Daily/hourly cost trends over time
Budget Status - Progress toward daily/monthly limits

Setting Up Alerts

While RankDisco doesn't have built-in alerting, you can monitor via:

Cron-Based Budget Checks

// Check costs every hour
export default {
  async scheduled(event, env, ctx) {
    const stats = await getCostStats(env);
    
    if (stats.today_cost_usd > 100) {
      console.error(`ALERT: Daily cost exceeds $100: $${stats.today_cost_usd}`);
      // Send to external alerting service
    }
  }
};

Query-Based Monitoring

-- Find cost anomalies (>2x average)
WITH daily_avg AS (
  SELECT AVG(daily_cost) as avg_cost
  FROM (
    SELECT date(created_at/1000, 'unixepoch') as day, SUM(cost_usd) as daily_cost
    FROM api_costs
    WHERE created_at >= strftime('%s', 'now', '-30 days') * 1000
    GROUP BY day
  )
)
SELECT 
  date(created_at/1000, 'unixepoch') as day,
  SUM(cost_usd) as daily_cost,
  (SELECT avg_cost FROM daily_avg) as avg_cost
FROM api_costs
WHERE created_at >= strftime('%s', 'now', '-7 days') * 1000
GROUP BY day
HAVING daily_cost > 2 * (SELECT avg_cost FROM daily_avg);

Cost Reports

Generate periodic cost reports:

// GET /api/admin/costs/daily?days=30&granularity=day
const report = await getDailyCosts(env, {
  days: 30,
  granularity: "day",
});

// Aggregate by service
const byService = {};
for (const row of report.costs) {
  byService[row.service] = (byService[row.service] || 0) + row.total_cost;
}

console.log("Monthly Cost Summary:");
console.log(`Total: $${Object.values(byService).reduce((a, b) => a + b, 0).toFixed(2)}`);
Object.entries(byService)
  .sort(([,a], [,b]) => b - a)
  .forEach(([service, cost]) => {
    console.log(`  ${service}: $${cost.toFixed(2)}`);
  });

Dashboard Integration

Key metrics for operational dashboards:

Metric	Query
Cost today	`SUM(cost_usd) WHERE created_at >= today_start`
LLM budget remaining	`$50 - SUM(cost_usd) WHERE service='cf_workers_ai' AND today`
ZenRows credits used	`SUM(request_count) WHERE service LIKE 'zenrows%' AND this_month`
Avg cost per domain onboard	`AVG(cost) GROUP BY workflow_type='domain-onboard'`
Cost per 1K URLs classified	`SUM(cost) / COUNT() 1000`

Classification Pipeline - How classification stages work
Workflow System - Workflow execution and tracking
Admin Console - Using the admin dashboard

Cost Tracking Overview​

Why Cost Tracking Matters​

What Gets Tracked​

Service Costs​

COST_SERVICES Constants​

Default Cost Rates​

Budget Caps​

LLM Daily Budget ($50)​

Per-Run Budget Tracking (KV-Based)​

Workflow Budget Checking​

Free Tier Configuration​

Cost Attribution​

Tracking Costs to Workflows​

Workflow Cost Tracking​

Aggregating Run Costs​

Cost Queries​

Database Schema​

Common Query Examples​

Total Cost Today​

Cost by Service (Last 7 Days)​

Daily Cost Breakdown​

Cost by Run​

LLM Daily Usage Check​

API Endpoints​

Example API Response​

Budget Enforcement​

How Budget Checks Prevent Overspending​

1. Pre-Request Check (LLM)​

2. Per-Workflow Hourly Budget​

3. Per-Run Limits​

Graceful Degradation​

Cost Optimization​

Pipeline Ordering​

Caching Strategies​

Early Termination​

Batch Operations​

Monitoring and Alerts​

Real-Time Cost Monitoring​

Setting Up Alerts​

Cron-Based Budget Checks​

Query-Based Monitoring​

Cost Reports​

Dashboard Integration​

Related Documentation​