Auracle docs
Auracle/Docs/The Forge

The Forge

Feed it provider API keys. Get back a ranked catalog of the datasets actually available to you, plus a slate of strategy candidates built on top of them.

Overview

Most quants spend more time hunting for usable data than they spend on alpha. Provider APIs have inconsistent coverage, undocumented gotchas, and asymmetric quality across asset classes. The Forge encodes that exploration as a pipeline of cooperating LLM agents, each with a narrow job.

The pipeline runs once per scan and produces three artifacts:

  1. A catalog of datasets discovered across your active providers, each scored 0–100 on six quality dimensions and assigned a verdict.
  2. A list of research plans — hypotheses the Planner agent proposed by combining your useful datasets.
  3. A set of strategy candidates, each one a runnable backtest with metrics.

None of this auto-deploys. The whole point is to surface options for you to review.

Agents

Four cooperating agents, each with a narrow job.

AgentWhat it does
Discovery Classifies your API keys against the bundled registry (14 providers). Enumerates each provider's capabilities into the catalog.
Probe Calls each provider for a small sample and scores it on the six quality dimensions. The only agent that talks to external APIs.
Planner Reads the catalog and proposes research plans — falsifiable hypotheses that combine 1–3 useful datasets.
Synthesis Codes one research plan as a runnable strategy, backtests it, and parks the result for your review. The only agent that writes executable code.

Catalog

The catalog is a per-install table of every dataset the Forge knows about. Each row is a (provider, capability, asset_class) triple with quality scores, a verdict, and links to provenance.

The six quality dimensions

Each dimension is scored 0–100 by the Probe agent.

DimensionQuestion it answers
coverageHow much of the universe (symbols, time range) is actually present?
freshnessHow recent is the most recent point? How long is the publish lag?
completenessWhat fraction of expected fields are non-null?
uniquenessDoes this dataset duplicate something else in the catalog?
stabilityDoes historical data get revised? Is the schema stable across releases?
trading_relevanceIs this data plausibly tradable (vs. illustrative or after-the-fact)?

The aggregate score

The six dimensions are combined as a geometric mean, not arithmetic. The architectural guarantee: a zero in any single dimension clamps the aggregate to ~zero. A polygon endpoint that's 90/100 on five dimensions but 0/100 on coverage is not "average" — it's unusable, and the verdict reflects that.

# From auracle/forge/catalog.py
# Floor the zeroes at 1 so log() is defined.
floors = [max(d, 1) for d in dims]
score  = math.exp(sum(math.log(d) for d in floors) / len(floors))
# Score == 1 if any dimension is 0; ~90 if all are 90.

Verdict thresholds

The aggregate score maps to a verdict by these thresholds:

ScoreVerdictWhat it means
≥ 70usefulPass straight into the Planner's input pool.
≥ 40marginalPlanner may use it with a caveat; backfill or supplement first.
≥ 15poorExcluded from Planner. Visible in the catalog so you know it exists.
< 15irrelevantEffectively hidden. Catalog entry kept for audit only.

Research

A research plan is a structured hypothesis. The Planner emits them as JSON in forge_research_plans.plan_json; you can also write them by hand.

{
  "hypothesis": "OI divergence on Coinalyze precedes 4h mean reversion in BTC-USD",
  "inputs": [
    { "dataset_id": 12, "alias": "oi"   },
    { "dataset_id":  7, "alias": "bars" }
  ],
  "feature_spec": {
    "oi_change_24h": "oi.delta(24h) / oi.rolling(7d).mean()",
    "ret_4h":        "bars.close.pct_change(4h)"
  },
  "label_spec":   { "horizon": "4h", "kind": "log_return" },
  "evaluation": {
    "walk_forward": { "train": "12mo", "test": "1mo", "step": "1mo" },
    "costs":  { "bps": 5 },
    "metrics": ["sharpe", "max_dd", "hit_rate"]
  }
}

Plans can be queued, paused, retried, or rejected. The Planner won't re-emit a plan it already proposed in the past 30 days (content-hash dedupe).

Candidates

A strategy candidate is the output of running Synthesis on one plan. It lives in forge_strategy_candidates with:

In the web UI, candidates appear in a table sorted by Sharpe. Clicking one opens the candidate review surface — equity curve, trade list, generated code, and a Promote button. Promotion copies the file out of _forge/ into your active strategies directory; only then can it be deployed.

Important Candidates are never automatically deployed. There is no path from "Synthesis produced a good-looking backtest" to "this is now running against real money" that doesn't go through an explicit human promotion step.

Limits & audit

The Forge spends money (LLM tokens) and writes code (synthesized strategies). Both are bounded.

Cost caps

Hitting any cap pauses the scan; the partial catalog is preserved.

Audit trail

forge_agent_runs records every agent invocation: which agent, which dataset, input fingerprint, token usage, wall-clock time, output fingerprint, and the install_uuid. Replay a scan by re-running the same input fingerprints; the output should hash-match (modulo non-determinism flagged at the LLM call boundary).

Promotion gate

Generated strategies are sandboxed and never auto-deployed. You review the code, the backtest, and the metrics before Promote moves the file into your active strategies directory; only then can it be deployed live.

Costs

Default budgets per agent (configurable per install).

AgentPer-call capPer-scan capModel
Discovery4k input / 1k output$0.10Claude Haiku
Probe2k input / 500 output$0.50 / datasetClaude Haiku
Planner16k input / 4k output$1.50Claude Sonnet
Synthesis32k input / 8k output$2.50 / candidateClaude Sonnet

Cost is logged in forge_agent_runs.cost_usd. Aggregate with a direct SQL query (SELECT SUM(cost_usd) FROM forge_agent_runs WHERE created_at > now() - interval '30 days') until the auracle forge costs CLI ships in v1.1.