AI cannot predict markets. Here is what it can do.

A controlled comparison of four systematic investment approaches using identical input data across 2012 – 2026.

Drag the slider to explore any time window — returns rebase automatically from that starting point.

Signal 1
LLM Analyst
Ann. Return
Sharpe
Signal 2
Quantitative Value
Ann. Return
Sharpe
Signal 3
Systematic Signal
Ann. Return
Sharpe
Signal 4
Systematic + LLM Filter
Ann. Return
Sharpe

Why LLMs fail as market predictors — and where they add value

Large language models are trained to find consensus in data — to converge toward what is most probable given everything the model has seen. In markets, prices already encode that consensus. An LLM directed at market data will generate conclusions that are already priced in. Signal 1 demonstrates this empirically: given full access to historical financial data and asked to classify stocks by value, the LLM underperforms even the most basic quantitative factor. This is not a model quality problem. It is a structural one. Markets have no ground truth.

Quantitative systematic signals work because they do not predict direction — they identify persistent statistical patterns that have historically preceded returns. Signal 2 is the raw version of this: a simple value factor constructed from fundamental financial data. Signal 3 applies a proprietary machine learning-based signal enhancement process developed through years of institutional systematic investing experience. The improvement from Signal 2 to Signal 3 reflects the depth of that process — not additional data, but better use of the same data.

Signal 4 uses LLMs in a role they are genuinely suited for: contextual validation. Rather than asking the model to predict returns, we ask it to assess whether the real-world context of a specific stock significantly contradicts what the quantitative signal is saying. Where it does, the position is not taken. The LLM never predicts direction. It provides a contextual validity layer that the quantitative model, by construction, cannot provide for itself. The result is a cleaner, more robust signal — and the performance difference is the measurable value of that layer.


The Four Signals

Signal 1
LLM Analyst

A large language model was given complete historical fundamental financial data for each stock up to the rebalance date and asked to classify each stock as high or low value on a scale of 1 to 5. Long positions were taken in stocks scored high value; short positions in stocks scored low value. The LLM had access to the same data as the quantitative signals — no more, no less.

−1.02%
Ann. Return
−0.301
Sharpe
−15.19%
Max Drawdown
131 / 104
Avg Long / Short
Signal 2
Quantitative Value

A standard quantitative value factor constructed from fundamental financial data. Stocks are ranked by earnings yield (earnings per share divided by price) — the most direct measure of how cheaply a stock's earnings can be purchased. Long the cheapest stocks, short the most expensive. No processing or enhancement applied. This is the baseline systematic approach.

+6.62%
Ann. Return
0.903
Sharpe
−11.37%
Max Drawdown
100 / 185
Avg Long / Short
Signal 3
Systematic Signal

Signal 2 enhanced through a proprietary systematic process developed through institutional investing experience. The process improves signal quality without additional data inputs — all enhancement is applied to the same fundamental fields used in Signal 2. The specific methodology is proprietary.

Same input data as all other signals. No additional data sources.

+9.61%
Ann. Return
1.179
Sharpe
−12.30%
Max Drawdown
86 / 161
Avg Long / Short
Signal 4
Systematic + LLM Filter

Signal 3 with a qualitative LLM filter applied at each rebalance. For each position the quantitative signal wants to take, a large language model assesses whether the fundamental context of that stock significantly contradicts the signal direction. Positions where the LLM identifies a significant contradiction are not taken. The LLM does not predict returns — it provides a contextual validity check.

+10.38%
Ann. Return
1.241
Sharpe
−11.78%
Max Drawdown
83 / 160
Avg Long / Short

Methodology

UniverseS&P 500 constituents (fixed as of January 2020)
Backtest periodJanuary 2012 — January 2026
Signal update frequencyDaily (updates on each new SEC filing)
Input dataFundamental financial statement data only
All signals useIdentical input data — no signal has an informational advantage
Long-short structureDollar-neutral: gross long = +1.0, gross short = −1.0
Point-in-time integritySEC EDGAR filing dates used as data availability anchors
LLM modelClaude (Sonnet 4.6)

All signals use identical fundamental input data. No signal has an informational advantage over any other. The only variable is methodology.

Data sourced from SEC EDGAR financial statement filings. Filing dates used as point-in-time anchors to ensure no forward-looking information is used in signal construction.


The signals on this page were constructed by an LLM operating within our systematic framework, given constraints on data inputs and complexity. No manual signal engineering. The research you are looking at is itself a demonstration of the approach.