Strategy OS

Case Study #001 — Strategy Autopsy

I Built an Insider-Buying Trading Bot. Paper Returned +64%. Live Went 0-of-5. Here's the Audit.

The signal may not be dead. The naive implementation is.

Published 2026-05-24Insider cluster buyingUS equities~20 min read

The 60-second version

TL;DR

  • Built EC_V1 — a fully-systematic equity bot that buys clusters of insider open-market purchases on US stocks. Public methodology, three-window backtest, fixed parameters.
  • Paper run with $100k notional, 4 simultaneous positions, hold 10 trading days: equity went from $100,000 to $164,034 over six weeks. +64.03% return. Max drawdown 3.48%. It looked great.
  • Live deployment with $965 cash account, $10 max risk per trade, max 2 positions: 5 of 5 closed trades lost. ~$58 realized losses on a 5-week window. It looked terrible.
  • Most of that gap is implementation, not signal. Sizing was 144× smaller. Hold period was 21 days in live vs 10 in paper. Live had an early-failure exit that paper didn't. The IBKR Gateway died twice mid-trade. The account was cash, not margin, which blocked the time-exit logic entirely.
  • After the apples-to-apples cleanup, both paper and live show the same flat-to-negative performance on 2026 forward data. The strategy worked from 2018–2024 (validated out-of-sample with t-statistic 2.24 on 169 trades), then decayed in 2025 H1 and never fully recovered.
  • Honest level of claim: insider-cluster buying as a naive, unsegmented buy-the-event signal looks fragile in 2025–2026. It is notproven to be inverted. The data we have supports “edge is conditional” far better than “cluster buying is now anti-signal.”
  • The most useful artifact from this experiment is not the strategy. It's the audit framework that exposed how a paper backtest could be off by orders of magnitude from the live result. That audit framework is now what Strategy OS is built around.

1. What I built

EC_V1: an insider-cluster trading bot

The premise of EC_V1 is older than the bot. Cohen, Malloy, and Pomorski (J. Finance, 2012) showed that “opportunistic” insider purchases earned about 82 basis points per month in value-weighted abnormal returns, while “routine” insider purchases earned essentially zero. Kang, Kim, and Wang (2018) extended this with what they called “cluster purchases” — cases where two or more insiders bought on the open market within a short window — and found that such clusters produced roughly 3.8% abnormal return at the 21-trading-day horizon.

EC_V1 is the most boring, most disciplined possible version of that idea. Six explicit rules:

  1. 1. Source the event from SEC Form 4 filings. Open-market buys only. Exclude option exercises, grants, and every other transaction code that is not someone actively writing a check.
  2. 2. Require at least $100,000 in open-market-buy dollars on the candidate ticker, on the candidate day.
  3. 3. Require a cluster: at least two distinct insiders buying open-market on the same ticker within a rolling seven-trading-day window, totaling at least $250,000.
  4. 4. Exclude any transaction flagged as a Rule 10b5-1 plan. Pre-planned, calendar-driven buys are information-empty by construction.
  5. 5. Equal-weight position sizing. Up to four open positions at a time. No leverage, no shorts.
  6. 6.Hold every position for exactly ten trading days. Exit at the open. That's the entire exit policy.

Cost model: 7.5 basis points fees per side, 7.5 basis points slippage per side. Total round-trip friction: 30 bps. Realistic for institutional execution on liquid US names. Not realistic for a retail trader with $1,000 in a cash account.

The full filter ablation including parameters dropped and parameters kept is published at /methodology. Everything below is derived from that same data set, audited before publication.

2. What paper did

+64.03% in six weeks. Max drawdown 3.48%.

The paper run started 2026-04-08 with a notional starting cash balance of $100,000, position size around $25,000 per trade, max four positions. By 2026-05-22 the simulated equity stood at $164,033.84, a return of 64.03% on the period. There were 50 closed trades. The win rate was 72%. The average per-trade return was +5.55%. The biggest winner was GO at +35.53%. The biggest loser was CHTR at −15.35%.

Start equity

$100,000.00

End equity

$164,033.84

Period return

+64.03%

Max drawdown

3.48%

Closed trades

50

Win rate

72.0%

Avg per-trade return

+5.55%

Hold (configured)

10 trading days

You would publish that on a landing page. People did. We did. The problem is that almost none of it is what the strategy was actually testing.

3. What live did

Zero wins on five closed trades. ~$58 in realized losses.

The live run is on a real IBKR cash account, account number U25173122. Starting NetLiq around $1,000. Bot tick 60 seconds. Signals come from the same EC_V1 paper pipeline. Telegram approval required before each entry. Server-side GTC stop placed at −10% of fill price after each fill.

Between 2026-04-27 and 2026-05-23 the bot received fourteen candidate signals, executed six BUYs, closed five of them at a loss, and held one (GEHC) open with a small unrealized gain. The other eight candidates were skipped before they ever became orders. The reasons matter as much as the outcomes.

OrderTickerEntrySharesNotionalOutcomeRealized $
ord43CBZ2026-04-276$184stop, exec event lost during outage−$2.38
ord44CHTR2026-04-291$174reconciled after Gateway outage−$15.43
ord45GEHC2026-05-013$180open, time-exit blocked by cash-account rule+$11.51 unrlz
ord48OPCH2026-05-0521$455stop hit during Gateway outage, exec event missed−$35.33
ord53UPST2026-05-143$87early-failure exit at −3.10% on day 2−$2.88
ord56VVV2026-05-182$66early-failure exit at −3.13% on day 2−$2.39

Total realized: −$58.41. Plus one open winner at roughly +$11.50 unrealized. On a thousand-dollar account, that is a 5-6% drawdown in three weeks with a near-100% loss rate. It is the kind of result that, taken at face value, ends the project.

On top of that, the bot rejected or skipped more signals than it took: three were declared order_unbuildable because $10 of risk could not cover even one share of a $30+ stock with a 10% stop; three more were skipped because the account already held the maximum of two open positions; one was skipped because the operator clicked Skip; two were skipped because the Gateway was disconnected. None of those skips show up in the realized P&L. All of them are part of the actual behavior of the system.

4. Why the comparison is unfair

Paper and live were not the same strategy

Most retail systematic traders never reach this part of the analysis, because by here the obvious conclusion is “the strategy died.” That conclusion would be wrong, or at least premature. The paper run and the live run differ in at least seven specific ways. Every one of those differences pushes the live result in the same direction, which is down.

ParameterPaperLiveNet effect on live
Notional per trade~$25,000$66–$455Fixed costs become catastrophic %
Max concurrent positions42Half the signals never enter
Hold period10 trading days21 trading daysDifferent strategy entirely
Early-failure exitnone−3% at day 2 = forced sellCuts reversals that would have won
Max risk per tradenot capped$10 hard capStocks >$30 = unbuildable
Account typeunconstrainedcash, not marginBlocks time-exit logic (IBKR Error 201)
Operational reliability100% uptimeGateway died twice mid-tradeLost execution events, manual reconcile

The first row is the headline one. Paper said GO returned +35.5% and recovered $8,800 on a $25,000 position. Live, on the same signal at the same time, would have returned +35.5% on a $200 position and recovered $70 in P&L — before any commission. The percentages are the same. The dollars are meaningless at this scale.

The fourth row is the most operationally important. The live policy added an “early-failure exit” that forces a sell if a position is down 3% after two trading days. The motivation is sensible — cap tail risk early. But the consequence is that the strategy in live is no longer the strategy that was tested in paper. UPST and VVV are the two trades where this kicked in. Both were sold at the cap. Whether or not either would have recovered to a positive return by day 10 is unknowable from the data we have, but the ablation showed the underlying strategy has a 56-60% win rate at the 10-day horizon, which strongly suggests that some non-zero fraction of early-failure exits are cutting eventual winners.

The sixth row is the one that should have been caught in pre-flight. Cash-account rules at IBKR prevent the algorithm from issuing market sells that could leave a momentary short position during settlement. The bot's time-exit logic was rejected by the broker with Error 201 for GEHC. The position simply continued to be held. By the time the audit completed, the trade was 23 days into a strategy that was tested on 10-day holds.

5. What the data says about the strategy itself

Out-of-sample edge, then decay, then a small recovery

Setting aside the live failure for a moment: did EC_V1 ever have a real edge? The answer the data gives is “yes, on three non-overlapping windows ending in 2025, and the edge degraded meaningfully in 2025 and stayed weak through the first half of 2026.”

Three-window ablation (cost-adjusted, fixed parameters)

WindowPhaseTradesExpectancyExcess vs SPYWin ratet-stat
2018-01-01 → 2020-10-20In-sample94+3.65%+4.29%62.8%+2.78
2020-10-21 → 2023-04-08In-sample170+2.08%+2.56%56.5%+2.44
2023-04-09 → 2025-12-31Out-of-sample169+2.73%+2.28%56.8%+2.24

All three windows: layer cluster_omb_total_250k, hold 10 trading days, equal-weight, four positions max, 30 bps round-trip cost. The OOS t-stat of 2.24 crosses the conventional 2.0 significance threshold.

That is real. Three different market regimes — late-cycle rally, COVID shock and recovery, post-2022 rate cycle — all produced positive expectancy after costs, in the same direction, with consistent win rates around 56-63%. A curve-fit result typically collapses under a regime change. This one did not.

Then 2025 happened. The OOS window aggregates a strong 2024 with a much worse 2025, and the aggregate hides the inflection. Broken down by half-year:

PeriodTradesWin rateAvg / tradeCumulative $
2023 H2 (Apr–Dec)4656.5%+3.89%+$44,710
2024 H13562.9%+4.21%+$36,865
2024 H22766.7%+6.98%+$47,086
2025 H12339.1%−4.94%−$28,389
2025 H23855.3%+1.60%+$15,210
2026 Q1–Q2 (post-OOS)2343.5%−1.10%−$6,305

The 2024 peak was real. The 2025 H1 collapse was real. The 2025 H2 partial recovery was real. The 2026 Q1–Q2 sample is too small to call — 23 trades, t-statistic −0.13, statistically indistinguishable from zero edge — but it is consistent with the trajectory of a strategy whose alpha is fragile in the current regime.

A second observation: most of the 2025 H1 drawdown came from a handful of biotech blowups (FLNC, IMNM, SLDB, BHVN, plus one non-biotech tail in SBET). Insider buying ahead of a failed clinical trial is not a contrarian signal; it is a CEO with cap-table exposure trying to defend a position. The same cluster filter that produced edge from 2018 through 2024 happened to concentrate exposure in this kind of name in early 2025. Whether the strategy fundamentally degraded or whether 2025 H1 was a single concentrated bad regime is, on this sample, an honest open question.

6. What this audit supports, and what it doesn't

The claim ladder

It would be very convenient to write a clickable thesis here. It would also be wrong. The data we collected supports some claims cleanly and refuses to support others. We are publishing both.

Level 1 — proven

This specific live deployment of EC_V1 lost on five of five closed trades.

Level 2 — strongly supported

Most of the paper-to-live gap is implementation, not signal: sizing, exit policy, account constraints, and operational outages all push the same way.

Level 3 — likely supportable

Naive insider-cluster buying as a buy-the-event signal is fragile and conditional in 2025–2026. The earlier decade's edge does not appear to extend cleanly to the recent regime, at least not without further conditioning.

Level 4 — not yet supportable

“Insider cluster buying has inverted into an anti-signal.” The data we have is consistent with this hypothesis in 2025 H1 only. The 2025 H2 recovery and the near-zero 2026 sample do not let us cross the line into a structural inversion claim. We will not write that headline until we have substantially more out-of-sample evidence.

The press-friendly version of this story is “the magic signal died.” That is at level 4. The honest version is level 2 and 3. Both can be true at the same time: the signal can be fragile and the naive live implementation can be the primary culprit for this specific failure.

7. What Strategy OS becomes after this

The strategy is not the product. The audit is.

Strategy OS was originally pitched as a delivery pipeline for insider event signals. After running EC_V1 through the full backtest → paper → live cycle and auditing the gaps between those three stages, it became uncomfortably clear that the most reproducibly valuable artifact from this project is not the strategy. It is the machinery that made the gap visible.

Strategy OS is being repositioned around that machinery. The product is no longer “subscribe to our insider signals.” It is:

Audit your own backtest

Upload trade records and equity curves from your existing backtest. Get a reproducibility report: cost realism, sizing realism, slippage assumptions vs typical execution.

Compare paper to live

Side-by-side trade-level comparison of what your simulator said vs what your broker actually did. Quantified divergence, attributed by cause.

Alpha decay monitoring

Rolling expectancy and t-statistic on your strategy across recent windows. Early warning when an edge is fading rather than after the drawdown.

Readiness gates before live

Explicit pass/fail gates on cost model, sample size, regime coverage, paper-to-live consistency before any real capital goes in. The gate EC_V1 should have had.

EC_V1 is now the first published case study on this platform. The next case studies will not be ours.

If you have a strategy that looks good in backtest

Bring it. We will run the same audit on it that we just ran on ours. Same honesty, same data, same claim ladder. We are opening a small number of audit pilots while the pivot lands. Pricing and intake will be published shortly; in the meantime, the application form below routes directly to the audit cohort.