Strategy OS

Methodology & Evidence

How the event-cluster workflow is built, tested, and what it does not claim.

This page describes the research methodology behind the signals delivered to Founding Pilot participants. It shows backtested results across three distinct market windows, the filter decisions that shaped the current workflow, and the limitations we consider important for a disciplined trader to know before engaging.

All performance figures below derive from historical simulations with transaction cost assumptions (7.5 bps fees + 7.5 bps slippage per side). Past simulated results do not guarantee future outcomes. Individual trades can and do lose, including during periods when the aggregate edge is positive.

The question

Can disciplined insider-cluster filtering produce a reproducible edge?

The research starting point is the public literature that suggests open-market insider purchases, especially clustered purchases by multiple insiders, can carry short-term informational value. The question this work tries to answer is not whether such edges exist in academic samples, but whether a specific filter set, applied with fixed parameters, over multiple regimes, survives out-of-sample testing after realistic trading costs.

The strategy

event_cluster_v1: scope and rules

Universe

US equities with normalized SEC Form 4 insider transactions, excluding warrants, rights, units, preferred shares, and 10b5-1 plan filings.

Signal definition

Issuer-day events where a cluster of insiders accumulated open-market buys within a 7-trading-day window: cluster OMB ≥ $250k and 2+ unique buyers.

Position sizing

Equal-weight across up to 4 concurrent positions. No leverage. No shorts. 10 trading-day holding period.

Cost assumptions

7.5 bps fees per side + 7.5 bps slippage per side, applied on entry and exit. Total round-trip friction: 30 bps.

The test design

Three non-overlapping windows, one fixed parameter set

The filter parameters were selected on the first two windows (in-sample), then applied unchanged to the third window (out-of-sample). The third window covers a distinct market regime from the first two, which is the critical test: if the parameters were overfit, the out-of-sample window would show meaningful degradation or a sign change in the excess expectancy.

WindowPhaseTradesExpectancyExcess vs SPYWinrateMax DD
2018-01-01 → 2020-10-20In-sample943.65%4.29%62.8%$41.2k
2020-10-21 → 2023-04-08In-sample1702.08%2.56%56.5%$21.2k
2023-04-09 → 2025-12-31Out-of-sample1692.73%2.28%56.8%$39.3k

Trades = number of closed positions over the window. Expectancy = mean per-trade return after costs. Excess vs SPY = expectancy minus same-period SPY return over the same holding window. Max DD = maximum drawdown on a $100k starting capital, equal-weight, 4 max positions.

The key finding

Cross-regime consistency, statistical significance on OOS

The out-of-sample window produced 169 closed trades, +2.73% mean expectancy after costs, and +2.28% excess over SPY over the same holding windows. The t-statistic on excess expectancy reached 2.24, which crosses the conventional 2.0 significance threshold for a one-tailed test.

Equally important: the sign and approximate magnitude of the edge are consistent across all three windows, which covered distinct market regimes (late-cycle rally, COVID shock + recovery, 2023-2025 rate regime). A curve-fit result typically collapses under a regime change; this set did not.

Important caveat. Statistical significance on 169 trades is not proof of a persistent edge — it is evidence against the null hypothesis of no edge in this specific sample. The workflow is delivered with this uncertainty explicit. Pilot closeout reports include the per-trade distribution, not only the aggregate.

Filter evolution (transparency)

What we dropped and why

An earlier version of the filter required a CEO or CFO to participate in the cluster, and also required a minimum routine-versus-opportunistic score. Applied to the OOS window, the full ablation showed:

Filter layerTradesExpectancyExcessMax DDt-stat
base_omb_100k2541.63%1.21%$28.5k1.87
cluster_unique_buyers_2+1803.17%2.71%$46.7k2.44
cluster_omb_total_250kcurrent1692.73%2.28%$39.3k2.24
cluster_ceo_cfo1273.19%2.57%$26.5k2.04

The CEO/CFO layer produced a slightly higher mean expectancy (3.19%) but cut trade count by 25% and did not improve the t-statistic materially. The chosen layer trades a marginal reduction in per-trade expectancy for broader applicability and more stable sample size. This is a judgment call, stated explicitly here so a pilot participant can disagree.

Limitations

What this research does not show

  • The samples are in the hundreds of trades, not thousands. A true rare-regime event (2008, March 2020) within the sample would shift tail measurements non-trivially.
  • The cost model (30 bps round-trip) is reasonable for liquid names at institutional order sizes. Retail execution on thin names can realize worse slippage and would degrade net expectancy.
  • The workflow does not handle earnings-gap overlay, sector concentration constraints, or correlation caps. A trader wanting those constraints should apply them as a layer on top.
  • Drawdown on the OOS window reached $39k on a $100k paper base (≈40% peak-to-trough on the equity curve for the selected layer). This is structural to the strategy, not a rare event.
  • Results are for an equal-weight, 4-position portfolio. Different sizing rules will produce different risk profiles.

What the Founding Pilot includes

Research output, delivered with context

The pilot delivers daily signals from this workflow, with AI-generated context and risk notes, a weekly review, and a closeout report at 6 weeks. It does not deliver personalized allocation, position sizing instructions, or automated execution. Pilot participants use the output within their own process and judgment.