The evidence

Two kinds of evidence matter here: the research showing that founding-team failure has measurable structure, and the honest status of this instrument's own validation. Both belong on the same page, because an assessment product that shows you its sources but hides its own status isn't practising what it measures.

The research on team failure

65%

of failures within venture portfolios were attributed by investors to problems within the management team.

SOURCE: SURVEY OF VENTURE INVESTORS, CITED IN WASSERMAN, THE FOUNDER'S DILEMMAS (2012). AN ATTRIBUTION SURVEY — INVESTOR JUDGMENT, NOT MEASURED CAUSATION. WE QUOTE IT AS EXACTLY THAT.

+28.6%

increase in the hazard of a co-founder leaving, per additional prior social relationship within the founding team. Teams of friends proved less stable than teams of strangers.

SOURCE: WASSERMAN'S DATASET OF THOUSANDS OF US STARTUPS, THE FOUNDER'S DILEMMAS (2012).

73%

of founding teams split equity within a month of founding — before contributions are knowable. A third split equally; most equal splitters settled it in a day or less.

SOURCE: WASSERMAN (2012). SPEED OF AGREEMENT AS A PROXY FOR AVOIDED NEGOTIATION.

>50%

of startups had replaced their founder-CEO by the third financing round; 62% of those successions were initiated by the board rather than the founder.

SOURCE: WASSERMAN (2012).

57 / 43

the approximate split of US founders motivated primarily by control versus primarily by wealth — the "Rich versus King" trade-off. Both are legitimate; inconsistency between motive and decisions is where the damage concentrates.

SOURCE: WASSERMAN (2012). THE INSTRUMENT MEASURES THIS DISPOSITION BEHAVIOURALLY, PER FOUNDER.

This instrument's own status — stated plainly

VenturePressure's scoring is deterministic and fully inspectable: fixed rules, versioned algorithms, reproducible scores, evidence trails on every finding. What it is not yet is a statistically validated predictive psychometric — and it will not describe itself as one until the data says otherwise. The ladder, and where we stand on it:

Structured practitioner judgment — explicit rules, thresholds set by expertise, every report human-reviewed ◄ WE ARE HERE
Provisional operating bands — thresholds refined against 30–50 respondents
Reliability and item analysis — internal consistency and discrimination statistics published
Empirical cut-points — distribution-based thresholds from a sufficient sample
Outcome tracking — configurations followed longitudinally against what actually happened

Every engagement moves the instrument up this ladder — pilot teams are told this explicitly, and it's why pilot pricing exists. In the meantime, the instrument's honesty mechanisms don't wait for statistics: signal-quality flags on every session, confidence levels on every finding, and a claim boundary on every report. If a finding doesn't survive your inspection at the debrief, it comes out of the report.

Why show you this?

Because the alternative is the industry standard: four-colour frameworks with validation studies nobody reads, sold with certainty nobody's earned. An instrument built on the premise that founding teams break through unexamined structure shouldn't itself have unexamined structure. Here's ours.