Proteus: Prediction Markets Scored by Edit Distance

Sean McDonald with many AI models

February 2026

tl;dr

Roleplay as a public figure, predict the exact text of their next tweet, and get scored by Levenshtein edit distance — closest prediction wins the pool. Unlike binary prediction markets that encode 1 bit (yes/no), every character of precision matters here. The market gets deeper as AI improves because each remaining edit becomes more valuable.

The Idea

Binary prediction markets ask: will X happen? The answer is yes or no — 1 bit of information. When two participants both predict correctly, neither has an edge over the other.

Proteus asks a different question: what exactly will this person say?

Participants predict the exact text a public figure will post on X (Twitter). Predictions are scored by Levenshtein edit distance — the minimum number of single-character insertions, deletions, or substitutions needed to transform the prediction into the actual post. The closest prediction wins the pool.

This changes the game in a fundamental way. In a binary market, “roughly right” and “exactly right” pay the same. In a Levenshtein market, a prediction that’s off by 1 character beats one that’s off by 8 characters. The payoff is a gradient, not a cliff.

Why this matters as AI improves. Binary markets commoditize when every model converges on the same probability estimate — the spread vanishes and there’s nothing left to win. Text prediction markets do the opposite: as AI models get better and edit distances shrink from 100 to 10 to 1, each remaining edit becomes worth more of the pool, not less. The approaching AI capability explosion doesn’t flatten this market. It deepens it.

Why X (Twitter)

The scoring mechanism needs a text source that is:

Public — anyone can verify the actual text
Timestamped — unambiguous ordering for resolution
Short — 280 characters keeps on-chain computation feasible
Attributable — tied to a specific, verified account

X is the only major platform where all four properties hold simultaneously. Instagram is image-first. LinkedIn posts are long-form. Threads lacks the cultural footprint. TikTok is video. X posts are short, public, timestamped, and attributable — exactly what an on-chain scoring function needs.

Why Levenshtein Distance

Levenshtein distance (Levenshtein, 1966) counts the minimum number of single-character edits (insertions, deletions, substitutions) to transform one string into another. It is a proper metric:

Identity: d_L(a, b) = 0 if and only if a = b (perfect prediction is unambiguous)
Symmetry: d_L(a, b) = d_L(b, a) (scoring doesn’t depend on comparison order)
Triangle inequality: d_L(a, c) ≤ d_L(a, b) + d_L(b, c) (distances are coherent)

These properties matter for market design. Identity means a perfect prediction is uniquely identifiable. Symmetry means the scoring is fair. The triangle inequality means “close to close” can’t simultaneously be “far” — the geometry is well-behaved.

The metric also provides anti-spam for free. A random 280-character string will be at near-maximum edit distance from any real tweet. There’s no way to game the system with garbage input — the metric itself is the spam filter.

One Example

Note: This example is constructed and illustrative, not real data. It demonstrates why Levenshtein scoring captures information that binary markets cannot.

Market: What will @sataborasu (Satya Nadella) post?

Actual text: Copilot is now generating 46% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.

Submitter	Predicted Text	d_L
Claude (roleplay)	`Copilot is now generating 45% of all new code at GitHub-connected enterprises. The AI transformation of software is just beginning.`	1
GPT (roleplay)	`Copilot is now generating 43% of all new code at GitHub-connected enterprises. The AI transformation of software has just begun.`	8

Claude’s prediction differs by a single character: 5 → 6. d_L = 1.

GPT’s prediction has a different number (43% vs 46%) and a phrasing difference (has just begun vs is just beginning). d_L = 8.

In a binary market, both models “predicted correctly” — Nadella posted about Copilot code generation, which both anticipated. A binary contract sees no difference between them.

In a Levenshtein market, Claude wins the entire pool. The 7-edit gap between d_L = 1 and d_L = 8 is the margin of victory. Every character of precision was worth money.

This is the core thesis: binary markets collapse two meaningfully different predictions into the same outcome. Levenshtein distance preserves the gradient.

How It Works

Contract Design

The prototype is a Solidity smart contract (PredictionMarketV2) deployed on BASE Sepolia (Coinbase L2, OP Stack).

Flow: 1. A market is created for a specific X handle and time window 2. Participants submit predictions (up to 280 characters) with a stake of at least 0.001 ETH 3. Submissions close 1 hour before the market end time 4. After the window closes, the market creator resolves by providing the actual text 5. The contract computes Levenshtein distance on-chain for each submission 6. Closest prediction wins the pool (minus 7% platform fee)

Key parameters:

Parameter	Value
`PLATFORM_FEE_BPS`	700 (7%)
`MIN_BET`	0.001 ETH
`BETTING_CUTOFF`	1 hour before market end
`MIN_SUBMISSIONS`	2 (single submission gets full refund)
`MAX_TEXT_LENGTH`	280 characters

The Null Sentinel

To bet that someone won’t post, submit __NULL__. If resolution also uses __NULL__ (nobody posted), d_L = 0 — a perfect match.

This creates a market primitive that binary contracts can’t express: betting on silence. AI roleplay models always generate text; they can’t predict inaction. A human trader who recognizes someone is unlikely to post can exploit this.

Winner-Take-All Payout

fee = floor(totalPool × 700 / 10000)   // 7% platform fee
payout = totalPool − fee               // 93% to winner
winner = argmin(d_L(prediction, actualText))

Ties are broken by earliest submission timestamp. The fee is collected via a pull pattern to prevent griefing.

Gas Costs

On-chain Levenshtein computation at 280 characters costs ~9,000,000 gas on BASE L2 — feasible given BASE’s higher block gas limits and low transaction fees. The 280-character cap (matching tweet length) doubles as a gas ceiling.

Open Problems

These are unsolved. We’re not hand-waving mitigations — these are real issues that need real solutions before any mainnet deployment.

Self-oracle attack. A participant creates a market for their own account, submits a prediction, then posts exactly that text. d_L = 0, guaranteed win. The MIN_SUBMISSIONS = 2 requirement means they need at least one other participant, but the economic incentive is strong and pseudonymous identities make detection hard.

Centralized resolution. The market creator currently provides the resolution text via a single EOA (externally owned account). This is a single point of trust. A commit-reveal oracle with staking and slashing would be better, but doesn’t exist yet.

No live validation. The contract has no way to verify that the resolution text actually matches what was posted on X. Resolution depends entirely on the market creator’s honesty. An oracle bridge to X’s API (or screenshot proof via IPFS) is needed but not built.

Insider information. Someone with advance knowledge of what a public figure will post (a ghostwriter, PR team member, or someone who saw a draft) has an overwhelming advantage. This isn’t necessarily a bug — prediction markets generally reward information — but the edge is so large (d_L = 0 is trivially achievable) that it may deter participation.

Gas costs at scale. 9M gas per resolution works for a prototype. At scale with hundreds of submissions per market, the resolution transaction could exceed block gas limits. Batched resolution or off-chain computation with on-chain verification (e.g., ZK proofs) may be needed.

AI-induced behavior modification. If a public figure knows a prediction market exists for their posts, they may deliberately change their wording to invalidate predictions. This could make markets less useful or, conversely, create a meta-game that some find entertaining.

What Exists

This is a prototype. It has not been audited, is not on mainnet, and is not ready for real money.

What’s built: - ~513 lines of Solidity implementing PredictionMarketV2 with on-chain Levenshtein computation - 259+ passing tests (109 contract, 135 unit, 15 integration) - Deployed on BASE Sepolia (Coinbase L2 testnet) - Flask web application for market creation and participation

The code is open source: github.com/timepointai/proteus

References

Levenshtein, V. I. (1966). “Binary codes capable of correcting deletions, insertions, and reversals.” Soviet Physics Doklady, 10(8), 707-710.
Wagner, R. A., & Fischer, M. J. (1974). “The string-to-string correction problem.” Journal of the ACM, 21(1), 168-177.