Explanation of the key parts & caveats
- Database: We store events, markets, and snapshots so you can later backtest, compute historical features, or track resolved outcomes.
- Market fetching & filtering: You fetch all markets and heuristically filter “economic” ones using token matching. You may instead use metadata or event categories if the API supports it.
- Signal / model: The simple model takes the implied probability (i.e. price) and then adjusts it based on naive sentiment from news titles. That’s extremely simplistic; you’ll want to replace that with a more rigorous model (e.g. time-series regression, macro forecasts, natural language sentiment, etc.).
- Confidence score: Here we base it on the absolute difference between model vs implied, scaled. You could also consider liquidity, volume, variance in historic predictions, or ensemble consistency.
- Background / news lookup: We do a simple Google search and parse the top titles. In practice, you might want to integrate a proper news API (e.g. NewsAPI, RSS from Bloomberg / Reuters / Econ blogs) for more reliable results.
- Bet selection: We pick the highest-confidence non-neutral signal as the “best bet” for that run.
Extensions & improvements you should consider
- Better filtering of economics events
Use event metadata from Kalshi (if available) instead of string heuristics. - Time series / feature engineering
Use historical snapshots, price movement, volume trends, volatility, momentum, etc., to build features. - Sentiment / NLP model
Use a proper sentiment analysis / news scoring model (e.g. from HuggingFace or OpenAI) rather than naive word matching. - Risk management / position sizing
Don’t bet too much; consider limiting exposure, hedging across correlated markets, etc. - Backtesting and evaluation
Over time, compare your model’s predictions to actual outcomes to refine weights and calibration. - Automated trade execution
Once your confidence is high, you can integrate the “place order” API endpoints and manage execution, slippage, etc. - Rate limits, error handling, retries
Add logic to handle API errors, HTTP rate limits, and network issues. - Caching / incremental updates
Instead of fetching all markets every time, just fetch changes / new snapshots. - Better news retrieval
Use RSS / API feeds from major economic news sources (Bloomberg, Reuters, Fed announcements, etc.), not raw Google scraping. - Ice cream.
Welcome to the world where betting, machine intelligence, and markets collide. The goal of this project is simple (yet audacious): let AI identify value bets on economic prediction markets, automatically fetch background data, and rank the best bet with an explanation + a confidence score. In short: “Use AI to bet on f*cking everything.”
In this post, I’ll walk you through:
- Why betting on prediction markets is an interesting use case
- How the Kalshi API works (authentication, fetching markets, etc.)
- The architecture of the Python script
- How the model / signal is constructed
- Some caveats, risks, and ideas for improvement
- A worked example / thought experiment
- Next steps
Why prediction markets + AI?
Prediction markets (like Kalshi) let users trade binary “yes/no” contracts about future events. The current market price of a contract (e.g. $0.40) can be interpreted as the implied probability that the “yes” outcome will happen (e.g. 40 %). Thus, these markets aggregate collective information and beliefs, and respond to new data in real time.
If you believe your AI / model / research can predict better than the market (or at least differently in a useful way), you can try to exploit that difference.
This is akin to “value betting” in sports or financial markets: find cases where your estimated probability > market-implied probability → expected value (EV) > 0. Boyd’s Bets+1
Prediction markets also have advantages:
- They are often more efficient and transparent (no hidden “vig” or juice like a sportsbook).
- They cover many domains (economics, events, politics), not just sports.
- The framework naturally lends itself to combining your own models + external data + sentiment.
Understanding the Kalshi API & authentication
API keys / signing requests
To interact with Kalshi programmatically, you need an API key. According to the docs:
- Go to your account / profile settings → “API Keys” → “Create New API Key.” You will be given two parts: a key ID and a private key (RSA format). Kalshi API Documentation
- The private key is only shown once — store it securely. You cannot retrieve it later. Kalshi API Documentation
- Each API request must be signed using RSA over a concatenation of timestamp, HTTP method, and path. You also send the headers
KALSHI-ACCESS-KEY
(key ID),KALSHI-ACCESS-TIMESTAMP
, andKALSHI-ACCESS-SIGNATURE
. Kalshi API Documentation
Kalshi provides SDKs to help with signing / abstraction. The Python SDK (kalshi-python
) is one of them. Kalshi API Documentation+1
Market / data endpoints
Once authenticated, you can use endpoints like:
get_markets
— list markets (with paging / cursors)get_event
/get_market
— fetch details of a specific event or marketget_trades
,get_orderbook
, etc. — get historical trades, price depth, etc.
The API also supports public (unauthenticated) endpoints for market listing. The docs recommend starting with public endpoints like GetMarkets before diving into authenticated ones. Kalshi API Documentation
Using this API, our script fetches all markets, filters those tied to economic / macro events, and stores their latest prices, volumes, etc.
The architecture: how the script works (at a glance)
Here’s the high-level flow:
- Initialize / create a SQLite database with tables for events, markets, snapshots, and model signals.
- Fetch all markets (via pagination) using the Kalshi API.
- Filter markets whose name / ticker suggests they are economic (CPI, inflation, interest rates, GDP, unemployment, etc.).
- Insert or update the markets and event info into SQLite; also insert a snapshot entry (price, timestamp, volume).
- For each economic market, compute a “signal” — that is, compare the market-implied probability (based on price) vs. your model’s probability (augmented by news / sentiment).
- Rank signals by confidence, pick the strongest, and output the “best bet” with explanation and supporting news.
Optionally, one could extend this to automated trading (placing orders) if confidence is high enough.
Model / signal: comparing implied vs estimated + news
Here’s the intuition:
- The market-implied probability equals
yes_price
(e.g. 0.40) for the “yes” side (and1 - yes_price
for “no”). - Your model tries to estimate a “true” probability for “yes” (based on your data, forecasts, sentiment).
- If
model_prob > implied_prob + margin
, that suggests value in betting “yes.” - If
model_prob < implied_prob - margin
, you might bet “no.” - Otherwise, the difference is too small → no bet.
In the prototype, we used a naive sentiment bias derived from news headlines:
- We fetch a few news titles about the event (e.g. “CPI inflation surge expected”)
- If words like “rise”, “surge”, “increase” show up, we add a small positive bias; if “fall”, “decline”, etc., we subtract a bias
- Then clamp the resulting model probability into [0.01, 0.99]
- Confidence is a function of how far model and implied diverge (scaled).
This is oversimplified, but it gives the framework to plug in any more advanced model (ML, time series, NLP, etc.).
import sqlite3
import time
import json
import requests
import math
from datetime import datetime, timezone
from kalshi_python import Configuration, KalshiClient
from urllib.parse import quote_plus
# (Optional) for news / web search
from bs4 import BeautifulSoup
# ========== Configuration & setup ==========
API_KEY_ID = "your_api_key_id"
PRIVATE_KEY_PEM = """-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----"""
# Base API host (use demo or production as appropriate)
API_HOST = "https://api.elections.kalshi.com/trade-api/v2"
config = Configuration(host=API_HOST)
config.api_key_id = API_KEY_ID
config.private_key_pem = PRIVATE_KEY_PEM
client = KalshiClient(config)
# SQLite setup
DB_PATH = "kalshi_econ.db"
def init_db():
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
# event table
c.execute("""
CREATE TABLE IF NOT EXISTS events (
event_ticker TEXT PRIMARY KEY,
name TEXT,
category TEXT,
close_ts INTEGER,
resolution TEXT
)""")
# market table
c.execute("""
CREATE TABLE IF NOT EXISTS markets (
market_ticker TEXT PRIMARY KEY,
event_ticker TEXT,
yes_price REAL,
no_price REAL,
last_trade_ts INTEGER,
volume REAL,
FOREIGN KEY(event_ticker) REFERENCES events(event_ticker)
)""")
# trade history / snapshots
c.execute("""
CREATE TABLE IF NOT EXISTS market_snapshots (
snapshot_id INTEGER PRIMARY KEY AUTOINCREMENT,
market_ticker TEXT,
ts INTEGER,
yes_price REAL,
no_price REAL,
volume REAL
)""")
# optional: your model signals / bets
c.execute("""
CREATE TABLE IF NOT EXISTS model_signals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
market_ticker TEXT,
ts INTEGER,
implied_prob REAL,
model_prob REAL,
signal TEXT,
confidence REAL
)""")
conn.commit()
conn.close()
# ========== Fetch & store data ==========
def fetch_all_markets(limit=1000):
"""Fetch all markets via pagination"""
all_markets = []
cursor = None
while True:
resp = client.get_markets(limit=limit, cursor=cursor)
data = resp.data
for m in data.markets:
all_markets.append(m)
cursor = data.cursor
if not cursor:
break
# rate-limit sleep if needed
time.sleep(0.2)
return all_markets
def filter_economic_markets(markets):
"""Filter markets whose underlying event is economic in nature."""
econ = []
for m in markets:
# Some heuristics: check event ticker or market name or category containing “CPI”, “Fed”, “inflation”, “GDP”, etc.
name = m.name.lower() if hasattr(m, "name") else ""
ticker = m.market_ticker.lower()
if any(tok in name for tok in ["cpi","inflation","fed","gdp","unemployment","rate","ppi"]):
econ.append(m)
return econ
def store_markets(markets):
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
for m in markets:
# store event
ev = m.event
c.execute("""
INSERT OR IGNORE INTO events(event_ticker, name, category, close_ts, resolution)
VALUES (?, ?, ?, ?, ?)
""", (ev.event_ticker, ev.name, ev.category if hasattr(ev, "category") else None,
ev.close_ts, ev.resolution if hasattr(ev, "resolution") else None))
# store market
yes_price = None; no_price = None
# The API may return “last_price” for yes side and no = 1 - yes (depending on representation). Adapt as needed.
# Here, assume m.last_price is yes side, and no_price = 1 - yes_price.
yes_price = m.last_price
no_price = 1.0 - yes_price
c.execute("""
INSERT OR REPLACE INTO markets(market_ticker, event_ticker, yes_price, no_price, last_trade_ts, volume)
VALUES (?, ?, ?, ?, ?, ?)
""", (m.market_ticker, ev.event_ticker, yes_price, no_price, m.last_trade_ts, m.volume))
# snapshot
c.execute("""
INSERT INTO market_snapshots(market_ticker, ts, yes_price, no_price, volume)
VALUES (?, ?, ?, ?, ?)
""", (m.market_ticker, int(time.time()), yes_price, no_price, (m.volume or 0)))
conn.commit()
conn.close()
# ========== External news / background fetch ==========
def search_news_for_event(event_name, num=5):
"""Do a simple web search and return a list of (title, snippet, url)."""
query = quote_plus(event_name + " outlook analysis 2025")
url = f"https://www.google.com/search?q={query}"
# (Note: Google search may block automated requests; you may need to use a search API.)
headers = {"User-Agent": "Mozilla/5.0 (compatible)"}
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")
results = []
for g in soup.select(".kCrYT a"):
href = g.get("href")
if href and href.startswith("/url?q="):
actual = href.split("/url?q=")[1].split("&sa=")[0]
title = g.text
results.append((title, "", actual))
if len(results) >= num:
break
return results
# ========== Simple “model” & bet suggestion logic ==========
def compute_signal_for_market(market_ticker):
"""
Get latest market, compute implied probability, build a naive model for true probability,
then compute signal and confidence.
"""
conn = sqlite3.connect(DB_PATH)
c = conn.cursor()
c.execute("SELECT yes_price FROM markets WHERE market_ticker = ?", (market_ticker,))
row = c.fetchone()
if not row:
conn.close()
return None
implied = row[0]
# *** Simple model: treat implied as base, then adjust by news sentiment ***
# For demonstration: if latest news has strong language (“sharp rise inflation”) push model a bit.
# A real model would parse economic forecasts, time series, etc.
# Here we fetch news:
c.execute("SELECT event_ticker FROM markets WHERE market_ticker = ?", (market_ticker,))
evt = c.fetchone()[0]
c.execute("SELECT name FROM events WHERE event_ticker = ?", (evt,))
ev_name = c.fetchone()[0]
news = search_news_for_event(ev_name, num=3)
# Very crude sentiment: if news titles contain “rise”, “surge”, “jump” → upward bias
bias = 0.0
for title, _, _ in news:
t = title.lower()
if "surge" in t or "rise" in t or "increase" in t or "jump" in t:
bias += 0.02
if "fall" in t or "decline" in t or "drop" in t:
bias -= 0.02
model_prob = implied + bias
# clamp
model_prob = max(0.01, min(0.99, model_prob))
signal = None
if model_prob > implied + 0.01:
signal = "bet_yes"
elif model_prob < implied - 0.01:
signal = "bet_no"
else:
signal = "no_bet"
# Confidence: based on magnitude of difference and number of sentiment signals
diff = abs(model_prob - implied)
confidence = min(1.0, diff * 5) # e.g. if diff=0.1 → confidence=0.5
conn.close()
return {
"market_ticker": market_ticker,
"implied_prob": implied,
"model_prob": model_prob,
"signal": signal,
"confidence": confidence,
"news": news
}
def choose_best_bet(signals):
"""
Among signals, pick the one with highest confidence (and non-neutral) as the “best bet”.
"""
best = None
for s in signals:
if s["signal"] != "no_bet":
if best is None or s["confidence"] > best["confidence"]:
best = s
return best
# ========== Main orchestration ==========
def main():
init_db()
print("Fetching markets …")
markets = fetch_all_markets(limit=500)
print(f"Fetched {len(markets)} markets")
econ_markets = filter_economic_markets(markets)
print(f"Filtered {len(econ_markets)} economic markets")
store_markets(econ_markets)
# compute signals for each econ market
signals = []
for m in econ_markets:
sig = compute_signal_for_market(m.market_ticker)
if sig:
signals.append(sig)
# pick best bet
best = choose_best_bet(signals)
if best:
print("=== Best bet recommendation ===")
print(f"Market: {best['market_ticker']}")
print(f"Signal: {best['signal']}")
print(f"Model prob: {best['model_prob']:.3f}, Implied prob: {best['implied_prob']:.3f}")
print(f"Confidence: {best['confidence']:.3f}")
print("News influencing decision:")
for title, _, url in best["news"]:
print(f" - {title} → {url}")
else:
print("No strong bet signal at this time.")
if __name__ == "__main__":
main()
This approach mirrors the core idea of “value betting” in sports: convert odds → implied probability; compare with your own estimate; bet when you believe your estimate is more accurate. OddsHaven+2OddsShopper+2
It also echoes the principle that calibration (the match between predicted probabilities and actual frequencies) is often more valuable in betting models than mere accuracy. arXiv
Code walkthrough (key parts)
Here’s a deeper dive into important segments:
Database & schema
We create tables:
events(event_ticker PRIMARY KEY, name, category, close_ts, resolution)
markets(market_ticker PRIMARY KEY, event_ticker, yes_price, no_price, last_trade_ts, volume)
market_snapshots(snapshot_id, market_ticker, ts, yes_price, no_price, volume)
model_signals(id, market_ticker, ts, implied_prob, model_prob, signal, confidence)
This structure lets you track historical price changes and how your model’s signals evolve over time.
Fetching & storing markets
The function fetch_all_markets()
pages through results using cursor
until exhausted.
filter_economic_markets()
is a heuristic — it simply checks if the market or event name contains tokens like “cpi”, “inflation”, etc. You could improve this by relying on richer metadata from the API if available.
store_markets()
inserts/updates market and event rows, plus snapshot logs.
Signal computation
compute_signal_for_market()
is the meat of decision logic:
- Reads the latest
yes_price
→ implied probability - Fetches the event name → runs a simple Google search to get a few news titles
- Computes a “bias” from those titles (rudimentary sentiment)
- Sets
model_prob = implied + bias
(clamped) - Determines a
signal
(“bet_yes”, “bet_no”, or “no_bet”) based on margin - Assigns a
confidence = min(1.0, diff * 5)
as a scaling of how big the gap is
choose_best_bet()
picks the signal with highest confidence (non-neutral).
Orchestration (main()
)
- Initialize DB
- Fetch markets
- Filter & store
- Compute signals for each econ market
- Pick & print the best bet + explanation + news
You could extend this loop to run periodically (cron / daemon), track your bet results, or trigger trade execution.
Worked Example / Thought Experiment
Suppose one of the markets is:
Market: “Will U.S. CPI YoY exceed 4.0% in June 2025?”
Yes price: 0.35 → implied probability = 35%
No price: 0.65 → implied probability = 65%
Your AI model, using inflation data, Fed communications, supply chain indicators, etc., estimates that the true chance of CPI > 4.0% is 42%. Meanwhile, a recent news article says “inflation pressures intensifying — CPI expected to surge.” That gives a small positive sentiment bias (+0.02). So your model_prob = 0.35 + 0.02 = 0.37
(in practice you’d combine a stronger model, not just bias).
Since 0.37 > 0.35 + 0.01, your signal is “bet_yes” with a confidence proportional to (0.37 – 0.35).
If among all economic markets that has the greatest confidence, you choose that as your best bet. You then output the news articles that triggered the bias, plus the implied vs model probabilities, plus the confidence score.
You might execute a small position (e.g. fraction of bankroll using Kelly criterion or capped sizing) if confidence is high.
Strengths, limitations & improvements
Strengths
- Modular / extensible: You can plug in better models (ML, ensemble, time series) instead of naive bias.
- Transparent explanations: You output news, probability comparisons, and a confidence score so you understand “why” the recommendation.
- Persisted history: Using SQLite lets you backtest, track signal performance, and refine over time.
- Data fusion: You combine market data + external signals (news) in a unified pipeline.
Key limitations & risks
- Naive sentiment model: The biasing from simple keyword matches is extremely fragile. Real news parsing / sentiment analysis (e.g. via NLP models) is needed.
- Overfitting / data snooping: If you tailor your model too much to past events, you may “find patterns” that don’t generalize.
- Market efficiency: Prediction markets may already price in most publicly available info. The alpha opportunity may be very small or disappearing fast.
- Transaction costs / slippage / liquidity: Even if your model sees value, execution (bid/ask spreads, insufficient liquidity) may eliminate profit.
- Confidence calibration: Your confidence scaling is ad hoc; better calibration (e.g. Bayesian, historical signal accuracy) is needed.
- Legal / regulatory / financial risk: Betting/trading involves capital risk; always be cautious.
Suggested improvements
- Use an NLP / sentiment model (e.g. transformer) to score news, not just keyword heuristics.
- Use time series features: price momentum, volatility, cross-market correlations.
- Calibrate your confidence based on how past signals fared (e.g. a Bayesian score or rolling accuracy).
- Use proper bet sizing (Kelly criterion or fractional Kelly). Wikipedia
- Incorporate trade execution logic with risk controls (max stake, stop losses).
- Expand external data: macroeconomic reports, central bank minutes, expert forecasts, research papers.
- Run distributed / ensemble models and compare consistency across them.
- Include backtesting: simulate how your signals would have performed historically to validate your strategy.