BitGuard: Bitcoin Address Risk Intelligence
Mission
Build a data-driven system that identifies risky Bitcoin wallets before fraud occurs, helping exchanges, businesses, and individuals transact confidently and safely in the crypto ecosystem.
Website: bitguard.pro
Why This Problem Matters
Unlike traditional finance, Bitcoin is anonymous and irreversible. That creates a simple but serious problem: users often have to decide whether to send funds without a clear way to assess risk first. Unfortunately, over $17B in crypto losses were incurred in 2025 alone due to fraud.1,8 Roughly ~50% of transactions come from new users who are unaware of scam risks.2 However, existing crypto risk tools are either too unreliable or locked behind enterprise-level paywalls. BitGuard was built around that need, and provides a fast, understandable risk signal before money moves.
Target Users & Pain Points
What We Heard
- Explainability had to be part of the MVP.
- The interface had to feel lightweight and immediately understandable.
- Output needed to support a real send / do not send decision.
- Users want fast clarity, not deep investigation.
- Trust comes from a score with a simple explanation.
Who We Built For
- Retail crypto users who are not blockchain literate and users copying addresses from the wild and wanting a quick confidence check.
- AI Agents interacting with API via x402 payment protocol. AI agents are emerging as a key driver in prediction markets - $6 billion in weekly volume in March 2026, and should be able to interact with our API seamlessly to transact cryptocurrency with confidence.
MVP Experience
Website: bitguard.pro
- User pastes a Bitcoin address into BitGuard
- The API returns (typically under 5 seconds):
- a risk category from a confidence signal
- plain-language explanations for why the risk was identified
- Attributes of the address and transactions, including a link to mempool.space
- The user can use that output to decide whether to continue, or investigate further.
Exploratory Graph Data Analysis and Pipeline
KEY EDA TAKEAWAYS
A graph database was needed over a relational database because it allows for quick exploration. Bitguard found the EBA dataset3,5, presented at NeurIPS in December 2025, the first publicly available bitcoin graph dataset. This contained over 16 years worth of transaction data, from the inception of bitcoin until recent, with ~2.4 billion nodes, and ~26.3 billion edges. Different expansion strategies were evaluated, including 3 hops and dynamic hops, before settling on 2-hops. Supernode avoidance was necessary, with a 5000 ms + 1000 Tx limit, to avoid searching Crypto Exchanges and keep the analysis focused on wallet → wallet behavior. Even with major hurdles in compute and a 2 Tb Graph, the decision was ultimately to move forward with the full graph to save time.
Training Data Sources: BitcoinHeist and Elliptic++
BITCOINHEIST
Academic dataset of Bitcoin ransomware addresses developed at UCI4,9 covering 28 ransomware attack families.
ELLIPTIC++
Graph-based dataset of illicit Bitcoin addresses developed at Georgia Tech7, covering a broad range of illicit wallet categories including darknet markets, exchange hacks, and scams.
WHY BOTH?
The combination of these datasets gives BitGuard broader coverage across illicit wallet behavior and types. BitcoinHeist contributes ransomware-focused labels, while Elliptic++ contributes a broader range of illicit wallet categories including darknet markets, exchange hacks, and scams. Together, they support training on a broader range of illicit wallets and their on-chain data, with approximately 513,000 total wallets, ~35,000 illicit wallets, with a class imbalance of 13:1.
Feature Engineering and Key Signals
We translated neo4j graph output into 138 total features, grouped by hop distance and direction. The strongest signals included Coinjoin activity and Dust transactions, both of which were much more frequent indicators in illicit wallets (7x and 8x respectively), and the total feature set captured meaningful patterns:
- tx volume: BTC sent, received, max, avg., etc.
- network structure: unique wallet interactions, transaction asymmetry, network depth
- coinjoin txs: mixing patterns used to hide fund origins, commonly used by illicit wallets
- dust txs: tiny BTC amounts used to expose a target wallet’s network, common in
- round-number txs: suspiciously high round number txs that may indicate fund laundering
- temporal behavior: active block lifespan, blocks used as proxy for time
Modeling Approach and Selection
The modeling decision came down to balancing performance, interpretability, and inference speed. Logistic Regression and Random Forest provided useful baselines, with high interpretability and fast inference, but weren't able to capture our complex feature interactions as well as other options. Graph Neural Networks were a natural consideration for graph data, but limited interpretability (“black box”) and comparatively slow inference made GNNs less aligned with our model requirements. Ultimately, we selected LightGBM due to its overall best performance in the test set, while remaining fast and interpretable through SHAP scores, a key requirement to make the results explainable and justifiable to our end user. We indexed heavily on a high accuracy (.98) with the difference to our F1 score (.88) tied to the class imbalance ratio of 13:1.Our False Positive rate was only 0.3%, however, our false negative rate was 17%. Overall, additional fine tuning and additional data can be added to our dataset to improve training and hyperparameter tuning to boost performance.
Model Performance and Limitations
Where the Model Performs Well
- High CoinJoin Tx activity in network
- High Dust Tx activity in network
- Suspicious hub-and-spoke patterns (sends to many unique recipients from 1 source)
- Short bursts of high activity, common in burner wallets.
Edge Cases and Limitations
- Model skews toward ransomware patterns (dominant type in training data)
- Privacy-oriented wallets using Coinjoin legitimately (Wasabi, JoinMarket)
- Novel illicit on-chain patterns may be harder to detect if not learned by the model.
System Architecture Walkthrough
The system architecture includes a Oracle user-facing frontend web app, a submit endpoint to a Redis Cache, a Neo4j Database, an ML Model, and an API Response layer. The architecture supports the product flow from the user request through cached and graph-backed scoring and back to the frontend.
Key Technical Challenges
Challenge | Our Solution |
Monolithic Fragility -Frontend, cache, model API, and graph database as one entity issue. | Separation of Concerns -Mutual Synergy (via JSON) -Monetizable API |
Database Computation -Queries exponentially blew up on Supernodes | Database Optimization -Optimized index for Cypher queries |
User Centric Design -Model API response output was too complex
| Usability Engineered -Cached Results -Interpretable Frontend -Modern Mobile Support |
Roadmap & Expansion
- - Proper session token w/ x4026 protocol1 compatibility
- Allows AI agent pay-per-call monetization with our API
- Ingest up-to-date transaction data
- Allows the latest blocks to be trained on
- Train to catch additional malicious behaviors
- Atomic Swaps
- Additional Cryptocurrencies (Ethereum, Solana)
- UX improvements
- User Profiles / Saved Searches / x402 private tokens
- User Wallet Integration
- Chrome Extension
- Discord & Telegram Bots
Addressable Market
Our most similar competitor is TokenSniffer, a scam detector for the ethereum blockchain. Bitguard is different, where our approach is a true graph/ML based approach, and not token limited. TokenSniffer charges $100/month to its customers, with 500 tests allowed per day.
Based on Bitguard’s Service Obtainable Market, if we charge $20/month to 1% of our SOM, we will generate a $30.7 Million Annual Recurring Revenue. From that starting point, AI Agent wallet/exchange distribution will serve as an additional lever for scale, utilizing our x402 protocol.
Acknowledgements
Course Staff:
- Puya Vahabi
- Danielle Cummings
Team:
- Chirag Agarwal
- Noah Cederholm
- Steven Au
- Wes Morberg
Additional Resources:
Github: https://github.com/thesteau/bitguard
Website: bitguard.pro
Slide Deck: https://docs.google.com/presentation/d/1I7ndbJvwnAgakOInzbrN8wwC0LjAQpDl9E0X6zAMHbg/edit?usp=sharing
Works Cited
- Coinbase. “2024 Shareholder Letter.” Coinbase, 13 Feb. 2025.
- “Coinbase Bytes Newsletter.” Coinbase, 25 Mar. 2026, www.coinbase.com/bytes/archive/the-future-of-ai-and-crypto. Accessed 15 Apr. 2026.
- Jalili, Vahid . “EBA.” B1aab.ai, 2026, eba.b1aab.ai/. Accessed 13 Apr. 2026.
- “UCI Machine Learning Repository.” UCI.edu, 2020, archive.ics.uci.edu/dataset/526/bitcoinheistransomwareaddressdataset. Accessed 17 Apr. 2026
- Vahid Jalili (2025). The Temporal Graph of Bitcoin Transactions. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- “X402.org.” X402.org, 2025, www.x402.org/.
- Youssef Elmougy. “GitHub - Git-Disl/EllipticPlusPlus: Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses.” GitHub, 2025, github.com/git-disl/EllipticPlusPlus.
- Chainalysis Team. “Record $17 Billion Estimated Stolen in Crypto Scams and Fraud in 2025 as Impersonation Tactics and AI Enablement Surge.” Chainalysis.com, Chainalysis, 13 Jan. 2026, www.chainalysis.com/blog/crypto-scams-2026/.
- Akcora, Cuneyt Gurcan, Yitao Li, Yulia R. Gel, Murat Kantarcioglu. "BitcoinHeist: Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain." arXiv:1906.07852, 2019.
