💹 Project Plan: A Collaborative Framework for Model Development
Project Plan: A Collaborative Framework for Model Development
Objective
This document outlines our unified strategy for building a robust, production-grade crypto trading model. To balance the immediate needs of our modeling teams with our long-term architectural goals, we will adopt a "Parallel Paths" approach.
Our core modeling work will follow a scientific method: we will establish a single baseline model and then quantitatively measure the impact of new features and ideas. The focus will be on a regression approach to predict continuous 1-day forward returns, allowing for more nuanced strategy development than simple classification.
The Baseline Model: Our Source of Truth
To ensure a scientific process, we will establish a single baseline model stored in our main GitHub branch. This model serves as our benchmark, and all feature experiments will be measured against its performance.
- Architecture: XGBoost Regressor
- Core Features: RSI Family (e.g.,
rsi_7d,rsi_14d,rsi_30d)
- Initial Performance Benchmark: Information Coefficient (IC): Target IC > 0.02.
- Rule: Any new feature will only be merged into the baseline if it can demonstrably improve upon this benchmark in a rigorous walk-forward validation.
Core Feature Hypotheses to Test
Our team’s prior research has identified several feature families with high potential. The squads will treat these as our primary hypotheses to test against the baseline model:
- Hypothesis 1: Derivatives Alpha. Features derived from Open Interest (OI) and Funding Rates will significantly increase the model's Information Coefficient.
- Hypothesis 2: Market Regime Context. Adding market dominance features (e.g.,
btc_dominance) will improve the model's performance during different market regimes.
- Hypothesis 3: Volume Signals. Incorporating normalized volume signals (Volume Z-scores) will enhance the model's ability to confirm price trends.
- Hypothesis 4: Fundamental Quality Filter. Using on-chain data (TVL momentum, token unlocks) as a pre-modeling filter will improve the quality of our asset universe and lead to a more robust final signal.
Technology & Validation
- Data Pipeline: The established Bronze Layer (raw data) will serve as our foundation.
- Silver Layer Development: All Silver Layer feature engineering for this track will be developed in our existing Python/pandas pipeline to ensure immediate delivery.
- Validation: All hypotheses will be tested using our proven walk-forward validation framework to prevent overfitting and ensure out-of-sample robustness.
(Parallel Effort): Long-Term Cloud Foundation
While the ML team develops features in pandas, the data engineering team will work in parallel, focusing on building our future-state architecture.
- Goal: To build a scalable, automated, production-grade data pipeline.
- Technology: PySpark, Google BigQuery, and Apache Airflow.
- Synergy: The feature logic and transformations created in pandas in Track 1 will serve as the direct blueprint for the final PySpark implementation in Track 2.
- Team: This effort will be led by @Madhavi Dixit and @Ankit Verma.
- Integration: This side of things will be handled for you, don’t worry about connecting anywhere for now. Focus on developing a baseline, we’ll get you the data you need.
Squad Missions & Technical Specifications
Official Toolkit
To maintain consistency across experiments, all squads will utilize the following core libraries:
- Modeling:
XGBoostfor our baseline architecture.
- Hyperparameter Tuning:
Optunafor systematic optimization.
- Feature Importance:
SHAPfor model interpretability.
- Backtesting: Our internal
walk-forward validationframework.
Research Squad 1: The Baseline Model (Momentum & Regime)
Squad Assignment:
- Lead: @Nidhi Parab
- Members: @Brian Meng @Rishik Gowrishetti, @Poojita Pondari, @Sharanya Emmadisetty
Mission:
Your squad's mission is to build, tune, and validate the foundational
baseline model for the entire project. This model will serve as the "control group" and official source of truth. Its performance, measured by the Information Coefficient (IC), will be the benchmark that all other feature squads must demonstrably beat.
Phased Research Plan:
To accomplish this mission scientifically, your work will be divided into two phases to isolate the impact of each feature family.
Phase 1: Establish the Core Momentum Baseline
First, establish a stable benchmark using only the highest-priority RSI features.
- Action: Build and backtest a model using only the
RSI Familyfeatures.
- Goal: Document a stable Information Coefficient for this RSI-only model. This becomes the internal benchmark for your squad's next phase.
Phase 2: Test the Market Regime Hypothesis
Once the momentum baseline is stable, test the core hypothesis by adding the market regime features.
- Hypothesis: Adding market regime signals will improve the model's predictive power over a pure momentum model.
- Action: Add the
Market Regime Signalsto the model from Phase 1.
- Measurement & Validation: Compare the IC of this enhanced model against the RSI-only baseline. A statistically significant increase will validate the hypothesis.
Technical Specifications:
- Data Filtering: The model must be trained on a high-quality asset universe. Before feature engineering, the Bronze Layer data will be filtered using the following criteria:
# Apply asset filters (your discovery: fundamentals as filters)
filtered_data = data[
(data['tvl_momentum_30d'] > 0.5) & # Strong DeFi growth
(data['unlock_pct_30d'] < 0.1) & # Low unlock risk
(data['is_stablecoin'] == False) & # Quality assets only
(data['is_meme_coin'] == False)
]- Model Architecture: The project's baseline architecture is an XGBoost Regressor.
- Target Variable: The model will predict the continuous 1-day forward return.
- Validation: All tests must use the standardized walk-forward validation framework with monthly retraining.
Definition of Done
This squad's work is complete when you deliver the final, stable baseline model that incorporates the best combination of momentum and regime features. The deliverable must include a report detailing the final IC of the combined model and the measured performance lift that was achieved by adding the regime features in Phase 2. This model will then become the new official benchmark for the entire project.
Squad 2: The Derivatives & Volatility Experiment
- Lead: @Rachana Dharmavaram
- Team Assignment: @Jimmy Wu @zihao yang @Benjamin Lu
- Mission: Your squad's mission is to test the hypothesis that derivatives and advanced volatility data can provide a significant predictive edge over pure price momentum. You will add this new feature set to the baseline XGBoost model and rigorously measure its impact on the Information Coefficient (IC).
Technical Specifications
1. Hypothesis to Test (Feature Set):
You will add the following feature families to the baseline model to test their predictive power.
- Derivatives - Open Interest Signals:
oi_change_24h,oi_volume_ratio,oi_momentum_7d,oi_dominance, etc.
- Derivatives - Funding Rate Signals:
funding_rate_zscore,funding_momentum_7d,oi_funding_divergence, etc.
- Advanced Volatility:
atr_14(Average True Range),vol_regime_change, etc.
2. Phased Feature Testing:
To precisely measure the contribution of each feature family, you will test them in sequence:
- Phase A: Test Derivatives. Add only the Derivatives features (OI and Funding Rates) to the baseline model and measure the IC lift.
- Phase B: Test Volatility. Add only the Advanced Volatility features to the baseline model and measure the IC lift.
This approach will allow you to determine if one family provides a stronger signal than the other, or if their combined effect is necessary to achieve a performance boost.
3. Data Filtering:
- You will use the exact same data filtering logic as the baseline model squad to ensure a fair, apples-to-apples comparison.
4. Target Variable:
- Target: Same as the baseline: Predict the continuous 1-day forward return.
5. Model & Validation:
- Architecture: You will use the established baseline model architecture: XGBoost Regressor. The goal is to isolate the impact of your features, not to change the model type.
- Validation: Same as the baseline: Rigorous walk-forward validation with monthly retraining.
Definition of Done
This squad's work is complete when you deliver a report detailing the final IC of the model with your new features. The report must clearly show the performance lift (or degradation) compared to the Squad 1 baseline, providing a clear "yes" or "no" on the hypothesis that derivatives data adds value.
Squad 3: The Fama-French & Cross-Asset Factor Experiment
- @Venkata Sri Sai Surya Mandava
- Team Assignment: @deepthim @Monica Lama @litesh perumalla
- Mission: Your squad's mission is to test the hypothesis that features inspired by the Fama-French factors and other cross-asset signals can add a new layer of predictive power beyond single-asset momentum. You will add this new feature set to the baseline XGBoost model and rigorously measure its impact on the Information Coefficient (IC).
Technical Specifications
1. Hypothesis to Test (Feature Set):
You will research and implement the following feature families to test their predictive power against the baseline model.
- Crypto Fama-French Factors: This is a key research area. The goal is to create crypto-native proxies for the classic Fama-French factors.
crypto_mkt_factor(i.e., Beta, or market sensitivity)
crypto_smb_factor(Size, based on market cap rank)
crypto_hml_factor(Value, potentially using a proxy like TVL-to-market-cap ratio)
crypto_momentum_factor(Winners vs. Losers, cross-sectionally)
- Cross-Asset Signals: These features measure how an asset is behaving relative to the rest of the market on a given day.
volume_zscore_cross_sectional
return_zscore_cross_sectional
2. Phased Research Plan:
To isolate the impact of each signal type, your squad will test the hypotheses sequentially:
- Phase A: Test the Fama-French Hypothesis. This is the primary mission. Focus solely on researching, implementing, and testing the four crypto-native Fama-French factors. Measure their combined impact on the baseline model's IC.
- Phase B: Test the Cross-Asset Hypothesis. Separately, test the
volume_zscore_cross_sectionalandreturn_zscore_cross_sectionalfeatures against the baseline model to measure their predictive power in isolation.
3. Data Filtering:
- You will use the exact same data filtering logic as the baseline model squad to ensure a fair, apples-to-apples comparison.
4. Target Variable:
- Target: Same as the baseline: Predict the continuous 1-day forward return.
5. Model & Validation:
- Architecture: You will use the established baseline model architecture:
XGBoost Regressor. Your goal is to isolate the impact of your features, not to change the model type.
- Validation: Same as the baseline: Rigorous
walk-forward validationwith monthly retraining.
Definition of Done
Your final report should not only provide a "yes/no" on the Fama-French hypothesis but also comment on the value of the Cross-Asset signals. This phased approach will help us understand if one feature family is significantly more powerful than the other, or if they are most effective when combined.
Project & Inter-Team Lead
John Swindell
My goal will be to coordinate between the squads, ensure the teams are unblocked, manage the project roadmap, and communicate progress to leadership. Feel free to reach out if you need any help!
Future Research: Advanced Model Architectures
The following ideas for advanced model configurations are not part of our immediate plan but should be considered our experimental backlog.
Once our feature engineering sprints with the baseline XGBoost Regressor are complete and we have a robust set of validated features, we can begin testing these alternative architectures to see if they provide an additional performance lift. The goal is to separate feature discovery from model discovery to ensure a clear, scientific process.
Experiment A: Regularized Linear Models
- Hypothesis: A simple, regularized linear model (like Ridge) with adaptive parameters for different market regimes could provide a more stable and interpretable baseline.
- Potential Implementation:
# Regime-adaptive regularization def regime_adaptive_ridge(X, y, market_regime): alphas = {'bull': 0.1, 'bear': 1.0, 'neutral': 0.5} ridge = Ridge(alpha=alphas[market_regime]) return ridge.fit(X, y)
Experiment B: Advanced Tree Ensembles
- Hypothesis: A Random Forest Regressor might capture different feature interactions than XGBoost. We can also explore custom, crypto-specific loss functions within XGBoost to better align the model's objective with our financial goals.
- Potential Implementation:
# Custom objective for crypto-specific loss def crypto_loss(y_true, y_pred): """Asymmetric loss favoring upside predictions""" residual = y_true - y_pred loss = np.where(residual > 0, residual**2, 2 * residual**2) return loss.mean()
Experiment C: Non-Parametric Models
- Hypothesis: An adaptive K-Nearest Neighbors (KNN) regressor could capture local, non-linear patterns in the data that tree-based models might miss.
- Potential Implementation:
# Adjust K based on market conditions def adaptive_knn(X, y, market_volatility): k = max(5, min(50, int(20 * (1 + market_volatility)))) return KNeighborsRegressor(n_neighbors=k, weights='distance')
Experiment D: Deep Learning Models
- Hypothesis: A sequential model, like an LSTM-Transformer hybrid, could better capture complex time-series dependencies in our data compared to the tree-based baseline.
- Potential Implementation:
# LSTM-Transformer Hybrid class CryptoLSTMTransformer(nn.Module): # ... (implementation details) ...
Standardized Performance & Evaluation Metrics
To ensure all feature experiments can be compared objectively, every squad will evaluate their model's performance using the following standardized set of financial and regime-specific metrics. These metrics go beyond simple model accuracy and measure the real-world financial viability of our strategies.
1. Core Financial Metrics
All backtests must report on the following core metrics to provide a holistic view of performance. These will be calculated using a shared utility function.
- Information Coefficient (IC): The primary measure of our model's raw predictive power.
- Sharpe Ratio: The measure of our risk-adjusted return.
- Max Drawdown: The measure of our worst-case loss scenario.
- Hit Rate (Directional Accuracy): The measure of how often our model correctly predicts the direction of the market.
2. Regime-Specific Analysis
To ensure our model is robust, we must understand how it performs under different market conditions. Performance will be calculated and reported separately for the following regimes:
- Bull Market Performance
- Bear Market Performance
- Neutral/Sideways Market Performance
The ML4T Research Workflow: From Data to Strategy
To ensure our research is systematic and reproducible, each strategy will follow the established Machine Learning for Algorithmic Trading (ML4T) workflow. This framework provides a structured, scientific process for developing and validating our models from start to finish.
The cycle consists of the following key stages:
- Data Sourcing & Feature Universe (
Bronze→Silver):- Input: We begin with our standardized data from the
Bronze Layer.
- Action: We apply the project's core asset filters (e.g., TVL momentum, no meme/stablecoins) to define our investment universe.
- Output: A clean, filtered dataset ready for feature engineering in the
Silver Layer.
- Input: We begin with our standardized data from the
- Hypothesis-Driven Feature Engineering:
- Input: The filtered dataset.
- Action: Each squad engineers its assigned core features. For example, the Derivatives squad will calculate
oi_change_24handfunding_rate_zscore, while the Momentum squad focuses on theRSI Family. This is where your team’s validated insights are implemented as code.
- Output: A rich feature matrix (
X) for model training.
- Model Training & Optimization:
- Input: The feature matrix and the target variable (e.g., 1-day forward returns).
- Action: We train our chosen primary model architecture (e.g., XGBoost, Random Forest). Hyperparameter tuning will be conducted systematically, using tools like Optuna as specified.
- Output: A trained model ready for evaluation.
- Walk-Forward Backtesting & Evaluation:
- Input: The trained model and out-of-time-sample data.
- Action: The model's predictive performance is rigorously evaluated using our established walk-forward validation framework. We measure success using the Metrics defined in this plan, focusing on the Information Coefficient (IC), Sharpe Ratio, and Hit Rate.
- Output: A set of performance metrics and analytics (e.g., feature importance, regime-specific performance).
- Signal Integration & Review:
- Input: Performance metrics and model predictions (signals).
- Action: For Strategy 3, signals from different models and feature sets are integrated. For all strategies, the results are documented and reviewed.
- Output: A decision on whether the new feature/model is successful and should be merged into our baseline, or if the hypothesis needs refinement. This marks the end of one cycle and the beginning of the next.
This research workflow ensures that every idea is tested against the same rigorous standards, making our results comparable and robust.
Development & Collaboration Workflow: A Step-by-Step Guide
To complement the research workflow, our team will use a standardized development cycle based on software engineering best practices. This ensures all contributions are isolated, tested, and systematically integrated into the main codebase, preventing conflicts and maintaining a single source of truth.
Our development cycle is: branch → test → measure → merge if successful
Here is the step-by-step process for every team member:
- Step 1: Create a Branch from the Baseline Model
- Action: Before starting any new work, pull the latest version of the
mainbranch from our GitHub repository. Create a new branch with a descriptive name.
- Example Name:
feature/strategy1-volatility-bandsorfix/strategy2-data-pipeline.
- Purpose: This isolates your work from the stable baseline, allowing you to experiment freely without affecting the work of others.
- Action: Before starting any new work, pull the latest version of the
- Step 2: Develop & Test Your Feature or Model
- Action: In your branch, implement the required changes. This could be adding a new feature, tuning a model, or fixing a bug. Run your code and conduct initial tests in your local environment to ensure it works as expected.
- Purpose: To complete the core development task of your assignment.
- Step 3: Measure the Impact
- Action: Run a backtest using the established walk-forward validation framework. The primary goal is to objectively measure whether your change improved the model.
- Success Criteria: Your change is considered successful if it meets the project's "Quantitative Heuristics for Merging New Features," primarily demonstrating a consistent IC improvement over the baseline.
- Purpose: To validate your hypothesis with data before proposing it for inclusion in the main model.
- Step 4: Create a Pull Request (PR)
- Action: Once your feature is validated, push your branch to GitHub and open a Pull Request to merge it into the
mainbranch. In the PR description, clearly state the hypothesis you were testing, summarize the results (including the final IC score vs. the baseline's IC). Paste key charts or a summary table directly into the PR description to make the review self-contained and easy to track in GitHub's history.
- Purpose: To formally propose your change and provide context for code reviewers.
- Action: Once your feature is validated, push your branch to GitHub and open a Pull Request to merge it into the
- Step 5: Review & Merge
- Action: Another team member will review your code for quality and your results for validity. If the change is approved, your branch will be merged into the
mainbranch. The baseline model is now updated with your successful feature.
- Purpose: To ensure code quality, validate results, and complete the integration of a successful experiment. Your cycle is now complete, and you can start on the next task by creating a new branch.
- Action: Another team member will review your code for quality and your results for validity. If the change is approved, your branch will be merged into the
Project Roadmap & Sprints
Sprint 1: Foundation & Baseline Benchmarking (Week 1)
Goal: The objective for this week is to establish a fully functional, end-to-end data pipeline and train the initial baseline model to create our performance benchmark.
Key Deliverables:
- Data Ingestion & Filtering:
- Establish a connection to the Bronze Layer data.
- Implement the foundational asset quality filters (e.g., fundamental and liquidity screens).
- Feature Engineering Pipeline:
- Develop and validate the initial feature engineering scripts for the baseline model (Momentum & Regime features).
- Validation Harness Construction:
- Build and test the standardized walk-forward cross-validation framework that all squads will use.
- Initial Model Benchmark:
- By Friday, execute the first run of the baseline
XGBoostmodel.
- Formally document its initial performance metrics (IC, Sharpe Ratio) as the official project benchmark.
- By Friday, execute the first run of the baseline
Sprint 2: Experimental Sprints & Performance Analysis (Week 2)
Goal: This week is focused on running the planned feature experiments, analyzing the results, and synthesizing our findings into a final recommendation for the V1 production model.
Key Deliverables:
- Parallel Feature Experimentation:
- Each squad will conduct their assigned feature experiment (e.g., Derivatives, Fama-French).
- Run their feature set through the established pipeline and validation harness.
- Hyperparameter Optimization:
- Implement a hyperparameter tuning process using a library like
Optunato optimize the baseline model.
- Implement a hyperparameter tuning process using a library like
- Comprehensive Performance Analysis:
- Conduct a deep-dive analysis of the results.
- Include feature importance (SHAP values).
- Analyze regime-specific performance (bull vs. bear markets).
- Final Report & Recommendations:
- By Friday, produce the final analysis report in our shared analysis document.
- Detail the results of each experiment.
- Provide a clear recommendation for the
Model_V1feature set.
Project Goals & Risk Management
Success Criteria for Our Research Sprints
Our success will not be measured by hitting arbitrary performance targets, but by our ability to rigorously execute our research plan and generate clear, actionable insights. The project is considered a success if we achieve the following:
- Primary Goal: Validate or Invalidate Hypotheses. For each feature squad, successfully complete the backtest and deliver a final report with a clear, data-driven conclusion on whether their feature set improved the baseline model's Information Coefficient.
- Secondary Goal: Create a Robust Baseline. Deliver a final, validated baseline model that incorporates any new features that proved to be successful during the experimental sprints.
- Process Goal: Establish a Standardized Workflow. Successfully implement and follow the
branch -> test -> measure -> mergeworkflow, creating a reusable and scientific process for all future model development.
Key Learning Objectives
By the end of this project, the team will have gained practical experience in:
- Systematic Feature Engineering: Moving beyond simple indicators to test and validate complex feature families.
- Financial Model Evaluation: Using industry-standard metrics (IC, Sharpe Ratio, Max Drawdown) to evaluate strategy performance.
- Robust Backtesting: Implementing and interpreting the results from a rigorous walk-forward validation framework.
- Collaborative ML Development: Using a shared codebase and version control (Git) to manage a collaborative research project.
Risk Management & Contingency Plans
We will proactively manage the following potential risks:
- Technical Risks:
- Data Quality Issues: We will implement robust strategies for handling missing data and outliers, led by our Data Quality squad.
- Overfitting: We will mitigate this through the mandatory use of our strict walk-forward validation framework.
- Feature Multicollinearity: We will perform regular correlation analysis and use feature selection techniques to ensure model stability.
- Timeline Risks:
- Model Complexity: If a proposed feature proves too complex to implement quickly, we will time-box the research and prioritize simpler alternatives to maintain momentum.
- Integration Challenges: We will hold daily stand-ups within each squad and a weekly cross-team sync to address any integration issues immediately.
Quantitative Heuristics for Merging New Features
While our primary goal is validating hypotheses, we will use the following quantitative heuristics to guide our decisions. A feature experiment will generally be considered successful and a candidate for merging into the baseline model if it achieves the following in a rigorous walk-forward validation:
- Information Coefficient (IC): Demonstrates a consistent IC greater than 0.02, indicating a clear predictive signal.
- Hit Rate: Achieves a directional accuracy greater than 55%, showing it predicts the market's direction better than chance.
- Risk-Adjusted Contribution: Has a positive impact on the baseline model's Sharpe Ratio without significantly increasing the Max Drawdown, ensuring the new feature adds value without adding undue risk.
Team Communication & Collaboration Cadence
To ensure our squads stay aligned and information flows freely, we will adhere to the following simple meeting cadence:
- Daily Squad Stand-up (Slack): Each research squad will share a brief daily sync message to discuss progress, plan the day's work, and identify any blockers. Please use this opportunity to tell people what you’re working on and coordinate with teammates.
- Weekly Cross-Squad Sync (30 mins): A single weekly meeting for all ML team members to share key findings, discuss challenges, and ensure there are no overlapping efforts.
- End-of-Sprint Demo & Review (30 mins): At the end of each sprint, the squads will present their final reports and "yes/no" conclusions on their hypotheses to project stakeholders.