💹 Project Plan: A Collaborative Framework for Model Development


Project Plan: A Collaborative Framework for Model Development

Objective

This document outlines our unified strategy for building a robust, production-grade crypto trading model. To balance the immediate needs of our modeling teams with our long-term architectural goals, we will adopt a "Parallel Paths" approach.

Our core modeling work will follow a scientific method: we will establish a single baseline model and then quantitatively measure the impact of new features and ideas. The focus will be on a regression approach to predict continuous 1-day forward returns, allowing for more nuanced strategy development than simple classification.


The Baseline Model: Our Source of Truth

To ensure a scientific process, we will establish a single baseline model stored in our main GitHub branch. This model serves as our benchmark, and all feature experiments will be measured against its performance.

Core Feature Hypotheses to Test

Our team’s prior research has identified several feature families with high potential. The squads will treat these as our primary hypotheses to test against the baseline model:

  1. Hypothesis 1: Derivatives Alpha. Features derived from Open Interest (OI) and Funding Rates will significantly increase the model's Information Coefficient.
  1. Hypothesis 2: Market Regime Context. Adding market dominance features (e.g., btc_dominance) will improve the model's performance during different market regimes.
  1. Hypothesis 3: Volume Signals. Incorporating normalized volume signals (Volume Z-scores) will enhance the model's ability to confirm price trends.
  1. Hypothesis 4: Fundamental Quality Filter. Using on-chain data (TVL momentum, token unlocks) as a pre-modeling filter will improve the quality of our asset universe and lead to a more robust final signal.

Technology & Validation


(Parallel Effort): Long-Term Cloud Foundation

While the ML team develops features in pandas, the data engineering team will work in parallel, focusing on building our future-state architecture.


Squad Missions & Technical Specifications

Official Toolkit

To maintain consistency across experiments, all squads will utilize the following core libraries:


Research Squad 1: The Baseline Model (Momentum & Regime)

Squad Assignment:

Mission:
Your squad's mission is to build, tune, and validate the foundational
baseline model for the entire project. This model will serve as the "control group" and official source of truth. Its performance, measured by the Information Coefficient (IC), will be the benchmark that all other feature squads must demonstrably beat.

Phased Research Plan:
To accomplish this mission scientifically, your work will be divided into two phases to isolate the impact of each feature family.

Phase 1: Establish the Core Momentum Baseline
First, establish a stable benchmark using only the highest-priority RSI features.

Phase 2: Test the Market Regime Hypothesis
Once the momentum baseline is stable, test the core hypothesis by adding the market regime features.

Technical Specifications:

  1. Data Filtering: The model must be trained on a high-quality asset universe. Before feature engineering, the Bronze Layer data will be filtered using the following criteria:
# Apply asset filters (your discovery: fundamentals as filters)
filtered_data = data[
    (data['tvl_momentum_30d'] > 0.5) &      # Strong DeFi growth
    (data['unlock_pct_30d'] < 0.1) &        # Low unlock risk
    (data['is_stablecoin'] == False) &      # Quality assets only
    (data['is_meme_coin'] == False)
]
  1. Model Architecture: The project's baseline architecture is an XGBoost Regressor.
  1. Target Variable: The model will predict the continuous 1-day forward return.
  1. Validation: All tests must use the standardized walk-forward validation framework with monthly retraining.

Definition of Done

This squad's work is complete when you deliver the final, stable baseline model that incorporates the best combination of momentum and regime features. The deliverable must include a report detailing the final IC of the combined model and the measured performance lift that was achieved by adding the regime features in Phase 2. This model will then become the new official benchmark for the entire project.


Squad 2: The Derivatives & Volatility Experiment


Technical Specifications

1. Hypothesis to Test (Feature Set):

You will add the following feature families to the baseline model to test their predictive power.

2. Phased Feature Testing:

To precisely measure the contribution of each feature family, you will test them in sequence:

This approach will allow you to determine if one family provides a stronger signal than the other, or if their combined effect is necessary to achieve a performance boost.

3. Data Filtering:

4. Target Variable:

5. Model & Validation:


Definition of Done

This squad's work is complete when you deliver a report detailing the final IC of the model with your new features. The report must clearly show the performance lift (or degradation) compared to the Squad 1 baseline, providing a clear "yes" or "no" on the hypothesis that derivatives data adds value.


Squad 3: The Fama-French & Cross-Asset Factor Experiment


Technical Specifications

1. Hypothesis to Test (Feature Set):

You will research and implement the following feature families to test their predictive power against the baseline model.

2. Phased Research Plan:

To isolate the impact of each signal type, your squad will test the hypotheses sequentially:

3. Data Filtering:

4. Target Variable:

5. Model & Validation:


Definition of Done

Your final report should not only provide a "yes/no" on the Fama-French hypothesis but also comment on the value of the Cross-Asset signals. This phased approach will help us understand if one feature family is significantly more powerful than the other, or if they are most effective when combined.


Project & Inter-Team Lead

John Swindell

My goal will be to coordinate between the squads, ensure the teams are unblocked, manage the project roadmap, and communicate progress to leadership. Feel free to reach out if you need any help!


Future Research: Advanced Model Architectures

The following ideas for advanced model configurations are not part of our immediate plan but should be considered our experimental backlog.

Once our feature engineering sprints with the baseline XGBoost Regressor are complete and we have a robust set of validated features, we can begin testing these alternative architectures to see if they provide an additional performance lift. The goal is to separate feature discovery from model discovery to ensure a clear, scientific process.

Experiment A: Regularized Linear Models

Experiment B: Advanced Tree Ensembles

Experiment C: Non-Parametric Models

Experiment D: Deep Learning Models


Standardized Performance & Evaluation Metrics

To ensure all feature experiments can be compared objectively, every squad will evaluate their model's performance using the following standardized set of financial and regime-specific metrics. These metrics go beyond simple model accuracy and measure the real-world financial viability of our strategies.

1. Core Financial Metrics

All backtests must report on the following core metrics to provide a holistic view of performance. These will be calculated using a shared utility function.

2. Regime-Specific Analysis

To ensure our model is robust, we must understand how it performs under different market conditions. Performance will be calculated and reported separately for the following regimes:


The ML4T Research Workflow: From Data to Strategy

To ensure our research is systematic and reproducible, each strategy will follow the established Machine Learning for Algorithmic Trading (ML4T) workflow. This framework provides a structured, scientific process for developing and validating our models from start to finish.

The cycle consists of the following key stages:

  1. Data Sourcing & Feature Universe (Bronze → Silver):
    • Input: We begin with our standardized data from the Bronze Layer.
    • Action: We apply the project's core asset filters (e.g., TVL momentum, no meme/stablecoins) to define our investment universe.
    • Output: A clean, filtered dataset ready for feature engineering in the Silver Layer.
  1. Hypothesis-Driven Feature Engineering:
    • Input: The filtered dataset.
    • Action: Each squad engineers its assigned core features. For example, the Derivatives squad will calculate oi_change_24h and funding_rate_zscore, while the Momentum squad focuses on the RSI Family. This is where your team’s validated insights are implemented as code.
    • Output: A rich feature matrix (X) for model training.
  1. Model Training & Optimization:
    • Input: The feature matrix and the target variable (e.g., 1-day forward returns).
    • Action: We train our chosen primary model architecture (e.g., XGBoost, Random Forest). Hyperparameter tuning will be conducted systematically, using tools like Optuna as specified.
    • Output: A trained model ready for evaluation.
  1. Walk-Forward Backtesting & Evaluation:
    • Input: The trained model and out-of-time-sample data.
    • Action: The model's predictive performance is rigorously evaluated using our established walk-forward validation framework. We measure success using the Metrics defined in this plan, focusing on the Information Coefficient (IC), Sharpe Ratio, and Hit Rate.
    • Output: A set of performance metrics and analytics (e.g., feature importance, regime-specific performance).
  1. Signal Integration & Review:
    • Input: Performance metrics and model predictions (signals).
    • Action: For Strategy 3, signals from different models and feature sets are integrated. For all strategies, the results are documented and reviewed.
    • Output: A decision on whether the new feature/model is successful and should be merged into our baseline, or if the hypothesis needs refinement. This marks the end of one cycle and the beginning of the next.

This research workflow ensures that every idea is tested against the same rigorous standards, making our results comparable and robust.


Development & Collaboration Workflow: A Step-by-Step Guide

To complement the research workflow, our team will use a standardized development cycle based on software engineering best practices. This ensures all contributions are isolated, tested, and systematically integrated into the main codebase, preventing conflicts and maintaining a single source of truth.

Our development cycle is: branch → test → measure → merge if successful

Here is the step-by-step process for every team member:

  1. Step 1: Create a Branch from the Baseline Model
    • Action: Before starting any new work, pull the latest version of the main branch from our GitHub repository. Create a new branch with a descriptive name.
    • Example Name: feature/strategy1-volatility-bands or fix/strategy2-data-pipeline.
    • Purpose: This isolates your work from the stable baseline, allowing you to experiment freely without affecting the work of others.
  1. Step 2: Develop & Test Your Feature or Model
    • Action: In your branch, implement the required changes. This could be adding a new feature, tuning a model, or fixing a bug. Run your code and conduct initial tests in your local environment to ensure it works as expected.
    • Purpose: To complete the core development task of your assignment.
  1. Step 3: Measure the Impact
    • Action: Run a backtest using the established walk-forward validation framework. The primary goal is to objectively measure whether your change improved the model.
    • Success Criteria: Your change is considered successful if it meets the project's "Quantitative Heuristics for Merging New Features," primarily demonstrating a consistent IC improvement over the baseline.
    • Purpose: To validate your hypothesis with data before proposing it for inclusion in the main model.
  1. Step 4: Create a Pull Request (PR)
    • Action: Once your feature is validated, push your branch to GitHub and open a Pull Request to merge it into the main branch. In the PR description, clearly state the hypothesis you were testing, summarize the results (including the final IC score vs. the baseline's IC). Paste key charts or a summary table directly into the PR description to make the review self-contained and easy to track in GitHub's history.
    • Purpose: To formally propose your change and provide context for code reviewers.
  1. Step 5: Review & Merge
    • Action: Another team member will review your code for quality and your results for validity. If the change is approved, your branch will be merged into the main branch. The baseline model is now updated with your successful feature.
    • Purpose: To ensure code quality, validate results, and complete the integration of a successful experiment. Your cycle is now complete, and you can start on the next task by creating a new branch.

Project Roadmap & Sprints


Sprint 1: Foundation & Baseline Benchmarking (Week 1)

Goal: The objective for this week is to establish a fully functional, end-to-end data pipeline and train the initial baseline model to create our performance benchmark.

Key Deliverables:


Sprint 2: Experimental Sprints & Performance Analysis (Week 2)

Goal: This week is focused on running the planned feature experiments, analyzing the results, and synthesizing our findings into a final recommendation for the V1 production model.

Key Deliverables:


Project Goals & Risk Management

Success Criteria for Our Research Sprints

Our success will not be measured by hitting arbitrary performance targets, but by our ability to rigorously execute our research plan and generate clear, actionable insights. The project is considered a success if we achieve the following:


Key Learning Objectives

By the end of this project, the team will have gained practical experience in:


Risk Management & Contingency Plans

We will proactively manage the following potential risks:


Quantitative Heuristics for Merging New Features

While our primary goal is validating hypotheses, we will use the following quantitative heuristics to guide our decisions. A feature experiment will generally be considered successful and a candidate for merging into the baseline model if it achieves the following in a rigorous walk-forward validation:


Team Communication & Collaboration Cadence

To ensure our squads stay aligned and information flows freely, we will adhere to the following simple meeting cadence: