dl_backdrop2
MIDS Capstone Project Spring 2025

DeepLearning Sentiment Portfolio (DSP)

ML Driven Portfolio Framework to Adapt to Regime Change

Problem & Motivation 

What emotions do these words stir in you—Dotcom Bubble, Great Financial Crisis, COVID-19, Trade War? Did a chill crawl down your spine? Each time the market undergoes a major shift—economic or not—a great deal of social wealth vanishes. Think you’re safe if you don’t invest directly? Think again. Your social security, pension, or 401(k) can still suffer heavy losses.

Navigating investing through such market shifts, or regime changes, is hardly a new challenge. Yet - despite decades of work by practitioners and researchers, truly satisfying solutions remain rare. We believe this is largely due to existing methods’ limited ability to capture the non-linear, intricate, and dynamic relationships on both individual assets and the market level.

Fortunately, methods such as neural networks are known for their strength in modeling non-linear relationships. Additionally, natural language processing (NLP) opens the door to handling unstructured input, providing new ways to capture signals related to market dynamics.

Our goal in this project is to build a machine learning–based portfolio construction framework that could potentially help both retail and institutional investors better capture market shifts and preserve capital—particularly during periods of regime change.

Data Source & Data Science Approach

Overall Roadmap: we built two main models – a text sentiment model, and a deep learning model. The text sentiment model aims to find the information in financial news so as to provide guidance in market timing decisions, while the deep learning model aims to use the latest neural networks techniques to improve security selection processes. 

Data Source:

Sentiment Model: 

  • We began by assembling a corpus of over 16 million financial news articles spanning from 2000 to 2024, sourced primarily from a Financial News and Stock Price Integration (FNSPID) dataset and platforms such as New York Times. To fill the gap in the early years, we also created custom web scraping tools to collect a reasonable amount of news each day for the first 5-6 years of our dataset.
  • We further screened, cleaned up and merged the massive datasets into a workable size and format. We also engaged in FinBERT, a BERT based transformer model pre-trained on a large corpus of financial text, to conduct EDA analysis to help us understand the distribution of information better.

Deep Learning Model: 

  • The main data source is WDRS (Wharton Research Data Services). More specifically, we merged data from three data providers: CRSP, Compustat, and JKP, which provided return, accounting ratio and many literature-based return metrics. We chose to focus on stocks in the S&P500 universe historically, from 2000 to 2024 with daily frequency. The market level data (S&P index return) comes from Yahoo Finance. After careful cleaning and joining, the combined dataset is about 1GB in size, with ~3mm rows and 30+ columns.
  • We conducted various EDA to check the structural (e.g. missing value) and numerical (e.g. distribution, time series, and correlation) properties of the data. The data is relatively clean but contains unusual outliers occasionally which we subsequently corrected. We also noted that there exist immaterial correlations between ‘return’ and most numerical variables at first glance, inviting us to investigate non-linear relationships.

Data Science Approach

Sentiment Model

  • The text sentiment model aims to utilize finance related news to predict the direction of the next day return of S&P500 index.
  • We utilized the consolidated news data mentioned above, aggregated them into daily frequency leveraging FinBERT’s text embedding and a custom attention-based aggregation method.  Furthermore, we utilized the Hugging Face Trainer class to customize FinBERT’s downstream task from predicting standard sentiments to predict the direction of next day’s S&P500 index return, based on the abstract of news content the day before.
  • Besides the attention-based aggregation plus fine-tune method mentioned above (the ‘Attention-Pooled Fine-Tuned NN’ method), we also tried various other methods for the same task, including random forest, mean-pooling aggregation with fine-tune, prediction without base model fine-tune, and the ensemble method which collectively uses the results from all above models.

Deep Learning Model

  • In essence, we aimed to find a non-linear and dynamic framework for predicting expected security returns— an area where neural networks are particularly well-suited to capture complex relationships and interactions. More specifically, we adopted a transformer architecture (the same cutting-edge technology used in machine translation) to capture both temporal and cross-asset relationships. We used two transformer structures to capture the relationships subsequently.
  • Our goal was to use similar input features as the benchmark model, but apply a different information processing architecture to predict expected returns— ultimately improving security selection.

Deep Learning Model + Sentiment Model

  • Finally, we put the sentiment and deep learning model together, with this question in mind: can we bring market timing (sentiment model output), and security selection (deep learning output) together to improve the portfolio profile in general, especially in crisis time?
  • More specifically, we incorporated the sentiment model inputs in two ways: 1) as an additional input for deep learning model; 2) as a gauge for the portfolio's overall exposure (i.e. if sentiment model predicts that tomorrow's market return is highly likely to be positive, we can increase the overall market exposure in the portfolio).

Infrastructure Specifics

The benchmark and final models are both deployed through AWS EC2, using a FastAPI application to provide endpoints to host the prediction scripts. The original purpose of the EC2 instance was to host the benchmark model, which does not include machine learning components or frameworks that could be worked with in SageMaker AI. The sentiment and deep learning models were both trained in SageMaker AI using the PyTorch estimator. 

The interaction between the Streamlit application and the EC2 instance is facilitated by API Gateway and a Lambda function, pointing to the respective FastAPI endpoint within the instance, with an Elastic IP holding the endpoint constant. The endpoint receives the payload and returns a HTTPS link to an output stored in a publicly readable S3 bucket. DynamoDB is used as a way to store past outputs for future retrieval. Database keys are set to include the input parameters, so repeated requests return the stored output, rather than restarting prediction.

Evaluation

Sentiment Model

  • We evaluated the usefulness of the models by both data science metrics, such as accuracy of the prediction (given the original labels are reasonably balanced), and investment metrics based on a simple trading simulation, such as Sharpe Ratio (a commonly used metrics in finance to measure the return per unit of risk, the higher the better), and max drawdown (another commonly used metrics in finance to measure the largest sudden loss in the investment period, the less negative the better).
  • Notably, our best-performing model is the Attention-Pooled Fine-Tuned NN method, which outpaced the buy-and-hold baseline, and all other alternative methods, achieving 0.44 Sharpe ratio (vs 0.34 for buy-and-hold), and -34% max drawdown (vs -57% in buy-and-hold). 

Deel Learning Model

  • To assess if our machine learning model improves upon the traditional method, we established a statistical based portfolio framework, which utilizes well discussed factors in prior research (i.e. extended Fama-French Factors) as our benchmark. In fact, this choice effectively raises the bar for us, because a well performing statistical model beats the market, and we have to beat the statistical model. Indeed, our benchmark model meaningfully beats the market in the research period (2000-2024).
  • We also evaluated the effectiveness of the model by both the data science metrics, such as training/validation loss (MSE) in return prediction, and investment metrics based on an optimization-based backtest using the predicted return.
  • After extensive logic and numeric based parameter search, we selected the best in-sample performing model based on its Sharpe ratio and maximum drawdown profile. Because the training period includes two major crises, and the validation period contains another crisis, our final model demonstrates reasonably strong performance in the out-of-sample period too. As a result, the selected deep learning model beats the benchmark model by Sharpe ratio (0.95 vs 0.86x) while registering a smaller maximum drawdown (-42% vs -43%).

Deep Learning Model + Sentiment Model

  • Here we want to answer this question: can the sentiment model's probability output (of next day's return being positive) further improve the deep learning model in portfolio risk-return profile, especially in crisis time?
  • Turns out, incorporating sentiment outputs further mitigates the risk of large, sudden losses in our deep learning model, as evidenced by a reduced maximum drawdown (-41% vs 42%). While the improvement is incremental, the model’s performance during crisis periods shows marginal gains—which ultimately aligns with the core objective of our project.

Key Learnings & Impact

Through this project, the team obtained thorough hands-on experience in applying machine learning technologies in the investment space. Specifically, to improve the portfolio risk-return profile through tuning market timing and enhancing security selection compared to a pure statistical method, especially crisis period.

After diligent striving, we were able to produce a prototype portfolio construction framework involving sentiment and transformer deep learning model to improve upon statistical benchmark model, providing a valuable window for further research in this area and democratize this concept to more users outside of the highly specified investment groups.

The team acknowledged and worked through multi-facet challenges, which include decomposing the ultimate project goal into workable components, connecting a highly specialized finance problem to relevant machine learning techniques, collecting and consolidating text data, adapting traditional data science techniques to handle time series data, and attempting to predict market and security returns which are notoriously intricate, noisy and of low signal to noise ratio.  

Had we had more time and resources, we would:

  • Spend more time on data collection for sentiment model to improve data representation in early 2000s
  • Explore reinforcement learning, which could further align the data science metrics (learning loss) with the ultimate portfolio goal (sharpe ratio), replace the traditional optimization method with multi-period reinforcement process, enhance model robustness in different environments, and take model performance to new heights
  • Conduct further finance studies including turnover analysis, transaction cost analysis and implementation study, to translate the prototype to a more realistic product.

Acknowledgements 

We would like to thank the U.C. Berkeley School of Information (The I School) for providing access to WDRS and theNew York Times API. We thank the School for providing valuable classes, such as 266, 267, 255 and 241, which provided us with relevant knowledge and experiences, inspiring us with transferable ideas for our project. Most importantly, we would like to extend gratitude to Joyce and Danielle, our Capstone (210) instructors, for guiding us through the process, proactively providing us with useful materials, having multiple discussions and even for equipping us with connections to help us debunk bottlenecks in the process. To that end, we would also like to thank Tarun Sanghi, who provided additional valuable domain expert feedback and suggestions for our projects. 

References

  • MIDS Capstone Summer 2024, Very Intelligent Portfolio
  • Larry Cao. "AI PIONEERS IN INVESTMENT MANAGEMENT"
  • Li, Haifeng, and Mo Hai. "Deep Reinforcement Learning Model for Stock Portfolio Management."
  • Wang Z, Huang B, Tu S, Zhang K, Xu L (2021) Deeptrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding.
  • Srijan Sood, 1 Kassiani Papasotiriou, 1 Marius Vaiciulis, 2 Tucker Balch, J.P. Morgan AI Research, Deep Reinforcement Learning for Optimal Portfolio Allocation: A Comparative Study with Mean-Variance Optimization
  • Alfie Brixton, Jeff Cao, Pete Hecht, and Bryan Kelly, AQR Alternative Thinking 2024, Can Machines Build Better Stock Portfolios? The Virtue of Complexity in the Cross-Section of Stocks
  • Antoine Didisheim, Shikun Ke, Bryan Kelly, and Semyon Malamud, APT or “AIPT”? The Surprising Dominance of Large Factor Models
  • Z. Dong (2024). FNSPID: Financial News and Stock Price Integration Dataset, official GitHub Repository
  • Theis. Jensen, Bryan Kelly, Lasse H. Pedersen, Global Factor Data (jkpfactors.com)
Last updated: April 16, 2025