Stock signal model (FinBERT + EDGAR fundamentals)
Personal R&D project combining market time series, SEC EDGAR fundamentals, and transformer-based text embeddings (FinBERT) to predict whether a stock is likely to meet a target gain threshold within a defined window.
Role
Designer and engineer (personal project)
Timeframe
Ongoing (started 2023; latest iteration 2025)
Outcomes
- Built an end-to-end feature pipeline that aligns heterogeneous sources (price/volume, filings-derived signals, text embeddings).
- Iterated from technical-analysis-only baselines to a multi-modal feature set and strategy-aware backtesting.
- Kept evaluation honest by planning for leakage controls and walk-forward validation before claiming headline metrics.
Stack
- Python
- SEC EDGAR fundamentals
- FinBERT embeddings (dimensionality reduction for training efficiency)
- Neural network classifier + backtesting loop (strategy constraints like trailing stops)
Writeup
The core challenge is alignment: joining market history with filing-derived fundamentals and NLP embeddings at the right grain and time horizon. I built a dataset construction pipeline that produces model-ready examples per symbol/window, experimented with reducing FinBERT embeddings to a smaller representation to keep training efficient, and used backtesting to connect model outputs to a concrete decision strategy rather than treating it as a purely offline classification task.