Stock signal model (FinBERT + EDGAR fundamentals)

Personal R&D project combining market time series, SEC EDGAR fundamentals, and transformer-based text embeddings (FinBERT) to predict whether a stock is likely to meet a target gain threshold within a defined window.

Role

Designer and engineer (personal project)

Timeframe

Ongoing (started 2023; latest iteration 2025)

Outcomes

  • Built an end-to-end feature pipeline that aligns heterogeneous sources (price/volume, filings-derived signals, text embeddings).
  • Iterated from technical-analysis-only baselines to a multi-modal feature set and strategy-aware backtesting.
  • Kept evaluation honest by planning for leakage controls and walk-forward validation before claiming headline metrics.

Stack

  • Python
  • SEC EDGAR fundamentals
  • FinBERT embeddings (dimensionality reduction for training efficiency)
  • Neural network classifier + backtesting loop (strategy constraints like trailing stops)

Writeup

The core challenge is alignment: joining market history with filing-derived fundamentals and NLP embeddings at the right grain and time horizon. I built a dataset construction pipeline that produces model-ready examples per symbol/window, experimented with reducing FinBERT embeddings to a smaller representation to keep training efficient, and used backtesting to connect model outputs to a concrete decision strategy rather than treating it as a purely offline classification task.

LinkedIn

linkedin.com

GitHub

github.com