Backtesting Data: What You Need and Where to Find It
Backtesting data is the historical price, volume, and market information that a trading strategy runs against to simulate past performance. Without clean, relevant data, your backtest results actively mislead you instead of informing you.
Key Takeaways
- Backtesting data quality matters more than strategy complexity. Clean data is the foundation of reliable results.
- Use at least 100 trades and two market regimes in your backtest sample to avoid overfitting to one market condition.
- Check every symbol for stock splits, dividends, and corporate actions before running any backtest to avoid data artifacts.
- Test your strategy at multiple timeframes and data frequencies before trusting the results from a single dataset.
- Reserve 20% of your historical data as an out-of-sample period that you never touch during strategy development.
What Types of Data Do You Need for Backtesting?
The type of data you need depends entirely on the strategy you plan to test. A simple moving average crossover on SPY only requires daily OHLC (open, high, low, close) prices. A mean-reversion scalping strategy on NQ futures needs minute-level or tick data to capture the small intraday moves it trades on. Most retail backtests rely on four data types. OHLC data is the baseline: each bar has open, high, low, and close prices. Volume data shows how many shares or contracts traded. Adjusted price data accounts for dividends and stock splits so total return calculations stay accurate. Tick data records every individual trade and works best for high-frequency or scalping strategies. Fundamental data, such as earnings reports, P/E ratios, and institutional holdings, is needed for strategies that trade on company health rather than price movement alone. I do not recommend mixing fundamental and price data in the same backtest unless you are very clear on how often each updates.
- OHLC price data: the minimum requirement for most strategies
- Volume data: confirms whether price moves have participation
- Adjusted close: corrects for dividends and splits to show true return
- Tick data: every trade recorded, essential for scalping and HFT
- Fundamental data: for strategies tied to earnings or valuation
Where to Find Reliable Backtesting Data
TradingView provides the most accessible source for backtesting data if you are writing Pine Script. The built-in data covers stocks, forex, crypto, and futures with daily, hourly, and minute resolution. You can access it directly from Pine Script using the security() function to fetch any symbol and timeframe. For traders who prefer to backtest outside TradingView, free and paid options exist. Yahoo Finance offers free daily OHLC data for US equities. QuantConnect provides free minute-level data for stocks and futures through its API. Polygon.io charges for higher quality, cleaner data with better coverage of corporate actions. I tested a EURUSD trend-following strategy on TradingView data and got very different Sharpe ratios depending on whether I used daily or 4-hour bars. The daily bars smoothed out noise and made the strategy look profitable. The 4-hour bars showed the real story: a strategy that barely broke even after slippage. That difference taught me to always test at multiple timeframes before trusting any single result.
- TradingView: best for Pine Script users, built-in data across all asset classes
- Yahoo Finance: free daily OHLC, limited to US equities
- QuantConnect: free minute-level data for stocks and futures
- Polygon.io: paid premium data with corporate action adjustments
- Always test at multiple timeframes to avoid data frequency bias
How Data Quality Destroys Backtest Reliability
Bad data produces bad backtests. This sounds obvious, but it is the most common mistake I see in published trading strategies. Survivorship bias is the worst offender: if your stock data only includes companies that still trade today, you miss the bankrupt and delisted ones that would have destroyed your returns. A backtest on SPY components from 2000 misses Enron, WorldCom, and Lehman Brothers. Look-ahead bias is another trap. It happens when your backtest uses data that would not have been available at the time of the trade. For example, using an adjusted close that incorporates a dividend announcement from the following week makes past trades look better than they really were. Synchronization matters too. I once compared two forex broker data feeds for the same EURUSD period. The timestamps differed by up to 8 seconds on hourly bars. That gap changed the strategy entry prices enough to shift the profit factor from 1.8 to 1.2. Eight seconds. Always verify that your timestamps are in sync.
- Survivorship bias: delisted stocks missing from data inflates returns
- Look-ahead bias: future information leaking into past signals
- Dividend and split adjustment errors distort total return
- Timestamp synchronization: off by seconds changes entry prices
- Split-adjusted data without reverse splits creates artificial gaps
My First Backtest Data Mistake
I backtested a QQQ mean-reversion strategy using 5-minute bars from 2020 to 2023 and found what looked like a solid edge: 1.8 profit factor, 58% win rate. The strategy bought after a 1% drop in 20 minutes and sold when price returned to the 20-period SMA on a 5-minute chart. I was ready to trade it live. When the live results consistently underperformed the backtest, I dug into the data and found the problem. My historical data did not account for the three stock splits QQQ had during that period. The raw price bars showed one continuous series. In reality, each split caused a gap that my strategy treated as a normal price drop. The backtest bought every single split as if it were a dip. The live strategy had no splits to trade, so it generated far fewer signals. That experience changed how I prepare data. I now check every symbol for splits and dividends before running any backtest. QQQ split 4:1 in January 2022 and 3:1 in January 2024. Those are not dips. They are mechanical adjustments that no strategy should trade on.
How Much Data Is Enough for a Meaningful Backtest?
Two rules of thumb guide how much backtesting data you need. First, aim for at least 100 trades in your backtest sample. A sample of 20 trades can look amazing or terrible by pure luck. One hundred trades gives the law of large numbers room to work. Second, your data window should include at least two different market regimes. A backtest that only covers the 2020-2021 bull market is not a test. It is a bull market diary. A balanced approach: for a daily strategy on SPY, use 10-15 years of data. That captures multiple bull and bear cycles. For a 5-minute strategy on NQ, 2-3 years of tick or minute data is usually sufficient because the number of bars is far higher. Always reserve the most recent 20% of your data as an out-of-sample period that you do not touch during strategy development.
- Minimum 100 trades for statistical significance in your backtest sample
- At least two market regimes: bull, bear, or sideways
- 10-15 years for daily strategies on major indices like SPY
- 2-3 years for intraday strategies on liquid instruments like NQ
- Reserve 20% of data as out-of-sample to avoid overfitting
This page is for informational purposes only and does not constitute investment advice. All trading and backtesting carries substantial risk of loss. Past performance does not guarantee future results. Always consult a qualified financial advisor before making trading decisions.