OHLC data quality for backtesting: the complete guide 2026

Q: What is OHLC data in backtesting?

OHLC (Open, High, Low, Close) data represents the opening price, highest price, lowest price, and closing price of a given time bar. In backtesting, it is the raw material of every simulation: each buy or sell signal is calculated from these four values. A valid OHLC bar must satisfy High greater than or equal to Open, Low, and Close, and Low less than or equal to Open, High, and Close. Any bar violating this rule is corrupted and must be excluded before calculation.

Q: How do you check OHLC data quality?

OHLC validation follows four steps: (1) mathematical consistency check (High greater than or equal to max(Open, Low, Close), Low less than or equal to min(Open, High, Close)); (2) gap detection by comparing consecutive timestamps against the expected timeframe; (3) duplicate search for identical timestamps in the series; (4) timezone alignment verification. Platforms like Backtrex automate all these checks before every backtest run.

Q: What is repainting and why does it distort backtesting?

Repainting is the retroactive modification of an indicator's historical values. A repainting indicator shows perfect entries on past bars in a backtest, but those entries would not have been available at that exact moment in real time. The result is artificial historical performance that cannot be reproduced in live trading. The fix is to always use the previous confirmed bar's data (close[1]) instead of the current unclosed bar (close[0]).

Q: What are the best free OHLC data sources for backtesting?

For Forex and CFDs, Dukascopy provides free tick data with millisecond precision for over 700 instruments. For equities, Yahoo Finance covers 20 to 30 years of history but requires careful verification of split adjustments. TradingView offers solid data for liquid markets, with extended API access on Premium plans. For professional use, Refinitiv (LSEG) or TickData are the institutional references.

Q: How do you detect corrupted OHLC data in a dataset?

Three main warning signs: (1) bars where High is below Close or Open, or Low is above Close or Open; (2) sudden price jumps inconsistent with the asset's typical volatility, often a dividend adjustment artifact; (3) duplicate timestamps or bars in non-chronological order. A simple Python validation script can detect these anomalies in seconds across years of historical data.

Q: Can you backtest with daily data or do you need minute-level data?

It depends on your strategy. Swing strategies based on daily close signals work well with daily bars. Intraday, scalping, or session-based strategies (London, New York opens) require at least M1 data. With daily bars, you cannot model intraday spreads, stop hunts, or the intrabar price movements that directly impact your stop loss and take profit levels.

Q: Does Backtrex automatically validate my OHLC data before a backtest?

Yes. Backtrex includes a native OHLC validation layer that checks mathematical consistency on every bar, detects abnormal gaps, and enforces the anti-repainting rule across all strategy blocks. If corrupted data is detected, the platform flags it clearly before launching the computation, preventing you from interpreting results built on defective data.

Published on June 24, 202613 min read

BacktestingOhlcHistorical-dataRepaintingValidation

Backtrex Team

Backtrex Team · The team behind Backtrex, the visual no-code backtesting platform for traders.

Validating OHLC data quality before backtesting means checking the mathematical consistency of every bar (High greater than or equal to Open, Low, and Close; Low less than or equal to Open, High, and Close), verifying the absence of gaps, and confirming temporal reliability across the full historical dataset. Without this step, a backtest can show outstanding results that will never repeat in live trading, leading to capital decisions built on fictional performance.

Why data quality is critical in backtesting

Garbage in, garbage out: the fundamental principle

In quantitative finance, the principle of "garbage in, garbage out" is foundational: if you feed corrupted data into your backtesting system, it will return corrupted results. A perfectly coded backtest cannot compensate for defective source data.

According to the AMF (Autorité des Marchés Financiers), between 74% and 89% of retail client accounts lose money when trading CFDs and Forex. A significant portion of these losses stems from a poor pre-trade evaluation of strategies, often rooted in backtests built on low-quality data that paints an unrealistically positive picture.

A single corrupted OHLC bar in your dataset can cause:

A buy or sell signal triggered at an impossible price (for example, a Low above a High)
An underestimated maximum drawdown because a high-amplitude bar was stripped from the dataset
An artificially inflated Sharpe ratio caused by missing volatility on certain periods

The perfect backtest trap

A backtest showing 95% winning trades over 5 years should immediately raise red flags. Before searching for an error in your strategy logic, audit the integrity of your OHLC data. A near-perfect track record is almost always a data artifact.

The 3 most common types of OHLC data errors

OHLC data can be corrupted in three main ways, each with a distinct impact on your backtest results.

Error type	Description	Impact on backtest
OHLC inconsistency	Low above High, Close above High, or Close below Low	Signals fired at prices that never existed in the market
Gaps and missing bars	Missing bars during open market hours (server outages, export failures)	Underestimated drawdown, unrealistic price jumps between consecutive bars
Duplicates and bad timestamps	Two bars sharing the same timestamp, or bars in non-chronological order	Miscalculated indicators, non-reproducible results across runs

Validating OHLC data: the complete checklist

Checking OHLC consistency

The first check is mathematical and must be applied bar by bar. For every bar in your dataset, the following conditions must hold without exception.

Verify High >= max(Open, Low, Close)

If High is below Open, Low, or Close for any bar, that bar is mathematically impossible. Flag it and exclude it from the dataset before any analysis.

Verify Low <= min(Open, High, Close)

If Low exceeds Open, High, or Close, apply the same treatment: the bar is corrupted and must be excluded. This condition is frequently violated in dividend-adjusted equity data.

Check for zero or negative values

A price of zero or below is impossible on standard markets. Any zero in an OHLC series is a data artifact, typically from an export error or a placeholder bar inserted by the data provider.

Check volume when available

On centralized exchanges (CME, Euronext), a bar with zero volume during active market hours is suspicious. It is not strictly an OHLC error, but it warrants investigation before including the period in your backtest.

Detecting gaps and missing bars

A data gap is not always an error: Forex markets close on weekends, equities observe public holidays. The problem arises when a bar is absent during normal market open hours.

To detect gaps, calculate the time difference between consecutive bars and compare it to your expected timeframe. On an H1 chart, any gap exceeding 3,600 seconds during an active session (excluding known market closures) signals a potential missing bar that must be investigated.

Legitimate gaps vs abnormal gaps

On Forex, weekend gaps (Friday 10pm to Sunday 10pm UTC) are expected and normal. On S&P 500 futures (CME), the near-24/5 data has a 60-minute maintenance break each night. Any other interruption during published trading hours warrants a closer look before you trust the surrounding data.

Identifying duplicates and anomalies

A duplicate is a bar whose timestamp appears more than once in the time series. According to the ESMA (European Securities and Markets Authority), MiFID II requires reporting platforms to timestamp all transactions with microsecond precision. Despite this regulatory framework, free historical data sources frequently contain duplicates, especially in datasets adjusted for corporate actions such as stock splits or dividend payments.

Verifying timezone alignment

All data in your dataset must use a consistent, documented timezone. A one-hour offset introduced by a daylight saving transition not handled by your data provider can desynchronize your market sessions and generate fictitious signals at London or New York session opens, where a large portion of daily volatility is concentrated.

Repainting: the enemy of realistic backtesting

Definition and concrete examples

Repainting is the retroactive modification of an indicator's past values. In backtesting, this means the indicator uses future data to calculate its past values, creating the illusion of excellent historical performance that is impossible to reproduce in live trading.

Concrete example: an indicator computing the "highest High of the last 20 bars" using close[0] (the current, unclosed bar) changes its value with every new candle. In a backtest, you see the final locked value, but in real time you would have seen every intermediate value during bar formation. The anti-repainting rule is to always use close[1] (the previous confirmed, closed bar) for any conditional calculation.

For a broader look at common backtesting pitfalls, see our guide on common backtesting mistakes.

How to detect repainting in your data

The most reliable method to detect whether an indicator repaints is the real-time vs retrospective comparison test:

Record the signals generated by the indicator in real time over 2 to 4 weeks
Compare those signals to what the indicator shows retrospectively on the same bars
If the retroactive signals differ from the signals you observed in real time, the indicator repaints

Indicators known to repaint

Zigzag: recalculates its pivot highs and lows with every new bar. Using it in automated backtesting systematically produces fictional results.

Volume Profile (certain implementations): the current profile recalculates in real time, making retroactive entries impossible to reproduce.

Ichimoku cloud (future projection): the forward cloud shifts with each new candle, which can affect retrospective calculations depending on the implementation.

Certain non-normalized adaptive moving averages: some variants recalculate past periods based on recent data, causing subtle repainting.

ATR computed on the current unclosed bar: the value changes until the bar closes, creating partial repainting on the active bar.

Native anti-repainting in Backtrex

Backtrex enforces the anti-repainting rule natively: all strategy blocks use exclusively the data of the previous confirmed, closed bar. This is a technical guarantee built into the platform, not a coding convention. Learn more on the anti-repainting features page.

Reliable data sources for backtesting

Free vs paid data

The choice of OHLC data source matters as much as your strategy logic. Free sources can be sufficient for preliminary analysis but carry documented limitations in quality, completeness, and timestamp precision.

Source	Data type	OHLC quality	Historical depth	Cost
TradingView	Aggregated multi-source	Good (liquid markets)	5 to 15 years depending on the asset	Free / Premium
Yahoo Finance	Dividend and split-adjusted	Fair (split adjustment errors possible)	20 to 30 years (equities)	Free
Dukascopy	Tick with microsecond timestamps	Excellent	10 to 15 years Forex/CFD	Free
Refinitiv (LSEG)	Normalized institutional	Market reference	40+ years	Paid (institutional)
TickData	Validated institutional	Excellent	30+ years	Paid

Comparing sources: TradingView, Yahoo Finance, Dukascopy

TradingView aggregates data from multiple providers (direct exchanges, Quandl, Trading Economics). Quality is generally solid for liquid markets (major Forex pairs, indices, crypto). API access to extended historical data is reserved for Premium subscribers.

Yahoo Finance is a popular source for equities, especially via the Python library yfinance. Its main issue is the dividend and split adjustment process: retroactive adjustment errors regularly introduce inconsistent bars, with an adjusted Close higher than the unadjusted High on certain periods.

Dukascopy provides free tick data with millisecond precision for over 700 Forex and CFD instruments. It is typically the best free option for intraday Forex backtesting. Data is exportable in CSV with precise UTC timestamps, making temporal validation straightforward.

Selection criteria

Before importing any data source into your backtesting workflow, verify these four fundamental points.

Documented timezone: is the data in UTC, EST, or local market time? An undocumented or inconsistent timezone can introduce session offsets on London, New York, or Tokyo opens.

Transparent adjustment policy: for equities, how are splits and dividends handled? Prefer sources that provide both raw and adjusted data, along with a history of adjustments applied.

Versioned correction policy: some providers silently correct past errors, which can change your backtest results between two runs on the same dates. Prefer providers who version their corrections.

Minimum available resolution: for scalping or day trading, you need at least M1 (one-minute) data. Daily bars cannot realistically model intraday spreads, stop hunts, or intrabar price movements.

Automating data validation

Backtrex: built-in OHLC validation

Backtrex integrates a native OHLC validation layer before every backtest. Before computing your strategy, the platform automatically checks the mathematical consistency of every bar, detects and flags abnormal gaps over the selected period, enforces the anti-repainting rule on all strategy blocks, and rejects any corrupted data with a detailed error report.

This automation is especially valuable for no-code traders who do not want to write Python or Pine Script validation scripts. The Backtrex backtest features page details all the technical guarantees the platform provides. To complement your validation process, our guide on overfitting in backtesting explains how to prevent over-optimization once your data is clean.

Backtesting without surprises

Backtrex guarantees that your backtest results are built on validated, repainting-free OHLC data. That is the only way to honestly compare your strategy's performance against real market conditions. View plans and pricing.

Tools for traders who code

If you prefer to validate your data manually before importing it into a backtesting platform, Python libraries pandas and numpy let you check OHLC consistency in a few lines of code. For Pine Script on TradingView, you can add assertions on high >= close and low <= close at the start of your script to flag any corrupted bar in real time.

For a comprehensive view of how live testing complements data validation, see our guide on backtesting vs forward testing.

Important Risk Warning

Trading financial instruments involves significant risk of capital loss. Past performance does not guarantee future results. Backtest results presented on this platform are based on historical data and do not constitute investment advice. You should not invest money you cannot afford to lose. Always consult a qualified financial advisor before making any investment decisions.

Conclusion

OHLC data quality is the invisible foundation of every reliable backtest. A systematic check of mathematical consistency, gaps, duplicates, and timezone alignment is essential before interpreting any result. Repainting remains the most insidious threat: invisible in data logs but devastating for signal reliability, leading to strategies that look exceptional in a backtest and fail immediately in live markets.

Whether you validate your data manually with Python or use a platform like Backtrex that automates these checks, the key principle is never to trust a backtest without first auditing its source data. Explore the Backtrex features page to see how the platform addresses these issues systematically at every step of the backtesting workflow.