Trading strategy metrics: how to tell whether a strategy is good

A good trading strategy is not the one with the highest return in a single backtest. Return without context says very little: it can come from excessive leverage, one rare lucky event, parameter overfitting, or a risk that simply has not appeared in the selected history yet.

That is why strategies are evaluated through a set of metrics. Some show whether the system has positive expectancy. Others describe how severe the drawdowns were, how often capital stayed below its previous high, how much return came per unit of risk, and whether the equity curve looks like a stable process or a single lucky episode.

The main mistake is to look for one final number. Profit factor, Sharpe ratio, Sortino ratio, max drawdown, CAGR, and Calmar ratio answer different questions. A strong strategy should look convincing not in one metric, but in the combination of them.

Profit factor

Profit factor shows the ratio of gross profit to gross loss:

profit factor = gross profit / gross loss

If a strategy made $120,000 on winning trades and lost $80,000 on losing trades, its profit factor is 1.5. Formally, this means that every dollar of loss was matched by 1.5 dollars of profit.

The metric is useful because it looks not only at final PnL, but at the relationship between money won and money lost. But it is easy to misread when there are few trades or the distribution of outcomes is highly asymmetric. A strategy can have a high profit factor after several large winners while still depending on rare events. The opposite is also possible: a long sequence of small gains and one large loss can look acceptable until that tail risk appears in the sample.

So profit factor should be read together with the number of trades, average win, average loss, the largest losing trade, and stability across neighboring periods. By itself, it answers the question "did profits cover losses in this history", but it does not prove that the relationship will persist.

Win rate and expectancy

Win rate is the share of profitable trades:

win rate = winning trades / all trades

At first glance, a high share of winning trades looks like a sign of quality. In practice, it is one of the most dangerous metrics when viewed alone. A strategy with an 80% win rate can lose money if the average loss is much larger than the average win. A strategy with a 35% win rate can be profitable if the rare winners are large enough.

That is why win rate needs expectancy next to it: the average expected result per trade.

expectancy = win rate * average win - loss rate * average loss

Expectancy turns trade frequency and trade size into a more honest question: how much did the strategy make or lose on average per trade before position scaling. If expectancy is positive, the strategy at least had a historical edge. If it is negative, a high win rate does not help: the system is right often, but wrong too expensively.

In algorithmic trading, it is especially important to calculate expectancy after fees, spread, slippage, and funding. A small edge before costs can disappear completely if the strategy trades often or operates in thin markets.

Sharpe ratio

Sharpe ratio connects excess return with the volatility of returns:

Sharpe ratio = (portfolio return - risk-free rate) / standard deviation of returns

The idea goes back to William Sharpe's work on the reward-to-variability ratio: return should not be evaluated in isolation, but relative to the risk taken to obtain it. 1 Sharpe later described a broader version of the metric as a way to compare differential return with its variability. 2

The practical meaning is simple: two strategies can have the same CAGR but very different paths to it. If one grew relatively smoothly and the other reached the same result through sharp swings, the first one will have the higher Sharpe ratio.

The limitation of Sharpe ratio is that standard deviation penalizes all volatility: both downside and upside. For strategies with return distributions close to normal, this can be a reasonable approximation. For strategies with asymmetry, rare large losses, or strong dependence on tail risk, Sharpe can look better than the real risk profile.

Another problem is periodicity. A daily Sharpe annualized into a yearly number assumes statistical stability and often hides clustered losses. If a strategy earns a little every day and occasionally drops sharply, average volatility can underestimate the risk of a break in the equity curve.

Sortino ratio

Sortino ratio is similar to Sharpe, but its denominator uses downside deviation: deviations below a chosen minimum acceptable return.

Sortino ratio = (portfolio return - minimum acceptable return) / downside deviation

The point is that investors are usually concerned not with any volatility, but with negative deviations. A large positive day is not a problem; a large negative day changes capital risk. CFA Institute describes the Sortino ratio as a variation of Sharpe in which the minimum acceptable return replaces the risk-free rate and downside deviation replaces standard deviation. 3

Sortino is useful for strategies with asymmetric outcomes: trend following, breakout systems, option structures, and portfolios with rare large moves. It helps separate "uneven but mostly positive" returns from returns where most of the variability comes through drawdowns.

But Sortino is not a magic shield either. It depends on the chosen threshold, the length of the history, and the way downside deviation is calculated. If the sample has not yet included a real stress period, downside risk will look softer than it actually is.

Volatility

Volatility usually measures the dispersion of returns around their mean. In classical portfolio theory, portfolio risk depends not only on the expected return of individual assets, but also on the variability and relationships among their returns. 4

For a trading strategy, volatility shows how unevenly results are distributed over time. All else equal, lower volatility makes a system easier to scale: it is easier to set limits, calculate margin, withstand drawdowns, and compare the strategy with other sources of risk.

But low volatility does not always mean low risk. Strategies that sell insurance, average down against the move, or collect a small premium for rare risk can show a calm equity curve for a long time. Until an event arrives that did not fit into ordinary daily volatility.

So volatility should be used as a description of the normal regime, not as a complete measure of danger. For strategies with tail risks, it needs to sit next to max drawdown, tail losses, stress tests, and liquidity scenarios.

Max drawdown

Max drawdown is the maximum fall in capital from a local peak to a subsequent trough:

max drawdown = (trough equity - peak equity) / peak equity

This metric answers the question of how deep a historical capital hole had to be endured. Unlike volatility, drawdown is path-dependent: not only the dispersion of returns matters, but also the order of losses.

For example, two strategies can have the same average return and volatility but a different sequence of outcomes. If losses are clustered, the equity curve goes deeply underwater. If they are mixed with profitable periods, the drawdown can be much milder.

Max drawdown is especially important because recovery is nonlinear. After a 20% loss, a 25% gain is needed to return to the starting level. After a 50% loss, a 100% gain is required. So drawdown depth affects not only psychological resilience, but also the geometry of future returns.

The limitation of max drawdown is that it shows one worst episode in the selected history. It does not say how often drawdowns occurred, how long recovery took, or whether a longer history could have shown a worse scenario. That is why max drawdown should be paired with drawdown duration and time under water.

CAGR

CAGR, or compound annual growth rate, is the average annual capital growth rate with compounding:

CAGR = (ending equity / starting equity)^(1 / years) - 1

This metric is useful for comparing strategies over different horizons. If one strategy was tested for three years and another for seven, the simple total percentage return is not very informative. CAGR brings the result to an annual scale.

But CAGR does not show the path. A strategy with a 25% CAGR and a 60% max drawdown is a very different object from a strategy with a 15% CAGR and a 10% max drawdown. The first may look stronger in a return table, but be unsuitable for real capital if the drawdown exceeds the risk budget.

CAGR is also sensitive to start and end points. If the test begins before a strong trend and ends at a peak, annual growth will be overstated. That is why CAGR should be checked on rolling windows: how does annual return change when the analysis period shifts?

Calmar ratio

Calmar ratio connects return with maximum drawdown:

Calmar ratio = annualized return / absolute maximum drawdown

Unlike Sharpe, where risk is described through standard deviation, Calmar looks at the worst capital decline. In practical performance measurement literature, it belongs to drawdown-based metrics: a strategy is evaluated by how much annual return it produced per unit of maximum drawdown. 5

Calmar is useful where the main risk is not daily unevenness, but capital loss from a peak. For trading systems, this is often closer to reality: an investor or risk manager asks not "what was the standard deviation", but "how far did the strategy fall and was it able to recover".

But Calmar inherits the weaknesses of max drawdown. One worst episode can sharply worsen the metric, while the absence of a major crisis in the history can make it too optimistic. Different calculation windows also give different answers: 36 months, full history, rolling periods are not the same thing.

Equity curve

The equity curve is the chart of a strategy's capital over time. It is not a separate formula, but a visual check of how all the metrics appear dynamically.

A healthy equity curve does not have to be perfectly smooth. Real trading has drawdowns, flat periods, regime changes, and losing streaks. The suspicious part is not unevenness itself, but a shape that does not match the stated logic of the strategy.

Common red flags are:

almost the entire result was made in one short segment of history;
the curve crawls upward in small steps for a long time, then occasionally drops sharply;
recovery after drawdowns takes longer and longer;
the strategy earns only in one market regime;
the equity curve improves sharply after parameter fitting, but breaks on neighboring windows;
capital growth is accompanied by higher leverage rather than a stable edge.

The equity curve is also useful because it shows the strategy's behavior over time, not only the final row in a report. If CAGR is high but the curve consists of long stagnation periods and one lucky jump, that is a different risk from steady accumulation across regimes.

How to read the metrics together

Strategy evaluation does not start with the question "which metric is best", but with the question "what risk does this strategy take to produce return".

A practical frame can look like this:

profit factor shows whether profits covered losses;
win rate and expectancy explain how this happened: frequent small wins or rare large ones;
Sharpe ratio shows return per unit of total volatility;
Sortino ratio clarifies how much return compensated specifically for downside risk;
volatility describes the normal unevenness of the process;
max drawdown shows the worst historical capital decline;
CAGR normalizes growth to an annual horizon;
Calmar ratio links CAGR with drawdown depth;
the equity curve shows whether final numbers hide one lucky episode or a fragile risk shape.

A good strategy usually does not look perfect in every metric. Trend following can have a low win rate but positive expectancy because of large trends. Mean reversion can have a high win rate but be vulnerable to rare large losses. Market making can show a smooth equity curve until inventory risk or adverse selection appears.

So strategies should be compared within their class, horizon, trading frequency, and execution model. For tools like ai-trader, the practical value of these metrics is not a pretty report, but discipline: fixing the risk budget, detecting strategy degradation, comparing live results with the backtest, and noticing in time when the risk profile no longer matches the original hypothesis.

Conclusion

A strategy does not become good only because it has a high CAGR, a nice profit factor, or a Sharpe ratio above neighboring variants. Every metric compresses complex capital behavior into one number and therefore necessarily loses something.

A more reliable approach is to look at the combination: is expectancy positive after costs, does the result depend on rare lucky trades, how deep and long are the drawdowns, does return compensate for the risk taken, does behavior persist across different windows, and does the equity curve avoid looking like a product of overfitting.

A good strategy is a testable system with a clear source of edge, limited downside, and behavior that can be explained before capital is deployed, not only after a successful backtest.