Deep Reinforcement Learning (DRL) is increasingly becoming popular for developing financial trading agents. Nevertheless, the nature of financial markets to be extremely volatile, in addition to the difficulty of optimizing DRL agents, lead the agents to make more risky trades. As a result, while agents can earn higher profits, they are also vulnerable to significant losses. To evaluate the performance of the financial trading agent, the Profit and Loss (PnL) is usually calculated, which is also used as the agent's reward. However, in addition to PnL, traders often take into account other aspects of the agent's behavior, such as the risk associated with the positions opened by the agent. A widely used metric that captures the risk-related component of an agent's performance is the Sharpe ratio, which is used to evaluate a portfolio’s risk-adjusted performance. In this paper, we propose a Sharpe ratio-based reward shaping approach that enables optimizing DRL agents by taking into account both PnL and the Sharpe ratio, with the objective to improve the overall performance of the portfolio, by mitigating the risk that occurs in the agent's decisions. The effectiveness of the proposed method to increase different performance metrics is illustrated using a dataset provided by Speedlab AG, which contains 14 instruments. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.