| This paper investigates the impact of action sampling distributions on the efficiency and interpretability of Proximal Policy Optimization (PPO) in autonomous driving tasks. We compare PPO with Gaussian, Truncated Normal and Beta sampling in the photorealistic AirSim Neighborhood environment. Our experiments demonstrate that the Beta distribution significantly improves learning speed, with the Beta-based agent achieving task completion in fewer timesteps. Moreover, Grad-CAM visualization reveals that Beta-trained policies develop more interpretable decision-making strategies, with a clear focus on relevant obstacles rather than exhibiting diffuse attention patterns. These findings suggest that the Beta distribution is preferable for PPO-based agents in bounded-action spaces, particularly in safety-critical domains that require both sample efficiency and explainability. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.