| Generative Adversarial Networks (GANs) are widely used for image synthesis, but their training dynamics remain unstable and difficult to compare fairly across variants. In this work, we present a controlled empirical benchmark of six GAN variants trained on a curated single-class chest X-ray subset under matched architectural, optimization, and preprocessing conditions. The evaluation uses three independent training seeds while keeping the data split fixed, enabling a more robust comparison of optimization behavior and distributional alignment. Beyond Frechet Inception Distance (FID) and Inception Score (IS), we combine loss-based stability diagnostics with embedding-based geometric analysis. Specifically, we analyze rolling loss variability together with centroid distance, coverage, and precision-oriented proxies computed in a shared feature space. This multi-perspective evaluation shows that smoother adversarial dynamics do not necessarily imply better alignment with the real data distribution, and that aggregate generative metrics may obscure differences in feature-space coverage and centroid displacement. The study is intended as a controlled empirical comparison rather than a new generative method. Its results highlight the value of reporting stability, classical generative metrics, and geometric diagnostics jointly when evaluating GANs for medical image synthesis. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.