The primary research problem tackled in this study is the challenge of building robust, scalable, and fault-tolerant deep learning systems in distributed environments. The reviewed literature revealed that traditional deep learning models, particularly autoencoders, have demonstrated their effectiveness in feature extraction, dimensionality reduction, and unsupervised learning tasks. However, deploying these models in distributed systems presents several challenges. Firstly, as the dataset size and model complexity increase, the need for efficient distribution of computational workloads becomes crucial to avoid bottlenecks. Secondly, distributed systems are inherently unreliable due to potential network issues, hardware failures, or resource contention, making fault tolerance a critical requirement. This study developed a Fine-grained Actor-based Autoencoder Model to address critical challenges in distributed deep learning environments, such as fault tolerance, scalability, and modularity. The system integrates principles from actor-based distributed computing with neural network architectures, resulting in a fault-tolerant, distributed autoencoder capable of efficient training across multiple Graphical Processing Units (GPUs). The experiments demonstrated that the model achieved rapid convergence with a final reconstruction error of 0.240471, indicating high accuracy in unsupervised learning tasks. The study also observed the near-equivalence of the developed linear autoencoder to Principal Component Analysis (PCA), evidenced by near-zero subspace angles and comparable reconstruction errors. The actor-based architecture, implemented through a FaultTolerantActorSystem, showed a significant scalability and fault tolerance potential. The model successfully simulates and handles actor failures, ensuring uninterrupted operation in error-prone distributed environments. The modular design, separating encoder, decoder, and bottleneck components, facilitates easier debugging, testing, and future extensions. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.