Human action recognition (HAR) in still images is a critical task for various applications ranging from surveillance to human-computer interaction. This research introduces an innovative approach to HAR using transfer learning. By utilizing the InceptionResNetV2 architecture, pre-trained on ImageNet, we fine-tuned a model on the Data Sprint 76 dataset from AIPlanet. This dataset features a constrained set of still images depicting 15 human actions. Our objective was to leverage the pre-trained network's rich feature extraction capabilities and adapt them to the domain-specific task of HAR. We implemented a two-phase training process, initially training the custom model with frozen base layers to learn the new classes, followed by selectively unfreezing one-fourth and fine-tuning the base model to improve the feature adaptation to the HAR task. Data augmentation techniques were crucial in simulating a more extensive dataset, mitigating the overfitting risk, and enhancing the model's generalization abilities.The effectiveness of our model was quantified by a training accuracy of 88.43% and a validation accuracy of 77.30%. To interpret the model's decision-making process, we integrated Gradient-weighted Class Activation Mapping (Grad-CAM), which provided visual explanations for the model's predictions. This insight was critical in understanding the model's focus areas within the images and aided in identifying the cause of misclassifications, as observed in the confusion matrix. This study demonstrates that transfer learning, coupled with advanced visualization techniques like Grad-CAM, can effectively mitigate the challenges posed by limited data in HAR. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.