High-quality, meaningful data are crucial for successfully implementing analytics solutions that apply artificial intelligence (AI) and perform simulations using physics-based models. In such context, this paper proposes a semi-automated approach for the semantic enrichment of the building energy consumption data of Sofia, delivering a more meaningful dataset for further analytics and simulations. The aim is to enrich the building energy consumption dataset of the City of Sofia, Bulgaria, from the Sustainable Energy Development Agency with cadastral and spatial data, including а cadastral identifier, geometry, coordinates, built-up area, floors, etc. The data enrichment process is rather time-consuming since it requires substantial manual work. For this reason, a semi-automated data enrichment pipeline has been developed, including various processing activities such as data classi-fication, cleaning, filtering, validation, aggregation, augmentation and formatting. A dedicated crawler is developed to collect additional data needed for the enrichment. As a result, 1991 of a total of 2586 building data points have been successfully enriched. The enriched dataset is used for statistical and clustering analyses and applied to elaborate the energy atlas of Sofia. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.