LSTM-Based Models for Day-Ahead Electrical Load Forecast: A Novel Feature Selection Method Including Weather Data
- Authors: Vasenin, D.; Pasetti, M.; Astolfi, D.; Savvin, N.; Vasile, A.; Zizzo, G.
- Publication year: 2025
- Type: Articolo in rivista
- OA Link: http://hdl.handle.net/10447/691443
Abstract
This study introduces a two-step feature selection framework combining Distance Correlation (dCor) and Sequential Feature Selection (SFS), tailored for LSTM-based load forecasting. While each method is well-established, their sequential and comparative use reveals meaningful inconsistencies between generic statistical relevance and task-specific predictive utility, providing practical insight into feature prioritization. The development of accurate methods for day-ahead load forecast is important for various objectives, as the participation of utility companies in the energy markets, the operation of power systems and the management of smart buildings. On these grounds, the objective of the present paper is the assessment of four state-of-the-art models for day-ahead forecast, based on Long Short Term Memory networks, while including and appropriately selecting various types of features, in addition to the mere load time series. Three types of features are considered: namely, temporal metadata elaborated from the calendar date, weather measurements and load or weather parameters change rate. A two-step Features Selection method is proposed, where the first step is a refinement of the correlation analysis based on the Distance Correlation, and the second is a Sequential Features Selection where each input variable is added to the baseline forecast. The proposed methodology is tested on real-world data, sourced from the Energy Management System records of an electric power station located at the Engineering Campus of the University of Brescia, Italy. It results that the temporal metadata are the types of features which, if included, provide the highest improvement in the average accuracy metrics for all the considered models. The correlation analysis identifies the ambient temperature as most important weather parameter to incorporate in the forecast and such selection results being more convenient than that coming from the Sequential Features Selection. Finally, the load change rate is identified as a meaningful feature to incorporate in the model, but adding only this variable is insufficient for achieving a high accuracy forecast. Finally, by testing the model on a one-month rolling forecast day by day, it is argued that it is possible to achieve high accuracy without over complicating the model structure, but rigorously selecting the most appropriate features. The best-performing model (LSTM with temporal and weather features) achieved a MAPE of 7.24% and RMSE of 3.4 kW, improving significantly over the baseline with 17.4% MAPE. These results demonstrate the impact of feature selection in enhancing day-ahead forecasting accuracy.
