Missing Data in Space-time: Long Gaps Imputation Based On Functional Data Analysis
- Autori: Di Salvo, F; Plaia, A; Ruggieri, M
- Anno di pubblicazione: 2017
- Tipologia: Proceedings (TIPOLOGIA NON ATTIVA)
- Parole Chiave: missing; space-time; functional data analysis
- OA Link: http://hdl.handle.net/10447/243816
High dimensional data with spatio-temporal structures are of great interest in many elds of research, but their exhibited complexity leads to practical issues when formulating statistical models. Functional data analysis through smoothing methods is a proper framework for incorporating space-time structures: extending the basic methodology to the multivariate spatio-temporal setting, we refer to Generalized Additive Models for estimating functional data taking the spatial and temporal dependences into account, and to Functional Principal Component Analysis as a classical dimension reduction technique to cope with the high dimensionality and with the number of estimated eects. Since spatial and temporal dependences integrate information of dierent types and from dierent sources, this framework serves as synthesis of information and give important opportunities for data processing and analysis, including extremely eective dimension reduction and estimation of missing values. The idea behind is to work with an estimated variance function, represented in terms of the bases and parameters dened in the estimation process, by mean of which the variability is espressed in terms of the main temporal and spatial eects; the functional principal component analysis provides dimensions reduction, determining the uncorrelated linear combinations of the original variables that account for most of the variability expressed by the variance function. The eigenfunctions, or principal component functions, also represent an orthonormal functions set, which can be used to ll gaps in incomplete data: we explore the performance of imputation procedures based on Functional Data Analysis and Empirical Orthogonal Function approaches when missing values, and mainly long gaps, are present in the original data set. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed under dierent missing value patterns and in presence of long gaps.