## Cluster analysis of HVSR peak datasets to detect geological structures

**Authors:**Capizzi, P; Martorana, R; Stassi, G; Dâ€™Alessandro, A; Luzio, D;**Publication year:**2014**Type:**eedings**OA Link:**http://hdl.handle.net/10447/96023

### Abstract

A modified centroid-based algorithm has been applied to HVSR (Horizontal to Vertical Spectral Ratio) datasets (Nakamura, 2000) acquired for studies of seismic microzoning in various urban centers of Sicilian towns also aimed to obtain detailed reconstruction of the roof of the seismic bedrock (Di Stefano et al. 2014). HVSR data were previously properly processed to extract frequency and amplitude of peaks by a code based on clustering of HVSR curves determined in sliding time windows. In centroid-based clustering, clusters are represented by a central vector, which may not necessarily be a member of the data set. After fixing the number of clusters, the algorithm find the cluster centers and assign each HVSR peaks to the cluster, such that the squared distances from the cluster centroid are minimized. Then it calculates the new means to be the centroids in the next step. The algorithm converges to a (local) optimum when the assignments no longer change. There is no guarantee that the global optimum is found using this algorithm. The proposed algorithm doesnâ€™t fix the number of k clusters and choose automatically for each k value the initial centroids from data set. In particular, the UTM coordinates, amplitude and lithology values are the same for all k initial centroids and corresponding to their average value of all the units to be partitioned. The differentiation on the initial coordinates of the centroids was only based on frequency of the H/V peaks. The distance of each unit from the initial centroids and those obtained after each iteration was calculated as the weighted sum of the Euclidean normalized distances of all the variables considered, UTM coordinates, frequency, amplitude and lithology. The choice of weights has been optimized taking into account, for each k number of groups, the intra-cluster and inter-cluster variances. Also the choice of the optimal number of k classes was supported by the analysis of intra-cluster and inter-cluster variances, but basically remains a subjective choice, which is based on priori information and contextual data. The distribution in the frequency domain of H/V peaks obtained using the described centroid-based algorithm for different k values have been used to reconstruct geological discontinuities and to separate the stratigraphic peaks from those morphological. In many cases the clustering analysis of HVSR data showed excellent results, allowing to group peaks that can be attributed to the same seismic structures. However, the choice of the partition is strongly linked to the choice of weights for the calculation of the distance and to the geological and stratigraphic knowledge of the area. In other cases, the results showed similar results regardless of a priori choices. The obtained results underline how the most appropriate clustering algorithm for a particular problem often needs to be chosen experimentally.