Generation of hierarchically correlated multivariate symbolic sequences: With an application to the assessment of bootstrap confidence in phylogenetic analysis.
- Authors: Tumminello, M.; Lillo, F.; Mantegna, R.
- Publication year: 2008
- Type: Articolo in rivista (Articolo in rivista)
- Key words: Complex systems, Multivariate analysis, Combinatorics; graph theory
- OA Link: http://hdl.handle.net/10447/45660
We introduce a method to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities based on the Hamming distance. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied to an empirical matrix of similarities. The method that we present here is based on a generating mechanism that does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use the proposed simulation method to investigate the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny. The results of this analysis are compared with those obtained in the literature according to an evolutionary model with a per-symbol constant mutation rate. We observe that the relationship between the bootstrap value of a node and the probability of the corresponding clade being correct is sensitive to both the length of data series and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree, whereas such a relationship is only slightly affected by the topology of the true phylogeny and by the absolute value of similarity.