Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric dataShow others and affiliations
2015 (English)In: Chemometrics and Intelligent Laboratory Systems, ISSN 0169-7439, E-ISSN 1873-3239, Vol. 146, p. 10-23Article in journal (Refereed) Published
Abstract [en]
Multivariate permutation-based energy test of equal distributions is considered here. Approach is attributable to the emerging field of ε-statistics and uses natural logarithm of Euclidean distance for within-sample and between-sample components. Result from permutations is enhanced by a tail approximation through generalized Pareto distribution to boost precision of obtained p-values. Generalization from two-sample case to multiple samples is achieved by combining p-values through meta-analysis. Several strategies of varied statistical power are possible, while a maximum of all pairwise p-values is chosen here. Proposed approach is tested on several morphometric and chemometric data sets. Each data set is additionally transformed by principal component analysis for the purpose of dimensionality reduction and visualization in 2D space. Variable selection, namely, sequential search and multi-cluster feature selection, is applied to reveal in what aspects the groups differ most.
Morphometric data sets used: 1) survival data of house sparrows Passer domesticus; 2) orange and blue varieties of rock crabs Leptograpsus variegatus; 3) ontogenetic stages of trilobite species Trimerocephalus lelievrei; 4) marine phytoplankton species Prorocentrum minimum.
Chemometric data sets used: 1) essential oils composition of medicinal plant Hyptis suaveolensspecimens; 2) chemical information of olive oil samples; 3) elemental composition of biomass ash; 4) exchangeable cations of earth metals in forest soil samples.
Statistically significant differences between groups were successfully indicated, but the selection of variables had a profound effect on the result. Permutation-based energy test and it’s multi-sample generalization through meta-analysis proved useful as an unbalanced non-parametric MANOVA approach. Introduced solution is simple, yet flexible and powerful, and by no means is confined to morphometrics or chemometrics alone, but has a wide range of potential applications. Copyright © 2015 Elsevier B.V.
Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2015. Vol. 146, p. 10-23
Keywords [en]
ε-statistics, Permutation-based two-sample test, Non-parametric MANOVA, Multivariate analysis, Variable selection, Elliptical Fourier descriptors, Morphospace
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:hh:diva-28204DOI: 10.1016/j.chemolab.2015.04.018ISI: 000360595100002Scopus ID: 2-s2.0-84929192791OAI: oai:DiVA.org:hh-28204DiVA, id: diva2:809939
Note
Funding for this work was provided by a grant (No. LEK-09/2012) from the Research Council of Lithuania under National Research Programme "Ecosystems in Lithuania: climate change and human impact".
2015-05-052015-05-052017-12-04Bibliographically approved