References of "Roure, Béatrice"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailA Large and Consistent Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals.
Simion, Paul; Philippe, Herve; Baurain, Denis ULiege et al

in Current Biology (2017)

Resolving the early diversification of animal lineages has proven difficult, even using genome-scale datasets. Several phylogenomic studies have supported the classical scenario in which sponges (Porifera ... [more ▼]

Resolving the early diversification of animal lineages has proven difficult, even using genome-scale datasets. Several phylogenomic studies have supported the classical scenario in which sponges (Porifera) are the sister group to all other animals ("Porifera-sister" hypothesis), consistent with a single origin of the gut, nerve cells, and muscle cells in the stem lineage of eumetazoans (bilaterians + ctenophores + cnidarians). In contrast, several other studies have recovered an alternative topology in which ctenophores are the sister group to all other animals (including sponges). The "Ctenophora-sister" hypothesis implies that eumetazoan-specific traits, such as neurons and muscle cells, either evolved once along the metazoan stem lineage and were then lost in sponges and placozoans or evolved at least twice independently in Ctenophora and in Cnidaria + Bilateria. Here, we report on our reconstruction of deep metazoan relationships using a 1,719-gene dataset with dense taxonomic sampling of non-bilaterian animals that was assembled using a semi-automated procedure, designed to reduce known error sources. Our dataset outperforms previous metazoan gene superalignments in terms of data quality and quantity. Analyses with a best-fitting site-heterogeneous evolutionary model provide strong statistical support for placing sponges as the sister-group to all other metazoans, with ctenophores emerging as the second-earliest branching animal lineage. Only those methodological settings that exacerbated long-branch attraction artifacts yielded Ctenophora-sister. These results show that methodological issues must be carefully addressed to tackle difficult phylogenetic questions and pave the road to a better understanding of how fundamental features of animal body plans have emerged. [less ▲]

Detailed reference viewed: 77 (7 ULiège)
Full Text
Peer Reviewed
See detailPitfalls in supermatrix phylogenomics
Philippe, Hervé; Vienne, Damien M. De; Ranwez, Vincent et al

in European Journal of Taxonomy (2017), 283

In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then ... [more ▼]

In the mid-2000s, molecular phylogenetics turned into phylogenomics, a development that improved the resolution of phylogenetic trees through a dramatic reduction in stochastic error. While some then predicted “the end of incongruence”, it soon appeared that analysing large amounts of sequence data without an adequate model of sequence evolution amplifies systematic error and leads to phylogenetic artefacts. With the increasing flood of (sometimes low-quality) genomic data resulting from the rise of high-throughput sequencing, a new type of error has emerged. Termed here “data errors”, it lumps together several kinds of issues affecting the construction of phylogenomic supermatrices (e.g., sequencing and annotation errors, contaminant sequences). While easy to deal with at a single-gene scale, such errors become very difficult to avoid at the genomic scale, both because hand curating thousands of sequences is prohibitively time-consuming and because the suitable automated bioinformatics tools are still in their infancy. In this paper, we first review the pitfalls affecting the construction of supermatrices and the strategies to limit their adverse effects on phylogenomic inference. Then, after discussing the relative non-issue of missing data in supermatrices, we briefly present the approaches commonly used to reduce systematic error. [less ▲]

Detailed reference viewed: 365 (12 ULiège)
Full Text
Peer Reviewed
See detailImpact of Missing Data on Phylogenies Inferred from Empirical Phylogenomic Data Sets
Roure, Béatrice; Baurain, Denis ULiege; Philippe, Hervé

in Molecular Biology and Evolution (2013), 30(1), 197-214

Progress in sequencing technology allows researchers to assemble ever-larger supermatrices for phylogenomic inference. However, current phylogenomic studies often rest on patchy data sets, with some ... [more ▼]

Progress in sequencing technology allows researchers to assemble ever-larger supermatrices for phylogenomic inference. However, current phylogenomic studies often rest on patchy data sets, with some having 80% missing (or ambiguous) data or more. Though early simulations had suggested that missing data per se do not harm phylogenetic inference when using sufficiently large data sets, Lemmon et al. (Lemmon AR, Brown JM, Stanger-Hall K, Lemmon EM. 2009. The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst Biol. 58:130-145.) have recently cast doubt on this consensus in a study based on the introduction of parsimony-uninformative incomplete characters. In this work, we empirically reassess the issue of missing data in phylogenomics while exploring possible interactions with the model of sequence evolution. First, we note that parsimony-uninformative incomplete characters are actually informative in a probabilistic framework. A reanalysis of Lemmon's data set with this in mind gives a very different interpretation of their results and shows that some of their conclusions may be unfounded. Second, we investigate the effect of the progressive introduction of missing data in a complete supermatrix (126 genes × 39 species) capable of resolving animal relationships. These analyses demonstrate that missing data perturb phylogenetic inference slightly beyond the expected decrease in resolving power. In particular, they exacerbate systematic errors by reducing the number of species effectively available for the detection of multiple substitutions. Consequently, large sparse supermatrices are more sensitive to phylogenetic artifacts than smaller but less incomplete data sets, which argue for experimental designs aimed at collecting a modest number (∼50) of highly covered genes. Our results further confirm that including incomplete yet short-branch taxa (i.e., slowly evolving species or close outgroups) can help to eschew artifacts, as predicted by simulations. Finally, it appears that selecting an adequate model of sequence evolution (e.g., the site-heterogeneous CAT model instead of the site-homogeneous WAG model) is more beneficial to phylogenetic accuracy than reducing the level of missing data. [less ▲]

Detailed reference viewed: 95 (6 ULiège)