WP 1 : Data acquisition
Collection and conservation of DNA samples: Three hundred and thirty two (332) individual cDNA libabry have been produced and transferred to the SP6 program (DNA bank), and will constitute a valuable resources for the community.
Sequencing: All data are now available for the 15 species. There were obtained using a total of 35 lanes leading to a gross number of 13.3 billion reads of 7 to 100 base pairs. The data were used in the SP 1 but were also made available for different scientists (not SP1 partners) for other purpose (SNP development, gene annotation…).
WP 2: Sequence pre-treatment and database (see Bioinformatics project)
Polymorphisms were called for all individuals and annotated as coding vs non coding, synonymous vs. non synonymous sites. Transcriptomes were assembled for all species (see SP4) and polymorphism data were investigated on all species (Table SP1).
The quantity of data obtained is satisfying for each species and beyond the expectation at the start of ARCAD. The sequencing was also successful on the outgroups. The ancestral state of the SNP will be thus available for a large number of polymorphisms.
Table SP1. Polymorphism data obtained on the focal species of the quadruplets. # wild and # cultiv. are the number of individuals finally kept for the analysis. Contigs are the number of independent reads assemblies obtained from the raw sequence (see WP4). SNP are the number of single nucleotide polymorphisms obtained in the wild and cultivated compartments of the different crops.
All data will be made publicly available through the ARCAD web site.
WP 3: Comparative population genomics of the domestication process
Results confirmed that annual plants (at least einkorn and sorghum) experienced strong bottlenecks and accumulated a severe domestication cost. All other species, except bananas also had a slight increase in their rate of fixation of non-synonymous mutation in the cultivated compartment compared to the wild one (domestication cost) but this difference were not significant. Banana is puzzling. It suggests clearly that it should follow a specific feature which has now to be modelized correctly with a deeper analysis and discussion with banana specialists.
Another important feature was clear on einkorn. Modelling migration during and post domestication was clearly improving greatly the fit of the model to the data. This is an important contribution of the new ∂a∂i method. It clearly suggests that cultivated forms should have regularly incorporated new alleles from the wild for a long period after domestication. It could also suggest that the wild form could have been shaped by recurrent gene flows from the cultivated compartment.
WP4: Life history traits and genome evolution
The results suggest:
- No evidence for selection on codon usage acting in any of the species in this study
- Signatures of gBGC in GC-rich Monocots (Grasses, palm tree, banana) but also in some GCpoor Eudicots
- According to theoretical predictions, lack of gBGC in selfing grass species (deficit in heterozygote positions
ARCAD results provided a unique window into neutral polymorphism and underline the importance of meiotic recombination and neutral forces like gBGC in shaping polymorphism and GC-content in land plants. Our results showed that GC-biased gene conversion is the major force acting on GC-content evolution in land plants, suggesting that this neutral process is widespread.
WP 5: Comparative functional genomics
The results, published in BMC Plant Biology (2014), shows that lineage specific duplicated genes are a much more important substrate for positive selection to act on than single-copy genes. This is – to our knowledge – the first genome-scale study to empirically demonstrate that duplicated genes fuel adaptation in angiosperms.
The developed pipe-line to detect selection has been applied to two candidate gene families. It is also available for other species partners to evaluate the role of adaptation in the evolution of their favourite gene family.