Main results
WP1: Tests and setup of GbS laboratory protocols
We tested genomic complexity reduction methods on 7 species or species complexes retained in the WP1 (Citrus, Coconut, three 3 Coffee species, Cotton and Breadfruit). DNA sequencing is currently achieved, except for Breadfruit for which DNA sequencing is in progress. We received the sequence data in July 2014 and the bio-informatic data treatments are in progress, with the help of the bio-informatic tools (scripts) developed in SP5 (see section “3.3 Bio-informatic development and analyses” below). As a first result of data analysis, number of DNA fragment sequences (reads) available for SNPs detection and genotyping is given per species in Table SP5-2).
It is however necessary to complete sequence data analysis and SNPs detection to conclude on the comparative efficiency of the laboratory conditions and combinations tested on the different species.
Table SP5-2. GbS development: summary of DNA sequencing performed on the species analysed in the frame of ARCAD/SP5
Samples | GbS | ||||
Species | Number of accessions | Restriction enzyme | Sizing (bp) | Sequencing | Nb of pairs of reads |
Citrus spp. | 12 | ApeK1 | 400-500 | Hiseq 2 x 100 | 3 000 000 |
Citrus spp. | 12 | Pst1/Mse1 | >200 | Hiseq 2 x 100 | 17 000 000 |
Coffea canephora | 8 | ApeK1 | 400-500 | Hiseq 2 x 100 | 8 000 000 |
Coffea arabica | 8 | ApeK1 | 400-500 | Hiseq 2 x 100 | 10 000 000 |
Coffea arabusta | 8 | ApeK1 | 400-500 | Hiseq 2 x 100 | 14 000 000 |
Cotton | 24 | ApeK1 | 400-500 | Hiseq 2 x 100 | 32 000 000 |
Coconut | 12 | ApeK1 | 400-500 | Hiseq 2 x 100 | 6 000 000 |
Breadfruit | 8 | Pst1/Mse1 | >200 | Miseq 2 x 250 | Seq. in progress |
Breadfruit | 8 | ApeK1 | 400-500 | Miseq 2 x 250 | Seq. in progress |
Breadfruit | 8 | ApeK1 | >200 | Miseq 2 x 250 | Seq. in progress |
WP2: Test on heterozygous perennial species
Due to little background information on the use of this method in highly heterozygous species, we developped GbS markers on segregating populations in three allogamous perennial species: grapevine: small (fully sequenced genome, small size), olive tree (incompletely sequenced genome, medium size) and rubber tree (partly sequenced genome, large size).For each of these 3 species, a preliminary work allow us to decided to perform the different libraries using ApeK1 restriction enzyme either with sizing (Grape, olive) or without (rubber tree).
The work on grape enabled to test several thresholds for the different parameters: minimum coverage for the markers versus the percentage of missing data, use of multiples SNPs in the reads in order to develop multi-allelic markers. In total 2168/1630 markers were recovered and were mapped. Out of the 1630, 948 markers corresponded to single SNPs, while 682 markers corresponded to multiple linked SNPs (up to 14). The number of markers with a threshold of 8X was much higher and enabled the development of a map with a lower number of large gaps (> 10 cM) but without more skews in the segregation. With this threshold, the number of missing data was also lower, it should thus be preferred. Finally as expected, the sequencing using the Hiseq apparatus provided more markers (2428) than using the Miseq apparatus (1630)
For olive tree, using the 81 individuals, 7856 SNPs were obtained. After reconstruction, they were transformed into 3499 markers, 2835 of which were mapped (1754 individual and 1081 reconstructed).
For rubber tree, the preliminary bio-informatic analysis of the sequences (164 106 reads of good quality for the 275 genotypes) led to the identification of 121 000 SNPs and application of the ARCAD_SP5_GBS pipeline allowed identification of 12 000 potential SNP markers. Genetic mapping of these markers has now be performed, using a pseudo-test cross strategy, in order to determine the final number of informative SNP markers.
Bio-informatic development and analyses
The ARCAD_SP5_GBS pipeline, available on Git Hub (https://github.com/SouthGreenPlatform/arcad-hts/tree/master/sp5_gbs), gave original tools such as scripts allowing comparison of two sets of SNPs independently detected on the same accession. It allows rapid comparison of SNPs detection results obtained with different softwares or pipelines such as STACKS, TASSEL and ARCAD_SP4 pipeline. Another original development was made allowing transformation of bi-allelic single SNP markers in more informative multi-allelic markers, using information multiple adjacent SNPs.