Main results

WP1: Tests and setup of GbS laboratory protocols

We tested genomic complexity reduction methods on 7 species or species complexes retained in the WP1 (Citrus, Coconut, three 3 Coffee species, Cotton and Breadfruit). DNA sequencing is currently achieved, except for Breadfruit for which DNA sequencing is in progress. We received the sequence data in July 2014 and the bio-informatic data treatments are in progress, with the help of the bio-informatic tools (scripts) developed in SP5 (see section “3.3 Bio-informatic development and analyses” below). As a first result of data analysis, number of DNA fragment sequences (reads) available for SNPs detection and genotyping is given per species in Table SP5-2).

It is however necessary to complete sequence data analysis and SNPs detection to conclude on the comparative efficiency of the laboratory conditions and combinations tested on the different species.

Table SP5-2. GbS development: summary of DNA sequencing performed on the species analysed in the frame of ARCAD/SP5

Samples		GbS
Species	Number of accessions	Restriction enzyme	Sizing (bp)	Sequencing	Nb of pairs of reads
Citrus spp.	12	ApeK1	400-500	Hiseq 2 x 100	3 000 000
Citrus spp.	12	Pst1/Mse1	>200	Hiseq 2 x 100	17 000 000
Coffea canephora	8	ApeK1	400-500	Hiseq 2 x 100	8 000 000
Coffea arabica	8	ApeK1	400-500	Hiseq 2 x 100	10 000 000
Coffea arabusta	8	ApeK1	400-500	Hiseq 2 x 100	14 000 000
Cotton	24	ApeK1	400-500	Hiseq 2 x 100	32 000 000
Coconut	12	ApeK1	400-500	Hiseq 2 x 100	6 000 000
Breadfruit	8	Pst1/Mse1	>200	Miseq 2 x 250	Seq. in progress
Breadfruit	8	ApeK1	400-500	Miseq 2 x 250	Seq. in progress
Breadfruit	8	ApeK1	>200	Miseq 2 x 250	Seq. in progress

WP2: Test on heterozygous perennial species

Due to little background information on the use of this method in highly heterozygous species, we developped GbS markers on segregating populations in three allogamous perennial species: grapevine: small (fully sequenced genome, small size), olive tree (incompletely sequenced genome, medium size) and rubber tree (partly sequenced genome, large size).For each of these 3 species, a preliminary work allow us to decided to perform the different libraries using ApeK1 restriction enzyme either with sizing (Grape, olive) or without (rubber tree).

The work on grape enabled to test several thresholds for the different parameters: minimum coverage for the markers versus the percentage of missing data, use of multiples SNPs in the reads in order to develop multi-allelic markers. In total 2168/1630 markers were recovered and were mapped. Out of the 1630, 948 markers corresponded to single SNPs, while 682 markers corresponded to multiple linked SNPs (up to 14). The number of markers with a threshold of 8X was much higher and enabled the development of a map with a lower number of large gaps (> 10 cM) but without more skews in the segregation. With this threshold, the number of missing data was also lower, it should thus be preferred. Finally as expected, the sequencing using the Hiseq apparatus provided more markers (2428) than using the Miseq apparatus (1630)

For olive tree, using the 81 individuals, 7856 SNPs were obtained. After reconstruction, they were transformed into 3499 markers, 2835 of which were mapped (1754 individual and 1081 reconstructed).

For rubber tree, the preliminary bio-informatic analysis of the sequences (164 10⁶ reads of good quality for the 275 genotypes) led to the identification of 121 000 SNPs and application of the ARCAD_SP5_GBS pipeline allowed identification of 12 000 potential SNP markers. Genetic mapping of these markers has now be performed, using a pseudo-test cross strategy, in order to determine the final number of informative SNP markers.

Bio-informatic development and analyses

The ARCAD_SP5_GBS pipeline, available on Git Hub (https://github.com/SouthGreenPlatform/arcad-hts/tree/master/sp5_gbs), gave original tools such as scripts allowing comparison of two sets of SNPs independently detected on the same accession. It allows rapid comparison of SNPs detection results obtained with different softwares or pipelines such as STACKS, TASSEL and ARCAD_SP4 pipeline. Another original development was made allowing transformation of bi-allelic single SNP markers in more informative multi-allelic markers, using information multiple adjacent SNPs.

Arcad Project

Crop biodiversity research and resource center

Main results

WP1: Tests and setup of GbS laboratory protocols

WP2: Test on heterozygous perennial species

Bio-informatic development and analyses