Development of STS and CAPS markers for variety identification and genetic diversity analysis of tea germplasm in Taiwan

Background Tea (Camellia sinensis) is an important economic crop in Taiwan. Particularly, two major commercial types of tea (Paochong tea and Oolong tea) which are produced in Taiwan are famous around the world, and they must be manufactured with specific cultivars. Nevertheless, many elite cultivars have been illegally introduced to foreign countries. Because of the lower cost, large amount of “Taiwan-type tea” are produced and imported to Taiwan, causing a dramatic damage in the tea industry. It is very urgent to develop the stable, fast and reliable DNA markers for fingerprinting tea cultivars in Taiwan and protecting intellectual property rights for breeders. Furthermore, genetic diversity and phylogenetic relationship evaluations of tea germplasm in Taiwan are imperative for parental selection in the cross-breeding program and avoidance of genetic vulnerability. Results Two STS and 37 CAPS markers derived from cytoplasmic genome and ESTs of tea have been developed in this study providing a useful tool for distinguishing all investigated germplasm. For identifying 12 prevailing tea cultivars in Taiwan, five core markers, including each one of mitochondria and chloroplast, and three nuclear markers, were developed. Based on principal coordinate analysis and cluster analysis, 55 tea germplasm in Taiwan were divided into three groups: sinensis type (C. sinensis var. sinensis), assamica type (C. sinensis var. assamica) and Taiwan wild species (C. formosensis). The result of genetic diversity analysis revealed that both sinensis (0.44) and assamica (0.41) types had higher genetic diversity than wild species (0.25). The close genetic distance between the first (Chin-Shin-Oolong) and the third (Shy-Jih-Chuen) prevailing cultivars was found, and many recently released varieties are the descents of Chin-Shin-Oolong. This implies the potential risk of genetic vulnerability for tea cultivation in Taiwan. Conclusions We have successfully developed a tool for tea germplasm discrimination and genetic diversity analysis, as well as a set of core markers for effective identification of prevailing cultivars in Taiwan. According to the results of phylogenetic analysis on prevailing tea cultivars, it is necessary to broaden genetic diversity from wild species or plant introduction in future breeding programs. Electronic supplementary material The online version of this article (doi:10.1186/1999-3110-55-12) contains supplementary material, which is available to authorized users.


Background
Tea (Camellia sinensis) is one of the most important beverage crops around the world and also a significant economic crop in Taiwan. Currently, there are 14,091 hectares of tea farms in Taiwan, producing 17,310 tons per year (Council of Agriculture 2012). Tea has been planted in Taiwan since 200 years ago, and has been manufactured into different types of tea in accordance with different eras and production areas (Chiu 1988;Jun and Lin 1997). Because different types of tea are produced with specific cultivars, numerous tea cultivars are grown in Taiwan. Paochong tea and Oolong tea are two major types of tea currently produced in Taiwan, whereas black and green tea are considered to be minor types. The cultivars suitable for making Paochong or Oolong tea are cultivars Chin-Shin-Oolong, TTES-12, Shy-Jih-Chuen, Chin-Shin-Dahpan, and TTES-13. Whereas cultivar Chin-Shin-Gantzy is fitting for green tea, and TTES-8 and TTES-18 are suitable for black tea (Tsai et al. 2004b). Furthermore, there are many other germplasm including landraces, introduced varieties, and wild species that could be selected or utilized for breeding new varieties.
Tea is a woody, perennial, and out-crossing crop that is highly heterozygous (Barua 1963). In tea breeding, the key points for parental selection are superior traits from parents and their wide-ranging genetic diversity that prevent the weakness of progenies (Bandyopadhyay 2011). Furthermore, many elite cultivars developed in Taiwan have been illegally introduced to China, Vietnam, Thailand, Indonesia, and so on. Because of the lower cost, large number of "Taiwan-type tea" are produced and imported to Taiwan, causing a dramatic damage in the tea industry. Therefore, seedlings and products of tea have been protected by the "Plant Variety and Plant Seed Act" which was enacted in 2004. In addition, the scientific database for identifying and examining varieties of tea should be well developed for the suspicious torts.
The simple method for genetic diversity assessment and variety identification of tea or its commercial product (processed tea) is based on the morphological traits. However, the available morphological traits are limited in number and easily affected by environments and growth stages of tea (Gunasekare 2007;Bandyopadhyay 2011). DNA markers are genetic markers that came from various classes of DNA mutations and rearrangements (Collard et al. 2005). Compared with morphological traits, DNA markers have numerous advantages such as multiple marker types, relative abundance of polymorphism, extensive genomic coverage, not disturbed by growth stage and tissue of plants, not affected by environment and gene expression, only a small quantity of DNA needed for assay, only a short period required for analyzing large amounts of samples, and more reproducible (Powell et al. 1996;Collard et al. 2005;Jones et al. 2009). DNA markers, including RAPD (randomly amplified polymorphic DNA), ISSR (intersimple sequence repeat) and AFLP (amplified fragment length polymorphism) have been well developed for genetic fingerprinting and phylogenetic studies of tea in Taiwan (Lai et al. 2001;Tsai et al. 2003;Hu et al. 2005;Lin et al. 2005). Nevertheless, these markers are dominant, and their reproducibility and capacity for variety identification are less than targeted and locus-specific DNA markers, such as STS and CAPS.
STS (ssequence tagged site) is a relatively short and single-copy DNA sequence that can be specifically amplified by PCR (Olson et al. 1989). CAPS (cleaved amplified polymorphic sequence) or PCR-RFLP (polymerase chain reactionrestriction fragment length polymorphism) utilizes amplified DNA fragments digested with a restriction endonuclease to display restriction site polymorphisms (Konieczny and Ausubel 1993). STS and CAPS markers are co-dominant, locus-specific, and more reproducible. They have various advantages including their genotypes which are easily scored and interpreted, and only a small quantity of DNA is needed for one assay. Also the cleaved and un-cleaved amplification products can be adjusted arbitrarily by the appropriate placement of the PCR primers. The procedure is technically simple with robust results because the amplification product is always obtained (Drenkard et al. 1997).
DNA markers could be developed from whole nuclear genome or expressed sequence tags (ESTs). Because the whole genome sequences of tea plant are not available and updated, it is feasible to develop nuclear markers from ESTs database. ESTs are short cDNA sequences reversely transcribed from mRNA. In general, by using EST-derived primer pairs to amplify nuclear genome, the amplicons may consist of intron sequences that displayed higher variation to develop informative markers for variety identification (Shu et al. 2010). Besides, DNA markers could be also derived from the cytoplasmic genome, such as the mitochondria genome (mtDNA) and chloroplast genome (cpDNA). The cytoplasmic CAPS markers are not only maternal inherited from haploid genome (Kaundun and Matsumoto 2011), but also have a slower nucleotide substitution rate than the nuclear DNA (Palmer 1992). Because of conservative evolution, they have been widely used in detecting geographical origins of plant species (Kaundun and Matsumoto 2002;Katoh et al. 2003) and population differentiation (Schaal and Olsen 2000).
The aim of this study is to develop a stable, fast and reliable STS and CAPS DNA markers for fingerprinting commercial tea varieties in Taiwan and protect intellectual property rights for breeders. Furthermore, genetic diversity and phylogenetic relationship of tea germplasm in Taiwan are assessed to provide information for parental selection.

Plant materials and DNA extraction
A total of 55 germplasm were analyzed in this study, including 22 selected from crossing between varieties, nine local cultivars (landraces), 16 introduced varieties, and eight wild species. According to taxonomy, 22 C. sinensis var. sinensis (S), 12 C. sinensis var. assamica (A), 11 C. sinensis var. sinensis × var. assamica (SA), two C. sinensis var. assamica × var. assamica (AS), seven C. formosensis (F), and one C. formosensis var. yungkangensis (FY) are classified (Hu et al. 2005;Su 2007;Su et al. 2009). Except four (I4~I7) samples were obtained from the tea garden of Tung Pang Black Tea CO. LTD. in Nantou County, Taiwan. All samples were collected from the germplasm garden at the Tea Research and Extension Station in Taoyuan County, Taiwan (Table 1).
The DNA was isolated from buds and leaves by using a modification of Doyle and Doyle (1990) described by Hu et al. (2005).

Design of CAPS markers
Nuclear amplicons that amplified two bands with length polymorphisms were directly applied as STS markers. Meanwhile, the DNA bands were sequenced by ABI PRISM 3730 DNA Analyzer (Applied Biosystems, USA) once the PCR products were less than 1 kb. For SNPs (single nucleotide polymorphism) and InDels (insertion/ deletion) screening, sequence analyses were conducted with SeqMan Pro v.7.1 software (DNAStar, Inc., Madison, WI, USA). The sequences with SNPs or InDels were converted to CAPS markers by SNP2CAPS software (Thiel et al. 2004). To check restriction patterns, PCR reactions were performed in a final volume of 11.7 μL with 1× Taq buffer, 2 mM MgCl 2 , 0.27 mM dNTPs, 0.26 μM each primer, 1 U Taq DNA polymerase (Invitrogen by Life Technologies), and 40 ng DNA. Amplification was done by T-Gradient (Biometra, Germany) with programmed for 5 min preheating at 94°C followed by 35 cycles of 30 s at 94°C, 30 s at 55-60°C (depending on the primer pair) and 1 min at 72°C for the denaturation, annealing and extension steps, respectively. There was a final incubation for 10 min at 72°C. Amplification products were analyzed on 2% agarose gels stained with ethidium bromide to check the fragments being amplified. Amplified fragments were digested with restriction enzymes to detect CAPS and the products were resolved by electrophoresis on 2% agarose gels.

Data analysis and variety identification
The haploid and diploid types for cytoplasmic and nuclear markers were respectively scored, and each allele was assigned an alphabet for a particular primer set/enzyme combination. The polymorphism information of STS and CAPS markers was analyzed by PowerMarker v.3.25 (Liu and Muse 2005) to investigate the number of alleles and polymorphism information content (PIC) per marker.

Genetic diversity analysis
In this study, the tea germplasm consist of three main groups, including sinensis type (S and SA), assamica type (A and AS) and wild species in Taiwan (F and FY) (shown in Table 1). The genetic diversity of those germplasm was analyzed by Popgene v.1.32 (Yeh and Boyle 1997) to estimate the observed number of alleles (N A ), the effective number of alleles (N e ), the observed heterozygosity (H O ), the Nei's gene diversity (H), and Shannon's Information index (I) per group.

Cluster analysis and principle coordinates analysis
Both the analyses of average genetic distances among three main groups and genetic distances between the pairs of germplasm were based on modified Roger's distance (MRD) method (Wright 1978) by using TFPGA v.1.3 (Miller 1997). Upon the genetic distances between all pairwise combinations MRD, the cluster analysis and  Rohlf 1997). A dendrogram of the genetic relationships was developed by unweighted pair group method with arithmetic mean algorithm (UPGMA) using cluster analysis. The principal coordinate analysis (PCoA) was performed and the first two extracted coordinates extracted were used to derive the PCoA plot.

Polymorphism of STS and CAPS markers
The STS and CAPS markers in this study were derived from cytoplasmic genome and nuclear ESTs. From six polymorphic DNA sequences of cytoplasmic genome, 14 SNPs and same amount of InDels were screened and successfully designed for three chloroplast CAPS (C01~C03) and seven mitochondria CAPS (M01~M07) markers. A total of 54 nuclear EST primer pairs, including four pairs from the previous study Matsumoto 2003, 2004) and 50 pairs designed from public EST database of NCBI, as well as 27 primer pairs which amplified the expected size of amplicons. However, the remaining 27 primer pairs did not yield any scorable amplicon or yielded amplicons longer than 1 kb. In the expected size of 27 amplicons, 11 had no SNP, three had SNP (but without the restriction site), and the remaining 13 amplicons had 90 SNPs. Meanwhile, the four InDels could be successfully transferred into two STS (PAL and F3H) and 27 CAPS markers (including G01~G27). For example, one SNP of an EST sequence coding zinc finger protein was designed for CAPS marker shown in Additional file 1: Figure S1. The detailed information of the two STS and 37 CAPS markers (including 10 cytoplasmic markers and 27 nuclear markers) are listed in Table 2.
A total of 98 alleles out of 39 polymorphic loci were detected in 55 germplasm. In 10 cytoplasmic CAPS loci, the average number of alleles was 2 and polymorphism information content (PIC) ranged from 0.13 (C02) to 0.35 (M06), with an average of 0.25 per locus. In 29 nuclear STS and CAPS markers, the number of alleles varied from 2 to 7, with an average of 2.7 per locus. The PIC values widely varied from 0.04 (G16) to 0.62 (G22), with an average of 0.34 per locus ( Table 2).

Identification of the prevailing tea cultivars in Taiwan
Two STS and 37 CAPS markers developed in this study can be used to distinguish all 55 core germplasm in Taiwan, and their band patterns are shown in Additional file 1: Table S1. For the identification of 12 prevailing tea cultivars in Taiwan, the electrophoresis patterns of cleaved fragments in each STS and CAPS marker are shown in Additional file 1: Figure S2. In order to establish a flow chart for identifying 12 prevailing tea cultivars in Taiwan, five core markers, including M02 (mitochondria), C02 (chloroplast), G01 G03, and G04 (nuclear), were selected by variety-specific marker and PIC value. First, the sinensis type and the assamica type groups were distinguished by using the M02 marker. Secondly, the G03 and C02 can be employed to discriminate four cultivars within assamica type group, and the G03, G01 and G04 were used to separate eight cultivars within sinensis type group (Figure 1). In addition to five core markers, the remaining 34 markers could be used as a supplementary tool if more new varieties need to be identified in the future.    flavanone 3hydroxylase *Note: "C" represents chloroplast CAPS markers, "M" represents mitochondria CAPS markers, "G" represents nuclear CAPS markers, and "PAL", "F3H" represents nuclear STS markers. § PIC Polymorphism Information Content.

Genetic diversity of tea germplasm in Taiwan
On the basis of taxonomy, 55 tea germplasm in Taiwan can be divided into three classifications, including sinensis type (S and SA), assamica type (A and AS) and wild species in Taiwan (F and FY) ( Table 1). The average genetic distance among the three groups are shown in Table 3. The average genetic distance between sinensis type (S and SA) and wild species (F and FY) is 0.45, and that between assamica type (A and AS) and wild species (F and FY) is 0.47. Both distances are larger than that between sinensis type (S and SA) and assamica type (A and AS) (0.28). According to the genetic distance matrix of MRD coefficients among all 55 core germplasm (Table 4 and Additional file 1:  (Table 4). It also demonstrated that the cultivated species (C. sinensis) had greater genetic diversity than the wild species (C. formosensis).

Cluster analysis and principle coordinates analysis of tea germplasm in Taiwan
The genetic distances between all pairwise combinations were listed in Additional file 1: Table S2. The values among 55 surveyed germplasm in Taiwan ranged from 0.08 to 0.69, with an average value of 0.49. Among the germplasm of 47 cultivated tea (C. sinensis), the average values was 0.47. If only 37 sinensis type tea (S and SA) were surveyed, the genetic distances among this group ranged from 0.11 (TTES-14 and TTES-15) to 0.62 (Chin-Shin-Oolong and TTES-17), with an average value of 0.44. As for 14 assamica type tea (A and AS), the genetic distances ranged from 0.08 (TTES-8 and Jaipuri; Shan-1 and Shan-2) to 0.58 (Shan and Manipuri), with an average value of 0.41. However, the genetic distances among eight wild species (F and FY) were relatively small, ranging from 0.11 (Long-Tou wild tea and Le-Ye wild tea; Ming-Hai wild tea and Nan-Fong wild tea) to 0.38 (De-Hua-She wild tea and Yung-Kang wild tea), with an average value of 0.25. In PCoA based on MRD estimates of all 55 germplasm, the first, second and third principle coordinates (abbreviated to PC1, PC2 and PC3) explained 24.5%, 15.9% and 11.3% of the molecular variance, respectively, while the cumulative contribution was 51.8%. The first two principle coordinates were used to develop the Figure 1 The flow chart for identifying 12 prevailing tea cultivars in Taiwan. The yellow circle frames represent marker codes, and the blue square frames represent cultivar codes. Cultivar and marker codes are shown in Tables 1 and 2. By using five core markers, 12 prevailing cultivars could be identified. M02 can be used to discriminate cultivars attributed to sinensis or assamica group. G03 and C02 are employed to identify four cultivars within the assamica group. Cultivars of the sinensis group can be distinguished by G03, G01 and G04. PCoA plot shown in Figure 2. In the PC1, the 55 tea germplasm were divided into two major groups, cultivated tea (C. sinensis) and wild species in Taiwan (C. formosensis). In the PC2, the cultivated tea (C. sinensis) were divided into two major groups, sinensis type (S and SA) and assamica type (A and AS). The UPGMA dendrogram was constructed to separate the 55 germplasm into three major groups (Figure 3). Based on genetic distance coefficient of 0.57, the first group (GroupI) including C. formosensis could be isolated from the cultivated germplasm (C. sinensis). When the coefficient was reduced to 0.51, the assamica type (A and AS) (GroupII) and the sinensis type (S and SA) (GroupIII) germplasm were distinguished. Group III was divided into three subgroups, namely Group IIIa, Group IIIb, and Group IIIc. Many famous varieties belonged to Group IIIa including Chin-Shin-Oolong, Shy-Jih-Chuen, Bair-Mau-Hour, Wuu-Yi, Horng-Shin-Dahpan, and its derived varieties (TTES-3, TTES-4, and TTES-9). The Group IIIb comprised of Tiee-Guan-In, Hwang-Gan, and its derived varieties (TTES-10, . The Group IIIc, on the other hand, contained Chin-Shin-Gantzy, Chin-Shin-Dahpan and its derived variety (TTES-1), Dah-Yeh-Oolong, and its derived varieties , Ying-Jy-Horng-Shin its derived variety (TTES-13).

Polymorphism of STS and CAPS markers
In this study, 11 nuclear CAPS markers including G03, G12, G14, G15, G17, G18, G20, G22, G23 and G26 showed multi-allele patterns, while the others had only two alleles (bi-allele) ( Table 2). There was only one restriction site within each CAPS locus resulting in the biallele markers, and their genotypes were easily scored and interpreted. Otherwise, the multi-allele markers were based on different point mutation positions within the locus that had more than two restriction sites. They yielded more complicated genotypes but may still be considered very useful. For example, the multi-allele CAPS markers could be used widely in pepper breeding for viral resistance (Yeam et al. 2005).
Polymorphism information content (PIC) means different informative levels of a locus and it also implies the genetic variation of a marker. The value larger than 0.5, ranging from 0.25 to 0.5, and smaller than 0.25 suggest that the locus is highly informative, reasonably informative, and slightly informative, respectively (Botstein et al. 1980). Of all the 39 cytoplasmic and nuclear markers examined in this study, the PIC ranged from 0.04 to 0.62, with an average of 0.32. The PIC of 10 cytoplasmic markers was 0.25, and seven of them were reasonably informative. Otherwise, the remaining three were slightly informative ( Table 2). The 29 nuclear markers had an averaged PIC of 0.34, in which six were found to be highly informative, 16 were reasonably informative, and the remaining seven were slightly informative. The averaged PIC of the nuclear markers was higher than the cytoplasmic, and the average of the mtDNA markers (0.29) was higher than the cpDNA (0.18) ( Table 2). Similar results were also reported by Ishii's group, in which they found that the nuclear microsatellites (the averaged PIC is 0.89) had higher PIC values than the chloroplast microsatellites (the averaged PIC is 0.38) among A-genome species of rice (Ishii et al. 2001). Because the variation of cytoplasmic markers are lower than nuclear markers, the former could be used to examine relationship among distant-related taxa, and the latter are more suitable for the assessment of genetic diversity of close-related taxa.
In our previous study, the observed number of EST-SSR alleles (N A ) per locus was 5.6 (Hu et al. 2011). However, in this study, the values of STS and CAPS markers derived from cytoplasmic and nuclear were 2.00 and 2.69, respectively. The PIC per locus for EST-SSR (0.62) was higher than those of STS and CAPS from cytoplasmic (0.25) and nuclear (0.34). Because small size difference between polymorphic bands was shown in the EST-SSR markers, there was high resolution of agarose gel, polyacrylamide gel electrophoresis or Genetic Analyzer (Hu et al. 2011). However, large size Table 4 The genetic diversity and genetic distance of different tea groups based on 10 cytoplasmic markers and 29 nuclear markers difference between polymorphic bands was found in STS or CAPS markers, and it suggested that only less expensive agarose gel was needed to obtain accurate data.

Identification of 12 prevailing tea cultivars in Taiwan
In this study, 12 dominant cultivars were selected for variety identification based on the following criteria: (1) the acreage under cultivation of each variety; (2) the variety suitable for manufacturing unique tea; and (3) the newly bred varieties. According to statistics data from Tea Research and Extension Station in 2011, these 12 cultivars take over 98% acreage of Taiwan. Of these 12 cultivars, Chin-Shin-Oolong, TTES-12, Shy-Jih-Chuen, Chih-Shih-Dahpan, and TTES-13 are the top five cultivars in Taiwan that has been found to be suitable for both Paochong tea and Oolong tea. Shy-Jih-Chuen and Chih-Shih-Dahpan are mainly grown in Nantou County and north-west region of Taiwan, respectively, while others are distributed around Taiwan (estimated by Tea Research and Extension Station in 2011). Besides, Chin-Shin-Gantzy is fitted for green tea, and TTES-18, TTES-8, and TTES-7 are the excellent cultivars for making black tea. Chin-Shin-Gantzy is cultivated in New Taipei City, and the other three cultivars are mainly planted in Nantou County (estimated by Tea Research and Extension Station in 2011). In addition, varieties TTES-19 and TTES-20 were bred for manufacturing Paochong and Oolong tea, having been protected by the "Plant Variety and Plant Seed Act" in Taiwan since 2004 (Tsai et al. 2004a). TTES-21, on the other hand, was designated in 2008 for black tea procession (Chiu et al. 2009). These cultivars are most urgently desirable for variety identification in Taiwan. Tea commercial products are manufactured through the application ofhigh temperature and the use of fermentation treatments at a panning step. These processes could eventually lead to dramatic DNA degradation. Additionally, tea merchants or farmers often blend the tea with different varieties to increase its flavor or reduce material cost. To solve the above problems, we have reported that DNA markers less than 1 kb are less affected by procession treatments and are useful for variety identification. Moreover, the chloroplast DNA markers with haploid genotypes and maternal inheritance could be effectively applied to identify the mixed-varieties of tea products (Hu et al. 2006). Since most STS and CAPS markers in this study are less than 850 bp, they may have application potential in identifying different varieties or mixed-varieties of processed tea.  Table 1. A and AS: the assamica type; S and SA: the sinensis type ; F and FY: the Taiwanese wild species. The components of the first dimension explaining 24.5% genetic diversity separated C. formosensis from the rest groups, and the components of the second dimension explaining 15.9% genetic diversity isolated C. sinensis var. assamica and C. sinensis var. assamica x var. sinensis hybrid from the other groups.

Genetic diversity of tea germplasm in Taiwan
The consistent results of germplasm classification were found in the principal coordinate analysis and cluster analysis. A total of 55 germplasm can be divided into three groups: sinensis type (S and SA), assamica type (A and AS) and Taiwan wild species (F and FY). The sinensis type (S and SA) and assamica type (A and AS) are generally called cultivated tea (C. sinensis). The former is a shrub with small leaves and can withstand cold climates; while the latter has tall tree-like structure with large leaves and is suitable for warm tropical climates (Banerjee 1992). Besides, the latter has more flavanols content so it was found to be more suitable for making black tea. Meanwhile, the sinensis type has been found to be suitable for manufacturing green tea or Oolong tea (Takeo 1992 Su et al. 2009). It can be well distinguished from cultivated tea (C. sinensis) by the glabrous ovaries and winter buds . In this study, the results of both principal coordinate analysis and cluster analysis have supported that the wild species (C. formosensis) is monophyletic and independent from the cultivated tea (C. sinensis).
The genetic diversity can be accessed by many parameters. The N A (observed number of alleles) is a count of the mean number of alleles with nonzero frequency across loci; the N e (effective number of alleles) is an estimate of the mean number of equally frequent alleles in an ideal population; the H o (observed heterozygosity) is an estimate proportion of observed heterozygotes at a given locus; the H (Nei's gene diversity) is estimated proportion of expected heterozygotes under random mating; the I (Shannon information index) is an index as a measure of gene diversity (Yeh and Boyle 1997). According to the genetic diversity analysis, all parameters or indices showed that higher genetic diversity or genetic variation were detected in the sinensis type (S and SA) and the assamica type (A and AS) than wild species (Table 4). One possible explanation is that the cultivated tea (A, AS, S and SA) originated from diverse regions (China, Myanmar, Thailand, India, and so on) and had frequent inter-crossings. However, genetic recombination only Figure 3 Dendrogram of 55 tea germplasm in Taiwan using 39 STS and CAPS loci by UPGMA method based on modified Roger's distance coefficient. Three major groups were divided in this dendrogram. GroupIincluded C. formosensis (F) and C. formosensis var. yungkangensis (FY), groupII included C. sinensis var. assamica (A) and C. sinensis var. assamica × var. sinensis hybrid (AS), and groupIII included C. sinensis var. sinensis (S) and C. sinensis var. sinensis × var. assamica hybrid (SA). Three subgroups of groupIIIcomprised different introduced germplasm and their derived varieties. occurred in a limited local wild species in Taiwan. This rationalization differs from that of Lai et al. (2001), in which they used RAPD and ISSR markers to evaluate the gene diversity of 37 tea samples in Taiwan. They reported that the native Taiwan wild species had the highest genetic diversity, followed by the sinensis type and the assamica type (Lai et al. 2001). There are two contrarieties that could be raised against this: first, two (Laitou and Shueijing wild tea) of six native Taiwan wild tea samples in Lai et al. (2001) are C. furfuracea instead of C. formosensis authenticated by Su (2007). This would lead to overestimate the diversity of native wild species. Second, all of three assamica varieties surveyed in Lai et al. (2001) merely originated from India, which are not representative of the tea wild species.
The tea industry in Taiwan began in the Jiaqing era of Ching Dynasty (AD 1796 to 1820), and a few tea varieties were introduced from China (Jun 1997). During the Japanese occupation period (AD 1896 to 1945), four landraces including Chin-Shin-Oolong, Dah-Yeh-Oolong, Chin-Shin-Dahpan, and Ying-Jy-Horng-Shin were recommended to the tea farmers. In addition, Hwang-Gan and Horng-Shin-Dahpan were also the prevailing cultivars at that time. Since 1945, the above six varieties have been used as female parents for hybridization breeding (Sanui 2011;Shyu and Juan 1993). According to cluster analysis in this study (Figure 3), Chin-Shin-Oolong and Horng-Shin-Dahpan belonged to Group IIIa, Hwang-Gan was classified in Group IIIb, and the remaining three landraces (Dah-Yeh-Oolong, Chin-Shin-Dahpan, Ying-Jy-Horng-Shin) were categorized in Group IIIc. However, all of these six varieties were introduced from Fukien or Guangdong of China (Sanui 2011).
Genetic vulnerability is a common problem in most of the tea-production countries, because only a few specific varieties are grown in large-scale and not many varieties have been used as the breeding parents (Yao et al. 2008). For example, a famous cultivar Yabukita contributes more than 80% of its tea acreage in Japan for making green tea (Kaundun and Matsumoto 2004). Besides, the other prevailing varieties including Kanayamidori, Sayamakaori, Saemidori, Okumidori, Meiryoku etc. were selected from Yabukita (Tanaka 2012). This could possibly lead some alleles to be eliminated and result in genetic erosion when most cultivars are replaced by a few varieties. Once the dramatically biotic or abiotic stress occurs, it is more likely to cause reduction in the production of the same or close-related cultivars, which could induce a crisis in the tea industry, leading to its possible collapse. In fact, a similar problem also exists in Taiwan. The top three prevailing cultivars in Taiwan take over 84.2% acreage including Chin-Shin-Oolong (57.3%), TTES-12 (13.7%) and Shy-Jih-Chuen (13.2%) (estimated by Tea Research and Extension Station in 2011).
According to leaf morphological characters and ISSR DNA markers, a high similarity between Chin-Shin-Oolong and Shy-Jih-Chuen was found previously (Hu 2004). In this study, the genetic distance between these two cultivars (0.21) is far below the average (0.49) ( Table 4 and Additional file 1: Table S2), and the alleles of all 10 cytoplasmic markers are identical (Additional file 1: Table S1). It was confirmed that Shy-Jih-Chuen originated from Chin-Shin-Oolong. In addition, these two cultivars accounting for 70.5% of all tea plantations in Taiwan, and Chin-Shin-Oolong is also the male parent of another two new varieties, TTES-19 and TTES-20, which were released in 2004 (Tsai et al. 2004a). In order to avoid the genetic vulnerability and increase the genetic diversity of tea varieties in Taiwan, the new parental lines could be referred to as the dendrogram of cluster analysis in this study (Figure 3). The elite parents from different geographical origins or genetic background could also be chosen.

Conclusions
Tea is an important economic crop in Taiwan. Attributed to different eras and production areas, many unique types of tea have been expanded in the island, and accordingly, various genetic resources including introduced varieties, landraces, bred varieties and wild species were adopted. In order to develop a stable, fast and reliable marker system for variety identification and assessing genetic diversity of germplasm in Taiwan, 37 CAPS and two STS markers were successfully designed. Above all, five core markers have been found to be sufficient in identifying the prevailing varieties.
According to the genetic diversity analysis, principal coordinate analysis and cluster analysis on tea germplasm in Taiwan, three points of perception have been proposed. First, the high genetic diversity was found between the cultivated (C. sinensis) and wild species (C. formosensis) in Taiwan, although the genetic resources of wild species have not been used very well. Next, the genetic diversity of wild species among different areas of Taiwan was relatively small. Finally, the genetic relationship among the top prevailing cultivars is too close. Therefore, broadening the genetic diversity of the tea varieties is necessary for tea breeding in Taiwan.

Additional file
Additional file 1: Table S1. The band patterns of each STS and CAPS marker for all 55 core tea germplasm in Taiwan. Table S2. Matrix of genetic distance among pairs of 55 tea germplasm in Taiwan based on modified Roger's distance coefficients. Figure S1. A. Partial nucleotide sequences of three cultivars amplified with G01 primer set, and arrow