The level of genetic diversity and differentiation of tropical lotus, Nelumbo nucifera Gaertn. (Nelumbonaceae) from Australia, India, and Thailand.

BACKGROUND
Nelumbo nucifera Gaertn., a perennial aquatic macrophyte species, has been cultivated in several Asian countries for its economic importance, and medicinal uses. Two distinct ecotypes of the species are recognized based on the geographical location where the genotypes are adapted, i.e., tropical lotus and temperate lotus. The genetic diversity levels and differentiation of the tropical lotus from poorly studied geographic regions still remain unclear. Here, the population genetic diversity and structure of 15 tropical lotus populations sampled from the previous understudied natural distribution ranges, including India, Thailand, and Australia, were assessed using nine polymorphic SSR markers.


RESULTS
The SSR markers used to genotype the 216 individuals yielded 65 alleles. The highest and lowest genetic diversity estimates were found in Thailand and Indian populations, respectively. STRUCTURE analysis revealed three distinct genetic clusters, with relatively low admixtures, supported by PCoA cluster analysis. Low levels of gene flow (mean N⁠m = 0.346) among the three genetic clusters signified the Mantel test for isolation by distance, revealing the existence of a positive correlation between the genetic and geographic distances (r = 0.448, P = 0.004). Besides, AMOVA analysis revealed a higher variation among populations (59.98%) of the three groups. Overall, the populations used in this study exposed a high level of genetic differentiation (FST = 0.596).


CONCLUSIONS
The nine polymorphic microsatellite markers used in our study sufficiently differentiated the fifteen tropical N. nucifera populations based on geography. These populations presented different genetic variability, thereby confirming that populations found in each country are unique. The low genetic diversity (HE = 0.245) could be explained by limited gene flow and clonal propagation. Conserving the available diversity using various conservation approaches is essential to enable the continued utilization of this economically important crop species. We, therefore, propose that complementary conservation approaches ought to be introduced to conserve tropical lotus, depending on the genetic variations and threat levels in populations.

2014). N. nucifera is mainly distributed in Asia and Australia (Han et al. 2007), and has also been utilized for its economical importance (Yang et al. 2013). In China, for example, N. nucifera seeds are widely used for the preparation of Chinese herbal medicine (Chen et al. 2008;Li et al. 2010), and the rhizome of this species is a common vegetable (Tian et al. 2008). N. nucifera flowers are the main traditional flowers in China, while in India and Vietnam, they are regarded as the national flowers (Chen et al. 2008;Tian et al. 2014).
Lotus flowers are protogynous and usually out-crossed by insects (Kubo et al. 2009). This species can be propagated either by seeds or rhizomes (Goel et al. 2001;Pan et al. 2011). Lotus is capable of producing new hybrids through hybridization between wild and domesticated varieties . So far, a sizable number of cultivars have been developed from N. nucifera (Li et al. 2015). Notably, the wild lotus populations have served as essential germplasm sources for breeding purposes (Xue et al. 2006;Han et al. 2007), and varied agro-climatic conditions have contributed to the existence of diverse genotypes of wild lotus in China .
Recently, morphological features, ecological adaptation, and genetic studies in lotus indicated that the Southeastern Asia lotus is distinct from Chinese lotus (Li et al. 2010). Zhang and Wang (2006) grouped the N. nucifera populations into two distinct ecotypes based on the geographical location where the genotypes are adapted, i.e., tropical lotus and temperate lotus. These ecotypes have shown differences in the duration of flowering, growth, and rhizome morphology. The temperate lotus have annual growth habits and big rhizome, whereas the tropical lotus is perennial, has a small rhizome and long flowering period (Zhang and Wang 2006). Lotus grown in East and North-east Asian countries belong to the temperate group, whereas the lotus grown in South-east Asian countries and Australia are considered as tropical ecotype (Zhang and Wang 2006;Li et al. 2010). A previous study revealed that the Thailand lotus, one of the tropical lotus groups, had 2 to 3 months longer flowering periods than the Chinese cultivars (Li et al. 2010;Yang et al. 2013). Tropical lotus is often used for enhancing the ornamental value of temperate lotus by providing valuable traits for developing varieties with a more extended flowering period (Li et al. 2010;Liu et al. 2012;Yang et al. 2013).
Future breeding programs and conservation of N. nucifera will depend on the available knowledge of genetic variation among populations (Han et al. 2009;Hu et al. 2012). In addition, genetic diversity and structure studies avail platforms for undertaking evidencebased management planning (Luo et al. 2018). Previous studies have assessed the genetic diversity of N. nucifera (Han et al. 2009;Pan et al. 2011), with much consideration being accorded to the temperate lotus. These studies have revealed higher genetic diversity levels for N. nucifera using varied molecular markers (Na et al. 2009;Han et al. 2009;Pan et al. 2011). On the contrary, the population genetic studies on tropical lotus have mostly utilized lotus populations from Thailand, however, with relatively low sampling (Li et al. 2010;Hu et al. 2012). Comparing the genetic diversity levels of the two ecotypes yields striking results. For instance, Liu et al. (2012) indicated that tropical lotus had lower genetic diversity than temperate lotus. However, a more recent study by Yang et al. (2013) showed that the wild tropical lotus had higher genetic diversity than the temperate ecotype. Hu et al. (2012) also reported that the natural lotus accessions from Thailand differentiated from other natural lotus accessions in South-east Asian countries and China using variable molecular markers (AFLPs and SSRs). Among these studies, only a few samples of the tropical lotus were included, and the representations of the tropical lotus were insignificant in comparison to temperate groups. To this day, the genetic diversity of the tropical N. nucifera ecotypes has not explicitly been addressed from the other major distribution regions, including India and Australia, compared to Thailand populations. The genetic diversity levels and differentiation of the tropical lotus from these poorly studied geographic regions remain unclear. Therefore, there is the need to conduct population genetic studies of tropical lotus from these understudied areas.
Here, we genotyped 15 tropical N. nucifera populations sampled from the natural distribution ranges in Australia, India, and Thailand using nine polymorphic microsatellite markers. We aim to (i) evaluate the level of genetic diversity of the tropical lotus populations from the previous poorly studied natural distribution ranges, and (ii) estimate the degree of differentiation and population structure of N. nucifera.

Sample collections and DNA extraction
Fifteen wild tropical N. nucifera populations comprising of 216 individuals were sampled from the natural distribution range in Australia, India, and Thailand (Table 1; Fig. 1). N. nucifera is a clonal species, and therefore, to reduce the resampling of the same individuals, leaves samples were collected at a minimum 10 m apart. The collected leaves were dried with silica gel and preserved in the refrigerator until DNA extraction. The DNA extraction and quantification followed a similar protocol as published in Islam et al. (2020), followed by preservation in a freezer at − 20 °C for subsequent analysis.

SSR genotyping and PCR amplifications
Nine SSR markers previously developed for N. nucifera by Tian et al. (2008), Kubo et al. (2009), andPan et al. (2010), were selected for the present study (Additional file 1: Table S1). Fluorescent dye FAM (Applied Biosystems, Foster City, CA, USA) was used to label all forward primers. The polymerase chain reactions (PCR) and amplifications were performed following Islam et al. (2020). PCR products were confirmed by electrophoresis on 2.0% (W/V) agarose gel stained with ethidium bromide. Later, ABI 3730 XL automated sequencer (Wuhan Gene Create Biological Engineering Co. Ltd., Wuhan, China) was used to identify the products. GeneScan 500 LIZ (Applied biosystems) was used to check the dye sizes in each lane to allow the correct determination of fragment size. Lastly, the allele sizes were detected manually using Gen-eMarker2.2.0 (Soft Genetics) with the default settings.

Data analysis Genetic diversity indices
The Cervus version 3.0 program (Kalinowski et al. 2007) was used to assess Hardy-Weinberg Equilibrium deviations with Bonferroni corrections as well as the polymorphic information content (PIC) for each SSR marker and inbreeding coefficient (F IS ) for the 15 N. nucifera populations were estimated. GenAlEx version 6.5 (Peakall and Smouse 2012) was used to determine the genetic diversity characteristics of all loci and populations. The following parameters were assessed for each SSR; the effective number of alleles (Ne), expected heterozygosity (He), observed heterozygosity (Ho), the number of alleles per locus (Na). Populationbased characteristics estimated included; the effective number of alleles (N E ), expected heterozygosity (H E ), observed heterozygosity (H O ), the number of alleles per locus (N A ), Shannon's information index (I S ), the number of private alleles (Np), and inbreeding coefficient (F IS ).

Population structure
To examine the level of genetic variation among N. nucifera populations, and estimates of genetic differentiation (F ST ), the analysis of molecular variance (AMOVA) was done in Arlequin version 3.1 (Excoffier et al. 2005). STRU CTU RE version 2.3.3 (Pritchard et al. 2000) that uses a Bayesian algorithm was used to assign populations to genetic clusters. 100,000 burn-in steps and ten iterations for each K from 1 to 15 were run independently (where K = Number of populations), followed by 1,000,000 Markov Chain Monte Carlo (MCMC). An online tool, STRU CTU RE HARVESTER (http://taylo r0.biolo gy.ucla.edu/struc tureH arves ter/) (Earl 2012), was used to analyze the results and predict the suitable number of genetic clusters. K consistent values were readjusted in CLUMPP version 1.1.2 (Jakobsson and Rosenberg 2007) while employing the Greedy algorithm with 10,000 replications. The resulting genetic structure of the 15 populations N. nucifera was constructed and displayed in DISTRUCT version 1.1 (Rosenberg 2004). Using Nei's genetic distance matrix (Nei et al. 1983), GenAlEx version 6.5 (Peakall and Smouse 2012) executed the principal coordinate analysis (PCoA).

Bottleneck analysis
Bottlenecks across the 15 N. nucifera populations were assessed using the Bottleneck version 1.2.02 program (Piry et al. 1999). Wilcoxon's sign-rank tests were performed with 10,000 simulations at the 5% significance, using the two-phase model (TPM = 70%SMM +30%IAM), the step-wise mutation model (SMM), and the infinite allele model (IAM). The deviation of the populations from normal L-shaped distribution (mode shift), which indicates a demographic bottleneck on populations, was also checked (Luikart and Cornuet 1998).

Estimation of historical gene flow
The number of migrants per generation (Nm) among the genetic groups (K = 3), in the previous 4Ne generations (Ne = effective population size), was estimated by MIGRATE-n version 3.7.2 program (Beerli 2012). A Bayesian and coalescent inference approach (Beerli 2006) was used while applying the Brownian approximation model. The θ (mutational scaled effective population size) and M (mutation scaled migration rate) were obtained from the program with settings attuned to default, then used to approximate Nm. The Nm was estimated as in the equation; Nm = [(θa × Mb → a)/4], i.e., population b migrants per generation to population a (Beerli 2012).

Characteristics of the microsatellite markers
All SSRs markers observed significant Hardy-Weinberg deviations. The nine microsatellite markers used in the present study yielded high polymorphisms in all populations. For each microsatellite marker, the effective number of alleles (Ne) varied from 1.223 to 1.956 (mean = 1.475). Observed (Ho) and expected heterozygosity (He) estimates ranged from 0.067 to 0.631 and 0.140 to 0.474, respectively (mean, Ho = 0.274 and He = 0.245) (Additional file 1: Table S1). Nelumbo-13 and PR05 markers had the highest number of alleles (Na = 10). PIC is considered as a measure of the informativeness of the SSR markers (Babu et al. 2014), and high PIC values are reported to have a high discriminating ability and recommended for population genetic diversity studies (Ngailo et al. 2016). PIC values of the microsatellite markers used in our study varied from 0.322 at locus Nelumbo-32 to 0.775 at locus PR05 (mean = 0.593). Only, Nelumbo-32 and NNEST17 markers had PIC values less than 0.50.

Genetic diversity of N. nucifera
Sixty-five alleles were identified in the 15 tropical N. nucifera populations, ranging from six to ten alleles per locus (mean = 7.220) (Additional file 1: Table S1). The number of effective alleles (N E ) and the number of observed (N A ) per population ranged from 1.140 to 2.023 and 1.333 to 2.667, respectively. The heterozygosity levels, observed and expected, ranged from 0.044 to 0.824 and 0.081 to 0.470, respectively. The average expected heterozygosity (H E = 0.358) was higher in Thailand than in both India and Australia. Similarly, Shannon's information index varied from 0.129 to 0.730. Private alleles were detected in nine of the 15 populations examined. Eleven of the 19 observed private alleles were found in populations sampled from Thailand, and population T4 had the highest count (6) ( Table 2). Eleven populations showed low levels of coefficient of inbreeding (F IS ). This observation reflects the presence of high cross-pollination levels among populations. However, only four populations (A1, A6, I1, and I6) had positive F IS values, suggesting that there existed inbreeding among the individuals of these populations. Populations T4 and I3 had the highest and lowest genetic diversity (H E ), respectively. Overall, the microsatellite markers showed a low genetic variation in N. nucifera populations.

Genetic structure of N. nucifera
The Bayesian clustering in STRU CTU RE suggested three genetic clusters in the N. nucifera populations, according to delta K. These populations were divided geographically according to the three countries (India, Thailand, and Australia), except for two Australian populations that were assigned together with the Indian populations (Fig. 2). Among the 15 tropical N. nucifera populations, the highest (2.383) and lowest (0.005) genetic distance was found in the populations sampled from Australia (Additional file 2: Table S2). The PCoA analysis revealed similar clustering patterns as STRU CTU RE results, including the assignment of the two Australian populations to the Indian cluster (Fig. 3). The first and second axes in the PCoA explained 72.75% of the total variation (Fig. 3).
Results of AMOVA revealed a higher variation among populations (59.98%) than within populations (40.02%) of the three countries, supported by high levels of genetic differentiation (F ST = 0.596) ( Table 3). In addition, the Mantel's test confirmed the existence of a significant positive correlation between Nei's genetic distance (Nei et al. 1983) and geographic distance (km) for all pairwise populations (r = 0.448, P = 0.004) (Fig. 4). Mantel test results indicated that the geographical distribution of the populations had contributed significantly to the observed genetic diversity.

Population demographic bottlenecks and historical gene flow
The results of the bottleneck analysis are outlined in Additional file 3: Table S3 Table S4.

Genetic diversity in wild N. nucifera populations
In the current study, we included 15 tropical N. nucifera populations sampled from natural distribution ranges in Australia, India, and Thailand. The genetic diversity (H E ) values varied from 0.160-0.277; 0.081-0.216 and 0.254-0.470, in Australia, India and Thailand populations, respectively. This highlights the presence of wide genetic variability within each genetic group. The overall findings of this study exhibited low genetic diversity levels (mean H E = 0.245), which is mainly explained by low gene flow levels and clonal propagation. The result of the inbreeding coefficient analysis revealed that A1, A2, I1, and I2 populations had significant positive F IS values, which may be attributed to the high level of matings between closely related individuals. This phenomenon might have contributed to the observed low level of genetic diversity in these populations. The mean expected heterozygosity found in this study is greater than that recorded for the tropical lotus (H E = 0.152) by Liu et al. (2012). This value is much lower than the value (H E = 0.320) reported for the same ecotype by Yang et al. (2013). Moreover, the genetic diversity level found in this study is lower than the values reported for other aquatic plant species such as Ottelia acuminata (H E = 0.351) by Zhai et al. (2018) and Ottelia acuminata var. jingxiensis (H E = 0.441) by Li et al. (2019) using microsatellites markers. Comparably, the highest genetic diversity estimate (H E ) in this study was found among the populations from Thailand (H E = 0.360), whereas the least genetic diversity was found in Indian populations (H E = 0.156). The higher genetic diversity found in Thailand populations might be related to the inherent broad genetic base of the germplasm or the presence of suitable growing conditions for the species in this country. The highest genetic diversity values (H E = 0.470), was found in T4. From the result, we can suggest that the higher genetic diversity level revealed in this population might have been accumulated during a long evolutionary history of the population. The highest genetic diversity and private alleles (H E = 0.470 and N P = 6, respectively), were found in T4 (Table 2). From the result, we can suggest that the higher genetic diversity level revealed in this population  Liu et al. (2012), which observed a mean value of 0.537. Recently, the study conducted by Islam et al. (2020) on N. lutea sampled from the USA revealed higher PIC values (0.793), generally greater than the current study PIC. Overall, the present results portrayed that the SSR markers used are handy for genetic diversity studies in the N. nucifera germplasm. Unlike sexual reproduction, in clonal propagation, there is no genetic recombination, and only a rhizome is used as a seed (Chen et al. 2008). Therefore, a cultivar's total heterozygosity remains the same when using the same rhizomes vegetative propagation. As a result, plants propagated by clonal methods generally have low genetic variation than sexually propagated ones (Chen et al. 2008;Xue et al. 2006). Li et al. (2015) inferred that asexual reproduction through rhizomes in N. nucifera contributed to low genetic diversity. Positive F IS values observed in four N. nucifera populations (Table 2) reflect the presence of excess homozygote individuals, and it is expected to contribute to the lower genetic diversity in these populations, an aspect supported by Hyten et al. (2006) study. Similarly, Beatty and Provan (2011) stated that the habitat of species found at the peripheral areas are highly fragmented, and the populations are often found at the edge of their ranges. In the present study, most of the N. nucifera populations were sampled from the peripheral areas, for instance, the Australian populations and some Indian populations. Therefore, it is likely that these populations had already been affected by habitat fragmentation, which eventually leads to lower genetic diversity.

Population genetic structure
The higher percentage of variation in N. nucifera was found among, compared to within populations of the three countries; however, this dissimilarity was not significant. The PCoA investigation revealed that N. nucifera populations were distinct. Similarly, the STRU CTU RE analysis showed three distinct genetic clusters with low admixtures, supported by PCoA cluster analysis (Fig. 2). A1 and A6 consistently clustered together with the Indian populations. This clustering pattern is difficult to explain in terms of the proximity of geographical distance. However, we infer that the populations either might have diverged a long time ago from the same ancestors or recently introduced by humans. Xue et al. 2006 suggested that birds can occasionally disperse seeds. The geographic distance between A1 and A6 populations is approximately 106 km, and gene flow can occur between these populations. Because of this, the populations might have possessed common ancestral polymorphism, which differentiates them from other Australian populations. The low sampling of the two populations might have also contributed to the observed clustering pattern. The high level of differentiation (F ST = 0.596) in the present study is lower than the previous findings reported for N. nucifera (Han et al. 2007;Pan et al. 2011). A recent study by Islam et al. (2020) identified a lower level of gene flow, founder effect, inbreeding, and common ancestry as the major reasons for genetic differentiation in N. lutea populations in the USA. Slatkin (1987) reported that a lower gene flow (less than one) can cause genetic differentiation among populations. Hence, the high F ST (0.596) and the low gene flow (0.346) found in this study contributed to the observed genetic structure, supported by significant IBD patterns in the study area (r = 0.448, P = 0.004). Zhang et al. (2019) submitted that asexual propagation would also reduce genetic differences among individuals within populations and increase differences among populations.

Gene flow estimation and bottleneck analysis
Gene flow may have a significant impact on the genetic differentiation of the local populations (Storfer 1999). It plays a vital role in influencing genetic variations within populations by limiting inbreeding depression (Robledo-Arnuncio et al. 2014). Results of Migrate-n analysis indicated that the highest gene flow (Nm = 0.577) was observed from India to Thailand, and the lowest (Nm = 0.095) was from India to Australia. The agents of gene flow in lotus can be insects, birds, water currents (Kubo et al. 2009;Xue et al. 2006), and humans. Besides, due to the large geographical distance between Thailand and India, gene flow may not be carried out by insects attributed to the insect's short flight ranges. Therefore, water currents, birds or anthropogenic introductions may be significant among the main drivers of gene flow in lotus between the two countries. Slatkin (1987) indicated that genetic drift results in higher genetic differentiation when the gene flow among populations is less than one (Nm < 1). We, therefore, suggest that genetic drift might have influenced the observed genetic differentiation in the N. nucifera populations hence the low gene flow. Li et al. (2010) also reported a low level of recurrent gene flow among the wild populations of N. nucifera sampled from China, Japan, India, and Thailand. The bottleneck analysis revealed that seven of the 15 N. nucifera populations had experienced bottlenecks, of which, T4 and T5 had significant probabilities (Additional file 3: Table S3). Chen et al. (2019) outlined that habitat loss, fragmentation, and over-exploitation were the major factors that contributed to bottlenecks in N. nucifera populations. Hence, from the observation made in the present study, we presume that some of the populations (T4 and T5) have already been affected by fragmentation.

Implication for conservations
The presence of high genetic diversity (H E ) within crop species plays a critical role in crop improvement programs (Salgotra et al. 2015). Besides, genetic diversity determines the potential of species survival and adaptation in the changing environmental conditions (Otálora et al. 2015;Chen et al. 2019). The lotus varieties currently found under production were obtained by continuous selection from the wide diversity available in the agricultural fields and wild states (Tian et al. 2008). According to Hu et al. (2012), wild N. nucifera populations found in Thailand and northeastern China are valuable germplasm in lotus breeding work. In another study, it was reported that the tropical lotus germplasm found in Thailand was used in breeding to improve the ornamental and economic values of Chinese lotus varieties (Yang et al. 2013).
Notably, breeders used the genetic variation found in wild species to identify agriculturally important traits and introducing them into new varieties (Samiei et al. 2010). This suggests that countries are in one way or the other dependent on other countries' genetic resources for improving their indigenous species. At present, our study realized low genetic variability in most populations. The wetlands used as habitat for tropical lotus have been turned into agriculture, and other land uses (Laongsri et al. 2009). This phenomenon will likely affect the populations, and the genetic diversity might continue to decline. Two (T4 and T5) out of the seven populations that had experienced recent bottlenecks had significant probabilities, indicating that anthropogenic and natural factors had already threatened them. Hence, conservation of these threatened tropical lotus germplasm deserves special attention to ensure their continued availability. The highly diverse populations found in this study could be valuable germplasm for future breeding programs of the crop. Conservation priority should be given to populations with the highest genetic diversity, and those that have exhibited recent bottlenecks. Hence, we suggest the implementation of complementary conservation (i.e., in situ and ex situ) approaches for this species.

Conclusions
Population genetic structure studies of N. nucifera are essential to identify populations with unique traits and design appropriate conservation methods. The nine polymorphic microsatellite markers used in our study sufficiently differentiated the 15 tropical N. nucifera populations based on geography. The populations showed different genetic variability, and the results confirmed that the populations found in each country are unique. Geographically separated populations will likely develop genetic differences due to the adaptation to different habitats. We recommend that future breeding programs and conservation of N. nucifera, to utilize the germplasms of tropical populations with high genetic levels, as yielded in our study. Further studies using additional samples from all the species distribution areas and more markers should be conducted to gain more insights into the population genetic structure of N. nucifera. Conserving the available diversity using various conservation approaches is essential to enable the continued utilization of this economically important crop species. Therefore, based on the findings of this study, conservation priority should be given to populations with a high level of genetic diversity (e.g., T4, in Thailand), and to those that have exhibited bottlenecks. We recommend that complementary conservation approaches should be effected to maintain endangered and the declining populations of tropical lotus.