The primary goal of our study was to identify reliable marker-trait associations for key agronomic and biochemical traits and to accelerate the development of quinoa varieties tailored for cultivation in areas with diverse environments. An early identification and selection of desirable traits, including those that appear late during plant development or at maturity, are crucial for advancing the breeding cycle. Genome-wide association studies using a diverse mapping panel offer a promising approach toward this goal because they do not require the development of permanent mapping populations (biparental or other complex populations) that consume much time.
ICBA’s gene bank has one of the largest quinoa collections outside South America. A diverse collection based on phenotypic and biochemical data was shortlisted to constitute an association mapping panel for further rigorous phenotyping over the years, and for re-sequencing the genomes of interesting quinoa accessions.
Phenotyping
The heritability estimates, range, and Q–Q plots of the phenotypic traits in our study clearly indicate the fitness of data for marker-trait analysis (Fig. 1). The economically important trait TGW revealed the highest heritability in both seasons, suggesting its suitability in breeding programs as an important selection criterion. Heritability of DTF was found to be relatively high (81–90%), while PH showed a slightly lower heritability (53–71%) (Table 1). Most such traits are robust and relatively little influenced by genotype × environment (G × E) interaction. Maldonado-Taipe et al.13 also reported a high heritability of DTF (82.99%) and TGW (91.06%) in a biparental F2:3 mapping population derived from Chilean (PI-614889) × Peruvian (CHEN-109) accessions. Another recent study also reported high heritability of DTF (91.0%) and TGW (89.7%) in a GWAS panel comprising 303 quinoa accessions12. However, the heritability of the PH trait reported by Maldonado-Taipe et al.13 and Patiranage et al.12 in their respective quinoa mapping populations showed a contrasting pattern. Heritability of PH in a biparental mapping population was 38.4%13, whereas in a GWAS panel it was 85.0%12. Al-Naggar et al.14 also reported 60.7% heritability for PH in their studies. This moderate to high heritability of PH indicates its relatively low influence by the environment, although a more robust study involving multi-year and multi-location experiments could provide more robust information. Followed by TGW, GD also showed very high heritability, whereas the other three traits (SD, PL, and SY) revealed moderate to high heritability in both growing seasons. SD and PL also showed high correlation with PH over the seasons (70–89%), signifying the importance of all three traits for quinoa improvement programs through both direct and indirect selection (Supplementary Table S5). Nevertheless, none of the associations revealed confounding effects of PH on SD/PL or vice versa, which was surprising, suggesting that more detailed investigation is required to better understand these traits. Based on previous reports and our study, DTF and TGW are highly robust and reliable for marker-trait analysis in quinoa, followed by PH and PL. Interestingly, in our genomic association study, consistent and robust genomic regions observed over the seasons were evident only for DTF among all agronomic traits investigated.
SNP distribution across the quinoa genome
The overall average SNP density in our study was found to be 3.11 SNPs/10 kb and the B genome had a higher SNP density than the A genome. Patiranage et al.12 reported an average SNP density of 2.39 SNPs/kb while performing whole-genome sequencing of 312 accessions on the Illumina platform, ranging from 0 to 122 SNPs/kb, and no significant difference was observed in SNP densities of the A and B genomes. The size of the B genome has been reported to be larger than that of the A genome in quinoa10. A previous report12 and our study clearly indicate that the genome sizes and SNP densities are not correlated. Furthermore, both studies supported the view of possible recombination and chromosomal rearrangements between the A and B sub-genomes10.
Genetic diversity
In order to understand the genetic diversity of the germplasm used for our study, principal component analysis was conducted that broadly divided the whole population into three diverse groups: groups I, II, and III (Fig. 3a). Group I comprised mainly accessions from lowlands, whereas groups II and III had accessions from highlands. The majority of the lowland-specific group I accessions came from coastal areas of Chile, where temperature, light, and humidity conditions are significantly different from those of the Peruvian/Bolivian highlands. Genotypes categorized in groups II and III were mainly from the highlands of Peru and Bolivia. Recent reports with germplasm panels also divided the populations into those from highlands and lowlands, like our finding11,12. The first and second principal components in our study explained 37.65% and 9.42% of the total variation, respectively. In previous PCA studies11,12, PC1 and PC2 explained 23.35/31.6% and 9.45/6.80% variation, respectively, thereby showing a similar trend as our report. Also, similar to the analyses performed by Mizuno et al.11 and Patiranage12, accessions categorized as “collected from USA” were spread in all three sub-groups in the PC analysis in our study (Fig. 3a). This raises considerable concern about the reliability of passport data obtained from the WSU repository about the correct origin of the quinoa accessions, as addressed before11,12. Unlike Mizuno et al.11, we could not differentiate the two highland groups, even though they showed a clear distinction in Fig. 3a. This might be due to either the higher movement of germplasm materials between southern and northern highland areas or inappropriate collection site information. Similar patterns and problems were reported in a recent study on deciphering the diversity of Iranian wheat landraces15. A well-organized systematic germplasm collection in centers of diversity established using advanced GIS tools should help in overcoming such problems. In the case of quinoa, this is very important as researchers have defined five distinct ecotypes of Chenopodium quinoa based on origin16. However, even using high-density marker platforms, we could not explain the genetic basis of those five groups. This information is important and should be investigated in detail in the future as it could be quite useful for the improvement of quinoa germplasm and its agricultural deployment.
A mean genome-wise LD in the GWAS panel was observed at 118.5 kb estimated as decayed half to its maximum, which is one of the most commonly used thresholds to define unlinked loci17. Interestingly, we observed an inverse relationship between population size and LD decay in our study, which corroborates results from previous reports11,12. The numbers of lines in the populations of Mizuno et al.11, Patiranage et al.12, and our study were 136, 312, and 201, respectively. LD decay was found to be minimal in the population with 312 lines12, intermediate with 201 lines (our study), and maximal with 136 lines11. This clearly underpins the importance of larger population size to minimize LD decay.
Marker-trait associations
Marker-trait associations (MTAs) largely depend on the extent of LD decay in self-pollinated crop species. There are two contrasting reports, one suggesting the non-suitability of GWAS in quinoa for determining genomic associations, while another one strongly favors it12. Mizuno et al.11 performed their study with a set of 136 genotypes phenotyped for six traits using a GBS-based assay with 5753 SNPs, whereas Patiranage et al.12 conducted an analysis involving 312 germplasm accessions phenotyped for 17 traits and GWAS was done with 2.9 million SNP markers identifying 1480 significant MTAs. Our MTA study was conducted on a germplasm set of 187 lines phenotyped for seven traits using 383,251 SNP markers. We successfully identified a total of 387 genomic associations with four out of seven traits investigated (Supplementary Table S3). These three reports (including our current study) highlight the importance of marker density, population size, and phenotypic as well as existing genotypic variability of the target traits.
It was interesting to note that, out of 17 traits subjected to analysis by Patinarage et al.12, only DTF and TGW showed a consistent effect across two seasons. Our analysis also showed the same pattern only for DTF with consistency over years. In our study, TGW also revealed moderate to high correlation with GD (Supplementary Table S5), although no significant marker-trait associations were found for GD, possibly because of the degree of precision for trait phenotyping. Phenotypic variability of GD may largely depend on the type of scale and precision used for the measurements. The higher the precision, the greater the chance for identifying QTLs. We recommend higher precision for genome-wide association studies, especially for traits with smaller dimensions such as grain diameter.
Defining MTAs in GWAS is relatively challenging as compared to biparental mapping analysis because of threshold values. Following a highly stringent and conservative criterion, Bonferroni correction, one loses the majority of the MTAs18, which otherwise might be declared significant considering the FDR method19. A large number of studies are available in which investigators define a threshold of 0.0001 (or less) to declare significant MTAs15. Here, we followed a slightly modified approach in which we considered FDR ≤ 0.1 and − Log10P ≥ 4 as a threshold only when more than one SNP was found associated in a genomic region of 118.5 kb (as per LD decay results). Interestingly, qDTF5B.1 and qDTF7B.1 revealed 19 and 33 SNPs, respectively, associated significantly with DTF within the region (Table 2; Supplementary Table S3). This method can help greatly in developing diagnostic markers and deploying GWAS results in quinoa germplasm improvement processes.
Genomic associations for saponin content were identified on chromosomes 1B, 4A, 5A, and 5B (Table 2) based on previously reported saponin content and genotypic data for our GWAS population20. Jarvis et al.10 and Patinarage et al.12 reported a consistent and major-effect genomic association on chromosome 5B that corroborates our findings. Patinarage et al.12 also reported inconsistent QTLs on chromosomes 1B, 4A, and 5A, as in our study. This suggests that these chromosomes might also harbor minor QTLs underpinning the trait. A more detailed analysis involving contrasting large biparental mapping populations would be beneficial for detecting minor QTLs with a higher confidence on these chromosomes for understanding the genetic basis of saponin content in quinoa seeds.
Candidate genes
Putative candidate genes were searched within LD region of associated genomic regions and homologs reported for controlling flowering time or floral development within the distinct genomic regions that were identified in our study. The genomic region qDTF4B.1 consisted of genes encoding Zinc-finger homeodomain protein 4 (ZHD4) and PHD finger protein. The ZHD4 also called as FLORAL TRANSITION AT THE MERISTEM2 (FTM2) has been reported playing an important role in regulating floral induction in plants21. The PHD finger protein plays a key function in the chromatin-mediated repression of flowering, ensuring the precise control of flowering time22,23. It has been reported to be involved in controlling both vernalization and photoperiod pathways in Arabidopsis24. Another important gene near qDTF8B.1 was Protein POLLENLESS 3 which is essentials for male fertility, especially for microspore and pollen grain production in Arabidopsis25. Among the genes located in the genomic region qDTF5B.1, the FUT1 gene encoding galactoside 2-alpha-L-fucosyltransferase regulates flower development in tobacco (Nicotiana tabacum)26 while HEAT SHOCK TRANSCRIPTION FACTOR B2b (HsfB2b) acts as a transcriptional repressor of VIN3, a gene induced by long-term cold for flowering in Arabidopsis thaliana27. Further downstream of qDTF5B.1, a gene encoding transcriptional regulator SLK2 was located within the LD region. SLK2 functions together with co-regulator SEUSS in LEUNIG-regulated processes during flower development, thereby controlling flowering time28. The genomic region qDTF7A (1.92–2.07 Mb) includes WUSCHEL-RELATED HOMEOBOX 3 (WOX3) and a gene encoding an ankyrin repeat-containing protein. WOX3 genes are involved in the formation of lateral sepals and sepal margins in flowers29. Although ankyrin repeat-containing protein in quinoa has, to our knowledge, not been reported to be involved in flowering control, Arabidopsis plants with a mutated ankyrin repeat-containing protein showed delayed flowering30. The genomic region downstream of qDTF7A contained a gene encoding the protein UPSTREAM OF FLC (UFC). The UFC gene is adjacent to FLOWERING LOCUS C (FLC), which responds to vernalization and acts as a floral repressor in many temperate species31,32.
The genomic region qTGW1B.1 mapped for thousand-grain weight consists of 14 genes in the LD region. However, one gene encoding Agamous-like MADS-box protein AGL12 was of importance as it has been reported to be involved in numerous developmental process including seed and embryo development33. The genomic regions associated with panicle length consists of genes encoding FRL4B: FRIGIDA-like protein 4b and FAR1-RELATED SEQUENCE (FRS) 7 protein in genomic region qPL3A.1 and F-box/FBD/LRR-repeat protein in qPL7B.1. The FRIGIDA-like protein and FAR1 proteins has been involved in controlling flowering time, leading to late flowering phenotype in plants and inducing the development of enlarged panicles34,35. Further, FRS members has been reported as the candidate genes for Panicle and Spikelet Degeneration (PSD) gene in rice36 and has been mapped in qPL5 of rice37. F-box protein encoding genes has been reported to be involved in floral transition as well as panicle and seed development38.
The genomic region qSPN5B.1 on chromosome Cq5B (8.90–9.54 Mb) associated with saponin content contains two copies of bHLH25 and several UGT (UDP-GLYCOSYLTRANSFERASE) genes. The bHLH25 transcription factor has been identified in several studies and reported as TRITERPENE SAPONIN BIOSYNTHESIS ACTIVATING REGULATOR 2 (TSAR2) controlling the production of saponin in quinoa seeds10,12,13. Apart from bHLH25, the genomic region qSPN5B.1 also contains several UDP-GLYCOSYLTRANSFERASE genes involved in the glycosylation of triterpenoid saponins39,40,41,42. The other genomic regions associated with saponin content in quinoa seeds primarily contain genes involved in fatty acid and carbohydrate metabolism.
The SNP markers and candidate genes identified in this study can be efficiently used to develop the molecular markers-based robust and reliable selection system that can further help to fasten the pace of conventional breeding efforts. This is crucial for the traits which appear only during late growth stages (flowering duration and plant height) or only after harvesting (saponin content). Early identification of desired segregants through molecular markers will allow us to keep only positive plants without maintaining full population that include unwanted genotypes and save the limited resources. Similarly crossing of the desired plants would be possible without waiting for the trait to appear during late growth stage. The developed improved quinoa genotypes will be easily accepted by farmers once the major bottlenecks in quinoa genotypes (saponin content and long duration) will be overcome.
Quinoa has earned special attention worldwide due to its exceptional nutritional quality and health benefits and its ability to adapt to marginal environments especially saline soils and drought stressed marginal agroecosystems. Most of the target regions need improvement of traits like shortening of growth period, sweet grain types, reduced water requirement and non-lodging type sturdy stem with medium plant height. Current varieties in the public domain do not have these desired traits hence farmers are discouraged from cropping quinoa either due to non-availability of preferred varieties or extra cost incurred on post-harvest processing. Based on these assumptions, we targeted to breed the desired varieties for the region which are high yielding, sweet genotype with early maturity. This will allow poor farmers to have access ‘free to use’ sweet quinoa varieties as all existing sweet quinoa varieties are from private sector and mostly poorly adapted to relatively hot regions.