SSR (Simple Sequence Repeat), a small segment of DNA, has usually 1 to 6 bp in length that repeats itself a number of times. More than 6,947,000 SSRs were identified by a genome-wide scanning using a in-house python script. The microsatellites were counted based on the number of nucleotides per repeat: 6,866,782 mononucleotide, 309,157 dinucleotide, 42,533 trinucleotide, 53,499 tetranucleotide, 7,355 pentanucleotide and 468 hexanucleotide.

Classification criteria: SSRs were classified into two types based on their locations: the type I SSRs are located in genes and the type II SSRs are found in intergenic regions. Depending upon the arrangement of nucleotides within the repeat motifs, four categories of SSRs can be defined (1, 2): (1) simple perfect SSRs (SP) with continuous repetitive units, e.g., (ATC)n; (2) simple imperfect SSRs (SIP) with one or more interruptions in the run of repeats, e.g., (ATC)mTT(ATC)n; (3) compound perfect SSRs (CP), typically, a mixture of SP SSRs, e.g., (AC)m(AGAC)n; (4) compound imperfect SSRs or interrupted CPs (CIP), a combination of compound and imperfect SSRs, e.g., (ATC)mTTG(AT)kGC(AT)n.

