STATISTICS



1. Phylogenetic tree of species in FishGET


The phylogenetic tree of 8 species was made by TimeTree public resource (http://www.timetree.org/).

  • The species in FishGET have span two different whole genome duplication events found in teleost fish.
  • These species are covered Amiiformes, Perciformes, Salmoniformes, Siluriformes, Characiformes, Cypriniformes.
  • Amia calva belong to Amiiformes, which is a Holostei specie that diverged before the TGD (teleost specific whole genome duplication event).
  • Oncorhynchus mykiss belong to Salmoniformes, while Salmonids' gene fractionation may still be ongoing in their lineage because of an additional and relatively recent WGD event (SaGD: salmonid-specific whole genome duplication event).

2. Overview of the data analysis process


FishGET collected 95 bioprojects of 8 species from the SRA database and 2 bioprojects of Ctenopharyngodon idella from the GSA database.

  • Articles corresponding to these bioprojects have been manually organized, among which 79 bioprojects have found matching articles, and the number of articles is 76.
  • The transcript sources of Danio rerio are ensembl, LncRBase V.2, ZFLNC. We downloaded a high resolution Danio rerio embryo development bioproject (PRJEB12982, White, R. J.) for expression profile construction.
  • Except Danio rerio, other species almost covered all illumina pair-end RNA-seq data in SRA.


  • In this study, the raw data have been uniformly analyzed with a high standard, so that the analysis results have a high accuracy and reliability, and different data sets of different species could also can be compared and analyzed.
  • According to quality control, 89 bioprojects were used for transcript assembly and 93 bioprojects for expression profile construction.
  • Ctenopharyngodon idella and Oncorhynchus mykiss each have 2 bioprojects, which are not used for transcript assembly or expression profile construction, but only for article information storage.

3. Homology of transcriptome from 8 species


Using the Bidirectional Best hit (BBH) method, we analyzed the homology of transcripts between each two species.

  • If a pair of transcripts from two species has the highest correlation with each other, the two transcripts are considered homologous.
  • Homologous gene pairs were identified according to whether they were representative RNA or not.
  • Among them, Danio rerio and Ctenopharyngodon idella have the most obvious homology.

4. Distribution of bioprojects by experiment type


All bioprojects have been manually organized and divided into 18 experiment types to help users conduct comparative analysis.

  • Baseline is the experiment type with the largest number of both bioprojects (32) and samples (675), which is available for each species.
  • Challenge is the second most experiment type, mainly concentrated in Ctenopharyngodon idella and Oncorhynchus mykiss.