Species Name | Genome Version | download url |
---|---|---|
Danio rerio | GRCz11 | Download |
Protopterus annectens | GCF_019279795.1 | Download |
Monopterus albus | GCF_001952655.1 | Download |
Gadus morhua | GCF_902167405.1 | Download |
Clupea harengus | GCF_900700415.2 | Download |
Salmo salar | GCF_000233375.1_ICSASG_V2 | Download |
Lates calcarifer | ASM164080v1 | Download |
Collichthys lucidus | GCA_004119915.2 | Download |
Amia calva | GCA_016984155.1 | Download |
Ictalurus punctatus | GCF_001660625.1_IpCoco_1.2 | Download |
Cyprinus carpio | GCF_018340385.1_ASM1834038v1 | Download |
Perca fluviatilis | GCF_010015445.1 | Download |
Pimephales promelas | GCF_016745375.1 | Download |
Ctenopharyngodon idella | GC_v1 | Download |
Xiphophorus hellerii | GCF_003331165.1 | Download |
Coregonus clupeaformis | GCF_020615455.1 | Download |
Larimichthys crocea | GGCF_000972845.2 | Download |
Siniperca chuatsi | GCF_020085105.1 | Download |
Oryzias latipes | GCF_002234675.1 | Download |
Astyanax mexicanus | GCF_023375975.1 | Download |
Triplophysa tibetana | GCA_008369825.1 | Download |
Oreochromis niloticus | GCA_001858045.2 | Download |
Oncorhynchus mykiss | GCF_013265735.2_USDA_OmykA_1.1 | Download |
Labeo rohita | GCA_004120215.1_ASM412021v1 | Download |
Hypophthalmichthys molitrix | CNP0000974 | Download |
Channa argus | GCA_004786185.1 | Download |
Lepisosteus oculatus | GCF_000242695.1 | Download |
Tetraodon nigroviridis | GCA_000180735.1 | Download |
Pangasianodon hypophthalmus | GCF_009078355.1_GENO_Phyp_1.0 | Download |
Puntigrus tetrazona | GCF_018831695.1 | Download |
Takifugu rubripes | GCF_901000725.2 | Download |
Gasterosteus aculeatus | GCF_016920845.1 | Download |
Nothobranchius fuzeri | GCF_027789165.1 | Download |
Megalobrama amblycephala | GCF_901000725.2 | Download |
Tachysurus fulvidraco | GCF_003724035.1_ASM372403v1 | Download |
Gene basic information: a: Gene basic information, species, chromosome, chromosome position, original database id and description b: Sequence quick pull, select nucleic acid or protein will be presented in fasta format c: Gene compared with Swiss-prot vertebrate data id corresponding information d-e: GO/KEGG annotation information.
RNA information: a: Visualization of the gene transcript node, click on the transcript to browse the substructure and download the corresponding sequence b: Visualization of the transcriptome analysis on the gene information page (the process of selecting the data set is similar to the "data selection" in the previous "Omics Online Analysis" " part) c: Statistical test analysis parameter setting d: Analysis result picture display, click "picture ggplot file" below to download the corresponding ggplot2 file, upload it to the Retuning Plots part of Visual Omics, and you can perform online high-freedom degree of picture editing.
protein information: a: Prediction and visualization of protein domains corresponding to genes, below are predicted result files, visualization pictures and corresponding ggplot2 files. b: Protein subcellular localization c: Protein interaction information
SNP information: a: SNP information associated with the FishCODE gene page, users can click on the SNP form row to jump to the detailed information page of the SNP, b: basic information of the SNP site c: SNP flanking sequence (optional distance) d: multiple genome positions Point comparison information e: SNP genome annotation information. f: Population type, size, sampling location and other information. g: Article information h: SNP visualization information display. Click the target variation to display the position information of the SNP and its adjacent sites on the genome, as well as the detailed information of the SNP site.
Macro related information: a: Bisulfite-Seq DMRs analyzes the associated gene information, and associates the gene with the experimental conditions of the data set; b: the gene obtained by RNA-Seq differential analysis, associates the gene with the experimental conditions of the data set.
The purpose of this part of the code is to help users pre-process their own transcriptome raw data on their own machine. The output of this script is the gene count and TPM file. These output results can be input by the user as an input file to the "upload own data" section of FishCODE's online transcriptome analysis.
Considering the large-scale computing power required for the pre-processing of omics raw data, this series of processing scripts currently only supports Linux or Linux-like systems. More local software supported by the system are in our development list, so please pay attention to our webpage. It is worth noting that although FishCODE focuses on the analysis of fish omics data, this series of scripts is a general omics data processing flow, and users can use this processing flow to process any omics annotations (at least genome and gff files) Species.
Version Name | Modify Time | Download | Breif Introdunction |
---|---|---|---|
Transcriptome_linux_1.1 | 25 June 2023 | download | The original version, users can use this series of scripts to process omics raw data on the linux system |
The pp.conf file records the absolute paths of the dependent software mentioned below. Before you start you must configure this file and replace the execution path of the software with the absolute path to the software on your server!
Dependent software: gffread, salmon 1.4.0, bedtools v2.26.0, mashmap
$ bash First_bulid_index.sh ./build_faidx_example/GCF_901000725.2_fTakRub1.2_genomic.fna ./build_faidx_example/GCF_901000725.2_fTakRub1.2_genomic.gff 30
Note: 1. Your genome and corresponding annotation files must be in the same folder. 2. All the following working directories default to the decompressed directory.
3. The first parameter is the genome location, the second parameter is the gff file location, and the third parameter is the number of threads
Dependent software: NGSQCToolkit_v2.3.3 (IlluQC.pl)
$ bash filter.sh 792045 SRR17641338
Note: The first parameter is the name of the upper folder where fastq.gz is located, or the project name, and the second parameter is the prefix of fastq.gz. For the specific file structure, please refer to the compressed package. The user must create a project under the parent directory 00_data, and store the sequencing files to be processed under the project name. "./00_data/projectname/*fastq.gz"
Dependent software: salmon 1.4.0
$ bash salmon.sh 792045 SRR17641338 ./build_faidx_example/GCF_901000725.2_fTakRub1.2_genomic.fna ./build_faidx_example/GCF_901000725.2_fTakRub1.2_genomic.gff
Note: The first parameter is the project name of the second step, the second parameter is the prefix of fastq.gz, the third parameter is the genome location, and the fourth parameter is the location of the gff file (note that the location of this genome and annotation file must Strictly the input genome and gff file path for the first step)
1. The output result of the third step will be in 01_result/792045/SRR17641338_quant/ under the same directory
2. All directory structures and demonstration data examples are included in the compressed package, allowing users to try it easily.
3. In order to facilitate users to generate transcriptome data corresponding to the species included in FishCODE, we provide a download package of the genome and annotation files of FishCODE's 35 fish species in the "Help->Usage->gene information" section.
The purpose of this part of the code is to help users preprocess their own methylome (only referring to the "Bisulfite-Seq" library construction strategy) raw data on their own machines. The output of this script is a file of methylation ratios for the sites. Users can directly import these output results into the methlKit package as input files.
Considering the large-scale computing power required for the pre-processing of omics raw data, this series of processing scripts currently only supports Linux or Linux-like systems. More local software supported by the system are in our development list, so please pay attention to our webpage. It is worth noting that although FishCODE focuses on the analysis of fish omics data, this series of scripts is a general omics data processing flow, and users can use this processing flow to process any omics annotations (at least genome and gff files) Species.
Version Name | Modify Time | Download | Breif Introdunction |
---|---|---|---|
Methylome_linux_1.1 | 25 June 2023 | download | The original version, users can use this series of scripts to process omics raw data on the linux system |
Dependent software: FASTX Toolkit 0.0.13
$ bash filter_fastx.sh 555065 SRR9697472
Note: The first parameter of the quality control script is the project name, and the second parameter is the prefix name of fastq.gz. But before you run the script, you need to replace "@SRR" on line 20 of "filter_fastx.sh" with the special flag string of your sequencing data. What are special identifiers? Take NCBI's original sequencing file as an example. The special string of SRRXXXX.fastq.gz is "@SRR". You can zless open the fastq.gz file to view the first line and replace the "NCBI's" in line 20 of filter_fastx.sh. @SRR".
Dependent software: BSMAP 2.9, samtools 1.3.1, Python 2.7.15 [sys, time, os, array, optparse]
$ bash methy.sh 555065 SRR9697472 ./zebrafish/GCF_000002035.6_GRCz11_genomic.fna
Note: The first parameter of the quality control script is the project name, the second is the prefix name of fastq.gz, and the third parameter is the absolute path of the genome of the corresponding species.
1. The output result of the third step will be in 02methyPos/01methratio/555065 under the same directory
2. All directory structures and demonstration data examples are included in the compressed package, allowing users to try it easily.
3. *methratio.txt is the final output result, where *methratio is the methylation status of all sites, and the final result is the filtered site file with coverage
4. Considering that methylation computing resources consume a lot, we recommend that your running server has at least 100G of RAM, 200G of storage space, and 10 CPU cores. (actually depends on your genome size)