ELIT: A website shares evalution of lncRNA identification tools
About this project
Numerous tools for lncRNA identification have been developed. Which makes choice a problem. We are devoting to provide advice on various situation.
General guide for users
(Models)
Category | Detail category | Suggest softwares/models/tactics |
---|---|---|
Species | general suggestion | Choose the models of which the species was genetically close between the data for training and the data for testing, but the optimal model can not always be obtained. Use COME and lncScore for supported specie. Follow the result given by the section of different species. |
mammal | FEELnc_all_cl, CPC and hmmscan_A | |
bird | FEELnc_all_cl, hmmscan_both and CPAT_mouse | |
reptile | hmmscan_A, CPC and CPAT_mouse | |
fish | CPC2, CPAT_zebrafish and CPC | |
plant | CPAT_fly, FEELnc_all_cl and CPAT_mouse | |
others | FEELnc_all_cl, CPC and CPAT_mouse | |
Data quality [1] | high | Focus on other metrics such as species and speed. |
low | COME, hmmscan, PLncPRO and CPAT | |
Number of records | small number (eg. less than 5000) | FEELnc is not recommended for lacking of pre-trained models. |
large number (eg. more than 5000) | Parallel running was recommended for PLncPRO, CNCI, CPC and hmmscan. | |
Joint prediction | vote | Overall accuracy of prediction can be promoted by vote prediction. Select no more than 4 suitable models. |
rough | Apply this tactic when specificity is more important than sensitivity and overall accuracy. |
[1] Transcripts in high quality : the completeness of which is in a good situation. Eg., transcripts curated manually or assembled from specialized sequencing like PacBio or CAGE-seq. Transcripts in low quality: transcripts are incompleted or error-assembled. Eg., transcripts assembled from RNA-seq for routine differential expression analysis.
Collections of software
Collection of software for lncRNA indentification
Software packages | Input | Algorithm | Features | Online analysis | Binary/source | Reviewed | supported species |
---|---|---|---|---|---|---|---|
CPC | Sequence | SVM | ORF, consv | http://cpc.cbi.pku.edu.cn/programs/run_cpc.jsp | http://cpc.cbi.pku.edu.cn | true | all species |
CPC2 | Sequence | SVM | Fickett, ORF, pI | http://cpc2.cbi.pku.edu.cn/ | http://cpc2.cbi.pku.edu.cn/ | true | all species |
CNCI | Sequence | SVM | MLCDS | http://www.bioinfo.org/software/cnci | true | all species | |
CPAT | Sequence/(GM and R) | LR | ORF, Fickett, hexamers | http://lilab.research.bcm.edu/cpat/ | https://sourceforge.net/projects/rna-cpat/files/ | true | all species |
FEELnc | Sequence | RF | ORF, k-mer | https://github.com/tderrien/FEELnc | true | all species | |
hmmscan | Sequence | Cut-off | SS | https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan | http://hmmer.org/download.html | true | all species |
longdist | Sequence | SVM | np of ORF, ORF | https://github.com/hugowschneider/longdist.py | true | all species | |
PLEK | Sequence | SVM | k-mer | https://sourceforge.net/projects/plek/files/ | true | all species | |
PLncPRO | Sequence | RF | ORF; consv | http://ccbb.jnu.ac.in/plncpro | true | all species | |
RNAplonc | Sequence | REPTree | k-mer, ORF | https://github.com/TatianneNegri/RNAplonc | true | all species | |
COME | GM | BRF | GC%, conservation, SS | https://github.com/lulab/COME | true | human, mouse, fly, worm and Arabidopsis | |
iSeeRNA | GM | LR | ORF, di-mer,tri-mer, consv | http://sunlab.cpy.cuhk.edu.hk/iSeeRNA/webserver.html | https://sunlab.cpy.cuhk.edu.hk/iSeeRNA/download.html | true | human, mouse |
lncRScan-SVM | GM | SVM | ORF, tri-mer, exon,consv | https://sourceforge.net/projects/lncrscansvm/files/ | true | human, mouse | |
lncScore | Sequence and GM | LR | ORF, exon,MCSS | https://github.com/WGLab/lncScore | true | human, mouse | |
BASiNET | Sequence | SVM | topological measurements of network | https://cran.r-project.org/package=BASiNET | false | all species | |
PLIT | Sequence | RF | ORF, GC%, Fickett, hexamers, Codon bias | https://github.com/deshpan4/PLIT | false | all species | |
LGC | Sequence | GLM | feature relationship: LGC (ORF Length and GC content) | http://bigd.big.ac.cn/lgc/calculator | https://bigd.big.ac.cn/biocode/tools/4/releases/4 | false | all species |
lncRNAnet | Sequence | CNN and RNN | ORF | https://github.com/nofundamental/lncRNAnet | false | all species | |
lncADeep | Sequence | DBN | ORF, hexamer, Fickett, UTR, GC%, SS | http://cqb.pku.edu.cn/ZhuLab/lncadeep/ | false | all species | |
lncFinder | Sequence | RF,SVM,LR,ELM, DL | ORF, hexamer, SS, EIIP | http://bmbl.sdstate.edu/lncfinder | https://CRAN.R-project.org/package=LncFinder | false | all species |
Input: GM (gene model, mostly gtf file), R (reference genome); learning-model: SVM (support vector machine), LR (logistic regression), RF (random forest), REPTree (Reduced Error Pruning Tree), BRF (balanced random forest), DBN (Deep belief network), RNN (recurrent neural network), CNN (convolutional neural network), ELM (extreme learning machine), DL (Deep learning); Features: consv (sequence conservation), SS (secondary structures), np (nucleotide patterns), MCSS ( maximum coding subsequence), MLCDS (the most-like Coding domain Sequence), Fickett (Fickett TESTCODE score), pI (isoelectric point), EIIP(electron–ion interaction pseudo-potential), socf (Sequence-order correlation factors); Web: T (has web server); F (has not web server)
contact us
xqxia@ihb.ac.cn yduan94@ihb.ac.cn