ELIT: A website shares evalution of lncRNA identification tools

About this project

Numerous tools for lncRNA identification have been developed. Which makes choice a problem. We are devoting to provide advice on various situation.

General guide for users

(Models)

Tips for picking models
Category Detail category Suggest softwares/models/tactics
Species general suggestion Choose the models of which the species was genetically close between the data for training and the data for testing, but the optimal model can not always be obtained. Use COME and lncScore for supported specie. Follow the result given by the section of different species.
mammal FEELnc_all_cl, CPC and hmmscan_A
bird FEELnc_all_cl, hmmscan_both and CPAT_mouse
reptile hmmscan_A, CPC and CPAT_mouse
fish CPC2, CPAT_zebrafish and CPC
plant CPAT_fly, FEELnc_all_cl and CPAT_mouse
others FEELnc_all_cl, CPC and CPAT_mouse
Data quality [1] high Focus on other metrics such as species and speed.
low COME, hmmscan, PLncPRO and CPAT
Number of records small number (eg. less than 5000) FEELnc is not recommended for lacking of pre-trained models.
large number (eg. more than 5000) Parallel running was recommended for PLncPRO, CNCI, CPC and hmmscan.
Joint prediction vote Overall accuracy of prediction can be promoted by vote prediction. Select no more than 4 suitable models.
rough Apply this tactic when specificity is more important than sensitivity and overall accuracy.
[1] Transcripts in high quality : the completeness of which is in a good situation. Eg., transcripts curated manually or assembled from specialized sequencing like PacBio or CAGE-seq. Transcripts in low quality: transcripts are incompleted or error-assembled. Eg., transcripts assembled from RNA-seq for routine differential expression analysis.

Collections of software

(Installation)

Collection of software for lncRNA indentification
Software packages Input Algorithm Features Online analysis Binary/source Reviewed supported species
CPC Sequence SVM ORF, consv http://cpc.cbi.pku.edu.cn/programs/run_cpc.jsp http://cpc.cbi.pku.edu.cn true all species
CPC2 Sequence SVM Fickett, ORF, pI http://cpc2.cbi.pku.edu.cn/ http://cpc2.cbi.pku.edu.cn/ true all species
CNCI Sequence SVM MLCDS http://www.bioinfo.org/software/cnci true all species
CPAT Sequence/(GM and R) LR ORF, Fickett, hexamers http://lilab.research.bcm.edu/cpat/ https://sourceforge.net/projects/rna-cpat/files/ true all species
FEELnc Sequence RF ORF, k-mer https://github.com/tderrien/FEELnc true all species
hmmscan Sequence Cut-off SS https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan http://hmmer.org/download.html true all species
longdist Sequence SVM np of ORF, ORF https://github.com/hugowschneider/longdist.py true all species
PLEK Sequence SVM k-mer https://sourceforge.net/projects/plek/files/ true all species
PLncPRO Sequence RF ORF; consv http://ccbb.jnu.ac.in/plncpro true all species
RNAplonc Sequence REPTree k-mer, ORF https://github.com/TatianneNegri/RNAplonc true all species
COME GM BRF GC%, conservation, SS https://github.com/lulab/COME true human, mouse, fly, worm and Arabidopsis
iSeeRNA GM LR ORF, di-mer,tri-mer, consv http://sunlab.cpy.cuhk.edu.hk/iSeeRNA/webserver.html https://sunlab.cpy.cuhk.edu.hk/iSeeRNA/download.html true human, mouse
lncRScan-SVM GM SVM ORF, tri-mer, exon,consv https://sourceforge.net/projects/lncrscansvm/files/ true human, mouse
lncScore Sequence and GM LR ORF, exon,MCSS https://github.com/WGLab/lncScore true human, mouse
BASiNET Sequence SVM topological measurements of network https://cran.r-project.org/package=BASiNET false all species
PLIT Sequence RF ORF, GC%, Fickett, hexamers, Codon bias https://github.com/deshpan4/PLIT false all species
LGC Sequence GLM feature relationship: LGC (ORF Length and GC content) http://bigd.big.ac.cn/lgc/calculator https://bigd.big.ac.cn/biocode/tools/4/releases/4 false all species
lncRNAnet Sequence CNN and RNN ORF https://github.com/nofundamental/lncRNAnet false all species
lncADeep Sequence DBN ORF, hexamer, Fickett, UTR, GC%, SS http://cqb.pku.edu.cn/ZhuLab/lncadeep/ false all species
lncFinder Sequence RF,SVM,LR,ELM, DL ORF, hexamer, SS, EIIP http://bmbl.sdstate.edu/lncfinder https://CRAN.R-project.org/package=LncFinder false all species
Input: GM (gene model, mostly gtf file), R (reference genome); learning-model: SVM (support vector machine), LR (logistic regression), RF (random forest), REPTree (Reduced Error Pruning Tree), BRF (balanced random forest), DBN (Deep belief network), RNN (recurrent neural network), CNN (convolutional neural network), ELM (extreme learning machine), DL (Deep learning); Features: consv (sequence conservation), SS (secondary structures), np (nucleotide patterns), MCSS ( maximum coding subsequence), MLCDS (the most-like Coding domain Sequence), Fickett (Fickett TESTCODE score), pI (isoelectric point), EIIP(electron–ion interaction pseudo-potential), socf (Sequence-order correlation factors); Web: T (has web server); F (has not web server)

contact us

xqxia@ihb.ac.cn yduan94@ihb.ac.cn