WebArray 1.4 WebArray 1.4 is a combination of WebArray, WebArrayDB, and CellPred. This file will help you install it. PLEASE read the whole document before installing. Contents: A. Requirements B. Steps to install WebArray/WebArrayDB on POSIX systems C. Steps to install WebArray/WebArrayDB on windows D. FAQ and Tips ************************************************************************* * A. Requirements * ************************************************************************* 1. Unix-like OS or Win32 platform 2. Apache 3. MySQL (>=4.1 with "GROUP_CONCAT" function for cross-platform analysis) 4. R/Bioconductor 5. Python (>=2.3) 6. Ghostscript (ESP Ghostscript 815.04 under Linux and AFPL Ghostscript 8.51 under windows were tested, most likely other versions will work as well.) Apache: Apache should be configured to use python script as CGI, better with mod_python MySQL: Need privilege to create database and add user R/Bioconductor: 1. R Should be built as shared library, i.e. configure it with option "--enable-R-shlib" before making R. Otherwise you cannot use Rpy. -- NOT required any more since RPy is not used! 2. Bioconductor (http://www.bioconductor.org) packages: scatterplot3d, limma, vsn, affy, sma, statmod, siggenes, ade4, made4, and AnnotationDbi. made4 requires scatterplot3d. So download and install a suitable version of scatterplot3d, e.g. # wget http://cran.r-project.org/src/contrib/Archive/scatterplot3d/scatterplot3d_0.3-23.tar.gz # R CMD INSTALL scatterplot3d_0.3-23.tar.gz or directly install it in R: > install.packages('scatterplot3d') The rest can be done with two lines of commands in R: > source('http://www.bioconductor.org/biocLite.R') > biocLite(c('limma', 'vsn', 'affy', 'statmod', 'siggenes', 'ade4', 'made4', 'AnnotationDbi')) 3. Other R packages: (1) RMySQL (sometimes there are more than one mysql version installed, then you need to point out the one is use, e.g. R CMD INSTALL --configure-args='--with-mysql-inc=/usr/include/mysql --with-mysql-lib=/usr/lib64/mysql' RMySQL_0.5-11.tar.gz) (2) multcomp (needs mvtnorm) (3) muStat (needs muUtil and muS2RC) (4) snow (Rmpi or rpvm should be install too if you want to use MPI or PVM for parallel compuation, otherwise, snow will use SOCK mode instead. NOTE that snow only available on POSIX systems like Linux) (5) gplots (for heatmap) This can be done in R: > install.packages(c('RMySQL', 'multcomp', 'muStat', 'snow', 'gplots')) # don't include 'snow' on windows (6) sma > install.packages('remotes') > remotes::install_github('gnyamundanda/sma') Python modules: 1. MySQLdb (http://sourceforge.net/projects/mysql-python, version >= 1.2.1) 2. pycrypto (http://www.amk.ca/python/code/crypto.html) 3. Karrigell. Necessary modules were already packed in WebArray, so the user need to do nothing about it To install python modules, an easy way is to use "Easy Install" (http://peak.telecommunity.com/DevCenter/EasyInstall). The steps can be: if "Easy Install" is installed already, just run command at a console: # eazy_install MySQL-python pycrypto else: (1) download ez_setup.py at http://peak.telecommunity.com/dist/ez_setup.py (2) run command at console: # python /PATH_TO/ez_setup.py MySQL-python pycrypto NOTICE: For windows, please make sure that R, python, Ghostscript (gswin32c.exe) are in your searching PATH. ************************************************************************* * B. Steps to install WebArray/WebArrayDB on POSIX systems * ************************************************************************* 1. download the packages of WebArray: WebArray-X.X.tar.gz (version >= 1.3) 2. unpack them: # tar xzf WebArray-X.X.tar.gz 3. put them to correct locations: Now you get a folder "WebArray-X.X" with two sub-directories: "webarray-X.X" and "cgi-bin/webarray-X.X", containing pages and CGI scripts. Their destination directories are respectively "/var/www/html" and "/var/www/cgi-bin" (or "/home/Your_ID/public_html" and "/home/Your_ID/public_html/cgi-bin" for personal use) # cd WebArray-X.X # mv webarray-X.X /var/www/html # mv cgi-bin/webarray-X.X /var/www/cgi-bin # chown apache: /var/www/cgi-bin/webarray-X.X -R Change the "index.html" file if you install WebArray under "/home/Your_ID/public_html": # cd /home/Your_ID/public_html/webarray-X.X # cp index_user.html index.html Make correct links for webpages: # cp WebArray-X.X.tar.gz /var/www/html/webarray-X.X/sources/X.X If you prefer http://localhost/webarray, you may create links use the commands below. But make sure that you apache has been configured to follow symbolic links for these directories (/var/www/html/ and /var/www/cgi-bin/). # ln -s /var/www/html/webarray-X.X /var/www/html/webarray && ln -s /var/www/cgi-bin/webarray-X.X /var/www/cgi-bin/webarray Then edit the file /var/www/html/webarray-X.X/index.html: replace "webarray-X.X" with "webarray" 4. create user directory, system data directory create RSA keys (RSA keys are OPTIONAL for more safe logon), and some other optional operations. The users directory will be used to store user uploaded data and analysis results, while the system data directory saves the files deposited to the database. # cd /var/www/html # mkdir webarray_users webarray_files # chown apache: webarray_users # chmod 0775 webarray_users # cd webarray-X.X # ln -s ../webarray_users users # ln -s ../webarray_files webarraydb # /var/www/cgi-bin/webarray/initRSA By default, WebArray use POSIX command "sendmail" to send mails to users. If you want to use SMTP to do it, you need set up a smtp account by generate a file named MAIL as follows, then edit it using your own account information: # cp MAIL_example MAIL 5. create databases: # cd /var/www/cgi-bin/webarray-X.X # cp -p DBSRC_EXAMPLE DBSRC Then, edit the file 'DBSRC' to fit your case, typically, change the password. If you have privilege to create MySQL database, you may go forward (use -p only if a password is required for the MySQL manager): # ./mkDB [-d /var/www/html/webarray/webarraydb] -u mysql_manager_id -p Now the database was created. An encrpted version of 'DBSRC', named 'DB', was created too and the orignal file 'DBSRC' was deleted. Other specialized databases can be created: # ./mkmpmdb [-n dbname] [-t tb_definition_file (TABLES_XXX.txt)] [-d data_file_directory] [-u MySQL_Manager_Name] [-p] 6. Parallel computation. (OPTIONAL) WebArrayDB attempts to use multi-processors/cores on a Linux machine (SMP). But correct setting might be necessary. If the R package "snow" runs in "SOCKET" mode. (1) In the file "DAEMON_HOST", there is a line: USER_NAME = 'apache' change the user name 'apache' to one that has the privilege to use the server by "ssh". For a Linux cluster: (1) All software/packages are usable on all nodes. (2) In the file "DAEMON_HOST", there is a line: USER_NAME = 'apache' change the user name 'apache' to one that has the privilege to use the cluster, AND IMPORTANT, if you want each node can run job seprately, you need to create a file "CLUSTER_NODES" (see below). If you disabled the variable "JOB_ON_LOCAL_ONLY" by assigning it a value "False", then other nodes will be allowed to run job alone, then the other nodes should share the folder "/var/www" and "/var/log/webarraydb" with the host node of the computation daemon. You also need to change the cluster user's initial group to "apache" (same to the "GRP_NAME" in the file "DAEMON_HOST"): # usermod -g apache cluster_user_name. (3) The "snow" package in R may use "MPI", "PVM" or "SOCK" mode for parallel compuation. The function "getClusterOption('type')) will tell you which mode is set as default. Different pareparations needed depending on to which mode "snow" was configurated: (a) For SOCK model, you need to create a pure TEXT file named "CLUSTER_NODES" in the directory "/var/www/cgi-bin/webarray-X.X" to list nodes(one node name per line, a same node name can be used repeatly depending on the number of its CPU cores you want to use). Note that the user should be set to be able to ssh among nodes (even if there is only one node) without having to type a password! By default, WebArrayDB limit the computation running in parallel with a node. If you want a job running throuhgout all the nodes, change the value of "PARALLEL_IN_NODE_ONLY" to "False" in the file "DAEMON_HOST". (b) For MPI, you can manually initialize all nodes for MPI by the "lamboot" command, or WebArrayDB will run "lamboot CLUSTER_NODES" instead. (c) For PVM, you need to launch PVM virtual machine, and add all nodes you want to use into the virtual machine. 7. As a root, run the daemon for computation: # cd /var/www/cgi-bin/webarray-X.X # ./analyze_dmp To kill the daemon: # ./analyze_dmp --stop or # kill `ps -a | grep analyze_dmp | cut -d' ' -f1` You may change the way of WebArray/WebArrayDB to schedule jobs by the command analyze_dmp. To get help for usage: # ./analyze_dmp --help If you want to run it automatically, add some lines in /etc/rc.d/rc.local: /sbin/service mysqld start /var/www/cgi-bin/webarray-X.X/analyze_dmp (If you want to use a python not in /usr/bin, e.g. /usr/local/bin instead, and the command "strings /proc/1/environ" does show that /usr/local/bin is prior to /usr/bin, you need change the PATH first, then the second line should be: PATH=/usr/local/bin:$PATH /var/www/cgi-bin/webarray-X.X/analyze_dmp ) and add a link to R under /var/www/cgi-bin/webarray-X.X: # ln -s `which R` /var/www/cgi-bin/webarray-X.X/R You may want to run it as a service by writing some scripts in /etc/rc.d too. It's OK, but bear in mind that analyze_dmp should be launched at the directory "/var/www/cgi-bin/webarray-X.X" since it depends on some other modules here. 8. GO integration. (OPTIONAL) To use GO (Gene Ontology) terms to search probes in WebArrayDB, you need download/install a GO database (MySQL version). Tell WebArrayDB by the "--GO" option when make data base: # ./mkmpmdb [-n dbname] [-t tb_definition_file (TABLES.txt)] [-d data_file_directory] [--GO=GO_dbname] [-u MySQL_Manager_Name] [-p] or tell WebArrayDB later by command 'addGO' if you've made databases: # ./addGO [-u MySQL_Manager_Name] [-p] GO_dbname 9. Demo data and Tutorials integration. (OPTIONAL) You may optionally put the data for demo and tutorial videos on your server: (a) Download the zipped data packages for demo, and put them at /var/www/html/webarray-X.X (b) Download the file "tutorial-X.X.tar.gz", decompress them and put all released files into the folder /var/www/html/webarray-X.X/tutorial 10. Some other optional modifications: (a) In beginning part of the script "DAEMON_HOST", change the email address and bug-reporting message to what you will use. (b) In the web page file "intro.html", change related contact email and other information. ************************************************************************* * C. Steps to install WebArray/WebArrayDB on windows * ************************************************************************* Note: you need the privilege of administrators 1. download the packages of WebArray: WebArray-X.X.tar.gz (version >= 1.3) 2. unpack them. (winrar will do) 3. put them to correct locations: After unpacking you get a folder "WebArray-X.X" with two sub-directories: "webarray-X.X" and "cgi-bin/webarray-X.X", containing pages and CGI scripts. Their destination directories are determined by the location of Apache. e.g, typically for Apache 2.2, they are respectively "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\" and "C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\" Now the page folder might be "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\webarray-X.X", and the CGI folder might be "C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\webarray-X.X". You may change the both folder names from "webarray-X.X" to "webarray" if you like, but remember to make the same change in the file "index.html" by replacing "webarray-X.X" with "webarray". Make correct links for webpages by copy the downloaded package "WebArray-X.X.tar.gz" to "C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\webarray-X.X\sources\X.X" 4. change CGI codes by double-clicking the file "modCode4Win.py" under the CGI folder. 5. create databases: under CGI folder, copy or rename file file "DBSRC_EXAMPLE" to "DBSRC" and edit it to fit your case, typically, change the password. If you have privilege to create MySQL database, you may go forward (use -p only if a password is required for the MySQL manager): C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\webarray-X.X> python mkDB -u mysql_manager_id -p Now the database was created. An encrpted version of 'DBSRC', named 'DB', was created too and the orignal file 'DBSRC' was deleted. Other specialized databases can be created: # python mkmpmdb [-n dbname] [-t tb_definition_file (TABLES_XXX.txt)] [-d data_file_directory] [-u MySQL_Manager_Name] [-p] 6. Parallel computation. WebArray/WebArrayDB can run several jobs in parallel by proper setting in the next step, but cannot do parallel computation for each single job since the "snow" package is not available in R for windows. 7. Run the daemon for computation by double-clicking the file "analyze_dmp.py" or by command: C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\webarray-X.X> analyze_dmp to stop: C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\webarray-X.X> analyze_dmp --stop You may change the way of WebArray/WebArrayDB to schedule jobs by the command analyze_dmp. To know more, use command: C:\Program Files\Apache Software Foundation\Apache2.2\cgi-bin\webarray-X.X> analyze_dmp -h 8. GO integration. (OPTIONAL) To use GO (Gene Ontology) terms to search probes in WebArrayDB, you need download/install a GO database (MySQL version). Tell WebArrayDB by the "--GO" option when make data base: # python mkmpmdb [-n dbname] [-t tb_definition_file (TABLES.txt)] [-d data_file_directory] [--GO=GO_dbname] [-u MySQL_Manager_Name] [-p] or tell WebArrayDB later by command 'addGO' if you've made databases: # python addGO [-u MySQL_Manager_Name] [-p] GO_dbname 9. Tutorials integration. (OPTIONAL) You may optionally put the tutorial videos on your server: Download the file "tutorial-X.X.tar.gz", decompress them and put all released files into the folder /var/www/html/webarray-X.X/tutorial 10. Some other optional modifications: (a) In beginning part of the script "DAEMON_HOST", change the email address and bug-reporting message to what you will use. (b) In the web page file "intro.html", change related contact email and other information. ************************************************************************* * D. FAQ and Tips * ************************************************************************* 1. symbolic link of WebArray (linked to WebArray-X.X) doesn't work on CentOS servers 2. How to update from WebArray 1.0 or 1.1 to WebArray 1.3 3. Details of the file "CLUSTER_NODES" 4. How to run WebArrayDB along on with WebArray 5. How to update existed WebArray/WebArrayDB with a newer release of the same version? 1. symbolic link of WebArray (linked to WebArray-X.X) doesn't work on CentOS servers On CentOS, if you want to make symbolic link under the CGI directory, you may need to change the option in the related section in apache's configuration file (/etc/httpd/conf/httpd.conf): The orignal section may look like: AllowOverride None Options None Order allow,deny Allow from all should be changed to : AllowOverride None Options FollowSymLinks Order allow,deny Allow from all 2. How to update from WebArray 1.0 or 1.1 to WebArray 1.3 (a) Just install the new version as descripted above in the same directories with the old version, or install the new version in another directory. (b) if you decide to create a differenct MySQL database (defined in the file DBSRC), but want to keep all user information, just copy two tables ("users" and "requests") from the old version to the new version. (c) user's files and results can be kept by reusing the old user directory at STEP 3. 3. Details of the file "CLUSTER_NODES" CLUSTER_NODES is needed only if you want to run WebArrayDB on a Linux cluster (a group of Linux servers/nodes), e.g. if there are two Linux machines: linode1 and linode2, each have 4 CPU cores. If you want WebArrayDB use 2 cores of linode1 and 3 cores of linode2, the file "CLUSTER_NODES" should have 5 lines: linode1 linode1 linode2 linode2 linode2 4. How to run WebArrayDB along on with WebArray Although functionally WebArray-1.3 is a combination of WebArray and WebArrayDB. People may still keep a pure WebArray (WebArray <= 1.1). The following steps may help: (a) PORT for computation daemon In the file "DAEMON_HOST" in WebArrayDB, there is a line: port = 1970 change 1970 to something different, e.g. 1971 (b) Database name Before create the database, change the database name in file "DBSRC", there's a line: db = 'webarray' change it to db = 'webarraydb' or something else. (c) User folder When create user folder, you may use a different name as well, e.g., "webarraydb_user" 5. How to update existed WebArray/WebArrayDB with a newer release of the same version? Usually there's little changes between different releases of a same version. Simply replace all files with those in the new release. Make sure that: (a) recover the password (the variable "rc5_passwd") in the file "tools.py" if you changed it before. (b) create database (see Step 5). This won't destroy existed database tables, tables will be created if not existed. User acounts like "guest" and "demo" will be created as well if they don't exist. IMPORTANT: If the structure of an existed table changed (- a comparison of the two versions of TABLES.txt will tell this), there are two scripts to upgrade it ("modColumns.py" and "modTables.py"). "modColumn.py" can be used to change columns names and definitions; "modTables.py" can be used to add new columns or change existing definitions. Run these scripts without parameter will print their usage information. The structure of database won't be change often. But sometimes it does. A recent example is. The "probe" table was changed after Sep 26, 2008, the following commmands will update it: # ./modColumns.py [-u MySQL_Manager_Name] [-p] -d webarray -t probe -s 'chr_start,chr_end,sequence' -o 'probe_start,probe_end, probe_sequence' # ./modTables.py [-u MySQL_Manager_Name] [-p] -d webarray -t TABLES.txt probe (c) recover user name from "apache" for parallel computation (see Step 6) (d) other changes (see Step 10) 5. How to enable WebArray/WebArrayDB read new type of Affymetrix GeneChips? When R is reading a new type of Affymetrix GeneChip, it will try to read meta-data from BioConductor and install it into R's library. So the Daemon should have privilege to write in R's library directory (usually it is at "/usr/lib/R/library"). One way to address it is to set the directory' group same to the Daemon (i.e. "apache"): # chown root:apache /usr/lib/R/library # chmod g+rwx /usr/lib/R/library Xiao-Qin Xia Mon Sep 20 10:31:06 PDT 2010