Tuesday, 5 April 2016

Setting up bioinformatics workbench on university cluster

I use miniconda to install a genomics workbench on our university cluster for my own use. This should be set up in a way that means I can setup the same settings on a separate computer if required. Our cluster has python 2.5 (on centos5) and python2.6 (on centos 6 nodes) already installed, but I need python3 for most of the tools I use (snakemake in particular) and python2.7 for MACS2.

The required tools, and their application areas are:

General NGS
- BWA, Bowtie2, FastQC, fastx_toolkit, HTSlib, Picard, Samtools

ChipSeq
- MACS2, Phantompeakqualtools, R::SPP

RNASeq
- Tophat, R::{edgeR, DESeq2, limma}

Statistical genomics
- JAGS, R::rjags, various bioconductor packages

Network analysis
- networkx, R::igraph

Scripting
- Python3, R, (Python2.7 for MACS2), RUnit, nose, numpy

Workflow management
- Snakemake, rpy2 (run R in snakemake), pysftp (cp to/from a storage server)

# git, vim etc are assumed to already be present

---------------
I can't install from the web within R on our cluster: everything has to be installed using R CMD INSTALL <dependencies> <reqd_package>

Most things can be obtained using miniconda, and by installing through miniconda, should prevent me from needing to determine dependencies etc. Miniconda is used since anaconda is pretty bloated for use in a quick setup / teardown setting.

----------------
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Accept the license and install to ${HOME}/miniconda3
# Log out and log back in

Add the channels for R and bioinformatics tools
conda config --add channels r
conda config --add channels bioconda

Had to search for a conda channel that contained jags
conda config --add channels trent

Set up the bioinformatics environment:
conda create -n bfx --dry-run --file bfx.conda.packages
conda create -n bfx --file bfx.conda.packages

Where bfx.conda.packages looks like:
python
R
bwa
bowtie2
fastqc
fastx_toolkit
htslib
picard
samtools
r-spp
tophat
bioconductor-edger
bioconductor-deseq2
bioconductor-limma
networkx
r-runit
nose
numpy
snakemake
rpy2
pysftp

However, the trent channel had only jags-3.4.0 and I need jags-4.* to work with rjags etc;
conda remove --name bfx jags

--------
I was not able to install the following through this route:

Phantompeakqualtools - this is only a script though
R::igraph
R::rjags
jags
macs2

-------
Therefore, jags was installed (independently of anaconda, PKG_CONFIG_PATH having been set to point to the jags lib and 'pkg-config jags' having been ran) and rjags/coda installed from within R in the bfx environment. The R packages being installed using an install script and source packages:
setwd(file.path('path', 'to', 'source', 'R-packages'))
install.packages(
  c('coda_0.18-1.tar.gz', 'rjags_4-6.tar.gz'),
  type = 'source',
  repos = NULL
  )

------
py27 environment for running macs2
Installs python 2.7, numpy...
conda create -n py27.slim --dry-run macs2
conda create -n py27.slim macs2