--------------------------
A)
Apps:
Set up my Thunderbird account to point to uni emails
Installed chromium from ubuntu software
Removed firefox / amazon from the taskbar
TODO: install thunderbird::calender plugin (? lightning plugin - see here)
Printing set up as described here
B)
Added 'tools' to home dir
cd
mkdir tools
Basic command line tools and .bashrc setup:
Installed vim
sudo apt-get install vim ### installs vim / vim-runtime
vim was then set as the default editor:
sudo update-alternatives --config editor # selected vim.basic from the options
Change default 'alias ll="ls -alF"' to 'alias ll="ls -l"' in .bashrc
Changed default PS1 definition (\u@\t instead of \u@\h):
...
# For colour prompt (eg, standard terminal; read double-slashes as singles)
PS1='${debian_chroot:+($debian_chroot)}\\[\033[01;32m\\]\u@\t\\[\033[00m\\]:\\[\033[01;34m\\]\w\\[\033[00m\\]\$ '...
# For non-colour prompt (eg, in tmux)
PS1='${debian_chroot:+($debian_chroot)}\u@\t:\w\$ '
...
Already installed:
awk / sed / perl5.22 / python2.7.12 / python3.5.2 / make4.1 / curl 7.50.1 / wget 1.18 / ssh / scp / gzip (etc) / grep / sqlite3
To be installed:
java / sbt / git / tmux
sudo apt-get install git ### git2.9.3 also git-man / liberror-perl
git config --global user.name "*** **"
git config --global user.email "***_***@***.**"
sudo apt-get install tmux # installed tmux 2.2
See here for info about tmux
Installation of java
sudo apt-get install openjdk-8-jdk
Installation of sbt
echo "deb https://dl.bintray.com/sbt/debian /" |\
sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 \
--recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
sudo apt-get update
sudo apt-get install sbt
sudo apt-get install scala # so I can run scala .jars at CLI
Installation of intellij with scala and vim plugins (within ~/tools/intellij/...); downloaded from website
sudo apt-get install lsb-core
Installed gnome-tweak-tool so that I could disable the caps-lock
sudo apt-get install gnome-tweak-tool
# disabled caps-lock
Installed icedtea plugin so that I can use java webstart on IGV etc
sudo apt-get install icedtea-8-plugin
C)
cd ~/tools
touch README.sh # code to set up bfx tools added here
BFX Workflow tools:
Currently don't have R / conda / lyx / jupyter / snakemake / menedeley installed or any of the standard NGS analysis kit (picard / tabix / samtools / hisat / bwa etc...)
Can be installed from bioconda: gnu-parallel
First install miniconda (py3.5 version):
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh # used on 32 bit
bash Miniconda3-latest-Linux-x86*.sh
# accepted license; installed to /home/<me>/tools/miniconda3; added path to miniconda3 to ~/.bashrc
Reopened terminal
conda list # (conda 4.2.12, conda-env 2.6.0, ...)
conda update conda
conda list # (conda 4.2.12, conda-env 2.6.0, ..., python 3.5.2, ...)
Added channels for R, conda-forge, bioconda, biobuilds (a bunch of biofx tools) and my own channel. Did this in the wrong order to begin with - which meant that the channel 'r' was searched last and my own channel was searched first:
# conda config --add channels r
# conda config --add channels conda-forge
# conda config --add channels bioconda
# conda config --add channels biobuilds
# conda config --add channels russh
Prefer the idea of searching R, conda-forge and bioconda before biobuilds and my own channel (note that conda-forge, defaults, r and bioconda are added in the order specified in the bioconda docs).
conda config --add channels russh
conda config --add channels biobuilds
conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
Set up a global environment for general / exploratory work: requires snakemake, R, bioconductor. To make/use snakemake environment.yaml files, we need conda-env.
conda install anaconda-client
conda install -c bioconda \
snakemake
# (installs several other packages)
TODO: Split these packages into globally required packages and work-package specific requirements
Couldn't install snakemake on my 32bit home computer using the same approach
# pip install snakemake ## used on 32bit
#
# Installed base R and a few important packages
# - Note that the most up-to-date R-base is not appropriate (15/11/2016) since despite the availability of libcairo2-dev, etc, it installs with capabilities('cairo') == FALSE, and hence can't draw transparencies in ggplot (for example; see here).
r-base="3.3.1 1"
conda install -c r \
r-devtools \
r-gdata \
r-gridextra \
r-roxygen2 \
r-testthat \
rpy2 \
r-essentials
conda install -c r rstudio
# checked that capabilities('cairo') == TRUE after each step
conda install -c bioconda \
bioconductor-biobase \
bioconductor-geoquery \
bioconductor-limma \
bioconductor-org.hs.eg.db \
bioconductor-org.mm.eg.db \
bioconductor-s4vectors
# All the above bioconda installs are absent from conda for 32bit (TODO - add 32bit version of these to my conda channel)
conda install -c bioconda \
samtools \
picard \
htslib
conda install -c biobuilds \
tabix
# Some workpackage-specific libraries
# TODO - workpackage-specific conda channel
conda install -c bioconda \
r-ggally \
r-gplots \
bioconductor-pcamethods
To run snakemake.remote.HTTP we require python packages 'boto', 'moto', 'filechunkio', 'pysftp', 'dropbox', 'requests', 'ftputil'
conda install boto \
dropbox \
filechunkio \
ftputil \
moto \
pysftp \
requests
Installs the above, and updates `requests` to a version with an identical version number.
conda install anaconda-build \
conda-build
conda skeleton cran --recursive moments
conda build r-moments
anaconda upload \
${MINICONDA}/conda-bld/linux-64/r-moments-0.14-r3.3.1_0.tar.bz2
This uploaded r-moments to my (russH) anaconda account, from which it could be installed into my miniconda R:
conda install -c russh r-moments
However, I was not able to upload doBy using the same mechanism - the conda skeleton ... call worked fine (it urged me to install CacheControl, which I did),
conda skeleton cran --recursive doBy
# warning =>
rm -R r-doby
conda install cachecontrol
conda skeleton cran --recursive doBy
# error =>
# "In order to use the FileCache you must have lockfile installed."
conda install lockfile
rm -R r-dody
conda skeleton cran --recursive doBy
conda build r-doby
# had it worked, this would have installed r-base "3.3.1_5" and a bunch of other packages without giving me a chance to prevent it - thankfully I didn't lose 'cairo' again. And the error message was:
+ source <CONDA>/bin/activate <CONDA>/conda-bld/r-doby_1479296072493/_b_env_placehold_pl
+ mv DESCRIPTION DESCRIPTION.old
+ grep -v '^Priority: ' DESCRIPTION.old
+ <CONDA>/conda-bld/r-doby_1479296072493/_b_env_placehold_pl/bin/R CMD INSTALL --build .
<CONDA>/conda-bld/r-doby_1479296072493/_b_env_placehold_pl/bin/R: 12: [: Linux: unexpected operator
<CONDA>/conda-bld/r-doby_1479296072493/_b_env_placehold_pl/lib/R/bin/R: 12: [: Linux: unexpected operator
* installing to library ‘<CONDA>/conda-bld/r-doby_1479296072493/_b_env_placehold_pl/lib/R/library’
Error: error reading file '<CONDA>/conda-bld/r-doby_1479296072493/work/doBy/DESCRIPTION'
Command failed: /bin/bash -x -e <CONDA>/conda-bld/r-doby_1479296072493/work/doBy/conda_build.sh
... and there's a bizarre directory called _b_env_placehold_placehold_<ad_infinitum> in the r-doby_1477.../ directory
<I rewrote my work package to use lapply(split(), f()) instead of doBy and dplyr::arrange_(df, colname) instead of doBy::orderBy>
The version of GEOquery that I downloaded from bioconda was 2.38.4. This caused some issues with one of my scripts since it didn't download GPL annotation files from NCBI. The cause of this was 'http' URLs being hardcoded into GEOquery::2.38 whereas these http sites all fail when accessing the NCBI URLs (NCBI have updated to https). The latest version of GEOquery (2.40 as of 17/11/2016) has the 'https' URLs included, but is not yet available on bioconda. Having little experience with anaconda/bioconda I tried to get GEOquery uploaded into my bit of anaconda cloud using anaconda upload. Some initial errors (~ identical to those for r-doBy) were observed, but these seem to result from using filenames that are too long.
i) downloaded and unzipped the bioconda github repository:
cd ~/tools
wget https://github.com/bioconda/bioconda-recipes/archive/master.zip
unzip master.zip
ii) setup a conda environment for building/uploading/installing bioconductor packages
cd bioconda-recipes-master/scripts/bioconductor/
conda create --name bioconductor-recipes --file requirements.txt
source activate bioconductor-recipes
iii) tried setting up the build recipe for GEOquery:
./bioconductor_skeleton.py --recipes ../../recipes GEOquery
# failed - lack of pyaml in requirements.txt
conda install pyaml
./bioconductor_skeleton.py --recipes ../../recipes GEOquery
# did nothing, since bioconductor-geoquery was already in ../../recipes
./bioconductor_skeleton.py --recipes ../../recipes GEOquery --force
# finally made the recipe for geoquery
iv) Build geoquery
cd ../../recipes
conda build bioconductor-geoquery
# failed with bizarre placehold_placehold_... directory put into ~/tools/miniconda3/conda-bld (as above for doBy)
Searched on conda bugs at github, found the source of this error may be that my filename is too long while trying to build the package. Therefore specified a shorter dirname to output the conda build to:
conda build bioconductor-geoquery --croot ~/temp
Package builds fine.
v) Therefore uploaded to anaconda-cloud:
cd ~/temp
anaconda upload ./linux-64/bioconductor-geoquery-2.40.0-r3.3.1_0.tar.bz2
source activate <...>
conda install bioconductor-geoquery=2.40
SWOOP! - It's now available in my work package.
So I repeated it for :
ArrayExpress;
ArrayExpress' dependency bioconductor-oligo;
oligo's dependencies affxparser, oligoClasses and r-ff
since none of these are available on bioconda / anaconda's main channels
conda config --add channels terradue
conda install -c terradue r-ff # installs r-ff and r-bit
# note that I couldn't build r-ff on my ubuntu box
# terradue must be an available channel, so that oligoClasses installs
source activate bioconductor-recipes
cd ~/tools/bioconda-recipes-master/scripts/bioconductor/
./bioconductor_skeleton.py --recipes ../../recipes affxparser
cd ../../recipes
conda build bioconductor-affxparser --croot ~/temp
cd ~/temp
anaconda upload ./linux-64/bioconductor-affxparser
cd ~/tools/bioconda-recipes-master/scripts/bioconductor/
./bioconductor_skeleton.py --recipes ../../recipes oligoClasses
cd ../../recipes
conda build bioconductor-oligoclasses --croot ~/temp
cd ~/temp
anaconda upload ./linux-64/bioconductor-oligoclasses
cd ~/tools/bioconda-recipes-master/scripts/bioconductor/
./bioconductor_skeleton.py --recipes ../../recipes oligo
cd ../../recipes
conda build bioconductor-oligo --croot ~/temp -c russh -c terradue
cd ~/temp
anaconda upload ./linux-64/bioconductor-oligo-1.38.0-r3.3.1_0.tar.bz2
cd ~/tools/bioconda-recipes-master/scripts/bioconductor/
./bioconductor_skeleton.py --recipes ../../recipes ArrayExpresscd ../../recipes
conda build bioconductor-arrayexpress --croot ~/temp -c russh -c terradue
cd ~/temp
anaconda upload ./linux-64/bioconductor-arrayexpress-1.34.0-r3.3.1_0.tar.bz2
jupyter install:
conda install jupyter
# r-irkernel was already installed
lyx (2.2.0-2build1) was installed from Ubuntu Software (most recent version is 2.2.2, so the UbS version should be OK). However, although I could open lyx documents and convert basic lyx docs into pdf, I was not able to convert lyx documents that contained any R code (despite adding the knitr module and adding the Rscript path). Therefore, I removed lyx, and installed it using apt-get - this also didn't compile knitr projects. So I tried downloading and installing lyx myself, this required automake, zlib and libqt:
sudo apt-get install automake \
zlib1g-dev \
libqt4-dev
wget ftp://ftp.lyx.org/pub/lyx/stable/2.2.x/lyx-2.2.2.tar.gz
tar -xzvf lyx*.tar.gz
cd lyx-2.2.2/
./autogen.sh
./configure
make
sudo make install
I couldn't find any info regarding when the knitr-lyx integration changed though. Nonetheless, using Ctrl-L makes writing R code quicker in lyx (and my existing lyx 2.0 notebooks still compile regardless of the change).
TODO: Notes on setting up work-package specific options for lyx notebooks
<IGNORE - Using r-base 3.3.1_1 eliminated the need for this code:>
Fonts weren't correct for use in jupyter/IRKernel. For example, on putting "hist(rnorm(100))" into a jupyter/IRkernel page on chrome, I received the error "X11 font -adobe-helvetica-%s-%s---%d-------*, face 1 at size 6 could not be loaded" - suggesting that jupyter/R was using x11 rather than cairo to work with graphics and that some of the relevant x11 fonts were missing. I initially installed R::cairo into my conda environment, but this didn't fix the error (indeed, capabilities('cairo') remained FALSE within jupyter). So I installed a range of x11 fonts libraries, rebooted the computer and this particular error disappeared.
sudo apt-get install \
x11-xfs-utils \
xfonts-base \
xfonts-scalable \
xfonts-75dpi \
xfonts-100dpi \
xfsprogs \
xfslibs-dev
Although this fixed the issue with fonts, other issues were subsequently highlighted: eg, when using ggplot with transparencies, the error 'Warning message: In grid.Call.graphics(L_polygon, x$x, x$y, index) : semi-transparency is not supported on this device: reported only once per page' was seen, and a series of blank scatter plots resulted (that is, the axes / backround were all visible but the data points were missing from the plot). Therefore, I installed libcairo2-dev
sudo apt-get install libcairo2-dev
# installs a few dependencies as well
Disk setup:
The computer has a couple of separate harddrives, one of which (OS) is a solid-state drive (~ 500Gb, split between windows and ubuntu) and the other (DATA) is for storing data / workpackages etc (~2Tb). My workpackages are kept in a folder called jobs_llr on DATA, to which I link from ${HOME}/jobs_llr. External datasets are stored in ext_data on DATA and internal datasets are stored in int_data on DATA. If files in the latter two directories are required by a work package, a soft link from jobs_llr/job_name/data/<sub_dir(s)>/<file_name> should be made.
## Added to cron using crontab -e to ensure any internal raw data is backed up locally:
## Note this makes an int_data directory within /My_Passport/pog/
# (check &) back up all raw data from /media/DATA/int_data to my external HDD
# at 5-past-midnight on Tuesday of each week
# -r = recursive
# -t = keep modification times
5 0 * * 2 /usr/bin/rsync -rt /media/<me>/DATA/int_data /media/<me>/My_Passport/pog
E)
Clone my analysis repositories:
git clone https://***@bitbucket.org/***/***