Wednesday, 19 August 2015

Prepare py3 for probabilistic programming

Python3 environment for running code from Probabilistic Programming
Note that the latter is written using py2.7 but I want to update the code for py3.4

apt-get install apache2-dev
conda create --name phack python=3.4 \
  ipython ipython-notebook matplotlib \
  numpy pymc pyzmq scipy tornado jinja2

# praw and wsgiref not available via anaconda (the latter because the py3.4 version is called mod_wsgi)
source activate phack
pip install praw
pip install mod_wsgi # may be some differences with py2.7/wsgiref

# ipython check that they're all installed:
In [1]: import matplotlib
In [2]: import numpy
In [3]: import pymc
In [4]: import zmq
In [5]: import scipy
In [6]: import tornado
In [7]: import wsgiref
In [8]: import praw
In [9]: import jinja2
# nb, import 'zmq' and 'wsgiref' rather than 'pyzmq' and 'mod_wsgi'

Friday, 7 August 2015

Installing anaconda-python

Anaconda provides a python distribution that includes a range of python libraries for machine learning / statistics / data processing

I installed anaconda and various python libraries onto a new mint 17 install.
Downloaded anaconda from here and installed using
bash Anaconda-2.3.0-Linux-x86_64.sh

Installed into ~/anaconda and appended ~/anaconda/bin to ~/.bashrc

Updated anaconda:
conda update conda

The updated packages were:
    conda:      3.14.1-py27_0 --> 3.15.1-py27_0
    conda-env:  2.2.3-py27_0  --> 2.3.0-py27_0 
    pip:        7.0.3-py27_0  --> 7.1.0-py27_0 
    setuptools: 17.1.1-py27_0 --> 18.0.1-py27_0

Created a python3 environment
conda create --name py3 python=3.4
source activate py3
Forgot to install pandas etc with python 3.4:
conda remove --name py3 --all # can't do this as py3 is currently active

source deactivate
conda remove --name py3 --all
conda create --name py3 python=3.4 numpy pandas ipython
source activate py3

The last bit wasn't strictly necessary, I can add other libraries from anaconda into py3 using the following for scipy:
conda install --name py3 scipy

Wednesday, 5 August 2015

biolearnr intro

Bioinformatics would be a seriously lame duck without statistics.

This is my blog about statistics, machine learning, data mining and all those sorts of things. My background is in cancer genomics, network analysis and cell signalling. I'm currently working through a series of MOOCs that are relevant to this blog (many more interesting courses and resources are mentioned on the open source data science masters' website and machinelearningmastery):
- Ian Witten's introductory WEKA course
- The Caltech/JPL data analytics course
- Bill Howe's intro to data science

Also. I'm always working through statistics and machine learning books. At the moment I'm working through Bishop's PRML, Gelman's Bayesian Data Analysis and Casella & Berger's Statistical Inference.

All the above, and a variety of bioinformatics papers, will probably end up on here. Unless it ends up like biographr, where I wrote an intro and then nothing for 8 months