Software projects I've worked on


co-founder of Machine Learning Competition Team.


Seizure prediction from iEEG as part of the American Epilepsy Society Seizure Prediction Challenge mainly using scikit-learn. Finished in the top ~5% (16/527) of entries.


Convnet classification for identifying plankton as part of the National Data Science Bowl using the pylearn2 deep learning library. Finished in the top ~5% (57/1049) of entries.


Gaussian process based Bayesian Optimisation of NGS Assembly using GPy.


NLP analysis of the UK parliament Hansard using NTLK.


Tool to predict whether a given protein will be secreted or not created as part of my PhD research. This project is a python runner script wrapped around fortran and C utilities.


A parallelised batch phylogenetic tree generator tool in python and SQL created as part of my PhD.

Ecumenical Forest

An implementation of Tom Rainforth’s Canonical Correlation Forest Classifier as a scikit-learn compatible module.


A tool to conduct taxonomic profiling of NGS sequencing libraries for screening purposes


A C++ and bash based tool for rapidly finding and identifying shared K-length between different genomes/transcriptomes. This project involved contributions to the K-mer Analysis Toolkit.

ETE Python3 Port

Ported this python toolkit for analysis of tree structures to python3. All changes were merged back into the main project and formed a significant proportion of the current major release.


An OpenCV and C++ based tool for analysis microscopy images and automatically extracting images of cells.


A K-means clustering tool for partitioning meta-omic sequencing data based on compositional features. This tool is implemented in C++ using Armadillo and MLPACK.


A tool written with gngdb for pretty printing equations in markdown when MathJax isn’t supported.