Projects

Software projects I've worked on

Neuroglycerin

co-founder of Machine Learning Competition Team.

Hail-Seizure

Seizure prediction from iEEG as part of the American Epilepsy Society Seizure Prediction Challenge mainly using scikit-learn. Finished in the top ~5% (16/527) of entries.

Neukrill-Net

Convnet classification for identifying plankton as part of the National Data Science Bowl using the pylearn2 deep learning library. Finished in the top ~5% (57/1049) of entries.

BayeHem

Gaussian process based Bayesian Optimisation of NGS Assembly using GPy.

hansard-miner

NLP analysis of the UK parliament Hansard using NTLK.

predict-secretome

Tool to predict whether a given protein will be secreted or not created as part of my PhD research. This project is a python runner script wrapped around fortran and C utilities.

Dendrogenous

A parallelised batch phylogenetic tree generator tool in python and SQL created as part of my PhD.

Ecumenical Forest

An implementation of Tom Rainforth’s Canonical Correlation Forest Classifier as a scikit-learn compatible module.

DueyDrop

A tool to conduct taxonomic profiling of NGS sequencing libraries for screening purposes

eDicer

A C++ and bash based tool for rapidly finding and identifying shared K-length between different genomes/transcriptomes. This project involved contributions to the K-mer Analysis Toolkit.

ETE Python3 Port

Ported this python toolkit for analysis of tree structures to python3. All changes were merged back into the main project and formed a significant proportion of the current major release.

μ-Colander

An OpenCV and C++ based tool for analysis microscopy images and automatically extracting images of cells.

ParKour

A K-means clustering tool for partitioning meta-omic sequencing data based on compositional features. This tool is implemented in C++ using Armadillo and MLPACK.

markdown-pprint

A tool written with gngdb for pretty printing equations in markdown when MathJax isn’t supported.