Projects
Software projects I've worked on
Neuroglycerin
co-founder of Machine Learning Competition Team.
Hail-Seizure
Seizure prediction from iEEG as part of the American Epilepsy Society Seizure Prediction Challenge mainly using scikit-learn. Finished in the top ~5% (16/527) of entries.
Neukrill-Net
Convnet classification for identifying plankton as part of the National Data Science Bowl using the pylearn2 deep learning library. Finished in the top ~5% (57/1049) of entries.
BayeHem
Gaussian process based Bayesian Optimisation of NGS Assembly using GPy.
hansard-miner
NLP analysis of the UK parliament Hansard using NTLK.
predict-secretome
Tool to predict whether a given protein will be secreted or not created as part of my PhD research. This project is a python runner script wrapped around fortran and C utilities.
Dendrogenous
A parallelised batch phylogenetic tree generator tool in python and SQL created as part of my PhD.
Ecumenical Forest
An implementation of Tom Rainforth’s Canonical Correlation Forest Classifier as a scikit-learn compatible module.
DueyDrop
A tool to conduct taxonomic profiling of NGS sequencing libraries for screening purposes
eDicer
A C++ and bash based tool for rapidly finding and identifying shared K-length between different genomes/transcriptomes. This project involved contributions to the K-mer Analysis Toolkit.
ETE Python3 Port
Ported this python toolkit for analysis of tree structures to python3. All changes were merged back into the main project and formed a significant proportion of the current major release.
μ-Colander
An OpenCV and C++ based tool for analysis microscopy images and automatically extracting images of cells.
ParKour
A K-means clustering tool for partitioning meta-omic sequencing data based on compositional features. This tool is implemented in C++ using Armadillo and MLPACK.
markdown-pprint
A tool written with gngdb for pretty printing equations in markdown when MathJax isn’t supported.