Efficient Experimental Design

Finlay Maguire

root@finlaymagui.re

Overview

  • Experiments as parameter optimisation problems
  • Specific examples
  • Bayesian optimisation
  • Step-through of SpearSeq
  • Conclusions

Parameter Optimisation Problems in Biology

In the lab

  • Codon optimisation
  • PCR conditions
  • Protein purification
  • Chemical synthesis (reagent ratios/conditions/catalysts)
  • Brewing
  • Synthetic Biology

Computational problems

  • Finding the optimal assembly (according to some metric)
  • Training detection algorithms e.g. motifs, genes etc.
  • Optimising clustering methods
  • Really any optimisation problem (especially non-convex)

Specific examples

  • Given a protein of interest, which sequence will maximise expression? (e.g. 900nt = 300 codons ≈2300 possible sequences)
  • Given a set of sequencing data, which preprocessing/assembly parameters will produce the most likely assembly?

So how would you more efficient choose your datapoints?

Probabilistically!

Prior is random draws from a process (GP)
Prior is random draws from a process (GP)
Add your data
Add your data
Calculate the posterior
Calculate the posterior

Step through of optimisation

Choose 3 Random Initial Values
Prior GP distribution
Choose 3 Random Initial Values
Choose 3 Random Initial Values
Use something called an AQ function to select
Use Acquisition Function to select next experimental point
Another demonstration of AQ function
Acquistion Functions trade-off between exploration and exploitation
Pick another point using the AQ and refit GP
Pick another point using the AQ and refit GP
Do the same again
Do the same again
And again
And again
And again
And again
For as long as you want
For as long as you want
...
...
Until things are good enough
Until you are happy or have used your evaluation budget

Results

  • SpearSeq found the optimal assembly parameters for a test assembly in 4-5 trials
  • Bayesian Optimisation of synthetic construct expression found 5' UTR free fold energy and 5' UTR length are the two most importantfeatures in expression of the synthetic gene (Gonzalez, 2015)

Conclusion

  • Never just use a grid search, even a naive random search is better (Bergstra and Bengio, 2012)
  • Bayesian Optimisation for more efficient experimental design
  • Not limited to a single parameter (or objective)
  • Several recent relatively easy to use libraries and implementations (e.g. GPyOpt, Spearmint)
  • Tell me about experimental ideas that you think this might be applied to