Bayesian Optimization within automation

I just wanted to share some of the articles and libraries I found within Bayesian Optimization that apply to experimental science.

I think this kind of optimization is exciting for closed-loop optimization.
I am by no means an expert so please chime in if you have other resources you want to share.

First of all, if you don’t know what BO (Bayesian Optimization) is, then it is a method for efficient sequential search for optimum “settings”.
I think GitHub - AnotherSamWilson/ParBayesianOptimization: Parallelizable Bayesian Optimization in R
does a good job of explaining it.

Paul Jensen and Mark Hendricks had a presentation at SLAS 2023 about “Automated Sciences” Automated Science | The Jensen Lab

This is also a good read
Bayesian reaction optimization as a tool for chemical synthesis | Nature

I’m going to add the liberty that we are going to use soon for optimizing the extraction of DNA: GitHub - novonordisk-research/ProcessOptimizer: A tool to optimize real world problems

6 Likes

Thank you! I’ve been collecting works on BO

I believe that I even reached out to @marcosfelt about it.

1 Like

A package was shared on a similar thread (Liquid Class Optimization - #8 by marcosfelt) that might be useful.

3 Likes

This is a great list! I had not seen the presentation from Jensen and Hendricks - thanks for sharing!

As @evwolfson mentioned, I created Summit for solving scientific problems with Bayesian optimization. We really focus on making it easy to get started, but if you try it and have any issues let me know!

Also, I’d classify the applications of Bayesian optimization into three categories:

  1. Discovery: Finding new molecules to serve a particular application or efficiently selecting from an existing library (e.g., [2012.07127] Accelerating high-throughput virtual screening through molecular pool-based active learning)
  2. Process optimization: Improving recipes for new or existing processes (e.g., https://pubs.acs.org/doi/10.1021/acscentsci.3c00050 and Data-science driven autonomous process optimization | Communications Chemistry)
  3. System tuning: Optimizing the parameters of a control/analytical system either in the lab or pilot scale (e.g., Asynchronous parallel Bayesian optimization for AI-driven cloud laboratories | Bioinformatics | Oxford Academic)

Happy to answer any questions if I can!

3 Likes

BO is something I want to work into my own LC validation efforts - but haven’t quite gotten to point where I have spare time to plan and work on it. Do any of these solutions keep track of or measure a value relevant to time spent on optimization? This is typically coined “regret” from what I’ve seen, and is a metric that can be used to summarize loss of materials/time/resources from not using optimal variables. A combination of time spent optimizing and the difference between the highest optimal returns from a black box function and those used during optimization; something like sum[F(optimal)-F(tests)] + scaled_time_metric… the higher the value, the higher the regret.

From my understanding, it would be a great incentivizing variable to keep track of for relaying the importance of optimization, and the impact of BO on minimizing that “regret”.

Yes, that is the correct definition of regret. There are some nuances here. BO algorithms have an acquisition function which scores the quality of potential new experiments. While most acquisition functions do not explicitly keep track of cost, they will implicitly have a trade-off between exploring (and possibly finding a better solution) and exploiting (optimizing around the best known parameters).

For example, expected improvement is known for being quite exploitative. Upper confidence bound (with the correct hyperparameters) tends to explore more. There are more acquisition functions that have different trade-offs.

One important thing that people often are confused by is that BO algorithms usually do not converge to an optimal point. Common acquisition functions (expected improvement, upper confidence bound) will, to some extent, continue exploring as long as they have the budget and there is uncertainty in the model. So, you typically will set a budget in advance and stop once you hit that or are satisfied with the results.

Finally, if the cost of experiments varies as a function of parameters (e.g., certain liquid class parameters take longer to evaluate or need more material), you could look into cost-aware BO.

4 Likes

Thats a super interesting breakdown. It would be cool to see these get integrated more programmatically into people’s workflows especially accounting for cost and payoff metrics.

Interesting, there’s so much to dive into here. I know there’s a lot of wonderful digital and regular chemistry work being done in the space so please share anything you feel could be relevant!

Hi all!

  1. I am currently applying Bayesian Optimisation to yeast optimisation. I will have access to a Beckman Biomek i7 and we will use our Matterhorn Studio platform to host Bayesian Optimisation algorithms that can suggest new candidates experiments 24/7, i.e. whenever an experiment has finished. If nothing goes wrong, this should close the loop! (famous last words)

  2. I will give a talk on that work here in September: Sign Up | LinkedIn

  3. I am also this week at https://www.accelerate23.ca, and hope to contribute to this emerging space there!

  4. Also, feel free to check out our seminar series: Matterhorn Studio

Happy to hop on a call and discuss BO designs. :slight_smile:

3 Likes

That’s a very cool application! What kind of yeast optimization are you doing? Are we talking metabolic engineering?

I am not the bioengineer in the project, but most simply we’re trying to optimise e.g. the Nitrogen-Carbon ratio. I think there is metabolic engineering down the road, but baby-steps!