One of the “hottest” new areas in drug development is the use of artificial intelligence (AI). In the same way that other industries have developed predictive software to make more precise and accurate decisions, many believe that the pharma industry will be particularly attracted to using AI in drug research as well as in as in trial design, resulting in speedier delivery of new therapies. AI is certainly a compelling approach, though it still has to prove itself. 

One pioneering AI company is Numerate Inc., a privately held California-based biotechnology firm that applies novel machine-learning algorithms, at cloud scale, to overcome major challenges in small molecule drug discovery. Numerate’s drug design platform combines advances in computer science and statistics with traditional medicinal chemistry approaches to address, in parallel, the factors that determine the success and failure of a drug candidate. Numerate is using this proprietary platform to develop a pipeline of drug programs in the cardiovascular, metabolic and neurodegenerative disease areas, where traditional methods have been unable to generate lead compounds that meet all the criteria for a drug candidate.

Leading Numerate is president and CEO Guido Lanza. Formerly, Lanza was the co-founder and chief technical officer of Pharmix, where he served on the company’s Board of Directors  and was named one of Business Week’s “Tech’s Best Young Entrepreneurs under-30” in 2006. Prior to Pharmix, Lanza was a research scientist with Professor John Koza of Stanford University with whom he developed applications of genetic programming in the area of bioinformatics and computational biology. Lanza is the author of 10 scientific publications and inventor of four issued patents.

 WuXi AppTec Communications, as part of a new industry series, recently interviewed Lanza about the clinical direction and goals of Numerate, as well as what the future looks like for AI in drug development.

WuXi: How would you describe the landscape of companies involved in the field of AI? Are they mostly start-ups, or big data companies, such as Google and Intel?

Guido Lanza: The split seems pretty even, with startups like Numerate, Insilico Medicine, Berg Health, NuMedii, etc. and bigger companies like GE and IBM. What is remarkable is that there is relatively little overlap in the capabilities being developed. This industry has been extremely data-rich and algorithm-poor for so long that there are many, many problems to attack with AI and there will be for the foreseeable future.

WuXi: How does your company differ from others using AI in drug discovery and development? How are you applying AI?

Guido Lanza: The first, most obvious differentiator for Numerate is our longevity. We have built an AI-driven company at a time when nobody was looking for AI. We did this by starting with a team of both computer scientists and drug hunters – people with compounds in the clinic and on the market. This forced us to, in large part, hide the AI and evolve the business in a much more traditional platform company manner around service- and R&D collaboration- focused partnerships. This model has allowed us to, over the course of 10 years, invest almost $50 million into our platform, with most of that being non-dilutive funding.

From a scientific standpoint, our differentiators are around our translational capabilities. First, we are able to work on emerging biology with extremely small datasets, the kind not suitable for deep-learning types of approaches.  Second, our modeling is based on 3D ligand information; there is no need for structural information.  The combination of these abilities allows our Machine Learning algorithms to unlock programs with phenotype-driven, often low throughput, high-content biology.  The other translational axis is around our ADME and Toxicity prediction capabilities. Here, we have invested more than $10 million, including large contracts with the U.S. Department of Defense’s Defense Threat Reduction Agency (DTRA), in order to build and validate a system focused on rapidly translating leads into clinical candidates. Today, many of our conversations with pharma are focused on this capability, which is unique in its ability to learn from all past programs to inform every future chemical design and candidate selection decision.

WuXi:  How will AI change drug discovery and development and clinical research?

Guido Lanza: The industry has been trying to address the costs and time of discovery and development using a variety of in silico methods for more than a decade. Clearly, this is also one benefit of using AI over brute-force, lab-centric methods. However, focusing on the in silico aspect misses the potential for AI to impact our biggest challenge as an industry – increasing the rates of translation from basic biology to the patient.

In the earliest stages, the key challenges addressable by AI approaches are around extracting a large amount of information from a relatively small dataset. For example, our platform has allowed us to translate academic-stage programs with very little data and low-throughput, high-content assays into full blown lead optimization stage programs very quickly. We did this with the Gladstone Institutes, and are now starting on a couple of programs with UCLA and Mayo Clinic.

The second challenge is around the integration of the vast amounts of data (omics data, for example) around a single program. Here, for example, companies like Berg Health are able to integrate the vast amounts of data to drive a program with much more predictability.  There are also groups applying NLP (neuro-linguistic programming) to make sure the collective knowledge of biology is present when making decisions, thereby making both interpreting results and making non-obvious connections possible – like Watson and smaller groups like Benevolent.

However, the area where AI can make the biggest impact is by introducing, for the first time, a true learning loop in the industry. The idea that, for the first time, all decisions can be guided by all preceding successes and failures is profound. We have been building AI algorithms to predict, for example, the PK and toxicity profiles of compounds, but now, for the first time, companies are willing to share their data to allow us to roll this out. We will be doing so with one to two big pharma partners in the next six months, and it looks like more will follow shortly thereafter.

WuXi: Will AI applications eventually become the norm for biotech and pharmaceutical R&D? If so, how soon?

Guido Lanza: In the next three-to-five years, there will be AI algorithms being applied industry-wide.  Here too, the acceptance will vary based on the approach and the value it delivers.

Focusing on the preclinical space, NLP-based approaches will still be niche for repurposing, but for interpreting results will become more broadly used. Approaches like ours for translating emerging biology based on phenotypic signals will either be used as one-offs by biotechs looking to boot up projects or become more ubiquitous, allowing Pharma to bypass the traditional HTS-seeded campaigns of today. Similarly, structure-based, simulation-driven AI will continue to unlock an ever increasing number of targets.

More importantly, during the next three-to-five years, we will move from isolated and anecdotal successes to AI being a demonstrated way of taking the combined knowledge – whether within a company or from the industry more broadly –and avoiding at least some, if not most, past mistakes. If AI proves to be better than the current process along any axis (PK, ADME, animal tox, clinical safety, etc.), then it will be a significant competitive disadvantage to approach problems in a purely traditional fashion. This means in three-to-five years no clinical candidate will be selected that has not been run through a variety of AI-driven models, including predicted animal tox, predictive human tox, predictive PK, etc.

WuXi:  What are the challenges/barriers in employing AI in pharmaceutical and biotech drug development?

Guido Lanza: Current challenges are primarily around culture. The first is that AI is, by its nature, not meant to be interpretable, but used more as a “black box.” I often hear that in order to believe the predictions, the scientists want to know how those were arrived at by the AI. I think this is the wrong way to think about AI generally.  The point is these algorithms can see signals in data that are either too narrow or too broad for a human to see. Therefore, if we put a requirement on the AI that it generates human-interpretable results, we are likely limiting the AI to the least interesting problems.

A good example here is the prediction of human facial features from a raw genome sequence. Human Longevity, Inc has shown that this is possible even though they have no model of the underlying developmental biology. Putting a requirement for an “understandable” prediction, would likely limit the technology to finding a simple genetic marker for nose shape or length –which is hardly as impactful.

The other major cultural challenge is around data. Pharma companies need to be more open about its data. This does not mean sharing the latest data on the hottest target which they are currently in a race to develop. It means sharing the millions of data points which could be used for companies to predict future failures. As a company that has been working for 10 years on predictive ADME and Tox, we realize that this is a big ask, but there are companies, like GSK with its ATOM collaboration, leading the way and catalyzing the creation of new algorithms and approaches.

WuXi: What kinds of partnerships are important for the development of your company?

Guido Lanza: Our customer partnerships fall into three areas. First, we partner with large pharma in build-to-buy or bounty-hunter style collaborations. In these, the partner, as in our Takeda deal, has pre-negotiated options to in-license assets being generated by our AI platform. In addition, we partner with large pharma around more data- and less pipeline- focused collaborations. In those, the pharma shares data with us, usually around PK/ADME or safety, and we apply our AI platforms and both companies benefit from the resulting models. The final type of collaboration is our academic partnerships, where we look to feed our internal pipeline with collaborations like our previous very successful collaboration with the Gladstone Institutes and current ones with UCLA and the Mayo Clinic. This is a way to get access to the most promising emerging biology, and translate the projects into partnerable assets using our AI platform.

The other type of partnership which is crucial is with a network of highly capable CROs. It is my belief, that, for the foreseeable future, people will not (and should not) accept AI predictions as gospel. Especially early on, when these approaches are being refined, the ability to reduce to practice, and validate, the predictions will allow us (and other AI companies) to capture the value we are delivering. Clearly, in our case, we need a partner like WuXi on the wet lab chemistry / biology, while other companies working on repurposing may form relationships with CROs around clinical work. The burden of (lab) proof remains on the AI company, and this is likely to continue well into the future.

WuXi: How is your business model different from traditional biotech and pharma start-ups?

Guido Lanza: Traditional biotech and pharma start-ups generally focus on a small number of targets or on single therapeutic areas.  We are focused more on the platform and how it can transform the broader industry. Our business model is focused on capturing the value of the output – the chemical assets we can generate. We also collaborate around our platform in more data/validation and less revenue-focused collaborations, but our bread and butter is to build a pipeline of assets that can later be out-licensed. We have been building and scaling a pipeline of programs, across a number of therapeutic areas (CNS, cardio metabolic, inflammation) which are a mix of Numerate initiated programs, academic collaborations, and build-to-buy or bounty-hunter deals.

WuXi: What lessons have you learned about running an AI start-up in this space?

Guido Lanza: As an entrepreneur, just starting up, it’s fairly easy to assume that solving the technical hurdle, in this case, generating the algorithms and platform, is the hardest part. What we quickly learned is that there are equally hard data science and business problems. First, you have to really understand the data you are applying the algorithms to. We have spent decades in computational chemistry talking, sometimes implicitly, about the train/test paradox, where models perform poorly when applied prospectively in the lab even though they performed very well retrospectively. In order to solve these, it was crucial to understand the data, to account for the mess that is biology (and the noise it brings), and the challenges of chemistry (and the biases they bring).  On the business side, the key problems are around making sure the output is actually valuable from a scientific and commercial standpoint. After all, adding a methyl to a known drug will maybe produce another active compound, but its value is almost zero. In order to do this, it is crucial to have a team of drug hunters that really understand what will be of value to our ultimate customer, the pharma companies.

WuXi:  How is this different than previous in silico booms?

Guido Lanza: Broadly, the impact of AI is fairly obvious across many industries. Computer and data storage are finally cheap enough that we can apply the right AI algorithms to problems in discovery and development.  This has enabled the large players (IBM / GE) to start educating the industry, which has been struggling with R&D productivity for decades, about just how much could be done with what is currently locked away in Pharma’s in house databases.  For the first time in over a decade, people are starting in silico drug discovery startups. We built Numerate’s business very much putting the AI behind the scenes, and focusing on the output (chemistry and programs) and speaking very little about our approach. Today, we are excited to have the opportunity to be much more open about our approach both with large and small companies. In fact, there is a boom of start-ups in this space, thanks in part to an ever increasing amount of data in the public domain; and the increasing willingness of some pharmaceutical companies to share their data.

The competitive environment is also very different than the boom of 10-to-15 years ago. I think the entrepreneurs starting companies in AI are realizing that we are not, for the most part, in competition with each other. First, nobody has the model of selling packaged software / seats to pharma anymore. Instead we focus on either our own pipeline or on R&D collaborations which are unlikely to be directly competitive.  As a result, there is a true sense of community forming – from sharing contacts, providing references, and collaborating to organize conferences – which would have been unheard of even five years ago.