Skip to Main Content

Why me? The mutagenic origins of cancer for individual tumors and tumor types

December 01, 2022
  • 00:00This is my it's my pleasure to
  • 00:03introduce Jeffrey Townsend as
  • 00:04the today's grand rounds speaker.
  • 00:07Jeff is the Elio professor of
  • 00:10Biostatistics and Professor of
  • 00:11Ecology and evolutionary biology
  • 00:13and the Co leader of the Genetics,
  • 00:15genomics and epigenetics program
  • 00:17at Yale Cancer Center.
  • 00:19He received his PhD in organic chemistry
  • 00:23and evolutionary biology at Harvard
  • 00:25University and in 2019 received
  • 00:27the prestigious membership in the
  • 00:29Connecticut Academy of Sciences and.
  • 00:31Engineering for his work in
  • 00:33developing innovative tools for
  • 00:36to study population biology,
  • 00:38including evolution of
  • 00:39antimicrobial resistance,
  • 00:41disease evolution and transmission and
  • 00:43evolution of of tumor biology tumorigenesis.
  • 00:46His research enabled curtailment of
  • 00:49pathogen evolution outbreak mitigation
  • 00:51and used to inform therapeutic
  • 00:54approaches in cancer metastasis.
  • 00:56So in recognition of his
  • 00:59prominence in the field,
  • 01:00in 2021 Jeff was selected as the Co
  • 01:03Chair elect of the Cancer Revolution
  • 01:05Working Group by the ACR and his
  • 01:07lab is currently working on on many
  • 01:10projects including buying developing
  • 01:12bioinformatics tools for cancer
  • 01:14genetics epigenetics epidemiology.
  • 01:17And nonlinear mathematical
  • 01:19models of disease epidemiology.
  • 01:22So it's my pleasure to give the
  • 01:23podium to Jeff and we look forward
  • 01:25to hearing your your presentation.
  • 01:29Thank you House for that wonderful
  • 01:32introduction and thank you and and
  • 01:35and Ken for the encouragement to
  • 01:38present today for this audience.
  • 01:40And thank you all for basically the
  • 01:43opportunity to present the kind of
  • 01:45work that we've been doing in my lab.
  • 01:47The title of. My talk is why me?
  • 01:51The mutagenic origins of cancer
  • 01:53for individual tumors and tumor
  • 01:56types and I'm going to spend some
  • 01:58time talking about that title.
  • 02:00But first let me just go
  • 02:01by my disclosure slide.
  • 02:02I have done consulting for Black Diamond
  • 02:05Therapeutics and Agios Pharmaceuticals.
  • 02:08And so this title, why me?
  • 02:11Just was inspired by the fact that
  • 02:14as I started working on this work
  • 02:16originally largely with Vincent Kantaro,
  • 02:18who you'll see a picture of later,
  • 02:20we realized that what we were doing to
  • 02:22try to understand just what individual
  • 02:24variants were contributing to cancer.
  • 02:26Actually to some degree and the
  • 02:28degree to which it addresses it,
  • 02:30I'd love for you to think about,
  • 02:32as I give this talk answers the
  • 02:34question for an individual patient,
  • 02:36what the causation of their individual.
  • 02:38The answer was and I'll go through
  • 02:40a lot of detail about that,
  • 02:41but that that gets down to
  • 02:43the mutagenic origins.
  • 02:45Again not the physiological origins
  • 02:47but mutagenic origins of cancer for
  • 02:49individual tumors and tumor types.
  • 02:51And I think this is a very it's obviously
  • 02:53of interest to anyone who studies cancer,
  • 02:56what the mutagenic origins of cancer
  • 02:58are and certainly of interest in one
  • 03:01way or another to patients who have
  • 03:03have a have come down with cancer.
  • 03:08It has been widely reported that one of
  • 03:10the most difficult questions that patients
  • 03:12and doctors struggle with upon diagnosis
  • 03:14of cancer is the question, why me? Why?
  • 03:16Why was I struck with this ailment?
  • 03:19And it's natural for patients to want
  • 03:21to understand the causes behind their
  • 03:23calamities, and it's difficult to hear only
  • 03:26statistics and probabilities as a response.
  • 03:28So the traditional way that you answer
  • 03:30this question of why me is to say,
  • 03:31well, did you smoke that elevates
  • 03:33your process probability.
  • 03:34Do you have this genetic predisposition
  • 03:36that elevates your probability?
  • 03:37Um, you know, how old are you?
  • 03:41What is your ethnic background?
  • 03:43There's lots of different predictors
  • 03:44for whether or not someone
  • 03:46might come down with cancer.
  • 03:47But those aren't answers about why
  • 03:48you came down with your cancer.
  • 03:49Those are answers about
  • 03:51generalizations about your life.
  • 03:53So to date,
  • 03:55these statistics and probabilities are
  • 03:57nearly the only answer that science
  • 03:59and medicine has been able to give.
  • 04:02And one answer that's sort of
  • 04:04straightforward and obvious and if you
  • 04:07are proponent of sort of the genetic
  • 04:09evolutionary model of what makes
  • 04:11cancer happen is that mutations happen
  • 04:13and that's why you have your cancer.
  • 04:15It's a very general answer,
  • 04:17though it's not terribly satisfying,
  • 04:20but it can be broken down into a
  • 04:21lot of different kinds of mutations.
  • 04:23So there are clock like endogenous
  • 04:26mutations and processes that fuel
  • 04:28mutation throughout the body over a lifetime.
  • 04:31So as your body ages,
  • 04:33you get these mutations that happen
  • 04:36simply because the cellular processes
  • 04:38that reproduce your DNA are not perfectly
  • 04:41designed to reproduce it perfectly,
  • 04:44and they can't be just because of
  • 04:46the third law of thermodynamics.
  • 04:48So they're endogenous processes
  • 04:49that fuel mutation in your
  • 04:52body throughout your lifetime.
  • 04:53There are also mutational processes
  • 04:56that are fueled by exogenous sources,
  • 04:58such as viral infection
  • 05:00inducing applebach activity.
  • 05:01So viral infection can cause your
  • 05:03cell to react in certain ways,
  • 05:06maybe for cellular defense.
  • 05:08And those mutations that are
  • 05:10brought about as sort of secondary
  • 05:12consequences of your response to viral
  • 05:15infection can also lead to cancer.
  • 05:17And the third category is exogenous
  • 05:20mutagenic sources such as tobacco
  • 05:21smoke that may affect your lungs.
  • 05:24For your head and neck or UV
  • 05:26radiation that can affect your skin.
  • 05:29So these are all sources of mutation that
  • 05:31we know about and probabilistically we
  • 05:33can tell patients about the fact that,
  • 05:35you know,
  • 05:35exposing yourself to a lot of sun
  • 05:37may increase your risk for Melanoma.
  • 05:38If you have a Melanoma,
  • 05:39it may be partly due to the fact that
  • 05:41you exposed yourself to a lot of sun
  • 05:43light at some point during your life.
  • 05:46Now one of the I think quite
  • 05:48revolutionary sort of discoveries
  • 05:50of the of recent times was that we
  • 05:53can actually trace all a lot of
  • 05:56those sources I should say certainly
  • 05:58all the sources I just mentioned,
  • 06:01but many others as well to when we
  • 06:04when we sequence a tumor for instance
  • 06:07we can trace signatures of those different
  • 06:10sources in the DNA mutations that happen.
  • 06:13So certain DNA mutations are more frequent.
  • 06:16And I'll explain this in
  • 06:17more detail a little later.
  • 06:18Certain mutations are a little
  • 06:20more frequent when you have a UV
  • 06:23mutagenesis and other mutations are
  • 06:24more frequent when you have just
  • 06:26simple aging processes, etcetera.
  • 06:28And it turns out that there are
  • 06:30enough mutations in the typical
  • 06:31tumor that you can do a sort of
  • 06:33machine learning deconvolution.
  • 06:34And I won't go into the detail
  • 06:36about that to sort of figure out
  • 06:38for a given tumor what were the
  • 06:40different sources that contributed
  • 06:41these mutations and this is really,
  • 06:44really extraordinary.
  • 06:46That we can figure that out the one and
  • 06:49and just to give you a little bit more
  • 06:51of a a sort of a a more detail on that.
  • 06:54So here's S1 which is typically it's
  • 06:56called emanation of five methyl cytosine
  • 06:58and that's considered to be sort of
  • 07:01an endogenous aging process that sort
  • 07:03of occurs without any particular cause
  • 07:05other than our other time passing
  • 07:08for our body through development.
  • 07:10S2 is is one of two signatures
  • 07:13that we associate with apobec
  • 07:15activity there's defective.
  • 07:16From August recombination DNA repair,
  • 07:18which may be mutation based
  • 07:20and therefore endogenous,
  • 07:22but related to a very specific
  • 07:23process that might be treatable
  • 07:25tobacco smoke which you can see of
  • 07:27course largely affects lung cancer,
  • 07:29but you can also see some for for liver,
  • 07:34head etcetera, kidney,
  • 07:35there's some other sources tobacco smoking.
  • 07:38So an S5 which also you see
  • 07:40is large circles here.
  • 07:42That's another signature that has
  • 07:44been traced essentially to aging.
  • 07:46Processes.
  • 07:47Although it's a little less well understood
  • 07:49what the what the underlying basis of it is,
  • 07:52it's very clear that age is highly correlated
  • 07:56with the amount of SS5 mutation you get.
  • 07:59Defective DNA,
  • 08:00mismatch repair,
  • 08:01ultraviolet light etcetera.
  • 08:02And you can see these distribute themselves
  • 08:03differently for different types of cancer.
  • 08:05And so again this is very consistent
  • 08:07what we knew already in a lot,
  • 08:08very consistent with what we generally
  • 08:10did which was say predictably like
  • 08:11if you have a lot of exposure to sun,
  • 08:13you're more likely to get UV exposure
  • 08:15and that UV exposure then is more
  • 08:18likely to translate to mutations that
  • 08:20that may or may not cause Melanoma.
  • 08:22But but once you have those mutations there,
  • 08:24you know they may, they may cause that.
  • 08:26So this is great we've we've got, we've got.
  • 08:28The ability to see the,
  • 08:29the sort of the trace or exposure
  • 08:32in cells to these mutagens.
  • 08:35The,
  • 08:35the one thing that's missing though is,
  • 08:38is that the extent to which each of
  • 08:41those processes actually contribute to
  • 08:43tumorigenesis still remains unknown.
  • 08:44So we can look at what mutations
  • 08:46are in the genome.
  • 08:47But if I count up mutations in the genome,
  • 08:49here's one, here's one,
  • 08:49here's one, here's one.
  • 08:50That doesn't tell me how much of those,
  • 08:52each of those mutations are actually
  • 08:54contributing to tumor Genesis.
  • 08:55In fact, most of those mutations are
  • 08:57not contributing to tumor Genesis.
  • 08:58And most analysis find that there's only
  • 09:00a few mutations that are contributing
  • 09:02at a significant level sort of at
  • 09:04this SNV single nucleotide variant,
  • 09:06level two to two tumor Genesis.
  • 09:10So we really need to have more in our,
  • 09:13you know, another tool in our
  • 09:15plate to figure out.
  • 09:17What the level each of these
  • 09:19endogenous and exogenous processes
  • 09:21are contributing to a given cancer,
  • 09:23and here's just a schematic
  • 09:24for this right you know,
  • 09:26so the mutation 1, mutation 2,
  • 09:28up to mutation N However many there
  • 09:30are that are really affecting cancer,
  • 09:32they can cause increased cellular
  • 09:34proliferation and survival.
  • 09:36And sunlight may be contributing to UV
  • 09:39radiation may be contributing to some
  • 09:41of those mutations more than others
  • 09:43because certain mutations are caused by
  • 09:45sunlight and other ones are not similarly.
  • 09:47Aging may contribute to some of
  • 09:49those mutations more than others.
  • 09:50And what I've got right here is,
  • 09:52you know,
  • 09:53if you take nothing else from this lecture,
  • 09:55this is the main thing that I
  • 09:57want to emphasize is that there's
  • 09:58sort of two stages to this.
  • 09:59One is, you know,
  • 10:00what mutagens have you been exposed
  • 10:02to and contributing to the set of
  • 10:04mutations that are causing your cancer?
  • 10:06And the 2nd is how much do each of those
  • 10:08mutations actually contribute to the
  • 10:11increased cellular proliferation and
  • 10:13survival that is the phenotype of cancer.
  • 10:17And there's a way to figure this out.
  • 10:20But to figure it out we we need
  • 10:21to sort of deconvolve something.
  • 10:23And this is an old idea and I'm
  • 10:24going to go through it in some
  • 10:26detail just to make sure that it's
  • 10:27clear to everyone that cancers are
  • 10:29the outcome of an evolutionary
  • 10:31process that's driven by mutation,
  • 10:33consequent genetic variation
  • 10:35created by that mutation,
  • 10:36and natural selection for
  • 10:38the more oncogenic variants.
  • 10:40This is from Peter Knowles
  • 10:421976 science article,
  • 10:43a very well known article where he
  • 10:45just went through the idea that,
  • 10:46you know, it's an evolutionary.
  • 10:47Process that actually produces
  • 10:50malignancies and in this depiction
  • 10:53you can see a cellular lineages
  • 10:57differentiating and dividing.
  • 10:59You see a lot of lineages that are
  • 11:01hashed out here meaning they go
  • 11:03extinct and that's the selective
  • 11:05process in operation.
  • 11:06You know most of our our cells
  • 11:08are all dying at the same rate as
  • 11:10we're as they're dividing typically
  • 11:12as in as an adult.
  • 11:13So there's a lot of death going on we
  • 11:15don't usually emphasize that but but.
  • 11:17So that death may be going on and
  • 11:19what happens is that at some point
  • 11:21you get lineages that are reproducing
  • 11:22a lot more than they are dying.
  • 11:24And those ones,
  • 11:25in the case that they cause
  • 11:28difficulties for your life are
  • 11:30usually referred to as malignancies,
  • 11:32especially if they can then
  • 11:34migrate to other locations.
  • 11:35And this.
  • 11:37So these later evolved lineages are
  • 11:40usually the product of a series
  • 11:42of mutations that come along
  • 11:44during this evolutionary process
  • 11:46and what's happening with those.
  • 11:48Patience is they're actually enabling the
  • 11:51cells to survive and proliferate better,
  • 11:53so they're selected as the terminology
  • 11:55we use in evolutionary biology,
  • 11:57and they persist.
  • 11:59And that arising of those mutations
  • 12:02within individual cells within
  • 12:04cancer lineages is what we need to
  • 12:07sort of understand because there's
  • 12:08two processes going on here.
  • 12:09One is the appearance of these
  • 12:11mutations and then there's the amount
  • 12:13that they actually increase the
  • 12:14survival and replication of the cells.
  • 12:16So to quantify cancer effect size,
  • 12:19which is what I typically call
  • 12:20this the effect on,
  • 12:22you know,
  • 12:22on cells of actually leading to cancer,
  • 12:25which in evolutionary biology we
  • 12:27just call a selection coefficient.
  • 12:30We need to understand what the
  • 12:31prevalence in a population,
  • 12:33patient population is of a tumor
  • 12:34and we need to deconvolve that
  • 12:36prevalence into two factors because
  • 12:38when we see a certain mutation
  • 12:41very commonly in a kind of cancer,
  • 12:43that doesn't mean it's causing
  • 12:44a lot of the cancer.
  • 12:46It may just be that the mutation
  • 12:47rate is very high and we've seen
  • 12:49that very frequently in instances
  • 12:51where we have genes that are very
  • 12:52large or have very high mutation
  • 12:54rates that show up frequently
  • 12:55when we sequence tumors,
  • 12:56but are not significant causes of cancer.
  • 12:59And so we really need to understand,
  • 13:01you know,
  • 13:01which ones are actually contributing
  • 13:02cancer and which ones are
  • 13:03just typically contributing to
  • 13:04prevalence because of an
  • 13:05underlying mutation rate.
  • 13:06So to quantify the cancer effective size,
  • 13:09we have to do a fairly straightforward thing,
  • 13:11which is take that prevalence,
  • 13:12how frequent we see them in
  • 13:13patients and deconvolve it into
  • 13:15the baseline mutation rate.
  • 13:16How frequently the mutations are
  • 13:18occurring in the lineage and into
  • 13:21the degree of selection for that
  • 13:23mutation in the cancer lineage.
  • 13:26And if we can differentiate those two things,
  • 13:28then we can better understand how
  • 13:29much is because how much is that
  • 13:31mutation there is because of the
  • 13:33underlying mutations that are happening
  • 13:34and across your whole genome that
  • 13:36aren't necessarily relevant and how
  • 13:37much is due to those individual
  • 13:39mutations actually increasing the
  • 13:41proliferation and survival of the cell.
  • 13:43So here's just a schematic of that.
  • 13:45This is just basic evolutionary biology.
  • 13:48one-on-one mutation creates variation
  • 13:50symbolized by the different shades
  • 13:52of Gray there unfavorable mutations
  • 13:55are selected against.
  • 13:56Reproduction and mutation occur,
  • 13:58and the favorable mutations are more
  • 14:01likely to survive and reproduce,
  • 14:03and the point of this is that it
  • 14:05both the mutation rate and the
  • 14:08extent to which they contribute
  • 14:10to survival and reproduction.
  • 14:13Contribute to what you see at as an
  • 14:16end product of the process of cellular
  • 14:19differentiation, especially into cancers.
  • 14:22All right.
  • 14:23So how do we figure out
  • 14:24that baseline mutation rate?
  • 14:25Well, it's already been done for me anyway.
  • 14:28It was a lot of the work was already done,
  • 14:29which is really great by by
  • 14:33Lawrence and and others.
  • 14:35This is a 2013 paper quite a
  • 14:36while ago where they showed that
  • 14:38mutation rate varies widely across
  • 14:40the genome and correlates with DNA
  • 14:43replication time and expression level.
  • 14:45So there's these covariates.
  • 14:45I'm not going to go into a lot
  • 14:47of detail about this.
  • 14:48I've talked about this before
  • 14:50with various audiences here, but.
  • 14:51That mutation rate varies and correlates
  • 14:54with DNA replication time and
  • 14:56expression level with heterochromatin marks.
  • 14:58A bunch of other correlates that we can
  • 15:01actually get about individual tumors.
  • 15:03Those allow us to ask questions
  • 15:04about you know a given gene and
  • 15:06whether or not it's got a very high
  • 15:09mutation rate or a low mutation rate.
  • 15:11By using those correlates to help us
  • 15:13predict that along with synonymous
  • 15:15changes in the genome which we can
  • 15:17presume don't have any effect on the
  • 15:20proliferation and survival of of cells.
  • 15:22So for instance,
  • 15:23olfactory receptors,
  • 15:24which early on were this bugaboo that would
  • 15:26show up when we did these tumor sequencing,
  • 15:28happened to be in a part of
  • 15:29the genome that gets a very,
  • 15:30very high mutation rate.
  • 15:31It's basically not expressed
  • 15:33and not expressed.
  • 15:34Parts of the genome don't have
  • 15:35transcription enabled repair,
  • 15:36etcetera.
  • 15:37CSMD 3 is another example where
  • 15:39there's very high levels,
  • 15:40high correlates and also very
  • 15:42high mutation rate.
  • 15:43And typically it's not considered
  • 15:44to be a driver even though you see
  • 15:46it a lot in cancer tumor sequencing
  • 15:48and you can do regressions on
  • 15:49this and then I'm just going to
  • 15:51very quickly mention that.
  • 15:52This wonderful work was done by Lawrence,
  • 15:54but typically that work was only applied
  • 15:57to the question of whether or not
  • 15:59genes were overburdened with mutations.
  • 16:01So in other words, they got these
  • 16:02mutation rates and they just said,
  • 16:03well, is it more than we expect.
  • 16:05And then they calculated P value for
  • 16:06whether we should put this gene in the
  • 16:08category of mutated or not and then
  • 16:09they leave behind that mutation rate
  • 16:11and then just look at prevalence in
  • 16:13most of the analysis that were done
  • 16:14from 2013 through 2018 or so. So.
  • 16:18So typically that was sort of left
  • 16:20behind at that point and that's what.
  • 16:23Vincent Kintaro and I in 2018 sort
  • 16:24of picked up on and said look,
  • 16:25this mutation rate is more important
  • 16:27than for just calculating P values.
  • 16:28It's actually important for
  • 16:30calculating the effect.
  • 16:31You know in the biostatisticians mind
  • 16:33P value is sort of a secondary thing.
  • 16:36First you calculate the effect of
  • 16:37the thing you're looking at and
  • 16:39then you calculate that you see
  • 16:40whether you should trust that effect.
  • 16:42And so that's what Vincent Cantara
  • 16:44and I did and just here's a sort of a
  • 16:46brief introduction to how we do that
  • 16:48calculation by convolving the gene
  • 16:49based rates from the silent sites and
  • 16:51covariates with they're trying to die.
  • 16:53Context.
  • 16:53So you can just go through tumor
  • 16:55sequence data and you can look at
  • 16:57what the underlying mutation rate
  • 16:59is using basically that Lawrence at
  • 17:01all approach that I talked about
  • 17:02with the covariance,
  • 17:03you can sort of look at every gene
  • 17:04in the genome and calculate what
  • 17:06the mutation rate is.
  • 17:06And this is just one of these plots
  • 17:08that's just scatter plot on one axis
  • 17:10of what the different gene rates are.
  • 17:11And you can see there's quite a
  • 17:13wide range here.
  • 17:14And I think that's the most important
  • 17:16lesson of this little image is that
  • 17:18the mutation rate varies quite
  • 17:19extensively from gene to gene from 10
  • 17:21to the minus two to 10 to the minus 4.
  • 17:23In this particular instance,
  • 17:25so that's two orders of magnitude
  • 17:27rate variation in mutation rates.
  • 17:28So when you see,
  • 17:29you know one gene mutated in a
  • 17:32cancer tumor pop cohort at 100,
  • 17:33you know,
  • 17:34100 copies out of 1000 and another
  • 17:36at 10 out of 1000,
  • 17:37that's only one order of magnitude
  • 17:39difference in prevalence and you
  • 17:40can explain that by just half
  • 17:41of this mutation rate diagram.
  • 17:43In other words,
  • 17:43mutation rate can explain a lot of
  • 17:46the differences in how prevalent genes
  • 17:48are when you look in a patient population.
  • 17:50So you shouldn't take that
  • 17:52prevalence as an indicator.
  • 17:54As a strong indicator of how
  • 17:56important a gene is in the cancer,
  • 17:58you really need to basically understand
  • 18:01this underlying mutation rate as well.
  • 18:03And so then you can take different
  • 18:06genes that are on that that diagram
  • 18:09and you can look at each individual
  • 18:11tumor and you can map out what
  • 18:14the trinucleotide rate rates are.
  • 18:15So. So this rate is,
  • 18:16the rate above is just the rate at
  • 18:18which the gene itself gets mutated.
  • 18:20But if we want to look at every given site,
  • 18:23the important thing is that the
  • 18:25different mutational processes that I
  • 18:27mentioned earlier in this talk affect
  • 18:29different sites at different frequencies.
  • 18:31Have a question right there.
  • 18:342nd normalized for length.
  • 18:37Is the mutation rate itself?
  • 18:39In this case it is, yes.
  • 18:43So, so, so these different mutational
  • 18:46processes contribute to differently.
  • 18:49So in this case,
  • 18:49I'm looking at lung cancer,
  • 18:50which is why we can be carriers
  • 18:52and EGFR highlighted here.
  • 18:54And in lung cancer,
  • 18:55you get a lot of these RCA mutations
  • 18:57that are preceded by a T and
  • 18:59followed by an A and also ones that
  • 19:02are preceded by and followed by
  • 19:03an A&C and an A and an A and an A.
  • 19:05So, so, so all of these bright red
  • 19:08trinucleotide context get much
  • 19:10more mutation than other ones.
  • 19:11And again I just want to
  • 19:13emphasize that the coloration.
  • 19:14Here is scaled to how often we see it.
  • 19:16And so you see almost an order of magnitude,
  • 19:19sometimes more,
  • 19:20with some cancer types of variation,
  • 19:22again in how frequently given
  • 19:25sites get mutated over other sites.
  • 19:27So when you combine this
  • 19:29plus the gene by gene rates,
  • 19:30you're talking about 3 orders of magnitude,
  • 19:32maybe even four in some cases,
  • 19:34between a given site and
  • 19:36another site in the genome,
  • 19:37and how frequently gets mutated.
  • 19:38So this is a really important
  • 19:39factor to take into consideration
  • 19:41when wondering whether or not
  • 19:42a given site is important for.
  • 19:44Driving cancer and what you can
  • 19:45do is you can basically tape this
  • 19:47map and look at each gene and
  • 19:49here I've just look,
  • 19:50I'm looking at like an excerpt of a
  • 19:52tiny little part of the of the genome.
  • 19:54Sorry.
  • 19:55This is this is site 850 to 870 and EGFR,
  • 19:59here's site 1 to 20 in K Ras and here's
  • 20:03site 30 to 50 in cutting and B1.
  • 20:06And I just want to mention that if
  • 20:07you you know you take these rates
  • 20:09and then you make sure that the
  • 20:10individual site rates are accommodated
  • 20:12by ensuring that you know TCA is much more.
  • 20:15Frequent then see CCG chaining 2
  • 20:19and A and and and do all of the math
  • 20:22that's very straightforward here
  • 20:24but a bit of a lot of accounting
  • 20:26bioinformatics ally and then map it
  • 20:29through the the actual genetic code.
  • 20:31So you're looking at every single
  • 20:32site in that protein and saying well
  • 20:34how likely is this 850 histidine to
  • 20:36change based on its three code on
  • 20:38sites into a tyrosine or a proline
  • 20:41or a phenylalanine etcetera etcetera.
  • 20:42And some sites of course some
  • 20:44changes of course can't really
  • 20:46happen through a single.
  • 20:47Nucleotide mutation,
  • 20:48others can in multiple ways, etcetera.
  • 20:50So there's a lot of addition to add up here.
  • 20:52But once you add it all up,
  • 20:53this diagram tells you how likely
  • 20:55each different change is to happen
  • 20:57by neutral mutation.
  • 20:58That is when we just expect new
  • 21:00mutations to be sprayed on there
  • 21:01and have no difference in the
  • 21:03replication and survival.
  • 21:04So then we get this diagram of how
  • 21:06much each amino acid position would
  • 21:08be expected to be mutation mutated,
  • 21:10and then we can compare that
  • 21:12to what's actually observed.
  • 21:18Um, what's actually observed is much,
  • 21:22much more rarified set of mutations
  • 21:24than what you actually expect
  • 21:25based on neutral evolution.
  • 21:27And that's because when we sample tumors,
  • 21:29we're sampling tumors that have been
  • 21:31under selection for very specific
  • 21:32mutations and because right here I've
  • 21:34selected sites that actually do have an
  • 21:37effect on proliferation and survival.
  • 21:38So on the top EGFR 858,
  • 21:42Lucine is a very well known mutational site.
  • 21:45The KSG 12 is also a very well known one.
  • 21:48And then this.
  • 21:49Part of continuing 1B1,
  • 21:50it's a domain that is known to be
  • 21:53oncogenic when it gets mutated slightly
  • 21:54lower level in terms of the others.
  • 21:56But the whole region across here is sort
  • 21:59of known to be important to to oncogenesis.
  • 22:02And So what you can basically do is
  • 22:04take the prevalence that we see and and
  • 22:06this is in a very crude terms but and
  • 22:09there's some corrections that are involved,
  • 22:11I'm not going to go into but essentially
  • 22:14divide the expectations the observed
  • 22:16here by the expected block on the same.
  • 22:19On the same plot on the left and that
  • 22:21gives you a metric for the cancer factor.
  • 22:23How strongly that that site
  • 22:25is mutated that sorry,
  • 22:27how strongly that site is
  • 22:29selected once it is mutated.
  • 22:32And as I said these are well known
  • 22:34sites in these particular cancers.
  • 22:36And if you do that across all the
  • 22:38different sites that you can look at
  • 22:39what you see is a is a distribution
  • 22:40that looks like this where on the
  • 22:42X axis is the cancer effect size.
  • 22:45It ranges from 10 to the zero to 10 to the
  • 22:476th maybe even a little bit more typically.
  • 22:49And why is that?
  • 22:50Why,
  • 22:51what does this range mean?
  • 22:53The range is what it is because
  • 22:54that has it's it's complicated and
  • 22:56I don't want to go into a lot of
  • 22:58detail but population genetically
  • 22:59it has to do with the population.
  • 23:01Size of the cancer,
  • 23:02the reproductive population size.
  • 23:03How many cells in the cancer
  • 23:05could possibly reproduce?
  • 23:06I'm not going to go into more
  • 23:07saying about that,
  • 23:08but that's why it exists across
  • 23:10this wide range.
  • 23:11The density here is just I'm just going
  • 23:13to density a plot across cancer effect
  • 23:15size of these different mutations.
  • 23:16So most of the mutations lie at this
  • 23:18very low range where it's not even
  • 23:21clear necessarily if they're under any
  • 23:23selection below say 10 to the four or so.
  • 23:26And in blue I show you the
  • 23:28synonymous mutations and in red
  • 23:30the non synonymous mutations.
  • 23:31So there's just a slight,
  • 23:33a slight bias over the synonymous
  • 23:36mutations of NONSYNONYMOUS
  • 23:38mutations to be oncogenic.
  • 23:39But the really important mutations
  • 23:40are all out on this tail here,
  • 23:42and I've just shown 2 here for reference.
  • 23:44Here's a P53 mutation that's quite common.
  • 23:46Here's an NF2L2 mutation is quite
  • 23:48common in lung squamous cell carcinoma.
  • 23:51So these tail mutations are the
  • 23:52ones that are important.
  • 23:53And this harks back to
  • 23:54what I was saying earlier
  • 23:55when we say, oh, lots of mutations
  • 23:57are happening in the genome because
  • 23:59of UV light or something like that.
  • 24:01If they're not these key
  • 24:03mutations out here on the tail,
  • 24:04they're not contributing much to cancer.
  • 24:06So we really need that component to be
  • 24:09included if we want to ask the question
  • 24:12what is causing cancer in an individual.
  • 24:14Tumor in a digital patient.
  • 24:17You can do this diagram
  • 24:18not just for lung cancer,
  • 24:19but for lots of different cancers,
  • 24:21and we see very much the same pattern.
  • 24:31OK. Now just to provide you a little
  • 24:33bit of perhaps validation that this
  • 24:36cancer effect you know is meaningful,
  • 24:38probably many of you are
  • 24:39familiar with GLENVAR variants,
  • 24:41variants that have been attributed over
  • 24:44time with some clinical significance.
  • 24:47And these by the way these are Clint
  • 24:49Barbarians that were attributed significance,
  • 24:50not potential, not ones that
  • 24:53weren't attributed significance.
  • 24:54And on the X axis we've sort of divided them,
  • 24:58those Clint Barbarians up and
  • 24:59some categories I'll talk about,
  • 25:00but on the Y axis.
  • 25:01Is the scale selection coefficient,
  • 25:03and generally there's basically
  • 25:052 comparisons.
  • 25:06I really want to emphasize here.
  • 25:07If we look at glenvar single nucleotide
  • 25:09variants that are recurrent within
  • 25:11cancer type and compare it to other
  • 25:14single nucleotide variants that
  • 25:15are recurrent within cancer type,
  • 25:17we see that the GLENVAR variants have a much,
  • 25:20much higher distribution of selection
  • 25:21coefficient than the ones that are other SNV.
  • 25:24So in other words, there's,
  • 25:26you know this literally,
  • 25:28this is saying that Glenvar
  • 25:30predicts cancer effect.
  • 25:31But the opposite is true and I'll
  • 25:33show you that in the next slide.
  • 25:34And then we can also compare Glenvar STD's
  • 25:37that are a single hit within a cancer type.
  • 25:40That is ones that we only see once when
  • 25:43they're clean bar single nukite variance
  • 25:46versus other SNB's that are single hit.
  • 25:48And you can see that the cancer
  • 25:51affect size of those ones that are
  • 25:53you know known oncogenic are believed
  • 25:56oncogenic variants have a much higher
  • 25:58cancer effect than the ones that
  • 26:00are not believed to be oncogenic.
  • 26:02And this is a highly significant from
  • 26:04a a statistical science point of view.
  • 26:07By the way, this is work of Jeffrey Mandel,
  • 26:08who's sitting over here in the audience,
  • 26:10a grad student in my lab.
  • 26:12And so that should be reassuring.
  • 26:17Furthermore,
  • 26:17if you take the mean or the top
  • 26:20cancer effect of a given variant,
  • 26:23they're much stronger predictions of
  • 26:25glenvar status than the SIFT score,
  • 26:28the Polyphen 2 score,
  • 26:30or variant prevalence.
  • 26:31Any of these measures that are
  • 26:32typically used to try to say whether
  • 26:34a variant is important or not.
  • 26:36So really,
  • 26:36you should be using cancer effects if you
  • 26:38want to know whether variance important.
  • 26:41This is also work by Jeff Mandell.
  • 26:45OK, so hopefully it persuaded you that
  • 26:48cancer effect is a measure that you
  • 26:50should be thoughtful about and use
  • 26:52and and in the research you're doing.
  • 26:54But what we wanted to get to from
  • 26:55the beginning of this talk was
  • 26:57the extent to which each of those
  • 26:58processes contribute to tumorigenesis.
  • 27:00So if you'll if you'll at least walk
  • 27:01with me on the idea that cancer affect
  • 27:03quantifies the degree to which a given
  • 27:05variant contributes to tumorigenesis,
  • 27:07then that apply that gives us the
  • 27:09key to finish that association.
  • 27:11I said. So we know, you know,
  • 27:14from Alexandra's work.
  • 27:15The degree to which, no, sorry.
  • 27:17We know from this work the degree to which
  • 27:19mutations contribute to the increased
  • 27:21cellular perforation and survival.
  • 27:23And we know from Alexandra's work and others,
  • 27:26some strain in Xanal and others
  • 27:28what the contribution of various
  • 27:31mutagenic processes toward creating
  • 27:33those mutations are.
  • 27:34And so by putting those two things together,
  • 27:36we can understand the relationship between
  • 27:39these increased cellular proliferation
  • 27:41and survival and the actual processes
  • 27:43underlying these mutational effects.
  • 27:46So just going back again to Alexandra's work,
  • 27:49we know each signature contributes
  • 27:51differentially to mutation counts
  • 27:52observed in each cancer type.
  • 27:53I showed this slide earlier and
  • 27:55here's here's the slide where
  • 27:57you can you can sort of like.
  • 27:59Fade out for a moment if you want,
  • 28:01and then come back in a moment.
  • 28:02It's only saying.
  • 28:03What it's it's this is I'm going to
  • 28:06narrate through for those of you
  • 28:07who are really interested how we
  • 28:09actually calculate this process.
  • 28:11But if if you've understood everything
  • 28:13before, there's nothing new here.
  • 28:14It's just the bookkeeping of how we
  • 28:17calculate this process and the the
  • 28:19point is that forget for each for each.
  • 28:21A source of mutation.
  • 28:23Here's deamination with age apobec tobacco,
  • 28:26and then unload clock like signature,
  • 28:28which were the four sources that
  • 28:30came out of the deconvolution for
  • 28:32a particular tumor in the TCA data
  • 28:35set that turned out to be useful
  • 28:37for illustration of this.
  • 28:39For each of those processes,
  • 28:40there's a weight of mutation that
  • 28:42they contribute to given trying time
  • 28:44nucleotides that are listed down here.
  • 28:47So deamination with age really
  • 28:49focuses on these AC to TG mutations.
  • 28:52That's what they cause for the most part.
  • 28:54But then there's a few other ones
  • 28:55here that are quite frequent.
  • 28:57Apobec really focuses on TCA or
  • 29:01TCC or TCG or TCT changing to T,
  • 29:05and tobacco has a broader distribution
  • 29:08of neurogenic.
  • 29:09In fact and this unknown clock
  • 29:11like signature,
  • 29:12there's another aging signature has a
  • 29:15generally quite broad distribution as well.
  • 29:18So we deconvolve that tumor into these
  • 29:22different signatures to understand
  • 29:23how much each one is contributing.
  • 29:25That gives us a signature weight
  • 29:27for every signature here.
  • 29:29And I'm just emphasizing that,
  • 29:30you know,
  • 29:31we can do lots of uncertainty analysis by.
  • 29:34Bootstrapping the signature,
  • 29:35deconvolution,
  • 29:35that's what all these dots are many
  • 29:37bootstraps on and given tumor,
  • 29:38just saying how much of that
  • 29:40signature do we really believe
  • 29:41is contributing to that cancer.
  • 29:43So we do do that and then you
  • 29:45can also and then in addition to
  • 29:47understanding how much signature
  • 29:48is contributed to cancer,
  • 29:49we look at the probability that
  • 29:51each source created each variant.
  • 29:53And we know that because we know
  • 29:54what the sources are and we can
  • 29:55just look at the relative height
  • 29:56of these bars essentially to give
  • 29:58us the probability that each source
  • 29:59contributed to a given variant and
  • 30:02then that probability comes out of that.
  • 30:04Just by multiplying those together
  • 30:06essentially and that gives US4P53
  • 30:08here KF5 and this odorant receptor
  • 30:11which doesn't have much cancer effect,
  • 30:13what the probably each source
  • 30:15contributed to creating each variant.
  • 30:17And then we take that effect size
  • 30:18that I just described to you,
  • 30:20which is very high for this P53 variant,
  • 30:23quite a bit lower for KF5,
  • 30:24but still there and is basically
  • 30:26nonexistent for the odorant receptor
  • 30:28mutation. So.
  • 30:29So this is a really important variant,
  • 30:31this is a less important variant
  • 30:32and this is not important.
  • 30:34Fall and then we can just sort of
  • 30:36multiply through each variant by the
  • 30:38probability that each source created
  • 30:39that variant and that gives us this
  • 30:42final thing which is the proportional
  • 30:43mutation source effect size.
  • 30:45That's a mouthful.
  • 30:46But what we're just trying to say
  • 30:48is how much of this given variant
  • 30:51was caused by the particular
  • 30:53mutational process and or sorry,
  • 30:56how much of the selection for oncogenesis
  • 30:57was caused by that particular process.
  • 30:59So the TP50 were bar the TP 53 bars.
  • 31:03Are much higher than the ones in
  • 31:05KF5 and are those are way higher
  • 31:07than anything in order receptor
  • 31:09because the odorant receptor in
  • 31:11fact doesn't do anything for cancer.
  • 31:13So the average then you can then
  • 31:16you can look across all of those,
  • 31:19all of the variants,
  • 31:20not just these ones to look at what
  • 31:22the average attributable effect size
  • 31:23is in a given tumor and you get this
  • 31:25distribution which says oh for this
  • 31:27tumor you know most of the oncogenic
  • 31:30cause came from deamination with age.
  • 31:33And for this tumor?
  • 31:35You know the second most common process
  • 31:37that was creating mutations that led to
  • 31:40oncogenesis was this light Gray which
  • 31:41is this unknown clock like SIEGENER 5.
  • 31:43So.
  • 31:44So this is a largely aging driven
  • 31:46tumor and there's a little bit
  • 31:48of Apple back here and a little
  • 31:50bit of tobacco smoke and and you
  • 31:52can follow it through like that.
  • 31:53So this is one example for a
  • 31:55given tumor and then that result,
  • 31:57you know it basically tells you what
  • 31:59at least with the knowledge we have
  • 32:01right now what the effect size by
  • 32:03mutational source for this tumor was,
  • 32:05this is a lung cancer tumor.
  • 32:06By the way.
  • 32:07Now you can look at this not just at,
  • 32:10you know,
  • 32:11you can sort of understand that
  • 32:13for a given site,
  • 32:14but then you can also look at
  • 32:18what a set of sites all look like.
  • 32:21So this is just a diagram where we do that.
  • 32:22Again,
  • 32:23a little bit complex,
  • 32:24but hopefully this everyone can
  • 32:26follow along directly with.
  • 32:28If you look across the genome,
  • 32:29there's an average mutational weight.
  • 32:31So tobacco smoke is causing a
  • 32:33certain number of the mutations,
  • 32:34certain proportion of the mutations
  • 32:36and then a number of others.
  • 32:37And in these diagrams,
  • 32:38I've sort of put the major mutagenic
  • 32:40cause on the left and then stacked
  • 32:42all the other causes on the right,
  • 32:44just because it helps you really
  • 32:46see the differential effect
  • 32:47of these different processes.
  • 32:49So tobacco smoking is the major cause
  • 32:52of of loads in general in terms of
  • 32:56the underlying genomic mutation.
  • 32:58But if you look at from site to site,
  • 33:01each site has a different probability
  • 33:03of being caused by tobacco smoke.
  • 33:05So here's.
  • 33:07KSG 12C very, very,
  • 33:09very strong caused by tobacco smoke,
  • 33:11maybe that's not surprising it in
  • 33:13lung cancer, we see that variant very,
  • 33:15very frequently.
  • 33:16We very rarely see care SG12C in
  • 33:19other cancers like pancreatic cancer,
  • 33:20other K rosterman cancers.
  • 33:22So why is that?
  • 33:23Well, it's just because that site is
  • 33:25hit a lot more in terms of mutations.
  • 33:28It's not a doesn't appear to have anything to
  • 33:31do from our calculations with its particular
  • 33:34cancer effect relative to other variants.
  • 33:37And, and in contrast, here's EGFR LA58R.
  • 33:41It's a long known fact that you
  • 33:43rarely see those in individuals who
  • 33:45are non-smokers, are smokers.
  • 33:47You see that in non-smokers and the
  • 33:49reason is it's not caused by smoking.
  • 33:51So when you see a patient with these Fr
  • 33:55mutation, they typically are not smoker.
  • 33:57There's a lot more individuals coming
  • 34:00in with EGFR who are not smokers then
  • 34:03are smokers relatively speaking.
  • 34:05So you can do this for lung adenocarcinoma.
  • 34:09You can look at other variants of
  • 34:11course lung squamous cell carcinoma.
  • 34:12Here you see PI3 kinase largely driven
  • 34:15in lung squamous cell carcinoma by other
  • 34:18effects, mostly apobec but in fact.
  • 34:22Not at all driven by tobacco smoke.
  • 34:23Again, that's an empirical observation
  • 34:25that people have noted many times
  • 34:27that individuals with lung squamous
  • 34:29cell carcinoma who have PI3 kinase
  • 34:30mutation are rarely are less frequently
  • 34:32smokers than than other mutations.
  • 34:34And P3 mutations, on the other hand,
  • 34:37are diverse,
  • 34:37some of them likely to be created by
  • 34:39tobacco smoke, some of them less likely.
  • 34:42OK, so we can look at this on an
  • 34:44individual basis and then we can
  • 34:45look at some other cancers.
  • 34:46So here's bladder cancer and cervical cancer.
  • 34:49I just added these because.
  • 34:51Maybe this is a little less well known,
  • 34:54but a lot of the mutation in both
  • 34:56bladder cancer and cervical cancer
  • 34:58is is caused at least by the.
  • 35:00This deconvolution approach appears
  • 35:02to be attributable to APOBEC mutation.
  • 35:05Apobec is this apolipoprotein B.
  • 35:09Gene that enzymatically we know
  • 35:11mutates DNA and appears to be a
  • 35:14viral defense protein.
  • 35:15And what we see is that a lot of
  • 35:18the mutagenic cause in the in
  • 35:20the genome is created by apobec,
  • 35:22some of it's by aging and bladder
  • 35:24cancer and cervical cancer.
  • 35:25There's a little bit of defective homologous
  • 35:28recombination as a source there as well.
  • 35:30But as you can see for a number
  • 35:32of these mutations,
  • 35:33the some P3 mutations for FGFR 3 for KSG 12D,
  • 35:37we see almost no cause from APOBEC.
  • 35:39But on the other hand,
  • 35:40this other FGFR 3 mutation very
  • 35:42likely to be caused by apobec,
  • 35:45PI3 kinase,
  • 35:45again very likely to be caused
  • 35:47by APOBEC mutation.
  • 35:48Cervical cancer are the same thing.
  • 35:50All right,
  • 35:50so we can look at the interval variance here.
  • 35:52Let's get back to the main theme
  • 35:55that this talk hopefully is.
  • 35:57Presenting to you,
  • 35:58which is that once we understand for every
  • 36:00one of these variants what the causes
  • 36:02are and how much they're causing cancer,
  • 36:04we can then look at tumor
  • 36:06causation by tumor type.
  • 36:08And this isn't the best way to contrast them,
  • 36:10I'll show you another that maybe
  • 36:11contrast it a little bit better.
  • 36:12But here we have all the different
  • 36:15signatures on the Y axis and all
  • 36:17the different cancers on the X
  • 36:19axis and the red is the amount that
  • 36:22the tumor type is caused by that
  • 36:25particular signature and the Gray.
  • 36:27It is or black is, the amount that you
  • 36:29see mutation for due to that signature.
  • 36:32And there's some big differences,
  • 36:34say in signature 5 here for thyroid
  • 36:36cancer where you see an enormous amount
  • 36:39of cause but much less mutation.
  • 36:42But it's a little hard to read that dot plot.
  • 36:45Down below we have just these
  • 36:47bar plots showing the can't,
  • 36:49the weight of mutation.
  • 36:51How much? Mutation was caught,
  • 36:54which of the mutation in the genome was
  • 36:57caused by a given mutational process.
  • 36:58And on the right the effects and these
  • 37:01may look pretty similar but let I'll
  • 37:02show you the contrast that shows you
  • 37:04how they're different in a moment.
  • 37:06The thing I want to emphasize right
  • 37:07now is we've given colors for all
  • 37:09of those exogenous sources that
  • 37:11may in principle be things that
  • 37:13we could interfere on to stop.
  • 37:15So UV light, defective,
  • 37:18homologous recombination, presumably,
  • 37:19maybe there be a way to do that,
  • 37:21apobec perhaps if we.
  • 37:23You know, avoided viral infection,
  • 37:25tobacco certainly interventional alcohol,
  • 37:28definitely something we can intervention on.
  • 37:30Mutagenic chemical exposures definitely
  • 37:32something we can do intervention on anyway.
  • 37:34All those interventional ones are the
  • 37:36colored ones and the aging ones are the
  • 37:38Gray ones and then the unknown processes.
  • 37:40The process is that we haven't figured out
  • 37:42what they're associated with are in black.
  • 37:44So this diagram actually tells you
  • 37:46a lot about what you can do now
  • 37:49to understand more about cancer,
  • 37:50right because or to intervene we can
  • 37:53intervene a lot on these cancer on these
  • 37:55cancer types for which we see a lot of color.
  • 37:58We there's much less we can
  • 37:59do for the ones we don't.
  • 38:01So for instance glioma very a lot
  • 38:03of aging not a lot of other things,
  • 38:05thyroid cancer, a lot of apobec,
  • 38:07but other than that aging glioblastoma.
  • 38:10Again, a lot of aging prostate cancer,
  • 38:13a lot of aging,
  • 38:14just a little bit of apobec and
  • 38:17defective homologous recombination.
  • 38:18So there's some we don't have much
  • 38:20way to intervene on skin cancer,
  • 38:22extremely easy to intervene to
  • 38:25reduce the number of mutations,
  • 38:27lung cancer, a lot of tobacco,
  • 38:30a lot of defective,
  • 38:31longest recombination and Applejack.
  • 38:32So there's a lot we can do in terms
  • 38:34of stopping those and then also where
  • 38:36there's a lot more to understand.
  • 38:38So for instance, breast cancer,
  • 38:39ER 9 minus breast cancer.
  • 38:40Like nearly half of the mutations,
  • 38:42we don't know why they're being
  • 38:44caused process wise.
  • 38:45So this is something to be investigated
  • 38:47because if we could figure it out,
  • 38:48maybe there are these interminable
  • 38:51processes that we could do something about.
  • 38:55Etcetera. So you can sort
  • 38:56of look at the black.
  • 38:56That gives you an idea of how much
  • 38:58we still need to learn and the Gray
  • 39:00tells you and the idea of like,
  • 39:01how much more.
  • 39:04How much aging versus other processes
  • 39:06seem to be causing that given cancer?
  • 39:09And of course the cancers that
  • 39:11are most age-related are at
  • 39:12the bottom of this diagram,
  • 39:13and the ones that are least
  • 39:15age-related tend to be higher up.
  • 39:18So this is just a bigger diagram of
  • 39:21of that same picture in case Umm,
  • 39:23you can see it better.
  • 39:25And then I'm going to show you,
  • 39:26I'm not going to show you
  • 39:27the actual cancer types,
  • 39:28but just an animation that actually
  • 39:30Vincent Cantero made that helps
  • 39:32you see the difference between
  • 39:33the cancer mutation and effects.
  • 39:35So this just varies between how
  • 39:37much mutation is causing the given
  • 39:39cancer and how much of the cancer
  • 39:41affect by those mutations is causing
  • 39:43the cancer and allows you to sort
  • 39:45of see how different they are.
  • 39:49You know, for ones like skin cancer,
  • 39:52it doesn't change that much because nearly
  • 39:53all the mutations are caused by UV anyway.
  • 39:57All right. So as we said,
  • 40:00as I said earlier,
  • 40:01the extent to which the processes
  • 40:02contribute determines tumor
  • 40:03Genesis has been unknown.
  • 40:04But now we can link it together
  • 40:07with this process.
  • 40:08And now I wanted to go back to
  • 40:10this slide because I'm going to
  • 40:11show you a bunch of diagrams and
  • 40:13they're pretty complicated diagrams.
  • 40:15But on the left hand side is going
  • 40:16to be a bar plot that's respect
  • 40:18reflecting like how much each process
  • 40:20is contributing to the mutations.
  • 40:22It's this left hand sign sign
  • 40:24and on the right of each plot
  • 40:26is going to be another bar.
  • 40:27Plot that shows you how much each
  • 40:29mutation is contributing to the increased
  • 40:32cellular proliferation and survival.
  • 40:34For four different cancers,
  • 40:36here's primary skin cancer.
  • 40:38Sorry, primary and metastatic skin cancer.
  • 40:42Colorectal cancer.
  • 40:45Actually this color,
  • 40:46colon cancer,
  • 40:47HPV negative head,
  • 40:48neck cancer and thyroid cancer and the
  • 40:51diagrams are this bar versus this bar.
  • 40:54So the bar on the left is how
  • 40:56much of a gift for a specific
  • 40:58tumor was contributed by a given
  • 41:00process and then how much of the
  • 41:02oncogenesis for that tumor was
  • 41:04caused by that particular process.
  • 41:06And I've lined these up so that
  • 41:08what I'm showing you is just
  • 41:10five examples here from TCJ and
  • 41:12this is actually from some data
  • 41:14gathered here at Yale on Melanoma.
  • 41:16But anyway,
  • 41:17we've looked across these
  • 41:19different cancer type,
  • 41:20these different tumors and the
  • 41:21question is are these different
  • 41:23or are these similar?
  • 41:24Like is the basic mutagenic effect
  • 41:25and the cancer effect similar or is
  • 41:27it different and you see they're very
  • 41:29similar for two these two tumors,
  • 41:31very similar for this third
  • 41:32getting a little different here
  • 41:34and getting quite different here.
  • 41:35And these are arranged at the
  • 41:38quartiles of the distribution.
  • 41:40So it sort of represents the
  • 41:41range of what you see in patients.
  • 41:43So most of the time mutagenic
  • 41:45effect and cancer causation.
  • 41:46Are aligned very closely in primary
  • 41:48skin cancer and that's because UV is
  • 41:50causing almost all these mutations
  • 41:51and changing things in colon cancer.
  • 41:53As you extend from the more
  • 41:55similar to the more different,
  • 41:56you see a lot more heterogeneity
  • 41:58from patient to patient in terms of
  • 42:01whether or not the causative factors
  • 42:02are the same as the myogenic factors.
  • 42:05And that gets even more extreme
  • 42:06with HPV negative head,
  • 42:07neck cancer and even more
  • 42:10extreme with thyroid cancer.
  • 42:12So, but let me just emphasize again,
  • 42:15these measures are for individual patients.
  • 42:17So in principle this calculation can
  • 42:19be done on any tumor sequence from
  • 42:22an individual patient to tell you.
  • 42:25What the causation of their cancer was,
  • 42:28at least to the level that we
  • 42:30are able to analyze this now,
  • 42:31there's a bunch of things that
  • 42:33are that we would love to also
  • 42:35be able to incorporate into this.
  • 42:36This is only single nucleotide mutations.
  • 42:38It doesn't take into account
  • 42:39copy number variation.
  • 42:40It doesn't take into account
  • 42:42epigenetic changes.
  • 42:42And as I said at the outset,
  • 42:44none of this has to do with physiological
  • 42:46things like whether you exercise and have
  • 42:47good autophagy in your you know it doesn't.
  • 42:49It's not that physiological
  • 42:51question of why you got cancer,
  • 42:52but it is the mutagenic
  • 42:53answer of why you got cancer.
  • 42:55Down at the SMV level. Uh. And.
  • 43:00So it reveals that and so that, so.
  • 43:04So I think we're very good there.
  • 43:05I would argue that the logic behind
  • 43:07this is right and that we can apply
  • 43:10that same logic to epigenetics to
  • 43:11to copy number changes etcetera.
  • 43:13There's just a lot of understanding.
  • 43:14We still need of the basic underlying
  • 43:16mutation rate for those things
  • 43:18in order to actually do that and
  • 43:19we're trying to work on that now.
  • 43:23Now I just this, this is basically
  • 43:25the the the end of the major talk but
  • 43:27I just want to emphasize that this
  • 43:28doesn't just apply to the origin of
  • 43:30cancer in the early tumor genesis the
  • 43:33same the same processes are going on
  • 43:36in patients as we treat them as well.
  • 43:38So there is you know so if you have a
  • 43:41patient where you take out a a biopsy or
  • 43:43a resection and then they undergo some
  • 43:46sort of treatment and have recurrence
  • 43:48there are ways to figure out exactly
  • 43:50what the underlying processes that are
  • 43:52contributing to the mutations that.
  • 43:54Because that recurrence are.
  • 43:56So that should be of interest to all of
  • 43:58us who are interested in figuring out
  • 44:00what's causing recurrence in cancer.
  • 44:02So there's a clinical as well as the
  • 44:04sort of more public health side that
  • 44:05I was talking about with regard to
  • 44:07these mutations and clinical side of
  • 44:08how we might be able to apply this.
  • 44:10And just to give you 2 material
  • 44:13examples of this, here are two,
  • 44:15I'm going to show you two sort of
  • 44:18tree studies of individual patients.
  • 44:20These were led by Nick Fisk in my lab.
  • 44:24And here's a patient who was diagnosed
  • 44:29with stage 3B lung cancer.
  • 44:32They had an EGFR exon 19 deletion
  • 44:35and their tumor was resected.
  • 44:37They were given cisplatin and permatex
  • 44:40bib and this there's a little pipe
  • 44:42part and this is a phylogenetic tree
  • 44:44relating their metastatic tumors to
  • 44:46the primary tumor and it's been dated.
  • 44:48We have all these techniques in my
  • 44:49lab to date that based on the when
  • 44:51the primary tumor was and how many
  • 44:53mutations we see etcetera, etcetera.
  • 44:54These were extracted at a later
  • 44:56date than this,
  • 44:57and so that gives us a way
  • 44:58to calibrate the time.
  • 44:59And what you see here in these pie
  • 45:01charts is I've I've made it simpler.
  • 45:03I just am looking at all other
  • 45:05kinds of mutagenic sources.
  • 45:07And one specific source
  • 45:08that I'm interested in.
  • 45:10And in this particular case,
  • 45:11the source I'm interested in
  • 45:12is the effect of cisplatin,
  • 45:14which we know has a mutagenic
  • 45:16effect on tumors even as it.
  • 45:19The you know applies its own selective
  • 45:21effect killing tumor cells and what
  • 45:23you can see here is that the cisplatin
  • 45:26mutations on this branch so this
  • 45:28is independently determined right.
  • 45:30This isn't because cisplatin is here
  • 45:31we just did the deconvolution and boom
  • 45:33here are all these despite mutations.
  • 45:35This white pie piece here almost
  • 45:37you know a bit less than 1/4 of or
  • 45:40around 1/5 of the the mutations in
  • 45:42this tumor are now cisplatin derived
  • 45:44mutations and we can deconvolve
  • 45:46that by doing this tree and seeing
  • 45:48OK on this branch right here.
  • 45:49That's how many are are that kind of
  • 45:51mutation and then and then that of
  • 45:53course that tumor continued to evolve
  • 45:55and the reason it continued to evolve
  • 45:57it was the patient was given or alot nib.
  • 46:00Unfortunately a lot nib wasn't
  • 46:01very successful in this case
  • 46:03because they got the EGFR T790M
  • 46:05resistance mutation on this branch as
  • 46:07well and the tumor differentiated into
  • 46:12these metastatic metastatic tumors and
  • 46:15another metastatic tumor in the pancreas
  • 46:18and and the point is here just that.
  • 46:20The proportion of cisplatin was
  • 46:21discontinued and so the proportion of
  • 46:24mutations in subsequent branches is
  • 46:25actually lower out of the total because
  • 46:27new mutations are being added but they
  • 46:29aren't cisplatin related mutations.
  • 46:31So all of this deconvolution and
  • 46:34understanding of the underlying mutagenic
  • 46:36causation occurs during this treatment
  • 46:39process that that that patients receive.
  • 46:41And we can figure it out.
  • 46:44There's one more point that
  • 46:44I just want to make here,
  • 46:45which is that it turns out,
  • 46:48and I don't have a plot for this,
  • 46:49but the T790M mutation is a mutation
  • 46:52that is very likely to be caused
  • 46:54just like those other ones I
  • 46:57showed you by cisplatin mutation.
  • 46:59So this is a poor ordering clinically
  • 47:02for these treatments to be given
  • 47:04because this despite mutation,
  • 47:06creates a bunch of that genetic
  • 47:07variation that is exactly what we
  • 47:09don't want to have if we're going
  • 47:11to put them on our lot and later.
  • 47:13And very likely they had that mutation
  • 47:14right when they were put on their lot nib,
  • 47:16which is why there's a very little
  • 47:19duration of of of benefit for the patient.
  • 47:22So this is a great example for
  • 47:24for in terms of a clinical or
  • 47:26exogenous source of mutation,
  • 47:27the cisplatin treatment
  • 47:28that they were receiving.
  • 47:30And let me give you another example that's
  • 47:32about an endogenous change that has an
  • 47:34interesting effect in a very similar way.
  • 47:36So here's another lung cancer case.
  • 47:38This patient was put on her right resection.
  • 47:41They had a P53 mutation
  • 47:44already after resection,
  • 47:45but over a much longer period of time.
  • 47:47They were never treated
  • 47:48with cisplatin a much,
  • 47:50much longer time later they did receive.
  • 47:52They did get an ESR T790M mutation and
  • 47:56you can see these plots are solid here
  • 47:59meaning that the mutational process of
  • 48:01interest that I wanted to talk about you
  • 48:03know hasn't happened at all here yet.
  • 48:05And then you can see unfortunately
  • 48:08later on they were they were moved
  • 48:09to Avastin or not unfortunately
  • 48:11movement they were moved to Avastin.
  • 48:12It wasn't unfortunate necessarily
  • 48:15but Erlotinib was discontinued.
  • 48:17They were given permatex have been
  • 48:19carboplatin late in latent therapy but.
  • 48:22But the divergence,
  • 48:24but they're metastatic tumors
  • 48:26started genetically diverging about
  • 48:29two years before their death.
  • 48:30And the thing that I just want to emphasize
  • 48:32is here is a continuum B1 mutation S37C,
  • 48:35which is known to induce.
  • 48:39Yeah.
  • 48:42Defects in homologous recombination
  • 48:44based mutations and you can see in the
  • 48:47descendant lineages the increased amount
  • 48:49of that kind of mutation occurring after
  • 48:52continuing 1B stinging B1 mutation.
  • 48:54So this is an endogenous process that
  • 48:56was started by a mutation that we can
  • 48:58then track again down to the individual
  • 49:00branch where the mutations are occurring
  • 49:03and how many of them are caused.
  • 49:04And then from then on there's a lot of
  • 49:07cutting and B1 mutation in these tumors,
  • 49:09but not in the spleen.
  • 49:10Interesting.
  • 49:11That's an interesting story here
  • 49:12is that this could be 1 mutation
  • 49:15occurred and led to all the metastases
  • 49:18to all these other tissues.
  • 49:20The one tissue that had a
  • 49:21metastasis that was not continuing
  • 49:23to be 1 mutated was the spleen,
  • 49:25which is very interesting because Canadian
  • 49:27B1 mutation causes vascularization.
  • 49:29The spleen is already quite
  • 49:30highly vascularized,
  • 49:31so it may not have been
  • 49:32needed for the spleen,
  • 49:33whereas it may have been more important
  • 49:35to the cancer for the rest of the tumors.
  • 49:38All right,
  • 49:38I've sort of gone through all of
  • 49:40what I wanted to talk about.
  • 49:42Today, in terms of introducing you to
  • 49:44this way of actually trying to understand,
  • 49:47you know,
  • 49:47why an individual tumor is
  • 49:50has been made oncogenic.
  • 49:52I hope that I've at least been able
  • 49:53to argue that the logic behind what
  • 49:55we're doing is sound and that the
  • 49:57process that we're doing is a sound way
  • 49:59of sort of attributing that cancer effect,
  • 50:01at least as regards those
  • 50:02single nucleotide variants,
  • 50:03which are what we mostly focus on.
  • 50:05But where we don't, you know,
  • 50:07we don't know whether that's 10% or 90%
  • 50:09of the reason why genetically tumors are.
  • 50:12Because we don't know that yet.
  • 50:13But regardless,
  • 50:14if we look at that single nucleotide effect,
  • 50:18we now can sort of deconvolve that.
  • 50:20And I'm very curious if anyone
  • 50:22has thoughts to share with me.
  • 50:24You know what how this information could
  • 50:27be used for the benefit of patients,
  • 50:30for the knowledge of patients,
  • 50:31but also as I mentioned in the later
  • 50:33part of my talk in in understanding
  • 50:35better what our therapies are doing to
  • 50:38patients over time as well and ways
  • 50:40that we can ideally order our therapies.
  • 50:43So that we avoid the evolution of the
  • 50:46resistance that we're trying to avoid,
  • 50:48which is so clearly evidenced in
  • 50:50that one case with the cisplatin
  • 50:53and relative treatment.
  • 50:54Thanks very much for your attention.
  • 50:56I want to thank very much.
  • 50:58Vincent Cantaro who was a postdoc in my
  • 51:00lab is now a professor at Emmanuel College,
  • 51:02remains a a vibrant and
  • 51:05wonderful collaborator of mine.
  • 51:07I really like working with him.
  • 51:09It's been incredibly productive to continue
  • 51:11to do so and I'm really delighted that.
  • 51:13He's been able to do so despite a very
  • 51:16heavy teaching load at Emmanuel College.
  • 51:18Jeff Mandel over here in the audience,
  • 51:21graduate student working on the
  • 51:22cancer effect size calculations
  • 51:24and the machinery underlying that.
  • 51:26Nick Fisk who's worked on a lot
  • 51:28of the tree
  • 51:28based analysis in my lab,
  • 51:30including those last ones that I mentioned.
  • 51:32Everyone else in the town's lab,
  • 51:34they're also great.
  • 51:34I didn't specifically mention their work.
  • 51:36We've got interesting work on epistasis
  • 51:38and all kinds of other things that
  • 51:40that I really think is outstanding
  • 51:42and should be really interesting.
  • 51:44But so all the Members,
  • 51:45it's a great group.
  • 51:46Also I want to thank the NIH, NIDCR,
  • 51:48Yale support and head and neck is
  • 51:50a great community and I really
  • 51:52enjoy being part of it and and it
  • 51:54also provides a substantial amount
  • 51:56of my funding for cancer work.
  • 51:58So thank you very much and I
  • 51:59would love to take any questions
  • 52:01or hear any thoughts, comments,
  • 52:02etcetera from anyone in the audience.
  • 52:11Jeff, I think you actually have
  • 52:13an amazing talent to make really
  • 52:15complex things, explain them in
  • 52:17a very simple and logical way.
  • 52:19Do we have a microphone so,
  • 52:22so that people could actually ask questions?
  • 52:27Oh, OK.
  • 52:37OK. Amazing work.
  • 52:39Jeff I just wanted to ask if you could
  • 52:42address how you handle commutations
  • 52:44when you look at cancer effect size.
  • 52:46And you know I'm thinking of
  • 52:48this finding that we haven't had
  • 52:50neck cancer where P53 mutation is
  • 52:51truncating if you don't have CDK into
  • 52:54a mutated and when you when you have
  • 52:56a mutation in the DNA binding domain,
  • 52:59it seems like you need the second mutation.
  • 53:01So how do you handle that and then I guess
  • 53:04also when you have P53 mutations or.
  • 53:07CDK into a mutations or whatever.
  • 53:10Does that muddy your signatures at all?
  • 53:13You know to the to the mutational effects of.
  • 53:18Losing control of of cell cycle
  • 53:20and DNA repair.
  • 53:23This is a little more straightforward.
  • 53:25Does it muddy things when things get changed?
  • 53:27Yes, it does in the sense that there
  • 53:29is a temporal difference, right,
  • 53:31between what was happening before that
  • 53:33change happened and what happened afterwards.
  • 53:35And our resolution for understanding those
  • 53:38temporal differences is somewhat weak, right.
  • 53:40So generally, if we have a lot of
  • 53:42samples like we had in those two cases,
  • 53:44we can sort of piece apart when things
  • 53:46happened in a nice way and we'll be able
  • 53:48to understand those sorts of differences.
  • 53:50But when we're just looking at tumor
  • 53:52Genesis to resection and we have.
  • 53:53This association then we have to use
  • 53:55very large numbers to get sort of
  • 53:58statistical associations to understand
  • 53:59that sort of ordering process.
  • 54:01Which gets me to your second question or
  • 54:04your first question of of commutation which
  • 54:08I have a strong opinion on everything.
  • 54:09So my strong opinion on this is that that
  • 54:12the general genomics approach towards
  • 54:15looking at commutation is is flawed in a
  • 54:18way that is not apparent when you read
  • 54:21all the papers on it and the argument that.
  • 54:23I want to make is that when
  • 54:24you look at commutation,
  • 54:25you're typically looking at a
  • 54:27very observational thing,
  • 54:27which is like how often is this
  • 54:29one mutated and this one mutated.
  • 54:31So for the same reasons that I
  • 54:34outlined in my talk today,
  • 54:36that there are two reasons why
  • 54:38you see things mutated, you know,
  • 54:41underlying mutation or selection.
  • 54:43There are two reasons why things
  • 54:45might be commutated.
  • 54:46They might be commutated because
  • 54:47when you get one,
  • 54:48the other one is selected and it
  • 54:50really creates a great benefit to
  • 54:52the cell to survive and replicate.
  • 54:53The other reason might be because
  • 54:56they both have the same underlying
  • 54:59mutational process.
  • 55:00And when you have four orders
  • 55:01of magnitude of difference in
  • 55:02mutational process site to site,
  • 55:04that can be a very big reason
  • 55:06why you see commutation.
  • 55:07So commutation is not the signature we
  • 55:09like it to be to say these things are
  • 55:12selected together because sometimes
  • 55:13they may not be even though they're
  • 55:15strongly come come mutated in a data set.
  • 55:18So then how do I deal with it?
  • 55:20Well,
  • 55:20we can take all of the approaches
  • 55:22I told you and we're working on
  • 55:24you know even more sophisticated
  • 55:26approaches now to try to do this.
  • 55:27I think I have some slides on it,
  • 55:29so I would love to take the time to.
  • 55:31Just quickly introduce it since
  • 55:33they're they're way down here though.
  • 55:36Ah.
  • 55:37Yeah.
  • 55:38So this is the point that you
  • 55:39that I was just answering to you
  • 55:41which is mutual exclusivity and Co
  • 55:42occurrence are patterns that are
  • 55:44caused by either commutation or
  • 55:45what I call selective epistasis.
  • 55:47Again I'm using the terminology from my
  • 55:49background and evolutionary biology.
  • 55:50Epistasis meaning 1 gene is having an
  • 55:52effect on another or the mutation in
  • 55:54one gene is having an effect on another.
  • 55:56So typical approaches have not
  • 55:57acknowledged the possibility
  • 55:58of commutation
  • 55:59which is a common underlying mutational bias.
  • 56:01That's what I just said to you.
  • 56:02This is a typical slide from I
  • 56:04don't mean to be you know casting
  • 56:05aspersion on this as I said this
  • 56:07is what everyone pretty much.
  • 56:08Does but but they look for whether
  • 56:10cancers have sequential mutations
  • 56:12developed or commutation but we can
  • 56:15actually take those same analysis same
  • 56:17the same data and and and deconvolve with
  • 56:20some fairly sophisticated mathematics
  • 56:22that Jorge Alfaro Murillo and I did on
  • 56:25the fluxes mutation rates and scale.
  • 56:26So selection coefficients for up to five
  • 56:28genes and look at what the likelihood
  • 56:30of individual genes are are to get
  • 56:32mutated what the likelihood Karras is
  • 56:34going to be muted after P53 etcetera.
  • 56:36So we can look at all of these.
  • 56:38Figure out how frequently those happen.
  • 56:40So this is the flux which is a
  • 56:42measure of commutation,
  • 56:43essentially the underlying mutation
  • 56:44rates and then the scaled selection
  • 56:46coefficient for the new mutation.
  • 56:48So these are how likely is P53,
  • 56:50how likely is KSB mutated after PHP 53
  • 56:53and then how likely is it how selected
  • 56:56is it to have KRS after PMI 50?
  • 56:58P 53 is a separate measure,
  • 56:59so we can basically take all of those
  • 57:01and look at all of those different
  • 57:03things for up to five or six.
  • 57:05And again there are constraint
  • 57:06is usually the amount of data.
  • 57:08We need massive amounts of data
  • 57:10to understand,
  • 57:11like 3 way effects or four way effects.
  • 57:13So you need to have examples of every
  • 57:16possible combination in that data set
  • 57:18and that rapidly exhausts our samples.
  • 57:19But on their hand,
  • 57:20we're getting a lot more data
  • 57:22now and so we're able to do this
  • 57:23with more and more data sets.
  • 57:24Now this is lung cancer and we were
  • 57:26able to do it for these five genes.
  • 57:30P53 KSDK, 11 RL, RP1B and.
  • 57:33And figure out all their
  • 57:34relations with each other.
  • 57:36This is maybe an easier way to
  • 57:37see this instead of a big table,
  • 57:39which is just what's the
  • 57:40evolutionary trajectory of them.
  • 57:41And again, this is all epistatic,
  • 57:43like it's all taking into account
  • 57:45that commutation factor and the
  • 57:46width of the bar is the flux,
  • 57:48or how frequently you go from normal
  • 57:50to say P53 in this particular case,
  • 57:53or LPV one or K Ras or SDK 111.
  • 57:56And then you can see that if you KS isn't
  • 57:58actually that frequent as a first mutation,
  • 58:00but if you do get it,
  • 58:01then you're very likely to get LRP 1B.
  • 58:04Or SDK 11.
  • 58:05If you get P53,
  • 58:06you're very likely to then
  • 58:08get LRP 1B as well.
  • 58:10You're you're you know some probability,
  • 58:12but it's not so high of getting curious.
  • 58:13After that you're very likely to get a KRS
  • 58:16mutation if you have P53 and LRP we want to.
  • 58:19One LRP 1B together et cetera.
  • 58:21So you can you can look at what the
  • 58:23likely trajectory for a given patient is.
  • 58:25You could even look at where they are on
  • 58:26this trajectory and we haven't done this,
  • 58:28but presumably you can figure out what
  • 58:29their prognosis was based on where
  • 58:31they were on this diagram etcetera.
  • 58:33And we have basically a a map of what's
  • 58:35actually happening to these these patients.
  • 58:37And then down below in the smaller diagrams,
  • 58:39I've just divided this up because
  • 58:41this is all the fluxes again,
  • 58:42but let's divide it up into mutation
  • 58:44rates and selection coefficients
  • 58:46and what you see is the mutation
  • 58:47rates are here are quite.
  • 58:49Symmetrical because we haven't
  • 58:51accounted for things like.
  • 58:53Containing 1B mutation,
  • 58:54changing the mutation rate etcetera.
  • 58:56In this particular analysis,
  • 58:57although in principle we can do
  • 58:59that and then on the right are
  • 59:01so there's a LRP 1B particularly
  • 59:03has a very high mutation rate.
  • 59:05So it's relatively high frequency
  • 59:06is not that big a deal,
  • 59:07although it does seem to have
  • 59:09some selective effect as well.
  • 59:10And then over here we see the
  • 59:11selective effects and you can see
  • 59:13there's very strong selection for
  • 59:14P53 initially is the major selection
  • 59:16and yet that exists after LRP 1B
  • 59:19as well but after after P53 or.
  • 59:24LRP 1B and P33,
  • 59:25then we're very likely to get
  • 59:27this Karas mutation, etcetera.
  • 59:28So you can really understand what the
  • 59:31relative effect of each of these is.
  • 59:34Trajectories after the sample size.
  • 59:37That's a good question.
  • 59:39I haven't done the study that I'd like
  • 59:41to do to answer that, which would be
  • 59:43like do some very massive analysis.
  • 59:45It's actually a lot of computation to like
  • 59:47do that 1000 times subsampling etcetera.
  • 59:49But what I have done is just do the analysis,
  • 59:51you know, with one data set and then
  • 59:53add more data sets and it seems
  • 59:55quite stable from that perspective.
  • 59:57That's not really the same because
  • 59:58we're not subtracting out the
  • 59:59first data set when we do that.
  • 01:00:01But but it's not like it varies all over
  • 01:00:03the place and the stability of course is
  • 01:00:06proportional to the prevalence, right?
  • 01:00:07Of that particular mutation,
  • 01:00:09the mutations that are really
  • 01:00:10highly prevalence, you know,
  • 01:00:11they stay very stable because we've got a lot
  • 01:00:13of examples of them with the other genes.
  • 01:00:15As soon as you get the lower prevalence,
  • 01:00:17it's it's a lot iffier.
  • 01:00:19So.
  • 01:00:19So really this can only be used right
  • 01:00:21now for these for the most prevalent
  • 01:00:23kinds of mutations that you see.
  • 01:00:25And typically we are for instance
  • 01:00:27assembling all the mutations in a given
  • 01:00:30gene as one kind of mutation because
  • 01:00:32we need that sample size to do that,
  • 01:00:34which is something that in my other
  • 01:00:35research I usually avoid because I think
  • 01:00:37it's really important to understand it.
  • 01:00:39Different sites have different effects.
  • 01:00:41So
  • 01:00:42one thing that that I didn't
  • 01:00:44see certain probably this.
  • 01:00:46So you can calculate an additional process
  • 01:00:49contribution to to the privatization
  • 01:00:52in particular individual cases.
  • 01:00:54But what happens if you caused the
  • 01:00:56cases and obviously you should be able
  • 01:00:57to sell it off lung cancers related
  • 01:00:59to smoking and those who don't and
  • 01:01:01that would be a trial thing to do.
  • 01:01:03But could you do the same and create
  • 01:01:05a new classification for example for
  • 01:01:07initial cancer, breast cancer that
  • 01:01:09are going to aging and the other?
  • 01:01:11By looking at them separately,
  • 01:01:12you might get some idea about
  • 01:01:15what's actually causing.
  • 01:01:16The.
  • 01:01:17The Unknown edition signature.
  • 01:01:22Yeah, I definitely think you
  • 01:01:23could cluster them. I think you
  • 01:01:25know the you're reducing the
  • 01:01:28dimensionality of the data when you
  • 01:01:31go from the raw data back to the
  • 01:01:34processes and so you have a reduced
  • 01:01:36dimensionality of that raw data.
  • 01:01:38And then you're and then if you were
  • 01:01:40to cluster on the basis of this,
  • 01:01:41you would be taking that reduced
  • 01:01:42dimensionality data and trying to
  • 01:01:43say does that predict something.
  • 01:01:44So I I think from a machine learning
  • 01:01:46standpoint you might want to just go
  • 01:01:48back to that broad data in some way,
  • 01:01:49but there might be some way
  • 01:01:51of thinking about it.
  • 01:01:51That I say that,
  • 01:01:52but then I also think there's
  • 01:01:53a second part of that,
  • 01:01:54which is that I do think you do better
  • 01:01:57looking at actual biological processes,
  • 01:01:59even if it involves some reduction
  • 01:02:01of the data,
  • 01:02:02because it simplifies the data in
  • 01:02:03a way that means you don't go off
  • 01:02:05on these random tangents of all
  • 01:02:06the noisy stuff you're looking at.
  • 01:02:08So, so there's, there's a,
  • 01:02:09I guess there's a tension I think
  • 01:02:11you should be wary of in doing that,
  • 01:02:13but I don't see any reason you couldn't
  • 01:02:14do that and and it would probably
  • 01:02:16be highly predictive in some cases.
  • 01:02:17You're probably going to see
  • 01:02:19most skin cancers very easily,
  • 01:02:20you know, predictive that way.
  • 01:02:21Because they're just UV all over the place.
  • 01:02:25Some other cancers are probably
  • 01:02:26quite hard to distinguish one from
  • 01:02:29the other just by the mutational
  • 01:02:31processes that underlie their cause,
  • 01:02:33and so I could imagine doing that.
  • 01:02:36We haven't done anything like that.
  • 01:02:41Any other comments?
  • 01:02:43To ask questions, then the audience on
  • 01:02:46there, there was, I thought,
  • 01:02:48Q&A, but I there it is.
  • 01:02:51We have time. Ohh. Yeah.
  • 01:02:53We've got some questions here,
  • 01:02:54but maybe one more for you and then
  • 01:02:55I'll go to the online questions.
  • 01:02:56Yes, OK. Thank you.
  • 01:02:58Thanks, Jeff. Fantastic work.
  • 01:03:03I think your methodology is on the right
  • 01:03:06track and nothing to worry about at all.
  • 01:03:09The opposite is true.
  • 01:03:12My only concern is availability
  • 01:03:15of data in the future,
  • 01:03:18especially for new types of cancers.
  • 01:03:21Are we asking the right questions?
  • 01:03:24Are we collecting the right data?
  • 01:03:27Be meaning human as humans. And.
  • 01:03:34I'd like us humans to ensure
  • 01:03:37that this data is available,
  • 01:03:39it's it's open source and it's reliable
  • 01:03:44and what are your thoughts on that?
  • 01:03:47Yeah, so that's a great question.
  • 01:03:49I mean I think that the volume of data
  • 01:03:51sets on like tumor Genesis for section
  • 01:03:53kind of data is going to increase very
  • 01:03:55well on its own like we don't need
  • 01:03:57to pay attention to that question.
  • 01:04:00The the datasets that I think I would like
  • 01:04:02to see more of are these multi sample data.
  • 01:04:04That's from individual patients.
  • 01:04:06Back in 2016,
  • 01:04:07I was lucky to be funded by Gilead
  • 01:04:09to actually sequence these large
  • 01:04:10numbers of metastatic and primary
  • 01:04:12tumors and they were really there.
  • 01:04:14The potential of those data
  • 01:04:16sets is really high,
  • 01:04:17especially if they have a
  • 01:04:19clinical annotations alongside.
  • 01:04:19So you can map it to to understand what
  • 01:04:21was happening for the patient at the
  • 01:04:23same time as what was happening genetically.
  • 01:04:25That data set though was
  • 01:04:27heterogeneous by cancer type, right?
  • 01:04:29And I haven't seen similar sized data
  • 01:04:31sets on individual cancer types gathered.
  • 01:04:34And it's not, you know,
  • 01:04:36it's a lot of money like it's a couple
  • 01:04:38$1,000,000 to do that sequencing,
  • 01:04:39but you could do that for
  • 01:04:41every cancer type for.
  • 01:04:43You know,
  • 01:04:43$30 million or something like that.
  • 01:04:45And I think that would be so worth
  • 01:04:47it because we would learn so much
  • 01:04:49about the evolutionary trajectory of
  • 01:04:50each of these cancer types by looking
  • 01:04:52at multi sample data like that.
  • 01:04:53But I haven't managed to sort
  • 01:04:55of put together the argument to
  • 01:04:57get funding to do that.
  • 01:04:58I encourage you to elevate that, you know.
  • 01:05:03Definitely. Yeah. Thanks. Like.
  • 01:05:08Just. Comments. I'm sorry.
  • 01:05:12OK. First, I enjoy your talk.
  • 01:05:14Thank you.
  • 01:05:14But I'm not so sure.
  • 01:05:17Given the tumor heterogeneity.
  • 01:05:20Your math, just the tumor cell.
  • 01:05:22We don't even talk about
  • 01:05:24the microenvironment.
  • 01:05:25Math sequence will really be useful.
  • 01:05:29With all the other tools.
  • 01:05:32You know, otherwise you're going to.
  • 01:05:34For instance,
  • 01:05:35you just mentioned about the cluster.
  • 01:05:37Approach.
  • 01:05:39You can have a mutation in different
  • 01:05:41tumor cells within the tumor mass.
  • 01:05:46When you do the analysis,
  • 01:05:47you put them all together.
  • 01:05:51Does that make sense?
  • 01:05:53I think I might need to talk to you
  • 01:05:55at more length to sort of fully
  • 01:05:56understand your question, but but I
  • 01:05:58guess what I would comment is just that.
  • 01:06:00And I say this is the kind of data we need.
  • 01:06:02I'm mostly talking about for the kind of
  • 01:06:04work that I'm talking about rather than
  • 01:06:06for everything to solve cancer, of course.
  • 01:06:08So, but but in order to understand the
  • 01:06:09underlying selective coefficients and
  • 01:06:11understand the mutational processes,
  • 01:06:12I do think large amounts of.
  • 01:06:15Tumor resection data which will
  • 01:06:16be gathered anyway,
  • 01:06:18but also more of this multi sample data
  • 01:06:20so that we can understand dynamically
  • 01:06:21over time what's happening which we can't.
  • 01:06:24We can do, I said in a probabilistic way,
  • 01:06:26but never in a very satisfying way with
  • 01:06:28just the tumor genesis resection data.
  • 01:06:32It makes the noise that the tumor
  • 01:06:34cellularity differences we bring in and
  • 01:06:37I think it also remains you are gorgeous
  • 01:06:39question about the copy number changes.
  • 01:06:42So how do you adjust,
  • 01:06:43what is that you know if it has
  • 01:06:4517 copies of imitation it has that
  • 01:06:48signature that will be amplified.
  • 01:06:50And it's not necessarily black
  • 01:06:52would be the actual sometimes
  • 01:06:54higher prevalence of contribution
  • 01:06:55of the particular audition process,
  • 01:06:58but it's just that the gene.
  • 01:07:01I see these questions about the the.
  • 01:07:04So the adjacent normal tissues
  • 01:07:07requires mutations and they
  • 01:07:09actually introduce noise, right?
  • 01:07:13Yes. So both of those are sources of noise
  • 01:07:17in the sense that on average as we look at,
  • 01:07:21so the say talk about a gene amplification
  • 01:07:23for instance is a great example.
  • 01:07:24When you get a gene amplification,
  • 01:07:26you know the the mutation itself may not
  • 01:07:28be contributing the cancer effect size that
  • 01:07:30we analyze when we get this kind of data.
  • 01:07:32But what is true is that those
  • 01:07:34mutations and the amount of copy number
  • 01:07:37amplification that they typically have
  • 01:07:39contributes this amount because we're just
  • 01:07:41looking at whether or not we see these.
  • 01:07:44The patients and whatever other processes
  • 01:07:45are going on, we're averaging over. So.
  • 01:07:47So the cancer effect size is still I
  • 01:07:49would say it's still the measure of how
  • 01:07:51much that mutation is contributing to it.
  • 01:07:54But the means by which it contributes
  • 01:07:55we don't really know from this analysis.
  • 01:07:57It's a it's just that wider question of
  • 01:07:59how much is this variant contributing
  • 01:08:01and and if it needs amplification
  • 01:08:03as part of that process,
  • 01:08:05well then we need to do a more
  • 01:08:07detailed analysis that looks both at
  • 01:08:08amplification and the and the mutation
  • 01:08:10and then we'll be able to say like how
  • 01:08:12important that mutation is in terms of.
  • 01:08:14Cancer affect how important the amplification
  • 01:08:15vacation is in terms of cancer effect
  • 01:08:18compared to the mutation itself.
  • 01:08:19That's not something we've
  • 01:08:20been able to do yet,
  • 01:08:21but it's something on our agenda.
  • 01:08:23It's very difficult but I think
  • 01:08:25it's achievable but very difficult.
  • 01:08:29I think I better quickly ask,
  • 01:08:30I feel sorry for the people
  • 01:08:32who ask questions online.
  • 01:08:34The one question is,
  • 01:08:36is mutation a biochemical reaction to
  • 01:08:39TR GRC a substitute of T or G or C?
  • 01:08:42The mutations I'm talking about
  • 01:08:43in this entire study were all
  • 01:08:45single new type mutations.
  • 01:08:46In the context of A3,
  • 01:08:49what I meant by trinucleotide context
  • 01:08:50is the 3 mutations in the central one.
  • 01:08:53How was that mutated to
  • 01:08:54another single nucleotide?
  • 01:08:56There are ways to look at doublets
  • 01:08:57there are ways to look at.
  • 01:08:58Some other more complicated indels which
  • 01:09:00we have in the lab almost implemented,
  • 01:09:03but other mutation types we don't
  • 01:09:06have actually looked at Yuval
  • 01:09:08Kluger's question I think thank you.
  • 01:09:10You have echoed that for for me on low.
  • 01:09:13So I believe I answered that.
  • 01:09:18That you know basically it's true
  • 01:09:20that we don't know the specific,
  • 01:09:23you know when we talk about this mutation
  • 01:09:25and how much is cancer effect sizes,
  • 01:09:27that's in the context of everything that
  • 01:09:29happens to that mutation in cancers
  • 01:09:30and it's the average across that.
  • 01:09:34But Tim Robinson has a question,
  • 01:09:36which is, can the spectrum of mutations
  • 01:09:38tell us about the chance that the
  • 01:09:40tumor will respond to treatment?
  • 01:09:45It may well, so for instance you know this,
  • 01:09:47the fact that there were cisplatin
  • 01:09:49mutations is going to tell you that it's
  • 01:09:52likely to have an EGFR T790M resistant
  • 01:09:54mutation sort of sitting there waiting
  • 01:09:55to come out when you give it a lot.
  • 01:09:57So in a sense that spectrum could
  • 01:09:59tell us about the chance that a
  • 01:10:00tumor was bound to treatment.
  • 01:10:02But in general if I could I would
  • 01:10:04rather look at look for EGFR
  • 01:10:06T790M itself directly for example,
  • 01:10:09if the vast majority of mutations are in
  • 01:10:11Melanoma and Melanoma are B rap 600 and the.
  • 01:10:13The vast memory of cancer
  • 01:10:15causation by mutation is there.
  • 01:10:17Does that inform the chance that the tumor
  • 01:10:19will respond to directed therapy to be wrap?
  • 01:10:24Umm, I think the, you know, the number
  • 01:10:27of mutations I don't think does at all.
  • 01:10:28I think that what's important to
  • 01:10:30understand about Viraf E7 and E and
  • 01:10:32it's cancer effect size, which by the
  • 01:10:34way is a very high cancer effect size,
  • 01:10:36is that if you can get a therapy that
  • 01:10:38treats the rap fee 600 effectively,
  • 01:10:40it will be a very effective therapy.
  • 01:10:42And there's a good example of that.
  • 01:10:44And there's a caveat to that example also,
  • 01:10:46which is that the raffish under 600 E,
  • 01:10:48as many people know, there's vemurafenib,
  • 01:10:50which is a very effective therapy
  • 01:10:52for skin cancer.
  • 01:10:52The only problem is there's.
  • 01:10:54Very rapid evolution of resistance.
  • 01:10:56Nothing about cancer effect tells you
  • 01:10:58how quickly resistance will be evolved,
  • 01:11:00and in that case this also interplays
  • 01:11:02with CNV's because at least one
  • 01:11:04of the explanations for why that
  • 01:11:06rapid rises occurs is that you get
  • 01:11:08amplification of the variant BRAF
  • 01:11:10V600E that basically overwhelms the
  • 01:11:12treatment of vemurafenib and means that
  • 01:11:14you and that's a very fast process.
  • 01:11:17Amplification of a gene in a genome
  • 01:11:19is not hard to do as a high mutation
  • 01:11:22rate happens very quickly.
  • 01:11:23Some cells have.
  • 01:11:24More of it somehow is less than
  • 01:11:25those ones with more selected.
  • 01:11:26It's very easy to select on that basis.
  • 01:11:28So so it I think it informs you
  • 01:11:30about how likely a treatment is
  • 01:11:32to have a big effect at the moment
  • 01:11:35you apply the treatment.
  • 01:11:36How quickly you evolve resistance
  • 01:11:38is another question.
  • 01:11:41Umm. And already moustaki the
  • 01:11:45sources of mutations smoking,
  • 01:11:47UV infection affect the normal non
  • 01:11:50transform tissues. Yes they do.
  • 01:11:52Can you use your approach to calculate
  • 01:11:53the cancer effect mutations on the tumor
  • 01:11:55micro movement have on tumorigenesis.
  • 01:11:57One might argue a lot of these mutation
  • 01:11:59sources act on the environment reducing
  • 01:12:00the fitness of a normal cell allowing the.
  • 01:12:02This is a really interesting question.
  • 01:12:03We are working on this.
  • 01:12:05So the the bottom line is that
  • 01:12:07and I'll be very quick with this
  • 01:12:09answer that once we are able to
  • 01:12:11figure out these cancer effects.
  • 01:12:13Then we can ask it to the extent
  • 01:12:15that we have annotated data on
  • 01:12:17this tumor was exposed to this
  • 01:12:19given treat this given environment,
  • 01:12:21we can ask how does that environment
  • 01:12:24affect the cancer effect.
  • 01:12:25So we can ask if you're,
  • 01:12:27if you have different ages,
  • 01:12:28not just what mutations are caused by aging,
  • 01:12:30but how much does the cancer effect of a
  • 01:12:32given mutation change as someone ages.
  • 01:12:34So there's ways to do that with
  • 01:12:35the kind of data we have.
  • 01:12:37Again it requires bigger sample
  • 01:12:40sizes in general,
  • 01:12:41but we're looking at that right now with.
  • 01:12:43Regard to smoking,
  • 01:12:44because smoking of course can have a
  • 01:12:46direct effect of mutating individual genes,
  • 01:12:48but it can also have a physiological
  • 01:12:51effect of degrading the normal cells
  • 01:12:53in general in the lung ecosystem.
  • 01:12:55And because you have degraded normal cells,
  • 01:12:57that could increase your chance
  • 01:12:58of getting cancer.
  • 01:12:59Or it could mean that certain
  • 01:13:00mutations are more likely to be
  • 01:13:02able to make cancer proliferate and
  • 01:13:03survive better than other mutations.
  • 01:13:05So.
  • 01:13:05So the Physiology could be very important,
  • 01:13:07and there are ways to get at that.
  • 01:13:09But you need to know this first,
  • 01:13:10and then you can ask the question
  • 01:13:12about Physiology affecting things.
  • 01:13:14And I think I'm out of time.