Skip to Main Content

INFORMATION FOR

Deep Learning and Cancer Outcomes

June 29, 2020
  • 00:00Support for Yale Cancer Answers
  • 00:02comes from AstraZeneca, working
  • 00:05side by side with leading
  • 00:07scientists to better understand how
  • 00:10complex data can be converted into
  • 00:16innovative treatments. More information at astrazeneca-us.com.
  • 00:17Welcome to Yale Cancer Answers with
  • 00:18your host doctor Anees Chagpar.
  • 00:21Yale Cancer Answers features the
  • 00:23latest information on cancer care
  • 00:24by welcoming oncologists and
  • 00:26specialists who are on the
  • 00:27forefront of the battle to fight
  • 00:29cancer. This week it's a
  • 00:31conversation about deep learning
  • 00:33and cancer outcomes with Doctor
  • 00:34Sanjay Aneja. Doctor Aneja is
  • 00:37an assistant professor of
  • 00:38therapeutic radiology at the Yale
  • 00:40School of Medicine, where doctor
  • 00:42Chagpar is a professor of
  • 00:44surgical oncology.
  • 00:45Maybe we can start
  • 00:47off by you telling us a little
  • 00:49bit about yourself and about
  • 00:50your research?
  • 00:51Sure, I've been in New Haven since 2009.
  • 00:54I was actually a Yale medical
  • 00:56student and I stayed here for
  • 00:57my residency and now on faculty.
  • 00:59Clinically, I am a radiation
  • 01:01Oncologist and I primarily treat
  • 01:02tumors of the central nervous system.
  • 01:04So brain tumors as well
  • 01:06as some prostate cancer.
  • 01:07But I also run a large research group which
  • 01:09is primarily focused on Applied Mathematics.
  • 01:11My background is in applied mathematics, it's
  • 01:13always something that I was very interested in.
  • 01:15And what we're particularly interested
  • 01:17in in my lab is looking at the utility
  • 01:20of machine learning techniques,
  • 01:22specifically deep learning in
  • 01:24improving cancer outcomes and
  • 01:25modeling cancer processes.
  • 01:26I was with you
  • 01:28all the way
  • 01:30up to applied mathematics
  • 01:32and machine learning and deep learning
  • 01:34and all of that sounds really deep.
  • 01:37But can you break it down
  • 01:39into simple terms for us?
  • 01:41What exactly are you doing?
  • 01:44That's a good question,
  • 01:45I think that we're very interested in that
  • 01:48we have such a large amount of healthcare
  • 01:50data that's currently available to us
  • 01:53that's been kind of cultivated over
  • 01:54years of managing cancer patients,
  • 01:56and what we'd like to do is develop
  • 01:58methods to better model that data so
  • 02:00that we can kind of use that information
  • 02:03to better improve the health care
  • 02:05outcomes for cancer patients moving forward.
  • 02:07Machine learning is just one
  • 02:08way in which we do that.
  • 02:10Traditionally,
  • 02:10the way in which we used to
  • 02:13model cancer as a disease
  • 02:14it's always been a very difficult process.
  • 02:16We would look at factors
  • 02:19that physicians thought were important,
  • 02:21and then we put them in a model
  • 02:23and then we kind of look at an
  • 02:25average and that had a couple
  • 02:28different problems associated with it.
  • 02:30One is that physicians aren't really
  • 02:32great at predicting what factors
  • 02:33are actually associated with cancer.
  • 02:35It's a really complex disease.
  • 02:36The second thing is that cancer
  • 02:38is very difficult to model,
  • 02:40and so using some of these techniques that
  • 02:42we were developing a number of years ago,
  • 02:45those techniques weren't
  • 02:46necessarily as effective.
  • 02:47Machine learning is sort of an
  • 02:49advanced form of modeling data.
  • 02:50What it does is it takes in all
  • 02:53types of data,
  • 02:53so it doesn't really require the
  • 02:55physician to make any sort of choices
  • 02:57about what type of data to include,
  • 02:59and it allows us to model very
  • 03:01complex processes like cancer,
  • 03:02and there's been a lot of effort
  • 03:04within our group to show that machine
  • 03:06learning methods are probably the best
  • 03:07way in which we can model cancer outcomes,
  • 03:10and so that's what we're particularly
  • 03:11interested in.
  • 03:13Can you give us an example of how you did that in your lab?
  • 03:18The concept of taking large amounts of data variables
  • 03:21of various sorts that physicians may
  • 03:23or may not think are relevant at all,
  • 03:25giving it to a computer and saying,
  • 03:28hey, look at all of this data and see
  • 03:31whether or not any of these things,
  • 03:33or a combination of these things may
  • 03:36actually predict a particular outcome.
  • 03:38Am I on the right track?
  • 03:40Yeah, definitely.
  • 03:41I think the one benefit of machine learning,
  • 03:44compared to more traditional techniques
  • 03:46of modeling cancer data is that it allows
  • 03:48us to look at all the various resources.
  • 03:53An example of a project that
  • 03:54we've done in our lab is trying to
  • 03:57model the outcomes for patients
  • 03:58with early stage lung cancer.
  • 04:00And so what we do in clinical practice
  • 04:02is that we look at various different
  • 04:04things when we're trying to model
  • 04:05early stage lung cancer patients.
  • 04:07We look at certain demographic
  • 04:09variables like your age and if they're
  • 04:11smoking and things of that nature,
  • 04:12we also look at the images to see
  • 04:14how big the tumor looks and whether
  • 04:16or not it's close to any structures
  • 04:18that we're worried about.
  • 04:20And then we also look at
  • 04:22what are treatment plans are
  • 04:23and so how well we can deliver radiation
  • 04:25to treat those and those are kind
  • 04:28of three different data sources of
  • 04:29sorts that we use in clinic in order
  • 04:32to determine whether or not a patient
  • 04:33will have a good outcome or bad outcome.
  • 04:35And currently the only models that
  • 04:37we actually have to tell patients
  • 04:38how their outcomes would be are
  • 04:40using only demographic variables,
  • 04:41and so they're not really using the pictures,
  • 04:43and they're not really using the
  • 04:45treatment planning information,
  • 04:46and the reason for that is because
  • 04:47that data isn't necessarily
  • 04:48something that you can put into
  • 04:50some of those traditional models.
  • 04:52What we did is we developed a deep
  • 04:54learning machine learning algorithm.
  • 04:56So it's an algorithm that takes
  • 04:57the rawest form of the data from
  • 04:59the electronic medical record,
  • 05:00it pulls that demographic data from
  • 05:02the electronic medical record,
  • 05:03it takes every pixel from every
  • 05:05picture of the tumor and analyzes
  • 05:07those pixels in a very unique way.
  • 05:09And then it also looks at every little
  • 05:11part of our radiation treatment
  • 05:12plan down to the pixel level.
  • 05:14It kind of coalesces all that information
  • 05:16and derives a personalized prediction,
  • 05:17which we found was better than
  • 05:20sort of getting an average based
  • 05:21on just the demographic variables
  • 05:23alone. So basically it's
  • 05:25taking all of this information,
  • 05:26the clinical information that
  • 05:28most clinicians would use,
  • 05:31the image Ng that they also use,
  • 05:33but that they can't really put
  • 05:35into a model because it's hard to
  • 05:37define, like I see a big tumor,
  • 05:40it looks like it's encasing
  • 05:42some important vessels,
  • 05:43but how do I really put that into
  • 05:45a model and the treatment
  • 05:47plan and can tell you kind of
  • 05:50trying to be a clinician because
  • 05:52a clinician will kind of look at
  • 05:55that and have a good shift alt of.
  • 05:58This patient will do well.
  • 05:59This patient won't do so well and
  • 06:01the computer can kind of give
  • 06:03you that in a more quantitative
  • 06:05way.
  • 06:06Yeah, I think that you're touching
  • 06:07upon one of the big advantages
  • 06:09of these sorts of techniques.
  • 06:11So one is that there an objective form,
  • 06:13and so it's not necessarily
  • 06:14utilizing one physicians experiences,
  • 06:15or other physician experiences trying
  • 06:17to use everyone's collective experience
  • 06:18of analyzing data in an objective way.
  • 06:20The other thing I think you're kind of
  • 06:22touching upon is this idea of we're
  • 06:24trying to mimic the same set of
  • 06:26predictions that physicians make.
  • 06:28And that's another reason why a lot
  • 06:29of people are very interested in
  • 06:31machine learning is because there
  • 06:33is this component of artificial
  • 06:34intelligence that can be kind of
  • 06:37created when you are able to look
  • 06:39at data sources without
  • 06:41choosing which variables to evaluate,
  • 06:43and so
  • 06:46in this project where you
  • 06:48were looking at outcomes of
  • 06:50early lung cancer and giving a
  • 06:52machine the demographic data,
  • 06:54the smoking data, the imaging data,
  • 06:56the treatment plan data,
  • 06:57and you found that it was able to
  • 07:00predict outcomes in terms of survival
  • 07:03or in terms of recurrence.
  • 07:05Yes, we found that it was able to predict
  • 07:07recurrence in various different ways,
  • 07:09survival, and when we compared it
  • 07:11to maybe just using traditional
  • 07:12methods or just one data stream,
  • 07:14we found that it outperformed all of
  • 07:16those different methods and so this
  • 07:18idea of combining everything together
  • 07:19is very very essential and we
  • 07:21know it's very intuitive for
  • 07:23clinicians to realize you have to do that,
  • 07:25but I think that it's important
  • 07:27to be able to
  • 07:28do that in a mathematical way as well.
  • 07:30Did it outperform the
  • 07:31best guess of a clinician?
  • 07:32So instead of looking at just
  • 07:34traditional models, we know that
  • 07:36clinicians sometimes bring their own
  • 07:37experience and expertise to the equation.
  • 07:39Did you compare the machine
  • 07:40learning to clinicians best guess,
  • 07:42how well patients would do or not do?
  • 07:44Yeah, so one thing that we've done
  • 07:46is we've
  • 07:47done studies that are
  • 07:49looking at that, and similarly,
  • 07:50what we've done is had multiple
  • 07:52physicians do their best guess,
  • 07:53and the first thing that's important to
  • 07:55know is that physicians don't guess the same,
  • 07:58and so there's actually not a gold
  • 07:59standard for a way in which a physician
  • 08:02would actually evaluate a patient.
  • 08:03And we found that it performed at least
  • 08:05as well as an experienced clinician,
  • 08:08and better than
  • 08:09maybe less experienced clinicians.
  • 08:12In a sense,
  • 08:14you're recreating with this machine
  • 08:16learning the predictive
  • 08:17ability of an experienced clinician.
  • 08:19So how is that now being utilized?
  • 08:22Or is it being utilized in the clinic?
  • 08:24I think that one of the
  • 08:27benefits of the platform that
  • 08:29we've developed is that it doesn't
  • 08:31actually require us to pull data and
  • 08:34put it into a calculator of sorts,
  • 08:36which is a lot of what we
  • 08:38see with a lot of predictive
  • 08:40things in cancer.
  • 08:41And so what we're interested in right
  • 08:43now is trying to connect ours to
  • 08:45the electronic medical record.
  • 08:46We've developed an iPhone application
  • 08:47which allows us to basically put
  • 08:50in the medical record number of a patient,
  • 08:52and then it allows us to pull the data
  • 08:54natively and then it allows us to kind
  • 08:56of develop that prediction in the clinic,
  • 08:58and that's the next step
  • 09:00of what we're trying to do.
  • 09:01But I think the other thing that's
  • 09:03really important whenever we're
  • 09:04thinking about these machine learning
  • 09:06algorithms is because they are so
  • 09:08good at modeling healthcare data,
  • 09:09they tend to actually model it
  • 09:11too well, and so that's what we call overfit.
  • 09:13The data set,
  • 09:14and so they sort of are very good
  • 09:16at modeling Yale data for example,
  • 09:18but they might not be so good
  • 09:20at modeling data from
  • 09:21I don't know Chicago,
  • 09:22and what we're trying to do
  • 09:24also is something called external
  • 09:25validation where we send our model
  • 09:27to different cancer
  • 09:28centers across the country and say,
  • 09:30don't even tell us what the outcomes are,
  • 09:32we will tell you what our models are predicting,
  • 09:34and then you tell us how good
  • 09:36our model did so far.
  • 09:38Our collaboration with Jefferson
  • 09:39in Philadelphia has shown that
  • 09:40the model is very productive.
  • 09:42And it's maintaining that same performance,
  • 09:43but it's important to
  • 09:45test these models before we actually
  • 09:47put them into clinical practice.
  • 09:49How well something can
  • 09:52predict is based on how well we learned and on
  • 09:55the learning set that it had to work with.
  • 09:58It makes sense, however, that the data
  • 10:00that it was getting was objective data.
  • 10:03It wasn't getting
  • 10:05data that may have had a lot of
  • 10:07factors that were subjective, right?
  • 10:10You were looking at imaging.
  • 10:11Well, the image is what it is and if you give
  • 10:15that image to two different radiologists,
  • 10:17they both should say
  • 10:19roughly the same thing,
  • 10:20maybe not exactly, but roughly,
  • 10:22and so you'd think that the
  • 10:26Jefferson images are going to be
  • 10:28very much like Yale images and
  • 10:31so that may account for that close
  • 10:34correlation between the two datasets.
  • 10:37But the next question is OK,
  • 10:39let's suppose that the model after you
  • 10:41test it and I don't want to
  • 10:43minimize the utility of making sure
  • 10:45that it's externally generalizable.
  • 10:47Is it even being used here at Yale,
  • 10:50where it was developed and
  • 10:52it does well in terms of predicting
  • 10:55outcomes as well as an experienced clinician?
  • 10:58Is that being used in the clinic?
  • 11:00Are you putting in this data
  • 11:03you've got now,
  • 11:04this iPhone application that can
  • 11:05pull in this data into this model?
  • 11:08The model can do its magic and tell you,
  • 11:11this is the recurrence rate.
  • 11:13This is the survival rate.
  • 11:14Are you using that in the clinic,
  • 11:16and if so how?
  • 11:18I think that we're in the process
  • 11:20of developing the application.
  • 11:21One of the big hurdles and with health
  • 11:23care in general is the ability to
  • 11:25actually get access to the electronic
  • 11:27medical record in a way in which
  • 11:29you can make an application kind
  • 11:30of seamlessly integrate into it.
  • 11:32And so it's somewhat difficult
  • 11:33for us to do that.
  • 11:34We're working with
  • 11:36a software engineering firm
  • 11:37to actually help us with
  • 11:38beyond the scope
  • 11:40of what our lab does typically.
  • 11:42And so that's where we're
  • 11:44at right now with respect to actually
  • 11:46integrating into clinical practice.
  • 11:47I think that right now what we
  • 11:49have is we have an ability to
  • 11:51kind of look back on patients,
  • 11:53and if there was a patient, for example,
  • 11:55who wanted to have a prediction,
  • 11:57we could actually generate that.
  • 11:58But we cannot do it in the
  • 12:00electronic medical record.
  • 12:01As of right now,
  • 12:03and so when you think about
  • 12:05the potential utility of this,
  • 12:06where do you see it
  • 12:07going?
  • 12:08So I think that one thing that I think
  • 12:11is very important is as we're kind of
  • 12:13developing so many different genres
  • 12:14of treatment for cancer patients,
  • 12:16there's this increasing need for us
  • 12:18to develop methods to risk stratify
  • 12:20them and identify the highest risk.
  • 12:21Patients who maybe would be benefiting
  • 12:23from more aggressive treatment,
  • 12:25more aggressive followup,
  • 12:26and similarly,
  • 12:26I think we've found with some types of
  • 12:29cancers that maybe we've been a little
  • 12:31bit too aggressive in our follow-up
  • 12:33or too aggressive with our therapy
  • 12:35and to risk stratify
  • 12:36which of those patients would be
  • 12:38most useful for certain intervention
  • 12:40versus another one is something
  • 12:42that I think are our algorithm
  • 12:43or our platform is very useful for
  • 12:45especifically for early stage
  • 12:47lung cancer patients.
  • 12:47There's currently a clinical trial
  • 12:49evaluating whether or not those
  • 12:50patients should get radiation and then
  • 12:52additional treatment on top of that.
  • 12:54Because there's this idea that
  • 12:55potentially additional immunotherapy,
  • 12:56for example, would be helpful
  • 12:57for those patients and a large
  • 12:59amount of them may not need that
  • 13:00because they are already going
  • 13:02to have great outcomes anyways,
  • 13:03and a large amount of them
  • 13:05maybe would need that,
  • 13:06and they should get it
  • 13:07maybe
  • 13:09right after treatment,
  • 13:10before we even know how the outcomes are,
  • 13:12and so if we can identify which
  • 13:14patients are those high risk patients
  • 13:15versus those low risk patients,
  • 13:17I think we could potentially tailor
  • 13:19our treatments and better understand
  • 13:22the way in which we can
  • 13:25personalize care based on someone's
  • 13:27images and everything else.
  • 13:29I get that concept of,
  • 13:31you know, risk stratification,
  • 13:33especially for additional therapy.
  • 13:34What would be interesting though,
  • 13:36is to really look at how do
  • 13:39patients do without any therapy?
  • 13:41How do patients do with
  • 13:43therapy X versus therapy Y?
  • 13:45And how can we really
  • 13:47personalize therapies
  • 13:48given the data that we have?
  • 13:50We are going to take a short break for a medical minute,
  • 13:53but when we come back we will
  • 13:55answer those questions.
  • 13:57So stay tuned to learn more
  • 13:59about deep learning in
  • 14:00cancer outcomes with my
  • 14:02guest doctor Sanjay Aneja.
  • 14:04Support for Yale Cancer Answers
  • 14:05comes from AstraZeneca,
  • 14:07a biopharmaceutical business that
  • 14:08is pushing the boundaries of science
  • 14:11to deliver new cancer medicines. More
  • 14:14information at astrazeneca-us.com.
  • 14:15This is a medical minute about
  • 14:17pancreatic cancer, which represents
  • 14:19about 3% of all cancers in the US and
  • 14:22about 7% of cancer deaths.
  • 14:24Clinical trials are currently being
  • 14:27offered at federally designated
  • 14:28comprehensive cancer centers
  • 14:30for the treatment of advanced
  • 14:31stage and metastatic pancreatic
  • 14:33cancer using chemotherapy
  • 14:34and other novel therapies like FOLFIRINOX
  • 14:36a combination of five
  • 14:39different chemotherapies is the
  • 14:40latest advances in the treatment
  • 14:43of metastatic pancreatic cancer,
  • 14:44and research continues
  • 14:46in centers around the world
  • 14:47looking into targeted therapies.
  • 14:49And a recently discovered marker
  • 14:52hENT 1. This has been a medical
  • 14:54minute brought to you as a public
  • 14:57service by Yale Cancer Center.
  • 14:59More information is available at
  • 15:02yalecancercenter.org. You're listening
  • 15:03to Connecticut public radio.
  • 15:05Welcome
  • 15:06back to Yale Cancer Answers.
  • 15:07This is doctor Anees Chagpar
  • 15:10and I'm joined tonight by my
  • 15:12guest doctor Sanjay Aneja we're
  • 15:14discussing deep learning in cancer
  • 15:16outcomes and right before the break,
  • 15:18doctor Aneja was telling us
  • 15:20about how he and his lab have
  • 15:23really used machine learning.
  • 15:25That is to say,
  • 15:26Applied Mathematics and complicated
  • 15:28computational models to really take in
  • 15:30lots and lots of data that clinicians
  • 15:32using their usual clinical guess to
  • 15:34predict outcomes for cancer patients.
  • 15:36And Sanjay, the example that
  • 15:39you gave us in early lung cancer
  • 15:42where you said,
  • 15:43the demographics
  • 15:45we took the imaging data,
  • 15:47but then we took the treatment plan
  • 15:50and we use that and we ask the computer
  • 15:53to look at these things down to the
  • 15:57pixel level and then predict outcomes.
  • 15:59And it was very good at predicting those
  • 16:02outcomes as good as an experienced clinician,
  • 16:05and that's great, but
  • 16:07my question is
  • 16:09what happens if you
  • 16:12don't?
  • 16:12How do you get rid of the treatment
  • 16:16part of that and predict outcomes
  • 16:18without treatment to kind of get at
  • 16:21the idea of, are we over treating
  • 16:23some patients?
  • 16:24Because if the computer doesn't have that,
  • 16:27if all patients are treated,
  • 16:29and that's the basis on which it learned,
  • 16:32how do you take out one part of that model?
  • 16:36That's a good question,
  • 16:37and it's something that I
  • 16:39think is not impossible.
  • 16:40So I think that if we remove the
  • 16:42treatment piece of it and just look at
  • 16:44the images and the demographic data,
  • 16:47basically pretreatment information,
  • 16:48we find the model is actually
  • 16:50quite predictive as well.
  • 16:51It just improves significantly if we
  • 16:52know exactly what types of treatments
  • 16:54we provided for the patients,
  • 16:56and so another example of a study that
  • 16:57we've done which only uses pretreatment
  • 16:59imaging has been evaluating lymph nodes
  • 17:01in head and neck cancer patients.
  • 17:03We were attempting to look
  • 17:04at which lymph nodes we saw
  • 17:06on CT imaging actually had the
  • 17:08presence of cancer and we wanted to
  • 17:10identify that so that maybe you know
  • 17:12what we could do is more tailor
  • 17:13the therapy for head and neck cancer
  • 17:15patients. Oftentimes with head and neck cancer patients
  • 17:17when we think that their lymph
  • 17:18nodes don't have cancer,
  • 17:19we have them undergo surgery
  • 17:21and then we find
  • 17:22those lymph nodes have cancer.
  • 17:24They have to get radiation
  • 17:25and chemotherapy altogether.
  • 17:26And so if we were able to identify
  • 17:28the patients ahead of time that
  • 17:29have cancer in the lymph nodes than
  • 17:31what they would have instead is
  • 17:33just chemotherapy and radiation.
  • 17:34They save themselves some surgery and
  • 17:36so that's an example of when we've
  • 17:38used pretreatment imaging to sort of
  • 17:39reduce potential extra care or care
  • 17:41that maybe would not be necessary
  • 17:44or could have been avoided.
  • 17:46Do we have datasets with patients who
  • 17:49were treated in different ways so that
  • 17:52we can predict given pretreatment data,
  • 17:54if you got treatment a,
  • 17:56you will do this well,
  • 17:58if you got treatment b
  • 17:59you will do that well and if you got
  • 18:02treatment c you will do this well.
  • 18:05And if you got no treatment you would
  • 18:08do just as well as any of the above.
  • 18:12Yeah, so that actually kind of
  • 18:13touches upon something that we're
  • 18:15really actively exploring and NOTE Confidence: 0.904884934425354
  • 18:16something that we're very excited about.
  • 18:18So one thing that
  • 18:19we've kind of indicated,
  • 18:21is that these machine
  • 18:22learning algorithms, these deep
  • 18:23learning algorithms are extremely
  • 18:24good at analyzing pictures,
  • 18:26and so one thing that we've looked at is
  • 18:28this idea of what we call digital Twins.
  • 18:30So, based on your pretreatment imaging,
  • 18:32nothing else no demographics,
  • 18:33just what your tumor looks like,
  • 18:35if we could find your digital twin
  • 18:37or someone who's tumor looks exactly
  • 18:38like yours or digital family,
  • 18:40which is maybe a group of five
  • 18:42people that are like that,
  • 18:43we can use deep learning to do that.
  • 18:45And then what we can do is we can
  • 18:48see, OK among your digital twins,
  • 18:50or your digital family
  • 18:51who got one type of treatment,
  • 18:53this is what their outcome was
  • 18:55and among your digital family who
  • 18:57got another type of treatment,
  • 18:58this is what their outcome was.
  • 19:00And then they
  • 19:01can make a more informed decision about
  • 19:03what they would actually want to do.
  • 19:05We talk a lot on this
  • 19:07show about tumor heterogeneity and
  • 19:09different kinds of cancer and tumor
  • 19:11biology and all kinds of other things
  • 19:13that
  • 19:15seemed to really affect
  • 19:17biology, they seem to affect outcomes,
  • 19:19so how is it that you can use that
  • 19:21data without knowing all of the
  • 19:23other things?
  • 19:25I think for one, there's a lot of
  • 19:28evidence to suggest that deep learning
  • 19:29algorithms can actually predict changes
  • 19:31in tumors up to the genomic level,
  • 19:33so genetic mutations and tumor is based
  • 19:35off of the pictures because we have to
  • 19:37appreciate that their really evaluating
  • 19:39every tumor at a very, very small level,
  • 19:41every little pixel and each pixel has
  • 19:43a variety of different intensities,
  • 19:45and so they're really looking
  • 19:46at the data in a close level.
  • 19:48So in lung cancer as well as in brain tumors,
  • 19:52an also in some lymphomas there's
  • 19:54been evidence to suggest that deep
  • 19:55learning algorithms on the diagnostic
  • 19:57images can predict genomic changes.
  • 19:59So like driver gene mutations
  • 20:02that would actually maybe presumably need
  • 20:05sequencing information for and so then
  • 20:07that suggests that the pictures actually
  • 20:08have a lot more information than we think,
  • 20:11but I do think that you're kind of right
  • 20:14in one way that maybe it's not just the
  • 20:17pictures that tell the whole story.
  • 20:19But the idea is that people
  • 20:21whose tumors look similar,
  • 20:23they likely have similar genomic backgrounds.
  • 20:25I'm still puzzled by
  • 20:27this whole concept because
  • 20:29we think about a CT scan or a mammogram,
  • 20:33or an MRI and each of these
  • 20:35has its own pitfalls.
  • 20:37There are false positives on these images,
  • 20:39there are false negatives on these images,
  • 20:42so it kind of makes me a little wary to
  • 20:45put so much faith just in the images.
  • 20:49How do you explain
  • 20:50that?
  • 20:52I guess one of the limitations or one
  • 20:54of the important caveats to any sort of
  • 20:57machine learning project is that your
  • 20:59outcomes are only as good as your data,
  • 21:01and so if we have a lot of false
  • 21:03positives in our data set that we have
  • 21:06not addressed and we haven't identified,
  • 21:08then I think that it's really important
  • 21:10that we understand that the machine learning
  • 21:12algorithm will learn those same errors.
  • 21:14So similarly any sort of biases that we have,
  • 21:16maybe we have a bias towards over
  • 21:19imaging or over
  • 21:21diagnosing something on an image,
  • 21:22those similar biases will be promulgated
  • 21:24through our machine learning algorithms.
  • 21:26It's actually somewhat of an interesting
  • 21:28topic in the context of machine learning
  • 21:30outside of healthcare is that we
  • 21:31find that a lot of machine learning
  • 21:33algorithms they mimic the same biases and
  • 21:36discriminatory abilities
  • 21:37that people have in regular practice,
  • 21:40and so a lot of algorithms that
  • 21:42have been used in law enforcement,
  • 21:44we find that are actually maybe
  • 21:46promulgating some of the parts
  • 21:48of our law enforcement
  • 21:50that we don't want.
  • 21:52One way to kind of get around that
  • 21:55one would think is to use more data,
  • 21:58not just the images,
  • 22:00but get down to the genomic level,
  • 22:02do the biopsy. We've got biopsies on
  • 22:05most tumors before we ever treat them,
  • 22:07so look at the pathologic information,
  • 22:10look at the genomic information.
  • 22:11We can get a lot of
  • 22:14sequencing data these days.
  • 22:16And speaking of which,
  • 22:17it's really hard to understand what all
  • 22:19of these different mutations are when
  • 22:21we think about whole exome sequencing.
  • 22:23I mean, I would think that
  • 22:26machine learning might have
  • 22:27a role to play there too.
  • 22:29Certainly, I think that machine learning
  • 22:31in general is probably one of the more
  • 22:34common approaches to evaluate genomic data.
  • 22:36Now, because the genome is so complex
  • 22:38and it's so difficult for us to kind of
  • 22:41understand that the machine learning
  • 22:44algorithms are maybe the most common
  • 22:46ways in which we analyze that sort of
  • 22:49information now,
  • 22:50specifically with respect to deep learning,
  • 22:52which is what our lab is,
  • 22:55particularly just specific types
  • 22:56of machine learning methods.
  • 22:57It's somewhat difficult to evaluate
  • 22:59genomic information with that data,
  • 23:00and the reason
  • 23:02for that is
  • 23:05because we don't actually have a
  • 23:07huge data set right now at our
  • 23:09disposal of genomic information,
  • 23:11because in order to do some of these
  • 23:13analysis, we need upwards of 1000 patients,
  • 23:15and so it's difficult to get 1000
  • 23:18patients with tumors, images, and
  • 23:19whole exome sequencing,
  • 23:21but it's possible,
  • 23:22and I would
  • 23:24venture that a place
  • 23:26like Yale is the place
  • 23:28that would have that ability to do that or
  • 23:31or some of these cooperative groups, right?
  • 23:34For our listeners,
  • 23:37there are clinical trials that happen
  • 23:39all across the country, sometimes all
  • 23:41across the world with cooperative groups.
  • 23:44These groups of clinicians,
  • 23:45physicians who are all putting their
  • 23:48patients on exactly the same clinical trial
  • 23:50and taking their data,
  • 23:52putting it in a central repository
  • 23:54where all of that can be studied.
  • 23:56Sanjay, I would think that
  • 23:58would be an ideal place for you
  • 24:01to get that data.
  • 24:03Yes, and one effort of our research
  • 24:05group is actually sort of engaging
  • 24:07with the cooperative groups.
  • 24:08There's two that we've
  • 24:10begun engaging with, the NRG
  • 24:12which is a large group that has a
  • 24:14lot of radiation data as well as the
  • 24:17Southwest Oncology Group also
  • 24:19known as SWOG in order to sort of develop
  • 24:21an infrastructure within the
  • 24:23organization to evaluate machine
  • 24:24learning techniques and utilized
  • 24:25machine learning techniques.
  • 24:26Because a lot of what
  • 24:28we've designed, these clinical
  • 24:29trials and these repositories is that
  • 24:31these cooperative groups,
  • 24:33their infrastructure wasn't made for
  • 24:34these sorts of analysis because they
  • 24:36weren't necessarily thinking that
  • 24:38this is something that was going to
  • 24:40come on the horizon.
  • 24:42And so one thing that we're working
  • 24:44right now is with SWOG and NRG
  • 24:46to develop that sort of infrastructure.
  • 24:48The first process of that is
  • 24:50developing something that allows us
  • 24:52to get the imaging data very easily.
  • 24:54Images are sort of an easy
  • 24:57method for us to evaluate machine
  • 24:59learning methods because one,
  • 25:01it's been shown to be the most effective
  • 25:03in image analysis across various industries,
  • 25:05healthcare, technology,
  • 25:06etc.
  • 25:06and secondly imaging in healthcare
  • 25:08has a standardized data format.
  • 25:10It's a common data model,
  • 25:12so there's no difficulty about well,
  • 25:14so and so in California stores
  • 25:16their data one way,
  • 25:17and then we store it a different way, etc.
  • 25:20One thing that you mentioned,
  • 25:23which I still have to go back to, is
  • 25:26you said that you're
  • 25:27interested in deep learning,
  • 25:28which is a type of machine learning that
  • 25:31is particularly well suited to imaging.
  • 25:34Tell us the difference between
  • 25:35deep learning and machine learning.
  • 25:37Yeah, that's a
  • 25:38good question,
  • 25:40and the words,
  • 25:41artificial intelligence,
  • 25:42machine learning,
  • 25:43and deep learning.
  • 25:44sort of get thrown around together,
  • 25:46and it's difficult to parse them out.
  • 25:48I think that machine
  • 25:50learning is a broad
  • 25:51discipline of various types of
  • 25:53mathematical techniques to model data.
  • 25:54Deep learning is just one
  • 25:56of those techniques.
  • 25:57Now the difference between deep
  • 25:59learning and other traditional machine
  • 26:00learning techniques is that other
  • 26:02machine learning techniques require,
  • 26:03you know inputs that are called
  • 26:05features and so they can only handle
  • 26:07data that comes in a featured format.
  • 26:10So sort of predictor variables that
  • 26:11you're interested in demographic
  • 26:13variables or variables from
  • 26:14the electronic medical record.
  • 26:15Deep learning is particularly
  • 26:17unique in that it doesn't actually
  • 26:19require data at all from a human.
  • 26:21It doesn't require any sort
  • 26:22of human interaction.
  • 26:23It can learn those features on
  • 26:25its own as long as it has access
  • 26:27to what they called the sensor.
  • 26:29So where the data is generated and as data
  • 26:32is being generated in real time,
  • 26:34deep learning algorithms can analyze it,
  • 26:37identify those features that
  • 26:38are very important,
  • 26:39so those predictors are important and then
  • 26:43create predictions.
  • 26:44How exactly does this happen?
  • 26:46Somebody's gotta
  • 26:48program this thing right?
  • 26:50Yes, it is programmed,
  • 26:52typically in Python and so the way
  • 26:55that the process works for developing
  • 26:57a deep learning algorithm is first
  • 27:00you have a set of training data.
  • 27:02And the associated labels to that data.
  • 27:04So you already have data
  • 27:06with outcomes that you know,
  • 27:07and that's the process you're
  • 27:09trying to predict and then what you
  • 27:11do is you design your deep learning
  • 27:13algorithm using a complex series
  • 27:14of what they call neural networks
  • 27:16and what we do is we kind of train
  • 27:18the algorithm by looking at each of
  • 27:21those training data set labels to
  • 27:22identify sort of patterns in the data.
  • 27:24And it takes a significant amount of
  • 27:26time and a huge amount of computational
  • 27:28resources in order to do that.
  • 27:30So deep learning algorithms that
  • 27:31we developed in our lab oftentimes
  • 27:33take weeks to train.
  • 27:34And so we just let it run all week
  • 27:37and let it run every single time
  • 27:39and every piece of information
  • 27:42that it gets
  • 27:43it looks at the
  • 27:44outcome and it tries to learn
  • 27:46a little bit more from it.
  • 27:48And so presumably you
  • 27:51design this algorithm, you let it run,
  • 27:53the machine tries to learn what it can to
  • 27:56try and improve its prediction each time,
  • 27:58and then you test it on a separate set.
  • 28:02Yes, exactly and in order for
  • 28:05this to be utilized because
  • 28:08I'm thinking about how this can be
  • 28:11utilized in the clinic in terms of you
  • 28:15know one day you may walk into a clinic,
  • 28:18have your CT scan, have your biopsy,
  • 28:21fill out some paperwork on your
  • 28:24demographics and I can imagine a time
  • 28:27when all of that information is put into
  • 28:31a computer or stored in the computer.
  • 28:34This algorithm runs in the background
  • 28:37and spits out to the clinician,
  • 28:40this patient's prognosis is X.
  • 28:43The ideal treatment out of A, B&C is B
  • 28:48and that kind of thing.
  • 28:50I could see that happening
  • 28:51in the future. I think we're
  • 28:53a little bit far away from
  • 28:55complete automation in that way,
  • 28:57and partly because I think that
  • 28:58we don't have a good hold on
  • 29:01the data that we think is the most
  • 29:03important and we don't have a good
  • 29:05way of storing all that information,
  • 29:07but I think that it's not something that I
  • 29:10would be surprised we're doing in 5-10 years.
  • 29:13Doctor Sanjay Aneja is an assistant
  • 29:15professor of therapeutic radiology
  • 29:17at the Yale School of Medicine.
  • 29:19If you have questions,
  • 29:21the address is canceranswers@yale.edu
  • 29:22and past editions of the program
  • 29:24are available in audio and written
  • 29:26form at Yalecancercenter.org.
  • 29:28We hope you'll join us next week to
  • 29:30learn more about the fight against
  • 29:32cancer here on Connecticut public radio.