Deep Learning and Cancer Outcomes
June 29, 2020Information
June 28, 2020
Yale Cancer Center
visit: http://www.yalecancercenter.org
email: canceranswers@yale.edu
call: 203-785-4095
ID5373
To CiteDCA Citation Guide
- 00:00Support for Yale Cancer Answers
- 00:02comes from AstraZeneca, working
- 00:05side by side with leading
- 00:07scientists to better understand how
- 00:10complex data can be converted into
- 00:16innovative treatments. More information at astrazeneca-us.com.
- 00:17Welcome to Yale Cancer Answers with
- 00:18your host doctor Anees Chagpar.
- 00:21Yale Cancer Answers features the
- 00:23latest information on cancer care
- 00:24by welcoming oncologists and
- 00:26specialists who are on the
- 00:27forefront of the battle to fight
- 00:29cancer. This week it's a
- 00:31conversation about deep learning
- 00:33and cancer outcomes with Doctor
- 00:34Sanjay Aneja. Doctor Aneja is
- 00:37an assistant professor of
- 00:38therapeutic radiology at the Yale
- 00:40School of Medicine, where doctor
- 00:42Chagpar is a professor of
- 00:44surgical oncology.
- 00:45Maybe we can start
- 00:47off by you telling us a little
- 00:49bit about yourself and about
- 00:50your research?
- 00:51Sure, I've been in New Haven since 2009.
- 00:54I was actually a Yale medical
- 00:56student and I stayed here for
- 00:57my residency and now on faculty.
- 00:59Clinically, I am a radiation
- 01:01Oncologist and I primarily treat
- 01:02tumors of the central nervous system.
- 01:04So brain tumors as well
- 01:06as some prostate cancer.
- 01:07But I also run a large research group which
- 01:09is primarily focused on Applied Mathematics.
- 01:11My background is in applied mathematics, it's
- 01:13always something that I was very interested in.
- 01:15And what we're particularly interested
- 01:17in in my lab is looking at the utility
- 01:20of machine learning techniques,
- 01:22specifically deep learning in
- 01:24improving cancer outcomes and
- 01:25modeling cancer processes.
- 01:26I was with you
- 01:28all the way
- 01:30up to applied mathematics
- 01:32and machine learning and deep learning
- 01:34and all of that sounds really deep.
- 01:37But can you break it down
- 01:39into simple terms for us?
- 01:41What exactly are you doing?
- 01:44That's a good question,
- 01:45I think that we're very interested in that
- 01:48we have such a large amount of healthcare
- 01:50data that's currently available to us
- 01:53that's been kind of cultivated over
- 01:54years of managing cancer patients,
- 01:56and what we'd like to do is develop
- 01:58methods to better model that data so
- 02:00that we can kind of use that information
- 02:03to better improve the health care
- 02:05outcomes for cancer patients moving forward.
- 02:07Machine learning is just one
- 02:08way in which we do that.
- 02:10Traditionally,
- 02:10the way in which we used to
- 02:13model cancer as a disease
- 02:14it's always been a very difficult process.
- 02:16We would look at factors
- 02:19that physicians thought were important,
- 02:21and then we put them in a model
- 02:23and then we kind of look at an
- 02:25average and that had a couple
- 02:28different problems associated with it.
- 02:30One is that physicians aren't really
- 02:32great at predicting what factors
- 02:33are actually associated with cancer.
- 02:35It's a really complex disease.
- 02:36The second thing is that cancer
- 02:38is very difficult to model,
- 02:40and so using some of these techniques that
- 02:42we were developing a number of years ago,
- 02:45those techniques weren't
- 02:46necessarily as effective.
- 02:47Machine learning is sort of an
- 02:49advanced form of modeling data.
- 02:50What it does is it takes in all
- 02:53types of data,
- 02:53so it doesn't really require the
- 02:55physician to make any sort of choices
- 02:57about what type of data to include,
- 02:59and it allows us to model very
- 03:01complex processes like cancer,
- 03:02and there's been a lot of effort
- 03:04within our group to show that machine
- 03:06learning methods are probably the best
- 03:07way in which we can model cancer outcomes,
- 03:10and so that's what we're particularly
- 03:11interested in.
- 03:13Can you give us an example of how you did that in your lab?
- 03:18The concept of taking large amounts of data variables
- 03:21of various sorts that physicians may
- 03:23or may not think are relevant at all,
- 03:25giving it to a computer and saying,
- 03:28hey, look at all of this data and see
- 03:31whether or not any of these things,
- 03:33or a combination of these things may
- 03:36actually predict a particular outcome.
- 03:38Am I on the right track?
- 03:40Yeah, definitely.
- 03:41I think the one benefit of machine learning,
- 03:44compared to more traditional techniques
- 03:46of modeling cancer data is that it allows
- 03:48us to look at all the various resources.
- 03:53An example of a project that
- 03:54we've done in our lab is trying to
- 03:57model the outcomes for patients
- 03:58with early stage lung cancer.
- 04:00And so what we do in clinical practice
- 04:02is that we look at various different
- 04:04things when we're trying to model
- 04:05early stage lung cancer patients.
- 04:07We look at certain demographic
- 04:09variables like your age and if they're
- 04:11smoking and things of that nature,
- 04:12we also look at the images to see
- 04:14how big the tumor looks and whether
- 04:16or not it's close to any structures
- 04:18that we're worried about.
- 04:20And then we also look at
- 04:22what are treatment plans are
- 04:23and so how well we can deliver radiation
- 04:25to treat those and those are kind
- 04:28of three different data sources of
- 04:29sorts that we use in clinic in order
- 04:32to determine whether or not a patient
- 04:33will have a good outcome or bad outcome.
- 04:35And currently the only models that
- 04:37we actually have to tell patients
- 04:38how their outcomes would be are
- 04:40using only demographic variables,
- 04:41and so they're not really using the pictures,
- 04:43and they're not really using the
- 04:45treatment planning information,
- 04:46and the reason for that is because
- 04:47that data isn't necessarily
- 04:48something that you can put into
- 04:50some of those traditional models.
- 04:52What we did is we developed a deep
- 04:54learning machine learning algorithm.
- 04:56So it's an algorithm that takes
- 04:57the rawest form of the data from
- 04:59the electronic medical record,
- 05:00it pulls that demographic data from
- 05:02the electronic medical record,
- 05:03it takes every pixel from every
- 05:05picture of the tumor and analyzes
- 05:07those pixels in a very unique way.
- 05:09And then it also looks at every little
- 05:11part of our radiation treatment
- 05:12plan down to the pixel level.
- 05:14It kind of coalesces all that information
- 05:16and derives a personalized prediction,
- 05:17which we found was better than
- 05:20sort of getting an average based
- 05:21on just the demographic variables
- 05:23alone. So basically it's
- 05:25taking all of this information,
- 05:26the clinical information that
- 05:28most clinicians would use,
- 05:31the image Ng that they also use,
- 05:33but that they can't really put
- 05:35into a model because it's hard to
- 05:37define, like I see a big tumor,
- 05:40it looks like it's encasing
- 05:42some important vessels,
- 05:43but how do I really put that into
- 05:45a model and the treatment
- 05:47plan and can tell you kind of
- 05:50trying to be a clinician because
- 05:52a clinician will kind of look at
- 05:55that and have a good shift alt of.
- 05:58This patient will do well.
- 05:59This patient won't do so well and
- 06:01the computer can kind of give
- 06:03you that in a more quantitative
- 06:05way.
- 06:06Yeah, I think that you're touching
- 06:07upon one of the big advantages
- 06:09of these sorts of techniques.
- 06:11So one is that there an objective form,
- 06:13and so it's not necessarily
- 06:14utilizing one physicians experiences,
- 06:15or other physician experiences trying
- 06:17to use everyone's collective experience
- 06:18of analyzing data in an objective way.
- 06:20The other thing I think you're kind of
- 06:22touching upon is this idea of we're
- 06:24trying to mimic the same set of
- 06:26predictions that physicians make.
- 06:28And that's another reason why a lot
- 06:29of people are very interested in
- 06:31machine learning is because there
- 06:33is this component of artificial
- 06:34intelligence that can be kind of
- 06:37created when you are able to look
- 06:39at data sources without
- 06:41choosing which variables to evaluate,
- 06:43and so
- 06:46in this project where you
- 06:48were looking at outcomes of
- 06:50early lung cancer and giving a
- 06:52machine the demographic data,
- 06:54the smoking data, the imaging data,
- 06:56the treatment plan data,
- 06:57and you found that it was able to
- 07:00predict outcomes in terms of survival
- 07:03or in terms of recurrence.
- 07:05Yes, we found that it was able to predict
- 07:07recurrence in various different ways,
- 07:09survival, and when we compared it
- 07:11to maybe just using traditional
- 07:12methods or just one data stream,
- 07:14we found that it outperformed all of
- 07:16those different methods and so this
- 07:18idea of combining everything together
- 07:19is very very essential and we
- 07:21know it's very intuitive for
- 07:23clinicians to realize you have to do that,
- 07:25but I think that it's important
- 07:27to be able to
- 07:28do that in a mathematical way as well.
- 07:30Did it outperform the
- 07:31best guess of a clinician?
- 07:32So instead of looking at just
- 07:34traditional models, we know that
- 07:36clinicians sometimes bring their own
- 07:37experience and expertise to the equation.
- 07:39Did you compare the machine
- 07:40learning to clinicians best guess,
- 07:42how well patients would do or not do?
- 07:44Yeah, so one thing that we've done
- 07:46is we've
- 07:47done studies that are
- 07:49looking at that, and similarly,
- 07:50what we've done is had multiple
- 07:52physicians do their best guess,
- 07:53and the first thing that's important to
- 07:55know is that physicians don't guess the same,
- 07:58and so there's actually not a gold
- 07:59standard for a way in which a physician
- 08:02would actually evaluate a patient.
- 08:03And we found that it performed at least
- 08:05as well as an experienced clinician,
- 08:08and better than
- 08:09maybe less experienced clinicians.
- 08:12In a sense,
- 08:14you're recreating with this machine
- 08:16learning the predictive
- 08:17ability of an experienced clinician.
- 08:19So how is that now being utilized?
- 08:22Or is it being utilized in the clinic?
- 08:24I think that one of the
- 08:27benefits of the platform that
- 08:29we've developed is that it doesn't
- 08:31actually require us to pull data and
- 08:34put it into a calculator of sorts,
- 08:36which is a lot of what we
- 08:38see with a lot of predictive
- 08:40things in cancer.
- 08:41And so what we're interested in right
- 08:43now is trying to connect ours to
- 08:45the electronic medical record.
- 08:46We've developed an iPhone application
- 08:47which allows us to basically put
- 08:50in the medical record number of a patient,
- 08:52and then it allows us to pull the data
- 08:54natively and then it allows us to kind
- 08:56of develop that prediction in the clinic,
- 08:58and that's the next step
- 09:00of what we're trying to do.
- 09:01But I think the other thing that's
- 09:03really important whenever we're
- 09:04thinking about these machine learning
- 09:06algorithms is because they are so
- 09:08good at modeling healthcare data,
- 09:09they tend to actually model it
- 09:11too well, and so that's what we call overfit.
- 09:13The data set,
- 09:14and so they sort of are very good
- 09:16at modeling Yale data for example,
- 09:18but they might not be so good
- 09:20at modeling data from
- 09:21I don't know Chicago,
- 09:22and what we're trying to do
- 09:24also is something called external
- 09:25validation where we send our model
- 09:27to different cancer
- 09:28centers across the country and say,
- 09:30don't even tell us what the outcomes are,
- 09:32we will tell you what our models are predicting,
- 09:34and then you tell us how good
- 09:36our model did so far.
- 09:38Our collaboration with Jefferson
- 09:39in Philadelphia has shown that
- 09:40the model is very productive.
- 09:42And it's maintaining that same performance,
- 09:43but it's important to
- 09:45test these models before we actually
- 09:47put them into clinical practice.
- 09:49How well something can
- 09:52predict is based on how well we learned and on
- 09:55the learning set that it had to work with.
- 09:58It makes sense, however, that the data
- 10:00that it was getting was objective data.
- 10:03It wasn't getting
- 10:05data that may have had a lot of
- 10:07factors that were subjective, right?
- 10:10You were looking at imaging.
- 10:11Well, the image is what it is and if you give
- 10:15that image to two different radiologists,
- 10:17they both should say
- 10:19roughly the same thing,
- 10:20maybe not exactly, but roughly,
- 10:22and so you'd think that the
- 10:26Jefferson images are going to be
- 10:28very much like Yale images and
- 10:31so that may account for that close
- 10:34correlation between the two datasets.
- 10:37But the next question is OK,
- 10:39let's suppose that the model after you
- 10:41test it and I don't want to
- 10:43minimize the utility of making sure
- 10:45that it's externally generalizable.
- 10:47Is it even being used here at Yale,
- 10:50where it was developed and
- 10:52it does well in terms of predicting
- 10:55outcomes as well as an experienced clinician?
- 10:58Is that being used in the clinic?
- 11:00Are you putting in this data
- 11:03you've got now,
- 11:04this iPhone application that can
- 11:05pull in this data into this model?
- 11:08The model can do its magic and tell you,
- 11:11this is the recurrence rate.
- 11:13This is the survival rate.
- 11:14Are you using that in the clinic,
- 11:16and if so how?
- 11:18I think that we're in the process
- 11:20of developing the application.
- 11:21One of the big hurdles and with health
- 11:23care in general is the ability to
- 11:25actually get access to the electronic
- 11:27medical record in a way in which
- 11:29you can make an application kind
- 11:30of seamlessly integrate into it.
- 11:32And so it's somewhat difficult
- 11:33for us to do that.
- 11:34We're working with
- 11:36a software engineering firm
- 11:37to actually help us with
- 11:38beyond the scope
- 11:40of what our lab does typically.
- 11:42And so that's where we're
- 11:44at right now with respect to actually
- 11:46integrating into clinical practice.
- 11:47I think that right now what we
- 11:49have is we have an ability to
- 11:51kind of look back on patients,
- 11:53and if there was a patient, for example,
- 11:55who wanted to have a prediction,
- 11:57we could actually generate that.
- 11:58But we cannot do it in the
- 12:00electronic medical record.
- 12:01As of right now,
- 12:03and so when you think about
- 12:05the potential utility of this,
- 12:06where do you see it
- 12:07going?
- 12:08So I think that one thing that I think
- 12:11is very important is as we're kind of
- 12:13developing so many different genres
- 12:14of treatment for cancer patients,
- 12:16there's this increasing need for us
- 12:18to develop methods to risk stratify
- 12:20them and identify the highest risk.
- 12:21Patients who maybe would be benefiting
- 12:23from more aggressive treatment,
- 12:25more aggressive followup,
- 12:26and similarly,
- 12:26I think we've found with some types of
- 12:29cancers that maybe we've been a little
- 12:31bit too aggressive in our follow-up
- 12:33or too aggressive with our therapy
- 12:35and to risk stratify
- 12:36which of those patients would be
- 12:38most useful for certain intervention
- 12:40versus another one is something
- 12:42that I think are our algorithm
- 12:43or our platform is very useful for
- 12:45especifically for early stage
- 12:47lung cancer patients.
- 12:47There's currently a clinical trial
- 12:49evaluating whether or not those
- 12:50patients should get radiation and then
- 12:52additional treatment on top of that.
- 12:54Because there's this idea that
- 12:55potentially additional immunotherapy,
- 12:56for example, would be helpful
- 12:57for those patients and a large
- 12:59amount of them may not need that
- 13:00because they are already going
- 13:02to have great outcomes anyways,
- 13:03and a large amount of them
- 13:05maybe would need that,
- 13:06and they should get it
- 13:07maybe
- 13:09right after treatment,
- 13:10before we even know how the outcomes are,
- 13:12and so if we can identify which
- 13:14patients are those high risk patients
- 13:15versus those low risk patients,
- 13:17I think we could potentially tailor
- 13:19our treatments and better understand
- 13:22the way in which we can
- 13:25personalize care based on someone's
- 13:27images and everything else.
- 13:29I get that concept of,
- 13:31you know, risk stratification,
- 13:33especially for additional therapy.
- 13:34What would be interesting though,
- 13:36is to really look at how do
- 13:39patients do without any therapy?
- 13:41How do patients do with
- 13:43therapy X versus therapy Y?
- 13:45And how can we really
- 13:47personalize therapies
- 13:48given the data that we have?
- 13:50We are going to take a short break for a medical minute,
- 13:53but when we come back we will
- 13:55answer those questions.
- 13:57So stay tuned to learn more
- 13:59about deep learning in
- 14:00cancer outcomes with my
- 14:02guest doctor Sanjay Aneja.
- 14:04Support for Yale Cancer Answers
- 14:05comes from AstraZeneca,
- 14:07a biopharmaceutical business that
- 14:08is pushing the boundaries of science
- 14:11to deliver new cancer medicines. More
- 14:14information at astrazeneca-us.com.
- 14:15This is a medical minute about
- 14:17pancreatic cancer, which represents
- 14:19about 3% of all cancers in the US and
- 14:22about 7% of cancer deaths.
- 14:24Clinical trials are currently being
- 14:27offered at federally designated
- 14:28comprehensive cancer centers
- 14:30for the treatment of advanced
- 14:31stage and metastatic pancreatic
- 14:33cancer using chemotherapy
- 14:34and other novel therapies like FOLFIRINOX
- 14:36a combination of five
- 14:39different chemotherapies is the
- 14:40latest advances in the treatment
- 14:43of metastatic pancreatic cancer,
- 14:44and research continues
- 14:46in centers around the world
- 14:47looking into targeted therapies.
- 14:49And a recently discovered marker
- 14:52hENT 1. This has been a medical
- 14:54minute brought to you as a public
- 14:57service by Yale Cancer Center.
- 14:59More information is available at
- 15:02yalecancercenter.org. You're listening
- 15:03to Connecticut public radio.
- 15:05Welcome
- 15:06back to Yale Cancer Answers.
- 15:07This is doctor Anees Chagpar
- 15:10and I'm joined tonight by my
- 15:12guest doctor Sanjay Aneja we're
- 15:14discussing deep learning in cancer
- 15:16outcomes and right before the break,
- 15:18doctor Aneja was telling us
- 15:20about how he and his lab have
- 15:23really used machine learning.
- 15:25That is to say,
- 15:26Applied Mathematics and complicated
- 15:28computational models to really take in
- 15:30lots and lots of data that clinicians
- 15:32using their usual clinical guess to
- 15:34predict outcomes for cancer patients.
- 15:36And Sanjay, the example that
- 15:39you gave us in early lung cancer
- 15:42where you said,
- 15:43the demographics
- 15:45we took the imaging data,
- 15:47but then we took the treatment plan
- 15:50and we use that and we ask the computer
- 15:53to look at these things down to the
- 15:57pixel level and then predict outcomes.
- 15:59And it was very good at predicting those
- 16:02outcomes as good as an experienced clinician,
- 16:05and that's great, but
- 16:07my question is
- 16:09what happens if you
- 16:12don't?
- 16:12How do you get rid of the treatment
- 16:16part of that and predict outcomes
- 16:18without treatment to kind of get at
- 16:21the idea of, are we over treating
- 16:23some patients?
- 16:24Because if the computer doesn't have that,
- 16:27if all patients are treated,
- 16:29and that's the basis on which it learned,
- 16:32how do you take out one part of that model?
- 16:36That's a good question,
- 16:37and it's something that I
- 16:39think is not impossible.
- 16:40So I think that if we remove the
- 16:42treatment piece of it and just look at
- 16:44the images and the demographic data,
- 16:47basically pretreatment information,
- 16:48we find the model is actually
- 16:50quite predictive as well.
- 16:51It just improves significantly if we
- 16:52know exactly what types of treatments
- 16:54we provided for the patients,
- 16:56and so another example of a study that
- 16:57we've done which only uses pretreatment
- 16:59imaging has been evaluating lymph nodes
- 17:01in head and neck cancer patients.
- 17:03We were attempting to look
- 17:04at which lymph nodes we saw
- 17:06on CT imaging actually had the
- 17:08presence of cancer and we wanted to
- 17:10identify that so that maybe you know
- 17:12what we could do is more tailor
- 17:13the therapy for head and neck cancer
- 17:15patients. Oftentimes with head and neck cancer patients
- 17:17when we think that their lymph
- 17:18nodes don't have cancer,
- 17:19we have them undergo surgery
- 17:21and then we find
- 17:22those lymph nodes have cancer.
- 17:24They have to get radiation
- 17:25and chemotherapy altogether.
- 17:26And so if we were able to identify
- 17:28the patients ahead of time that
- 17:29have cancer in the lymph nodes than
- 17:31what they would have instead is
- 17:33just chemotherapy and radiation.
- 17:34They save themselves some surgery and
- 17:36so that's an example of when we've
- 17:38used pretreatment imaging to sort of
- 17:39reduce potential extra care or care
- 17:41that maybe would not be necessary
- 17:44or could have been avoided.
- 17:46Do we have datasets with patients who
- 17:49were treated in different ways so that
- 17:52we can predict given pretreatment data,
- 17:54if you got treatment a,
- 17:56you will do this well,
- 17:58if you got treatment b
- 17:59you will do that well and if you got
- 18:02treatment c you will do this well.
- 18:05And if you got no treatment you would
- 18:08do just as well as any of the above.
- 18:12Yeah, so that actually kind of
- 18:13touches upon something that we're
- 18:15really actively exploring and NOTE Confidence: 0.904884934425354
- 18:16something that we're very excited about.
- 18:18So one thing that
- 18:19we've kind of indicated,
- 18:21is that these machine
- 18:22learning algorithms, these deep
- 18:23learning algorithms are extremely
- 18:24good at analyzing pictures,
- 18:26and so one thing that we've looked at is
- 18:28this idea of what we call digital Twins.
- 18:30So, based on your pretreatment imaging,
- 18:32nothing else no demographics,
- 18:33just what your tumor looks like,
- 18:35if we could find your digital twin
- 18:37or someone who's tumor looks exactly
- 18:38like yours or digital family,
- 18:40which is maybe a group of five
- 18:42people that are like that,
- 18:43we can use deep learning to do that.
- 18:45And then what we can do is we can
- 18:48see, OK among your digital twins,
- 18:50or your digital family
- 18:51who got one type of treatment,
- 18:53this is what their outcome was
- 18:55and among your digital family who
- 18:57got another type of treatment,
- 18:58this is what their outcome was.
- 19:00And then they
- 19:01can make a more informed decision about
- 19:03what they would actually want to do.
- 19:05We talk a lot on this
- 19:07show about tumor heterogeneity and
- 19:09different kinds of cancer and tumor
- 19:11biology and all kinds of other things
- 19:13that
- 19:15seemed to really affect
- 19:17biology, they seem to affect outcomes,
- 19:19so how is it that you can use that
- 19:21data without knowing all of the
- 19:23other things?
- 19:25I think for one, there's a lot of
- 19:28evidence to suggest that deep learning
- 19:29algorithms can actually predict changes
- 19:31in tumors up to the genomic level,
- 19:33so genetic mutations and tumor is based
- 19:35off of the pictures because we have to
- 19:37appreciate that their really evaluating
- 19:39every tumor at a very, very small level,
- 19:41every little pixel and each pixel has
- 19:43a variety of different intensities,
- 19:45and so they're really looking
- 19:46at the data in a close level.
- 19:48So in lung cancer as well as in brain tumors,
- 19:52an also in some lymphomas there's
- 19:54been evidence to suggest that deep
- 19:55learning algorithms on the diagnostic
- 19:57images can predict genomic changes.
- 19:59So like driver gene mutations
- 20:02that would actually maybe presumably need
- 20:05sequencing information for and so then
- 20:07that suggests that the pictures actually
- 20:08have a lot more information than we think,
- 20:11but I do think that you're kind of right
- 20:14in one way that maybe it's not just the
- 20:17pictures that tell the whole story.
- 20:19But the idea is that people
- 20:21whose tumors look similar,
- 20:23they likely have similar genomic backgrounds.
- 20:25I'm still puzzled by
- 20:27this whole concept because
- 20:29we think about a CT scan or a mammogram,
- 20:33or an MRI and each of these
- 20:35has its own pitfalls.
- 20:37There are false positives on these images,
- 20:39there are false negatives on these images,
- 20:42so it kind of makes me a little wary to
- 20:45put so much faith just in the images.
- 20:49How do you explain
- 20:50that?
- 20:52I guess one of the limitations or one
- 20:54of the important caveats to any sort of
- 20:57machine learning project is that your
- 20:59outcomes are only as good as your data,
- 21:01and so if we have a lot of false
- 21:03positives in our data set that we have
- 21:06not addressed and we haven't identified,
- 21:08then I think that it's really important
- 21:10that we understand that the machine learning
- 21:12algorithm will learn those same errors.
- 21:14So similarly any sort of biases that we have,
- 21:16maybe we have a bias towards over
- 21:19imaging or over
- 21:21diagnosing something on an image,
- 21:22those similar biases will be promulgated
- 21:24through our machine learning algorithms.
- 21:26It's actually somewhat of an interesting
- 21:28topic in the context of machine learning
- 21:30outside of healthcare is that we
- 21:31find that a lot of machine learning
- 21:33algorithms they mimic the same biases and
- 21:36discriminatory abilities
- 21:37that people have in regular practice,
- 21:40and so a lot of algorithms that
- 21:42have been used in law enforcement,
- 21:44we find that are actually maybe
- 21:46promulgating some of the parts
- 21:48of our law enforcement
- 21:50that we don't want.
- 21:52One way to kind of get around that
- 21:55one would think is to use more data,
- 21:58not just the images,
- 22:00but get down to the genomic level,
- 22:02do the biopsy. We've got biopsies on
- 22:05most tumors before we ever treat them,
- 22:07so look at the pathologic information,
- 22:10look at the genomic information.
- 22:11We can get a lot of
- 22:14sequencing data these days.
- 22:16And speaking of which,
- 22:17it's really hard to understand what all
- 22:19of these different mutations are when
- 22:21we think about whole exome sequencing.
- 22:23I mean, I would think that
- 22:26machine learning might have
- 22:27a role to play there too.
- 22:29Certainly, I think that machine learning
- 22:31in general is probably one of the more
- 22:34common approaches to evaluate genomic data.
- 22:36Now, because the genome is so complex
- 22:38and it's so difficult for us to kind of
- 22:41understand that the machine learning
- 22:44algorithms are maybe the most common
- 22:46ways in which we analyze that sort of
- 22:49information now,
- 22:50specifically with respect to deep learning,
- 22:52which is what our lab is,
- 22:55particularly just specific types
- 22:56of machine learning methods.
- 22:57It's somewhat difficult to evaluate
- 22:59genomic information with that data,
- 23:00and the reason
- 23:02for that is
- 23:05because we don't actually have a
- 23:07huge data set right now at our
- 23:09disposal of genomic information,
- 23:11because in order to do some of these
- 23:13analysis, we need upwards of 1000 patients,
- 23:15and so it's difficult to get 1000
- 23:18patients with tumors, images, and
- 23:19whole exome sequencing,
- 23:21but it's possible,
- 23:22and I would
- 23:24venture that a place
- 23:26like Yale is the place
- 23:28that would have that ability to do that or
- 23:31or some of these cooperative groups, right?
- 23:34For our listeners,
- 23:37there are clinical trials that happen
- 23:39all across the country, sometimes all
- 23:41across the world with cooperative groups.
- 23:44These groups of clinicians,
- 23:45physicians who are all putting their
- 23:48patients on exactly the same clinical trial
- 23:50and taking their data,
- 23:52putting it in a central repository
- 23:54where all of that can be studied.
- 23:56Sanjay, I would think that
- 23:58would be an ideal place for you
- 24:01to get that data.
- 24:03Yes, and one effort of our research
- 24:05group is actually sort of engaging
- 24:07with the cooperative groups.
- 24:08There's two that we've
- 24:10begun engaging with, the NRG
- 24:12which is a large group that has a
- 24:14lot of radiation data as well as the
- 24:17Southwest Oncology Group also
- 24:19known as SWOG in order to sort of develop
- 24:21an infrastructure within the
- 24:23organization to evaluate machine
- 24:24learning techniques and utilized
- 24:25machine learning techniques.
- 24:26Because a lot of what
- 24:28we've designed, these clinical
- 24:29trials and these repositories is that
- 24:31these cooperative groups,
- 24:33their infrastructure wasn't made for
- 24:34these sorts of analysis because they
- 24:36weren't necessarily thinking that
- 24:38this is something that was going to
- 24:40come on the horizon.
- 24:42And so one thing that we're working
- 24:44right now is with SWOG and NRG
- 24:46to develop that sort of infrastructure.
- 24:48The first process of that is
- 24:50developing something that allows us
- 24:52to get the imaging data very easily.
- 24:54Images are sort of an easy
- 24:57method for us to evaluate machine
- 24:59learning methods because one,
- 25:01it's been shown to be the most effective
- 25:03in image analysis across various industries,
- 25:05healthcare, technology,
- 25:06etc.
- 25:06and secondly imaging in healthcare
- 25:08has a standardized data format.
- 25:10It's a common data model,
- 25:12so there's no difficulty about well,
- 25:14so and so in California stores
- 25:16their data one way,
- 25:17and then we store it a different way, etc.
- 25:20One thing that you mentioned,
- 25:23which I still have to go back to, is
- 25:26you said that you're
- 25:27interested in deep learning,
- 25:28which is a type of machine learning that
- 25:31is particularly well suited to imaging.
- 25:34Tell us the difference between
- 25:35deep learning and machine learning.
- 25:37Yeah, that's a
- 25:38good question,
- 25:40and the words,
- 25:41artificial intelligence,
- 25:42machine learning,
- 25:43and deep learning.
- 25:44sort of get thrown around together,
- 25:46and it's difficult to parse them out.
- 25:48I think that machine
- 25:50learning is a broad
- 25:51discipline of various types of
- 25:53mathematical techniques to model data.
- 25:54Deep learning is just one
- 25:56of those techniques.
- 25:57Now the difference between deep
- 25:59learning and other traditional machine
- 26:00learning techniques is that other
- 26:02machine learning techniques require,
- 26:03you know inputs that are called
- 26:05features and so they can only handle
- 26:07data that comes in a featured format.
- 26:10So sort of predictor variables that
- 26:11you're interested in demographic
- 26:13variables or variables from
- 26:14the electronic medical record.
- 26:15Deep learning is particularly
- 26:17unique in that it doesn't actually
- 26:19require data at all from a human.
- 26:21It doesn't require any sort
- 26:22of human interaction.
- 26:23It can learn those features on
- 26:25its own as long as it has access
- 26:27to what they called the sensor.
- 26:29So where the data is generated and as data
- 26:32is being generated in real time,
- 26:34deep learning algorithms can analyze it,
- 26:37identify those features that
- 26:38are very important,
- 26:39so those predictors are important and then
- 26:43create predictions.
- 26:44How exactly does this happen?
- 26:46Somebody's gotta
- 26:48program this thing right?
- 26:50Yes, it is programmed,
- 26:52typically in Python and so the way
- 26:55that the process works for developing
- 26:57a deep learning algorithm is first
- 27:00you have a set of training data.
- 27:02And the associated labels to that data.
- 27:04So you already have data
- 27:06with outcomes that you know,
- 27:07and that's the process you're
- 27:09trying to predict and then what you
- 27:11do is you design your deep learning
- 27:13algorithm using a complex series
- 27:14of what they call neural networks
- 27:16and what we do is we kind of train
- 27:18the algorithm by looking at each of
- 27:21those training data set labels to
- 27:22identify sort of patterns in the data.
- 27:24And it takes a significant amount of
- 27:26time and a huge amount of computational
- 27:28resources in order to do that.
- 27:30So deep learning algorithms that
- 27:31we developed in our lab oftentimes
- 27:33take weeks to train.
- 27:34And so we just let it run all week
- 27:37and let it run every single time
- 27:39and every piece of information
- 27:42that it gets
- 27:43it looks at the
- 27:44outcome and it tries to learn
- 27:46a little bit more from it.
- 27:48And so presumably you
- 27:51design this algorithm, you let it run,
- 27:53the machine tries to learn what it can to
- 27:56try and improve its prediction each time,
- 27:58and then you test it on a separate set.
- 28:02Yes, exactly and in order for
- 28:05this to be utilized because
- 28:08I'm thinking about how this can be
- 28:11utilized in the clinic in terms of you
- 28:15know one day you may walk into a clinic,
- 28:18have your CT scan, have your biopsy,
- 28:21fill out some paperwork on your
- 28:24demographics and I can imagine a time
- 28:27when all of that information is put into
- 28:31a computer or stored in the computer.
- 28:34This algorithm runs in the background
- 28:37and spits out to the clinician,
- 28:40this patient's prognosis is X.
- 28:43The ideal treatment out of A, B&C is B
- 28:48and that kind of thing.
- 28:50I could see that happening
- 28:51in the future. I think we're
- 28:53a little bit far away from
- 28:55complete automation in that way,
- 28:57and partly because I think that
- 28:58we don't have a good hold on
- 29:01the data that we think is the most
- 29:03important and we don't have a good
- 29:05way of storing all that information,
- 29:07but I think that it's not something that I
- 29:10would be surprised we're doing in 5-10 years.
- 29:13Doctor Sanjay Aneja is an assistant
- 29:15professor of therapeutic radiology
- 29:17at the Yale School of Medicine.
- 29:19If you have questions,
- 29:21the address is canceranswers@yale.edu
- 29:22and past editions of the program
- 29:24are available in audio and written
- 29:26form at Yalecancercenter.org.
- 29:28We hope you'll join us next week to
- 29:30learn more about the fight against
- 29:32cancer here on Connecticut public radio.