Deep Learning and Cancer Outcomes

June 29, 2020

Information

June 28, 2020

Yale Cancer Center

visit: http://www.yalecancercenter.org

email: canceranswers@yale.edu

call: 203-785-4095

ID5373

To CiteDCA Citation Guide

00:00Support for Yale Cancer Answers
00:02comes from AstraZeneca, working
00:05side by side with leading
00:07scientists to better understand how
00:10complex data can be converted into
00:16innovative treatments. More information at astrazeneca-us.com.
00:17Welcome to Yale Cancer Answers with
00:18your host doctor Anees Chagpar.
00:21Yale Cancer Answers features the
00:23latest information on cancer care
00:24by welcoming oncologists and
00:26specialists who are on the
00:27forefront of the battle to fight
00:29cancer. This week it's a
00:31conversation about deep learning
00:33and cancer outcomes with Doctor
00:34Sanjay Aneja. Doctor Aneja is
00:37an assistant professor of
00:38therapeutic radiology at the Yale
00:40School of Medicine, where doctor
00:42Chagpar is a professor of
00:44surgical oncology.
00:45Maybe we can start
00:47off by you telling us a little
00:49bit about yourself and about
00:50your research?
00:51Sure, I've been in New Haven since 2009.
00:54I was actually a Yale medical
00:56student and I stayed here for
00:57my residency and now on faculty.
00:59Clinically, I am a radiation
01:01Oncologist and I primarily treat
01:02tumors of the central nervous system.
01:04So brain tumors as well
01:06as some prostate cancer.
01:07But I also run a large research group which
01:09is primarily focused on Applied Mathematics.
01:11My background is in applied mathematics, it's
01:13always something that I was very interested in.
01:15And what we're particularly interested
01:17in in my lab is looking at the utility
01:20of machine learning techniques,
01:22specifically deep learning in
01:24improving cancer outcomes and
01:25modeling cancer processes.
01:26I was with you
01:28all the way
01:30up to applied mathematics
01:32and machine learning and deep learning
01:34and all of that sounds really deep.
01:37But can you break it down
01:39into simple terms for us?
01:41What exactly are you doing?
01:44That's a good question,
01:45I think that we're very interested in that
01:48we have such a large amount of healthcare
01:50data that's currently available to us
01:53that's been kind of cultivated over
01:54years of managing cancer patients,
01:56and what we'd like to do is develop
01:58methods to better model that data so
02:00that we can kind of use that information
02:03to better improve the health care
02:05outcomes for cancer patients moving forward.
02:07Machine learning is just one
02:08way in which we do that.
02:10Traditionally,
02:10the way in which we used to
02:13model cancer as a disease
02:14it's always been a very difficult process.
02:16We would look at factors
02:19that physicians thought were important,
02:21and then we put them in a model
02:23and then we kind of look at an
02:25average and that had a couple
02:28different problems associated with it.
02:30One is that physicians aren't really
02:32great at predicting what factors
02:33are actually associated with cancer.
02:35It's a really complex disease.
02:36The second thing is that cancer
02:38is very difficult to model,
02:40and so using some of these techniques that
02:42we were developing a number of years ago,
02:45those techniques weren't
02:46necessarily as effective.
02:47Machine learning is sort of an
02:49advanced form of modeling data.
02:50What it does is it takes in all
02:53types of data,
02:53so it doesn't really require the
02:55physician to make any sort of choices
02:57about what type of data to include,
02:59and it allows us to model very
03:01complex processes like cancer,
03:02and there's been a lot of effort
03:04within our group to show that machine
03:06learning methods are probably the best
03:07way in which we can model cancer outcomes,
03:10and so that's what we're particularly
03:11interested in.
03:13Can you give us an example of how you did that in your lab?
03:18The concept of taking large amounts of data variables
03:21of various sorts that physicians may
03:23or may not think are relevant at all,
03:25giving it to a computer and saying,
03:28hey, look at all of this data and see
03:31whether or not any of these things,
03:33or a combination of these things may
03:36actually predict a particular outcome.
03:38Am I on the right track?
03:40Yeah, definitely.
03:41I think the one benefit of machine learning,
03:44compared to more traditional techniques
03:46of modeling cancer data is that it allows
03:48us to look at all the various resources.
03:53An example of a project that
03:54we've done in our lab is trying to
03:57model the outcomes for patients
03:58with early stage lung cancer.
04:00And so what we do in clinical practice
04:02is that we look at various different
04:04things when we're trying to model
04:05early stage lung cancer patients.
04:07We look at certain demographic
04:09variables like your age and if they're
04:11smoking and things of that nature,
04:12we also look at the images to see
04:14how big the tumor looks and whether
04:16or not it's close to any structures
04:18that we're worried about.
04:20And then we also look at
04:22what are treatment plans are
04:23and so how well we can deliver radiation
04:25to treat those and those are kind
04:28of three different data sources of
04:29sorts that we use in clinic in order
04:32to determine whether or not a patient
04:33will have a good outcome or bad outcome.
04:35And currently the only models that
04:37we actually have to tell patients
04:38how their outcomes would be are
04:40using only demographic variables,
04:41and so they're not really using the pictures,
04:43and they're not really using the
04:45treatment planning information,
04:46and the reason for that is because
04:47that data isn't necessarily
04:48something that you can put into
04:50some of those traditional models.
04:52What we did is we developed a deep
04:54learning machine learning algorithm.
04:56So it's an algorithm that takes
04:57the rawest form of the data from
04:59the electronic medical record,
05:00it pulls that demographic data from
05:02the electronic medical record,
05:03it takes every pixel from every
05:05picture of the tumor and analyzes
05:07those pixels in a very unique way.
05:09And then it also looks at every little
05:11part of our radiation treatment
05:12plan down to the pixel level.
05:14It kind of coalesces all that information
05:16and derives a personalized prediction,
05:17which we found was better than
05:20sort of getting an average based
05:21on just the demographic variables
05:23alone. So basically it's
05:25taking all of this information,
05:26the clinical information that
05:28most clinicians would use,
05:31the image Ng that they also use,
05:33but that they can't really put
05:35into a model because it's hard to
05:37define, like I see a big tumor,
05:40it looks like it's encasing
05:42some important vessels,
05:43but how do I really put that into
05:45a model and the treatment
05:47plan and can tell you kind of
05:50trying to be a clinician because
05:52a clinician will kind of look at
05:55that and have a good shift alt of.
05:58This patient will do well.
05:59This patient won't do so well and
06:01the computer can kind of give
06:03you that in a more quantitative
06:05way.
06:06Yeah, I think that you're touching
06:07upon one of the big advantages
06:09of these sorts of techniques.
06:11So one is that there an objective form,
06:13and so it's not necessarily
06:14utilizing one physicians experiences,
06:15or other physician experiences trying
06:17to use everyone's collective experience
06:18of analyzing data in an objective way.
06:20The other thing I think you're kind of
06:22touching upon is this idea of we're
06:24trying to mimic the same set of
06:26predictions that physicians make.
06:28And that's another reason why a lot
06:29of people are very interested in
06:31machine learning is because there
06:33is this component of artificial
06:34intelligence that can be kind of
06:37created when you are able to look
06:39at data sources without
06:41choosing which variables to evaluate,
06:43and so
06:46in this project where you
06:48were looking at outcomes of
06:50early lung cancer and giving a
06:52machine the demographic data,
06:54the smoking data, the imaging data,
06:56the treatment plan data,
06:57and you found that it was able to
07:00predict outcomes in terms of survival
07:03or in terms of recurrence.
07:05Yes, we found that it was able to predict
07:07recurrence in various different ways,
07:09survival, and when we compared it
07:11to maybe just using traditional
07:12methods or just one data stream,
07:14we found that it outperformed all of
07:16those different methods and so this
07:18idea of combining everything together
07:19is very very essential and we
07:21know it's very intuitive for
07:23clinicians to realize you have to do that,
07:25but I think that it's important
07:27to be able to
07:28do that in a mathematical way as well.
07:30Did it outperform the
07:31best guess of a clinician?
07:32So instead of looking at just
07:34traditional models, we know that
07:36clinicians sometimes bring their own
07:37experience and expertise to the equation.
07:39Did you compare the machine
07:40learning to clinicians best guess,
07:42how well patients would do or not do?
07:44Yeah, so one thing that we've done
07:46is we've
07:47done studies that are
07:49looking at that, and similarly,
07:50what we've done is had multiple
07:52physicians do their best guess,
07:53and the first thing that's important to
07:55know is that physicians don't guess the same,
07:58and so there's actually not a gold
07:59standard for a way in which a physician
08:02would actually evaluate a patient.
08:03And we found that it performed at least
08:05as well as an experienced clinician,
08:08and better than
08:09maybe less experienced clinicians.
08:12In a sense,
08:14you're recreating with this machine
08:16learning the predictive
08:17ability of an experienced clinician.
08:19So how is that now being utilized?
08:22Or is it being utilized in the clinic?
08:24I think that one of the
08:27benefits of the platform that
08:29we've developed is that it doesn't
08:31actually require us to pull data and
08:34put it into a calculator of sorts,
08:36which is a lot of what we
08:38see with a lot of predictive
08:40things in cancer.
08:41And so what we're interested in right
08:43now is trying to connect ours to
08:45the electronic medical record.
08:46We've developed an iPhone application
08:47which allows us to basically put
08:50in the medical record number of a patient,
08:52and then it allows us to pull the data
08:54natively and then it allows us to kind
08:56of develop that prediction in the clinic,
08:58and that's the next step
09:00of what we're trying to do.
09:01But I think the other thing that's
09:03really important whenever we're
09:04thinking about these machine learning
09:06algorithms is because they are so
09:08good at modeling healthcare data,
09:09they tend to actually model it
09:11too well, and so that's what we call overfit.
09:13The data set,
09:14and so they sort of are very good
09:16at modeling Yale data for example,
09:18but they might not be so good
09:20at modeling data from
09:21I don't know Chicago,
09:22and what we're trying to do
09:24also is something called external
09:25validation where we send our model
09:27to different cancer
09:28centers across the country and say,
09:30don't even tell us what the outcomes are,
09:32we will tell you what our models are predicting,
09:34and then you tell us how good
09:36our model did so far.
09:38Our collaboration with Jefferson
09:39in Philadelphia has shown that
09:40the model is very productive.
09:42And it's maintaining that same performance,
09:43but it's important to
09:45test these models before we actually
09:47put them into clinical practice.
09:49How well something can
09:52predict is based on how well we learned and on
09:55the learning set that it had to work with.
09:58It makes sense, however, that the data
10:00that it was getting was objective data.
10:03It wasn't getting
10:05data that may have had a lot of
10:07factors that were subjective, right?
10:10You were looking at imaging.
10:11Well, the image is what it is and if you give
10:15that image to two different radiologists,
10:17they both should say
10:19roughly the same thing,
10:20maybe not exactly, but roughly,
10:22and so you'd think that the
10:26Jefferson images are going to be
10:28very much like Yale images and
10:31so that may account for that close
10:34correlation between the two datasets.
10:37But the next question is OK,
10:39let's suppose that the model after you
10:41test it and I don't want to
10:43minimize the utility of making sure
10:45that it's externally generalizable.
10:47Is it even being used here at Yale,
10:50where it was developed and
10:52it does well in terms of predicting
10:55outcomes as well as an experienced clinician?
10:58Is that being used in the clinic?
11:00Are you putting in this data
11:03you've got now,
11:04this iPhone application that can
11:05pull in this data into this model?
11:08The model can do its magic and tell you,
11:11this is the recurrence rate.
11:13This is the survival rate.
11:14Are you using that in the clinic,
11:16and if so how?
11:18I think that we're in the process
11:20of developing the application.
11:21One of the big hurdles and with health
11:23care in general is the ability to
11:25actually get access to the electronic
11:27medical record in a way in which
11:29you can make an application kind
11:30of seamlessly integrate into it.
11:32And so it's somewhat difficult
11:33for us to do that.
11:34We're working with
11:36a software engineering firm
11:37to actually help us with
11:38beyond the scope
11:40of what our lab does typically.
11:42And so that's where we're
11:44at right now with respect to actually
11:46integrating into clinical practice.
11:47I think that right now what we
11:49have is we have an ability to
11:51kind of look back on patients,
11:53and if there was a patient, for example,
11:55who wanted to have a prediction,
11:57we could actually generate that.
11:58But we cannot do it in the
12:00electronic medical record.
12:01As of right now,
12:03and so when you think about
12:05the potential utility of this,
12:06where do you see it
12:07going?
12:08So I think that one thing that I think
12:11is very important is as we're kind of
12:13developing so many different genres
12:14of treatment for cancer patients,
12:16there's this increasing need for us
12:18to develop methods to risk stratify
12:20them and identify the highest risk.
12:21Patients who maybe would be benefiting
12:23from more aggressive treatment,
12:25more aggressive followup,
12:26and similarly,
12:26I think we've found with some types of
12:29cancers that maybe we've been a little
12:31bit too aggressive in our follow-up
12:33or too aggressive with our therapy
12:35and to risk stratify
12:36which of those patients would be
12:38most useful for certain intervention
12:40versus another one is something
12:42that I think are our algorithm
12:43or our platform is very useful for
12:45especifically for early stage
12:47lung cancer patients.
12:47There's currently a clinical trial
12:49evaluating whether or not those
12:50patients should get radiation and then
12:52additional treatment on top of that.
12:54Because there's this idea that
12:55potentially additional immunotherapy,
12:56for example, would be helpful
12:57for those patients and a large
12:59amount of them may not need that
13:00because they are already going
13:02to have great outcomes anyways,
13:03and a large amount of them
13:05maybe would need that,
13:06and they should get it
13:07maybe
13:09right after treatment,
13:10before we even know how the outcomes are,
13:12and so if we can identify which
13:14patients are those high risk patients
13:15versus those low risk patients,
13:17I think we could potentially tailor
13:19our treatments and better understand
13:22the way in which we can
13:25personalize care based on someone's
13:27images and everything else.
13:29I get that concept of,
13:31you know, risk stratification,
13:33especially for additional therapy.
13:34What would be interesting though,
13:36is to really look at how do
13:39patients do without any therapy?
13:41How do patients do with
13:43therapy X versus therapy Y?
13:45And how can we really
13:47personalize therapies
13:48given the data that we have?
13:50We are going to take a short break for a medical minute,
13:53but when we come back we will
13:55answer those questions.
13:57So stay tuned to learn more
13:59about deep learning in
14:00cancer outcomes with my
14:02guest doctor Sanjay Aneja.
14:04Support for Yale Cancer Answers
14:05comes from AstraZeneca,
14:07a biopharmaceutical business that
14:08is pushing the boundaries of science
14:11to deliver new cancer medicines. More
14:14information at astrazeneca-us.com.
14:15This is a medical minute about
14:17pancreatic cancer, which represents
14:19about 3% of all cancers in the US and
14:22about 7% of cancer deaths.
14:24Clinical trials are currently being
14:27offered at federally designated
14:28comprehensive cancer centers
14:30for the treatment of advanced
14:31stage and metastatic pancreatic
14:33cancer using chemotherapy
14:34and other novel therapies like FOLFIRINOX
14:36a combination of five
14:39different chemotherapies is the
14:40latest advances in the treatment
14:43of metastatic pancreatic cancer,
14:44and research continues
14:46in centers around the world
14:47looking into targeted therapies.
14:49And a recently discovered marker
14:52hENT 1. This has been a medical
14:54minute brought to you as a public
14:57service by Yale Cancer Center.
14:59More information is available at
15:02yalecancercenter.org. You're listening
15:03to Connecticut public radio.
15:05Welcome
15:06back to Yale Cancer Answers.
15:07This is doctor Anees Chagpar
15:10and I'm joined tonight by my
15:12guest doctor Sanjay Aneja we're
15:14discussing deep learning in cancer
15:16outcomes and right before the break,
15:18doctor Aneja was telling us
15:20about how he and his lab have
15:23really used machine learning.
15:25That is to say,
15:26Applied Mathematics and complicated
15:28computational models to really take in
15:30lots and lots of data that clinicians
15:32using their usual clinical guess to
15:34predict outcomes for cancer patients.
15:36And Sanjay, the example that
15:39you gave us in early lung cancer
15:42where you said,
15:43the demographics
15:45we took the imaging data,
15:47but then we took the treatment plan
15:50and we use that and we ask the computer
15:53to look at these things down to the
15:57pixel level and then predict outcomes.
15:59And it was very good at predicting those
16:02outcomes as good as an experienced clinician,
16:05and that's great, but
16:07my question is
16:09what happens if you
16:12don't?
16:12How do you get rid of the treatment
16:16part of that and predict outcomes
16:18without treatment to kind of get at
16:21the idea of, are we over treating
16:23some patients?
16:24Because if the computer doesn't have that,
16:27if all patients are treated,
16:29and that's the basis on which it learned,
16:32how do you take out one part of that model?
16:36That's a good question,
16:37and it's something that I
16:39think is not impossible.
16:40So I think that if we remove the
16:42treatment piece of it and just look at
16:44the images and the demographic data,
16:47basically pretreatment information,
16:48we find the model is actually
16:50quite predictive as well.
16:51It just improves significantly if we
16:52know exactly what types of treatments
16:54we provided for the patients,
16:56and so another example of a study that
16:57we've done which only uses pretreatment
16:59imaging has been evaluating lymph nodes
17:01in head and neck cancer patients.
17:03We were attempting to look
17:04at which lymph nodes we saw
17:06on CT imaging actually had the
17:08presence of cancer and we wanted to
17:10identify that so that maybe you know
17:12what we could do is more tailor
17:13the therapy for head and neck cancer
17:15patients. Oftentimes with head and neck cancer patients
17:17when we think that their lymph
17:18nodes don't have cancer,
17:19we have them undergo surgery
17:21and then we find
17:22those lymph nodes have cancer.
17:24They have to get radiation
17:25and chemotherapy altogether.
17:26And so if we were able to identify
17:28the patients ahead of time that
17:29have cancer in the lymph nodes than
17:31what they would have instead is
17:33just chemotherapy and radiation.
17:34They save themselves some surgery and
17:36so that's an example of when we've
17:38used pretreatment imaging to sort of
17:39reduce potential extra care or care
17:41that maybe would not be necessary
17:44or could have been avoided.
17:46Do we have datasets with patients who
17:49were treated in different ways so that
17:52we can predict given pretreatment data,
17:54if you got treatment a,
17:56you will do this well,
17:58if you got treatment b
17:59you will do that well and if you got
18:02treatment c you will do this well.
18:05And if you got no treatment you would
18:08do just as well as any of the above.
18:12Yeah, so that actually kind of
18:13touches upon something that we're
18:15really actively exploring and NOTE Confidence: 0.904884934425354
18:16something that we're very excited about.
18:18So one thing that
18:19we've kind of indicated,
18:21is that these machine
18:22learning algorithms, these deep
18:23learning algorithms are extremely
18:24good at analyzing pictures,
18:26and so one thing that we've looked at is
18:28this idea of what we call digital Twins.
18:30So, based on your pretreatment imaging,
18:32nothing else no demographics,
18:33just what your tumor looks like,
18:35if we could find your digital twin
18:37or someone who's tumor looks exactly
18:38like yours or digital family,
18:40which is maybe a group of five
18:42people that are like that,
18:43we can use deep learning to do that.
18:45And then what we can do is we can
18:48see, OK among your digital twins,
18:50or your digital family
18:51who got one type of treatment,
18:53this is what their outcome was
18:55and among your digital family who
18:57got another type of treatment,
18:58this is what their outcome was.
19:00And then they
19:01can make a more informed decision about
19:03what they would actually want to do.
19:05We talk a lot on this
19:07show about tumor heterogeneity and
19:09different kinds of cancer and tumor
19:11biology and all kinds of other things
19:13that
19:15seemed to really affect
19:17biology, they seem to affect outcomes,
19:19so how is it that you can use that
19:21data without knowing all of the
19:23other things?
19:25I think for one, there's a lot of
19:28evidence to suggest that deep learning
19:29algorithms can actually predict changes
19:31in tumors up to the genomic level,
19:33so genetic mutations and tumor is based
19:35off of the pictures because we have to
19:37appreciate that their really evaluating
19:39every tumor at a very, very small level,
19:41every little pixel and each pixel has
19:43a variety of different intensities,
19:45and so they're really looking
19:46at the data in a close level.
19:48So in lung cancer as well as in brain tumors,
19:52an also in some lymphomas there's
19:54been evidence to suggest that deep
19:55learning algorithms on the diagnostic
19:57images can predict genomic changes.
19:59So like driver gene mutations
20:02that would actually maybe presumably need
20:05sequencing information for and so then
20:07that suggests that the pictures actually
20:08have a lot more information than we think,
20:11but I do think that you're kind of right
20:14in one way that maybe it's not just the
20:17pictures that tell the whole story.
20:19But the idea is that people
20:21whose tumors look similar,
20:23they likely have similar genomic backgrounds.
20:25I'm still puzzled by
20:27this whole concept because
20:29we think about a CT scan or a mammogram,
20:33or an MRI and each of these
20:35has its own pitfalls.
20:37There are false positives on these images,
20:39there are false negatives on these images,
20:42so it kind of makes me a little wary to
20:45put so much faith just in the images.
20:49How do you explain
20:50that?
20:52I guess one of the limitations or one
20:54of the important caveats to any sort of
20:57machine learning project is that your
20:59outcomes are only as good as your data,
21:01and so if we have a lot of false
21:03positives in our data set that we have
21:06not addressed and we haven't identified,
21:08then I think that it's really important
21:10that we understand that the machine learning
21:12algorithm will learn those same errors.
21:14So similarly any sort of biases that we have,
21:16maybe we have a bias towards over
21:19imaging or over
21:21diagnosing something on an image,
21:22those similar biases will be promulgated
21:24through our machine learning algorithms.
21:26It's actually somewhat of an interesting
21:28topic in the context of machine learning
21:30outside of healthcare is that we
21:31find that a lot of machine learning
21:33algorithms they mimic the same biases and
21:36discriminatory abilities
21:37that people have in regular practice,
21:40and so a lot of algorithms that
21:42have been used in law enforcement,
21:44we find that are actually maybe
21:46promulgating some of the parts
21:48of our law enforcement
21:50that we don't want.
21:52One way to kind of get around that
21:55one would think is to use more data,
21:58not just the images,
22:00but get down to the genomic level,
22:02do the biopsy. We've got biopsies on
22:05most tumors before we ever treat them,
22:07so look at the pathologic information,
22:10look at the genomic information.
22:11We can get a lot of
22:14sequencing data these days.
22:16And speaking of which,
22:17it's really hard to understand what all
22:19of these different mutations are when
22:21we think about whole exome sequencing.
22:23I mean, I would think that
22:26machine learning might have
22:27a role to play there too.
22:29Certainly, I think that machine learning
22:31in general is probably one of the more
22:34common approaches to evaluate genomic data.
22:36Now, because the genome is so complex
22:38and it's so difficult for us to kind of
22:41understand that the machine learning
22:44algorithms are maybe the most common
22:46ways in which we analyze that sort of
22:49information now,
22:50specifically with respect to deep learning,
22:52which is what our lab is,
22:55particularly just specific types
22:56of machine learning methods.
22:57It's somewhat difficult to evaluate
22:59genomic information with that data,
23:00and the reason
23:02for that is
23:05because we don't actually have a
23:07huge data set right now at our
23:09disposal of genomic information,
23:11because in order to do some of these
23:13analysis, we need upwards of 1000 patients,
23:15and so it's difficult to get 1000
23:18patients with tumors, images, and
23:19whole exome sequencing,
23:21but it's possible,
23:22and I would
23:24venture that a place
23:26like Yale is the place
23:28that would have that ability to do that or
23:31or some of these cooperative groups, right?
23:34For our listeners,
23:37there are clinical trials that happen
23:39all across the country, sometimes all
23:41across the world with cooperative groups.
23:44These groups of clinicians,
23:45physicians who are all putting their
23:48patients on exactly the same clinical trial
23:50and taking their data,
23:52putting it in a central repository
23:54where all of that can be studied.
23:56Sanjay, I would think that
23:58would be an ideal place for you
24:01to get that data.
24:03Yes, and one effort of our research
24:05group is actually sort of engaging
24:07with the cooperative groups.
24:08There's two that we've
24:10begun engaging with, the NRG
24:12which is a large group that has a
24:14lot of radiation data as well as the
24:17Southwest Oncology Group also
24:19known as SWOG in order to sort of develop
24:21an infrastructure within the
24:23organization to evaluate machine
24:24learning techniques and utilized
24:25machine learning techniques.
24:26Because a lot of what
24:28we've designed, these clinical
24:29trials and these repositories is that
24:31these cooperative groups,
24:33their infrastructure wasn't made for
24:34these sorts of analysis because they
24:36weren't necessarily thinking that
24:38this is something that was going to
24:40come on the horizon.
24:42And so one thing that we're working
24:44right now is with SWOG and NRG
24:46to develop that sort of infrastructure.
24:48The first process of that is
24:50developing something that allows us
24:52to get the imaging data very easily.
24:54Images are sort of an easy
24:57method for us to evaluate machine
24:59learning methods because one,
25:01it's been shown to be the most effective
25:03in image analysis across various industries,
25:05healthcare, technology,
25:06etc.
25:06and secondly imaging in healthcare
25:08has a standardized data format.
25:10It's a common data model,
25:12so there's no difficulty about well,
25:14so and so in California stores
25:16their data one way,
25:17and then we store it a different way, etc.
25:20One thing that you mentioned,
25:23which I still have to go back to, is
25:26you said that you're
25:27interested in deep learning,
25:28which is a type of machine learning that
25:31is particularly well suited to imaging.
25:34Tell us the difference between
25:35deep learning and machine learning.
25:37Yeah, that's a
25:38good question,
25:40and the words,
25:41artificial intelligence,
25:42machine learning,
25:43and deep learning.
25:44sort of get thrown around together,
25:46and it's difficult to parse them out.
25:48I think that machine
25:50learning is a broad
25:51discipline of various types of
25:53mathematical techniques to model data.
25:54Deep learning is just one
25:56of those techniques.
25:57Now the difference between deep
25:59learning and other traditional machine
26:00learning techniques is that other
26:02machine learning techniques require,
26:03you know inputs that are called
26:05features and so they can only handle
26:07data that comes in a featured format.
26:10So sort of predictor variables that
26:11you're interested in demographic
26:13variables or variables from
26:14the electronic medical record.
26:15Deep learning is particularly
26:17unique in that it doesn't actually
26:19require data at all from a human.
26:21It doesn't require any sort
26:22of human interaction.
26:23It can learn those features on
26:25its own as long as it has access
26:27to what they called the sensor.
26:29So where the data is generated and as data
26:32is being generated in real time,
26:34deep learning algorithms can analyze it,
26:37identify those features that
26:38are very important,
26:39so those predictors are important and then
26:43create predictions.
26:44How exactly does this happen?
26:46Somebody's gotta
26:48program this thing right?
26:50Yes, it is programmed,
26:52typically in Python and so the way
26:55that the process works for developing
26:57a deep learning algorithm is first
27:00you have a set of training data.
27:02And the associated labels to that data.
27:04So you already have data
27:06with outcomes that you know,
27:07and that's the process you're
27:09trying to predict and then what you
27:11do is you design your deep learning
27:13algorithm using a complex series
27:14of what they call neural networks
27:16and what we do is we kind of train
27:18the algorithm by looking at each of
27:21those training data set labels to
27:22identify sort of patterns in the data.
27:24And it takes a significant amount of
27:26time and a huge amount of computational
27:28resources in order to do that.
27:30So deep learning algorithms that
27:31we developed in our lab oftentimes
27:33take weeks to train.
27:34And so we just let it run all week
27:37and let it run every single time
27:39and every piece of information
27:42that it gets
27:43it looks at the
27:44outcome and it tries to learn
27:46a little bit more from it.
27:48And so presumably you
27:51design this algorithm, you let it run,
27:53the machine tries to learn what it can to
27:56try and improve its prediction each time,
27:58and then you test it on a separate set.
28:02Yes, exactly and in order for
28:05this to be utilized because
28:08I'm thinking about how this can be
28:11utilized in the clinic in terms of you
28:15know one day you may walk into a clinic,
28:18have your CT scan, have your biopsy,
28:21fill out some paperwork on your
28:24demographics and I can imagine a time
28:27when all of that information is put into
28:31a computer or stored in the computer.
28:34This algorithm runs in the background
28:37and spits out to the clinician,
28:40this patient's prognosis is X.
28:43The ideal treatment out of A, B&C is B
28:48and that kind of thing.
28:50I could see that happening
28:51in the future. I think we're
28:53a little bit far away from
28:55complete automation in that way,
28:57and partly because I think that
28:58we don't have a good hold on
29:01the data that we think is the most
29:03important and we don't have a good
29:05way of storing all that information,
29:07but I think that it's not something that I
29:10would be surprised we're doing in 5-10 years.
29:13Doctor Sanjay Aneja is an assistant
29:15professor of therapeutic radiology
29:17at the Yale School of Medicine.
29:19If you have questions,
29:21the address is canceranswers@yale.edu
29:22and past editions of the program
29:24are available in audio and written
29:26form at Yalecancercenter.org.
29:28We hope you'll join us next week to
29:30learn more about the fight against
29:32cancer here on Connecticut public radio.