# Using and Improving the Real-World Data Ecosystem in Cancer

January 11, 2023## Information

Yale Cancer Center Grand Rounds | January 10, 2023

Presentation by: Dr. Jeremy Warner

ID9362

To CiteDCA Citation Guide

- 00:00So it's a pleasure to introduce Jeremy
- 00:03Warner, who actually I first met
- 00:06this year when when I was chairing
- 00:08a panel at ESMO and he was
- 00:11one of the speakers who we invited.
- 00:15Jeremy is the director of the Brown
- 00:18Lifespan Center for Cancer Bioinformatics
- 00:21and data science and associate professor
- 00:24technically pending I guess, at Ed Brown.
- 00:29His clinical focus is Morgan
- 00:31hematology and he received
- 00:34his medical degree from from Boston
- 00:37University and also in a Masters in
- 00:41Photonics and Electrical and Computer
- 00:43Engineering from UC San Diego.
- 00:46In addition to his focus on
- 00:48malignant hematology, Dr.
- 00:50Warner is a leading expert in
- 00:51the clinical and translational
- 00:53clinical and translational
- 00:54cancer informatics research,
- 00:56including high dimensional data
- 00:58analysis and visualization,
- 00:59natural language processing
- 01:01of narrative oncology texts,
- 01:03and the creation and
- 01:04implementation of health data standards.
- 01:08Before coming to Brown.
- 01:12Jeremy was at Vanderbilt
- 01:14University Medical Center,
- 01:16where he was an associate professor of
- 01:18medicine and biomedical informatics.
- 01:22And I should also note that he
- 01:24is the deputy director of Escos
- 01:27Clinical Cancer Informatics Journal
- 01:29and a founding director of the New
- 01:31Brown University Center for Cancer
- 01:35Bioinformatics and Data Science.
- 01:37So without further
- 01:38ado, you're going to speak to us
- 01:40about using and improving real world.
- 01:42Hey there. Ecosystem in cancer.
- 01:45Thanks. Look forward to it.
- 01:49Thank you. Thank you so much for having me.
- 01:51And if anybody wants to
- 01:52come up to Providence,
- 01:53just one stop away on the Acela, so.
- 01:58Really nice that we're so close here.
- 02:00In New England, so I just have a few
- 02:03disclosures first before I get started.
- 02:05So I have some grant funding,
- 02:07some consulting.
- 02:08I do have ownership in hemlock.org LLC,
- 02:11but has no monetary value unless one
- 02:13of you wants to be an Angel investor
- 02:15and we can talk after the presentation.
- 02:18So what I'm going to talk
- 02:20about here is you know why,
- 02:22why do we need real world data and real
- 02:24world evidence in oncology and I'm going
- 02:26to focus on electronic health records.
- 02:28There are other sources of
- 02:29real world data of course,
- 02:31but you know most of this talk
- 02:33will really focus on the ER,
- 02:34the HR.
- 02:35I will spend some time on talking about
- 02:37in particular interest in mine which
- 02:39is standardizing systemic anti cancer
- 02:41treatment representations and then
- 02:43I'll spend some time talking about
- 02:46our COVID and COVID-19 and cancer consortium.
- 02:49Which is a bit of a culmination,
- 02:50if you will,
- 02:51of some of these thoughts.
- 02:52So there are some learning objectives here.
- 02:55If hopefully this is a CME,
- 02:57so we'll we'll cover,
- 02:59you know,
- 03:00some aspects of natural language
- 03:02processing and how it can be used
- 03:04to get information out of EHR's,
- 03:05why we need formal representations
- 03:08for complex concepts.
- 03:10Such as systemic anti cancer therapy
- 03:12and then learning about how these these
- 03:14ideas went and propelled the COVID,
- 03:16the C19 registry.
- 03:20OK, so first of all, you know,
- 03:22probably everybody might be already
- 03:23familiar with these definitions,
- 03:24but I think it's always helpful to go over,
- 03:26you know, what is real world data,
- 03:28what is real world evidence.
- 03:30And you know, it's nebulous a little
- 03:32bit and depending on where you,
- 03:34you know, the, the resource you look at,
- 03:36you'll get a different definition.
- 03:37But this is my definition,
- 03:39which is really based on something
- 03:41called the I kW pyramid.
- 03:44Has anybody heard about this?
- 03:46Heard of the DKW? OK great.
- 03:47So teach you a little bit here.
- 03:49So the idea here is that it's a
- 03:52pyramid where you're climbing a
- 03:54levels here from a base of data.
- 03:57The next step is information.
- 03:59The next step is knowledge.
- 04:00The next step is wisdom.
- 04:01You'll know there's 5 levels here.
- 04:03There's a little tiny level at the top
- 04:05which some people use you for understanding.
- 04:08But basically the idea is no matter
- 04:10what where the data comes from.
- 04:13What it is,
- 04:14whether it's from a randomized control
- 04:16trial or case control registry,
- 04:19etcetera,
- 04:19the idea is that as you move up this pyramid,
- 04:21you're generating real-world evidence,
- 04:23whereas real world data is really that base.
- 04:26On the right here you see the
- 04:28the sort of traditional pyramid
- 04:30of of evidence based medicine.
- 04:33So if you look at this from another
- 04:35dimension kind of looking from above,
- 04:36when we think about cancer in particular,
- 04:39you know I think about sort of
- 04:41three big aspects of cancer,
- 04:42there's the genotype, the phenotype and
- 04:44then the environment and sort of for
- 04:46each of these you have these layers.
- 04:48So, so if you think about the
- 04:50data level for genotype that might
- 04:52that's just the sequence right,
- 04:53just the somatic tumor sequence.
- 04:56For phenotype,
- 04:57it might be just a histologic type,
- 05:01a cell, you know what, what is that?
- 05:03And for environment,
- 05:04it might be pollutant levels.
- 05:05Now this is data,
- 05:06but it's not really telling you anything,
- 05:08right.
- 05:08So we need to kind of walk up this pyramid.
- 05:10The next level for for these three buckets
- 05:13would be for genotype and environment.
- 05:15You might talk about pathogenicity.
- 05:17What does that change mean in terms of is it,
- 05:20is it a driver mutation?
- 05:21That's sort of the next level information
- 05:24cancer behavior on the phenotype.
- 05:26Side so is it,
- 05:28is it aggressive,
- 05:29is it a high grade malignancy
- 05:31or is it something indolent
- 05:33kind of stepping up further?
- 05:35For genotype knowledge,
- 05:36the knowledge level is actionability.
- 05:39What can you do with this information?
- 05:41Can can you actually prescribe a medication
- 05:44that will change the outcome for a patient?
- 05:46Phenotype,
- 05:46same,
- 05:47you know,
- 05:47just generally speaking what are
- 05:49the treatment options and then the
- 05:50environment are there risk modifications
- 05:52that can be taken and then really
- 05:53getting to that top level wisdom,
- 05:55you know this is this is
- 05:57really complicated now.
- 05:59So in in phenotype you're thinking
- 06:01about what are patient values
- 06:03and preferences and how do those
- 06:05influence what treatment options
- 06:06you might consider for genotype,
- 06:08what's the tumor going to do once
- 06:10it gets exposed to treatments,
- 06:12how is it going to evolve under
- 06:13treatment pressure and an environment
- 06:15you've got issues about social justice.
- 06:16And structural racism.
- 06:17So those are all kind of like.
- 06:19The ideas of climbing this pyramid alright,
- 06:21hopefully I've convinced you
- 06:22the difference between data and
- 06:23evidence as we kind of step up.
- 06:25Now, why do we need this real-world evidence?
- 06:28Well, clinical trials are wonderful,
- 06:30but they're also expensive,
- 06:32slow to conduct, and they don't
- 06:35always represent the full population.
- 06:37At risk also trials,
- 06:40prospective trials,
- 06:41collect some but not all potentially
- 06:44pertinent information.
- 06:45And our space is huge.
- 06:46Oncology,
- 06:46the treatment space and oncology is huge.
- 06:49And then lastly,
- 06:50I think last but not least is that
- 06:51we've got this enormous data source,
- 06:53which is the electronic medical record.
- 06:56So just a few words about each
- 06:57of these items.
- 06:58So when you think about
- 06:59trials and disparities,
- 07:00this is a paper we just published
- 07:02very recently and this one was
- 07:06earlier this year and we just
- 07:07published another one in Jim
- 07:08Oncology looking at prostate cancer.
- 07:09This one looks at immune checkpoint
- 07:11inhibitors across cancers.
- 07:12And basically the take home message here
- 07:14is that when you look across Childs,
- 07:16there is really a lot of disparity
- 07:20in who enrolls in trials.
- 07:22And it can be different by cancer type,
- 07:25but it's pretty consistent across the board.
- 07:27And it's not always underrepresentation.
- 07:29Sometimes it's over representation,
- 07:30as you can see from the bottom row.
- 07:32But you know, essentially the yellow ones,
- 07:35the yellow circles are intersections of,
- 07:40in this case, gender, age,
- 07:44race and ethnicity, and a cancer type
- 07:46where the enrollment is as you'd expect.
- 07:49If it's green, it's sort of more than
- 07:50you'd expect, and if it's red, it's.
- 07:52It's less than you'd expect.
- 07:53So. So you know this,
- 07:55this gets that generalizability and there
- 07:57might be statistical ways around this,
- 07:59but you know, essentially.
- 08:03Our knowledge from clinical trials
- 08:05is primarily coming from younger.
- 08:08White men, OK, so.
- 08:12How about the information that gets left out?
- 08:14So this is, this was amazing.
- 08:17This is the recovery group that
- 08:19really geared up during the early
- 08:21days of the COVID pandemic and found,
- 08:23you know,
- 08:24pragmatic trials that they ran in
- 08:26the UK and they found some really
- 08:29important treatment options for COVID.
- 08:31This is one of their papers.
- 08:32This is probably the most impactful look,
- 08:35showing that dexamethasone could
- 08:38help hospitalize patients with COVID.
- 08:40And I've excerpted a table from that paper.
- 08:45Take a minute and let.
- 08:46So what's missing from this table?
- 08:48So this is a table of previous
- 08:50coexisting diseases in the
- 08:51patients who have COVID.
- 08:52Is there something missing?
- 08:55From this table.
- 08:58Something that's the topic of this talk.
- 09:02Cancer, right.
- 09:03There's no cancer in the stable.
- 09:06They did not collect cancer and
- 09:07and or they didn't report it.
- 09:09Well, we actually, we actually went and
- 09:11you know got their case report forms,
- 09:14they didn't record cancer.
- 09:15So here they enrolled 10s of thousands
- 09:17of patients in these trials.
- 09:18And they don't know if these
- 09:20patients had cancer or not.
- 09:21And so I mean, amazing work,
- 09:23but we're missing a key piece of information.
- 09:27And then sort of the last item you know
- 09:29that I met that I mentioned before is
- 09:31that this idea that our treatment space
- 09:34is huge but head-to-head comparisons
- 09:36of important drugs are mostly absent.
- 09:39And I'll just give you one example.
- 09:41So this is the space of PD1 inhibitors
- 09:44which have changed our obviously changed
- 09:46our fields from our hemac knowledge
- 09:48base which I'll talk about a bit later.
- 09:51We have 137 trials that have been published
- 09:55using 64 different regimens of of various.
- 09:5781 inhibitors.
- 09:58This includes XUS by the way.
- 10:01If you're like they're not 13 P you
- 10:02want to have actually there are,
- 10:04but many of those are only approved in China.
- 10:07So 83 of those are phase three trials.
- 10:11Take home point is one of those 83
- 10:14actually compared to PD1 inhibitor
- 10:16to a PDL 1 inhibitor kind of it
- 10:19actually compared to Kobe matanov and
- 10:21atezolizumab and that's grand total
- 10:23of zero of these trials compared 1PD1
- 10:26inhibitor to another PD1 inhibitor so.
- 10:29You know,
- 10:29maybe I'm missing some trials that are
- 10:31ongoing now that have yet to be published.
- 10:32But at this point in time,
- 10:34we don't have any data at all on whether
- 10:371PD1 inhibitor is better than another
- 10:40except for indirect treatment comparisons,
- 10:42so.
- 10:43Hopefully I've convinced you that.
- 10:45We should at least think
- 10:47about using real-world data.
- 10:49But.
- 10:49They are messy,
- 10:51ambiguous and unpredictable.
- 10:53So let me talk about some some
- 10:55challenges that we have once we
- 10:57start delving into the real world.
- 10:59So first of all.
- 11:01This is real-world data from the Medline.
- 11:07Institution information.
- 11:08OK, so did you know that there were 21
- 11:12clinical trial institutions in New Haven?
- 11:15Did you know that?
- 11:16That's amazing, right? Here they are.
- 11:19Smilow Cancer Center, Smilow Cancer Hospital,
- 11:23smilow cancer hospital at Yale,
- 11:26Smilow Cancer Hospital at Yale University,
- 11:28Yale Cancer Center.
- 11:30Yale Cancer Center and Smilow Cancer
- 11:32Hospital, Yale Medical school.
- 11:33Alright, I think you get the idea, right?
- 11:36So I mean this is real world.
- 11:38I mean you have to do something.
- 11:40I mean, a computer is not going to know,
- 11:41right? I mean,
- 11:42so if you want to use this data in some way,
- 11:46someone's got to do some work
- 11:48to actually fix this, right?
- 11:49That is a big part of working
- 11:52with real world data.
- 11:53Yale New Haven hospital.
- 11:55There's the 21st, OK?
- 11:58OK. So how about,
- 12:00so that's bibliometrics to some degree,
- 12:02how about treatments,
- 12:03how many tyrosine kinase
- 12:05inhibitors are there?
- 12:07And so this is a little project that
- 12:10a student of mine undertook where they
- 12:13mapped out how many letters you'd have
- 12:16to switch around or basically misspell.
- 12:19So that one tyrosine kinase inhibitor
- 12:21would actually be another one.
- 12:23And so it's it's fewer letters
- 12:25than you think and.
- 12:27You know, these drugs get misspelled
- 12:30all the time in a pretty amazing ways.
- 12:34I see that the net,
- 12:34there's a little bit of formatting
- 12:36issue with the next slide, but.
- 12:38So this this is real data.
- 12:43From the Vanderbilt University
- 12:45Medical Center.
- 12:46So this is from our text list of medications.
- 12:49Now you might say, oh,
- 12:51let's just, you know,
- 12:52we've got to be able to get these
- 12:53medications from structured data.
- 12:54That may or may not be true.
- 12:56It depends. We can talk more about that.
- 12:58But these are real misspellings
- 13:00of the drug or Latino BI.
- 13:01Think you can tell looking at this
- 13:03that all of these are or lot in him.
- 13:05But again,
- 13:06I mean if you don't have some sort of
- 13:08system to harmonize all those misspellings.
- 13:12You're not going to know
- 13:13which patient got what drug.
- 13:15So.
- 13:15So that's you know, that's a real
- 13:18world issue with real world data.
- 13:20This is work that we did some
- 13:22years ago on staging,
- 13:23so cancer staging.
- 13:26Here is what I call manageable ambiguity.
- 13:30All right,
- 13:32so.
- 13:32And again and and maybe you
- 13:34know during sort of discussion
- 13:35we can talk about the value of
- 13:37structured versus unstructured data,
- 13:39but the idea here is that we would
- 13:41take data from progress notes from
- 13:44clinical text to all these notes.
- 13:46And figure out if a patient had stage
- 13:49123 or four. So just forget ABC.
- 13:51We're just trying to go for the big stages.
- 13:55And you know,
- 13:56the problem that we knew ahead of
- 13:58time is that these things are going
- 14:00to be recorded variably in different
- 14:01notes by different types of doctors.
- 14:03But you know we did a pilot
- 14:06with about 1000 patients with
- 14:07lung cancer with over 460,000
- 14:10clinical documents across them.
- 14:11Now if you pause for a minute and you think
- 14:14about. A chart review.
- 14:15Think about how long it would take you to
- 14:18go through 460,000 documents, right so.
- 14:20Here's my pitch for natural
- 14:22language processing.
- 14:23You can actually automate this kind of
- 14:26thing and and do this kind of work at scale.
- 14:30So cutting to the chase a little bit here.
- 14:35First of all, we found that
- 14:36out of those 964 patients,
- 14:3899% had some kind of stage freeze
- 14:41in their note. At least one.
- 14:45And we also had a gold standard
- 14:47which was the tumor registry data.
- 14:48So we were able to compare
- 14:51our system to to the subset.
- 14:53You'll notice only 790 out of
- 14:56those 964 had tumor registry data,
- 14:57but we were able to do a comparison and you
- 15:00know our system worked really pretty well.
- 15:03The green, you know,
- 15:04basically the matches are in the green,
- 15:05the big numbers and we got some things wrong,
- 15:08but we didn't usually get things
- 15:10really wrong most of the time.
- 15:11So if it was stage one,
- 15:12we called stage four,
- 15:13that was a big mistake.
- 15:15Only happened once.
- 15:19This, this shows actually,
- 15:22so again 460,000 documents.
- 15:24So what we wanted to say is.
- 15:26And you have to look at all
- 15:28of those or can you just look
- 15:30at notes that were written?
- 15:32Right after a patient was diagnosed
- 15:34you know with if you think of
- 15:36some of this inspiration for this
- 15:38project came from the copi measures.
- 15:39And if any of you have done that work
- 15:41you'll remember I believe and they
- 15:43may have changed but at one point
- 15:45the coping measure was was stage
- 15:46recorded in one of the first two
- 15:48progress notes written after diagnosis.
- 15:50So it kind of makes sense that
- 15:52you would look for stage early on
- 15:54but if you look at this black line
- 15:56here at the bottom. So does this.
- 15:58Are you seeing my?
- 15:59You don't see the arrow, are you?
- 16:01I don't think you're seeing the error, OK.
- 16:03If you look at the black line
- 16:04towards the bottom, you'll see that.
- 16:06If you look at the the notes in the
- 16:08first five weeks from diagnosis,
- 16:09actually there's a pretty high
- 16:11rate of unknown stage.
- 16:12Like we couldn't determine it.
- 16:13It wasn't until we got to five
- 16:15weeks and out that we had enough
- 16:17mentions of stage that we could
- 16:19sort of make that determination.
- 16:21So we saw this kind of inflection point.
- 16:23And so that's another thing just
- 16:25to to note when you're working
- 16:27with real-world data is that,
- 16:28you know, time matters,
- 16:30time can matter a lot.
- 16:31And the other thing that really matters is,
- 16:33is ambiguity.
- 16:34So I mentioned we found stage
- 16:36in 99% of the records.
- 16:37What I didn't mention is that
- 16:40most of those are 84% had more
- 16:42than one stage in their records,
- 16:44OK and some some degree of discordance.
- 16:48So one note might say they have stage one,
- 16:51another note night say they have stage two.
- 16:54Actually when we constructed a network
- 16:56graph on the right here you see like
- 16:59every possible combination was present,
- 17:00every possible combination including
- 17:02you know more terms that are
- 17:05more generic like early stage,
- 17:07advanced stage.
- 17:08Everything you know happens and you
- 17:11know and and and on the bottom left
- 17:14here you can see a histogram of of Co
- 17:17occurrences of various stage information.
- 17:19But I do think that so that
- 17:22really potentially ambiguous.
- 17:23One take home point from this
- 17:25though is that we we use a
- 17:27really simple decision rule on,
- 17:28you know, what is the actual stage?
- 17:30We just chose the phrase that showed up
- 17:33the most OK and that and that seems to work.
- 17:35So if stage three shows up
- 17:37in the notes 100 times,
- 17:38in stage one shows up twice.
- 17:40Chances are at stage three now,
- 17:42just sort of a practical rule and it worked.
- 17:45Now getting back to that, you know,
- 17:48whole idea of.
- 17:49You know unknown or sort of lack of
- 17:51information and missingness which is
- 17:53a major issue with real world data.
- 17:55This is another mini project we
- 17:58did looking at colon cancer and
- 18:00we wanted to say could you find
- 18:03patients with stage 3 colon cancer
- 18:05and this was for the OCM project,
- 18:07the oncology care model.
- 18:08So you know really important as a
- 18:10metric to know if these patients
- 18:11got appropriate treatment within
- 18:13appropriate period of time.
- 18:14But again what we saw here.
- 18:16Is this sort of crossover at
- 18:18about seven weeks, at which point?
- 18:22You know, the the stage was changing
- 18:23or it was or is missing in the records
- 18:26and it wasn't until about seven weeks
- 18:27after diagnosis that you get to a
- 18:29kind of steady state where you can
- 18:31definitively say a patient has stage
- 18:33three or we don't know the stage so.
- 18:37Here's some really interesting
- 18:38work from here, actually from Yale,
- 18:41from the Radiation Oncology department,
- 18:43where they they actually looked at
- 18:46missingness as a variable, if you will.
- 18:49So they took the National Cancer database,
- 18:52the NCDB data, and they split patients
- 18:56into whether they had complete records or
- 18:59had some missing data from their record.
- 19:02Now the NCDB is not EHR data, right?
- 19:05But it is based on EHR data.
- 19:08So it I would call it a real
- 19:09world data source because it's,
- 19:11you know, curated out of EHR data.
- 19:13And you know the punch line here is that
- 19:17missing this is an independent prognostic
- 19:19factor for survival which is really an
- 19:22interesting thing to think about, right.
- 19:24And and it kind of depends on what kind
- 19:27of what kind of cancer you have as well.
- 19:29So they found for instance on the left.
- 19:31If you have non small cell lung cancer,
- 19:33it's the non metastatic patient
- 19:35who had a real difference in their
- 19:38prognosis if they were missing data.
- 19:40Whereas with prostate cancer it was the
- 19:42metastatic group that sort of split apart.
- 19:44But either way, I mean this is.
- 19:47Yeah, just think about it for a
- 19:48minute while I get my water bottle.
- 19:49It's.
- 19:50Interesting.
- 19:58OK. It's it's certainly not something
- 20:01that we conventionally use as a metric.
- 20:04Certainly not in a clinical trial
- 20:06because there's it's not an issue, right?
- 20:09Case report forms are complete,
- 20:11but missing this itself can be
- 20:13informative as in real world data.
- 20:16So what I wanted to do now is actually take
- 20:19us down a little different path briefly,
- 20:21which is a brief diversion into
- 20:22the history of medical records.
- 20:24Anybody know what the what this is?
- 20:26It's a local local.
- 20:31Eli Whitney's mill. OK,
- 20:32so it's kind of cool if you never been there.
- 20:34It's still there.
- 20:35Doesn't exactly look like this anymore,
- 20:38but you'll see why I'm showing
- 20:39this in a in a couple slides, so.
- 20:43So this is also this is a real thing.
- 20:47OK, so this is one of my favorite vehicles
- 20:51from the Lane Motor Museum in Nashville.
- 20:54Which is Doctor Weiner mentioned,
- 20:55I was there for about a decade
- 20:58and so this is a real vehicle.
- 20:59There's they actually have a
- 21:01collection of these and it makes
- 21:02me think of electronic medical
- 21:04records because it it works, right?
- 21:06It it actually this person's
- 21:08actually driving this car.
- 21:10But we don't exactly see propeller driven
- 21:12cars on the roads these days, right?
- 21:15So our ER, but it works.
- 21:17So MR's are functional,
- 21:18but are they fit for the purpose
- 21:19that we want to use them for?
- 21:23I think many of us have, you know,
- 21:25some ideas about that, but you know,
- 21:27when you think about medical records,
- 21:28this is obviously a little bit
- 21:31before the computer, you know,
- 21:33medical records have been around for.
- 21:36Almost, you know,
- 21:373500 years in one form or another.
- 21:39But what's interesting to me?
- 21:42Is that they were primarily
- 21:43used for teaching or didactics.
- 21:45Until very recently,
- 21:46that was the only purpose of medical records.
- 21:50And then sort of the second
- 21:53purpose that arose, if you will,
- 21:56didn't arise until the 1880s.
- 21:57It's not that long ago if you think about it.
- 21:59And that was for legal purposes,
- 22:02legal defense.
- 22:03And, you know,
- 22:05essentially to have a written record
- 22:07of what happens in case there was a
- 22:10lawsuit around medical malpractice.
- 22:12And we'll skip that and sorry
- 22:14about the there's some Mac to
- 22:17PC changes here with the font.
- 22:19So it's a little bit hard
- 22:20to read some of this,
- 22:21but you know how about billing
- 22:23that that's billing is the
- 22:25major driver rate of how our
- 22:27medical records look like today.
- 22:29But that only really happened in
- 22:311960s is really not long ago and until
- 22:34you know not so long ago physicians
- 22:38were paid with food and lodging.
- 22:41If they were lucky.
- 22:44This is a picture from the Confucian
- 22:47medical system where there's at
- 22:49least some cases where the the court
- 22:52physician was basically executed if
- 22:53the if the emperor did not get better.
- 22:56So that's a pretty harsh payment
- 22:59or penalty if you will.
- 23:00But you know what really changed
- 23:02things was the Medicare Act of 1965,
- 23:05which basically established this profile.
- 23:08You know,
- 23:09quote usual customary and
- 23:10reasonable fees which.
- 23:11Drive so much of what we do.
- 23:13And sorry about the font
- 23:14that's messed up here,
- 23:15but there's a quote from the AMA,
- 23:18the American Medical Association,
- 23:20that said that the 1965 Medicare Act
- 23:22was the most deadly challenge ever
- 23:24faced by the medical profession.
- 23:26That's actual quote.
- 23:29It certainly changed things a lot.
- 23:30And then what I'd argue also changed
- 23:33things was really more recent was
- 23:35in the 90s when the physician fee
- 23:36schedule was introduced and then
- 23:38something called the evaluation and
- 23:40management guidelines, which I think.
- 23:41A lot of us know more than
- 23:43we ever wanted to know about,
- 23:44but those really changed how
- 23:47medical records were were written.
- 23:51Noticed that haven't yet used
- 23:52the word electronic, right?
- 23:54So now what about patient care,
- 23:57which I think all of us want that to
- 23:59be the primary purpose of medical records.
- 24:02This kind of dates back to
- 24:05the 1800s in some ways.
- 24:06The case records of the
- 24:08Massachusetts General Hospital.
- 24:10Introduced some ideas like
- 24:11history of presenting illness,
- 24:13past medical history and so forth,
- 24:15medical record numbers.
- 24:16The whole idea that you would track a
- 24:19patient by a number was introduced at
- 24:20the Mayo Clinic in the early 1900s,
- 24:22where they also introduced the chief
- 24:25complaint and the review of systems.
- 24:28And then the American,
- 24:30the American College of Surgeons,
- 24:32this is amazing bit of history
- 24:34if you didn't know in 1918, they.
- 24:38There was no federal mandate of any.
- 24:40They basically mandated as
- 24:41a professional organization.
- 24:42They mandated that hospitals had to
- 24:45keep records including a discharge
- 24:47summary that basically said was the,
- 24:50you know, patient,
- 24:50you know alive or dead at the time they left.
- 24:52And at that time fewer than 20% of
- 24:56physicians kept any kind of record at all,
- 24:58which is like. Amazing, right?
- 25:03Now this is tying back to that Eli Whitney.
- 25:05So this is, you know,
- 25:07for those that did take records,
- 25:09this is kind of what they looked
- 25:11like as these are called case books.
- 25:12I'm not sure where this one is from,
- 25:14but it's basically a handwritten.
- 25:18And and what's really interesting about
- 25:20this is that it's physician centered, right.
- 25:23This is not,
- 25:24this is a diary basically it's not you know,
- 25:26one patient has one book,
- 25:28this was written as.
- 25:30The doctor saw patients,
- 25:31so if you ever wanted to go back and say OK,
- 25:34Mr. Smith or whoever,
- 25:35like put their case together,
- 25:37good luck.
- 25:40So really the the most recent innovation
- 25:42if you will in medical records was
- 25:44this one and that from Austin from
- 25:46the mid 1960s which is the problem
- 25:49oriented medical record which which
- 25:51was conceived as a quote medical
- 25:53record that guides and teaches.
- 25:55So kind of back to that idea of
- 25:57didactics in a way and and I'm sure
- 25:59everybody's familiar with this,
- 26:00this idea, this soap notes, right.
- 26:04What I like from the paper when
- 26:07doctor we'd introduced this idea.
- 26:09This is a quote which I think
- 26:12actually forecasts the ER right so,
- 26:15and it's worth reading it.
- 26:16It can be readily,
- 26:17readily be seen that all narrative
- 26:19data presently in the medical
- 26:21record can be structured,
- 26:22and in the future all narrative data
- 26:24may be entered through a series of
- 26:26displays guaranteeing a thoroughness,
- 26:27retrievability,
- 26:28efficiency and economy important
- 26:30to the scientific analysis of a
- 26:32type of datum that has hitherto.
- 26:34Been handled in a very unrigorous manner.
- 26:36It's an amazing quote.
- 26:37I mean,
- 26:38this is essentially before any
- 26:41electronic medical record, right?
- 26:42But he basically saw it, saw it coming.
- 26:47I think the most important part of
- 26:49this quote is this to be concluded.
- 26:52We're living through the evolution
- 26:54of these electronic medical records.
- 26:57This is actually a two-part paper,
- 26:59that's why it says this.
- 27:00But I think, you know,
- 27:00he could have been like OK,
- 27:02we don't know what's going to happen.
- 27:05It's worth taking a step back
- 27:07and saying what you know what.
- 27:09So now I'm going to say electron what
- 27:11is the electronic health record for?
- 27:14And it's got primary uses and secondary uses.
- 27:16So the primary uses are are patient
- 27:19care and delivery, financial billing.
- 27:20But it's this, when you talk about
- 27:23real-world data and real world evidence,
- 27:24that's a secondary use,
- 27:26as it's conceived here in this model,
- 27:28which the Institute of Medicine put forward.
- 27:31All right.
- 27:31So moving ahead a little bit.
- 27:33So this is where we were in the mid 2000s
- 27:36and this is when I was in medical school.
- 27:38At that time there was issues
- 27:41around funding to you know roll out
- 27:44electronic medical records and.
- 27:46What I like on the bottom here is in
- 27:482003 the mass medical society did a
- 27:50survey where 89% of physicians wanted
- 27:52EHR data, but 48% refused to use an ER.
- 27:56So little bit of a disconnect there
- 27:59and and by 2004.
- 28:01Hardly anybody was using medical records.
- 28:04So what changed?
- 28:05Arguably this this is, you know,
- 28:08one of the events that really changed things.
- 28:11Is everybody familiar with Katrina
- 28:14and what happened in in New Orleans?
- 28:17Does everybody know why the?
- 28:20Record so there's a picture there
- 28:21on the right.
- 28:22It's everybody know why those
- 28:24were in the basement.
- 28:26That flooded.
- 28:27It's the they are so heavy that the
- 28:29building literally would have collapsed
- 28:32under the weight of the paper if
- 28:35they'd been up on higher floors.
- 28:37So that's why they have their
- 28:39medical records in the basement and
- 28:41and they were all destroyed, right?
- 28:42They were all just lost.
- 28:43So, so fast.
- 28:45We're a little bit the High Tech Act in 2009,
- 28:48which Obama signed this this is what really.
- 28:51You know,
- 28:52gave a lot of money for institutions
- 28:54to really start putting any Mrs.
- 28:56but what is interesting is if
- 28:58you look at sort of the adoption
- 29:01curve and there's a couple,
- 29:03I won't get into the details here.
- 29:04There's a couple ways of like what is an
- 29:06EHR basic versus complete and so forth,
- 29:09but you actually see.
- 29:12You actually see them starting,
- 29:13so here's E&amp;M coming out in the mid 90s.
- 29:16Here's Katrina in 2005.
- 29:19There's the High Tech act.
- 29:20By the time the High Tech Act comes out,
- 29:22actually we're like well on the
- 29:25adoption curve and so, you know,
- 29:27definitely help things along, but you know,
- 29:30the process is already starting.
- 29:32Umm.
- 29:33And then you know where.
- 29:35So this is already five years old,
- 29:37but I think, you know it's it's.
- 29:40And sorry,
- 29:41sorry again,
- 29:41can't see the text there.
- 29:42But you know already by
- 29:44five years ago people were
- 29:46reporting that EHR's were
- 29:47a major driver of burnout.
- 29:49So, so you know, it's problematic.
- 29:51But OK, here's a here's a
- 29:53few other challenges. So.
- 29:55And I'm sure everybody who's
- 29:56clinical knows these things already.
- 29:58But carry forward a copy pasting is
- 30:02ubiquitous in medical records and
- 30:04there's just a ton of redundancy.
- 30:06Here's a paper that basically shows that.
- 30:09Umm. You know, large,
- 30:11large portions of any note you
- 30:13particularly look at have been
- 30:16copied forward from other notes.
- 30:18Progress notes.
- 30:19In particular,
- 30:20more than half of progress
- 30:22note material is copy copied
- 30:24forward from previous notes.
- 30:27This is a different study looking at you
- 30:31know how many progress notes have a manually
- 30:33entered text versus copied in any kind.
- 30:34And you can see again like very few progress
- 30:38notes have have fully written text.
- 30:40Which you would say is fully original, but.
- 30:44So I think it's a legitimate question
- 30:45to say what are we dealing with here?
- 30:47Is it a giant pile of paper
- 30:48or is there actually meaning.
- 30:49So this is a little little tiny project
- 30:52I did and when during fellowship where
- 30:54I basically took one of my patients
- 30:57charts and I counted up like how
- 30:59many data points are in that chart.
- 31:02And you can see the blue bars are all the
- 31:05structured data elements like billing
- 31:07codes or vital signs or lab values.
- 31:11And then these red bars are the words
- 31:13in the clinical documents and you
- 31:15see that that just drowns out right,
- 31:17the structured data.
- 31:18So there's a lot of data there but.
- 31:21It's awesome.
- 31:22There's even more than that, right?
- 31:24So in this chart.
- 31:26And this is small these days, right?
- 31:28So this was more than 10 years ago,
- 31:31there was another 277 pages of scanned
- 31:34documents with 69,000 words in them
- 31:37that were basically inaccessible,
- 31:38but and and the take home point here.
- 31:41Is that this is what it all boils down to,
- 31:44OK?
- 31:46Patient with diffuse large B cell lymphoma.
- 31:48It was a complete remission after
- 31:50getting 6 cycles of our chop.
- 31:52I think that's enough for most research.
- 31:54OK now how can we,
- 31:57how can we boil things down like
- 31:59that because that's that's kind of
- 32:00maybe what we're talking about here.
- 32:02So and of course there's more to it right.
- 32:06But you know when you think about what's
- 32:09in ER's or EHR's and and what is not.
- 32:13Umm.
- 32:13You have to know what you're,
- 32:15you have to know what you're going to find,
- 32:16right.
- 32:17So, so let's say you know you've
- 32:19unlocked this medical record,
- 32:21but it's not necessarily going
- 32:22to have what you want.
- 32:23So here's, here's some, you know,
- 32:25basically some big buckets, right.
- 32:28So you're going to find the person's
- 32:30date of birth, no problem, right.
- 32:31But you're not going to find
- 32:33probably where they were born,
- 32:34the circumstances of their birth,
- 32:35where their complications.
- 32:37Very unlikely, because they will have.
- 32:39You know they won't have lived
- 32:41their whole system with their
- 32:42life within the electronic air,
- 32:43and they won't have all that data.
- 32:45You might find their biologic sex,
- 32:47no problem,
- 32:47but are you going to find
- 32:48their gender orientation,
- 32:49that sexual identity?
- 32:50You'll find race in this city,
- 32:52but are you going to find other
- 32:55social determinants of health?
- 32:56You'll find the medications
- 32:58that they are prescribed,
- 32:59but will you find what they actually took,
- 33:01the medication that they took and the
- 33:04regimens and we're going to get into
- 33:05that next you'll find laboratory tests,
- 33:08but you want necessarily find images.
- 33:10And so forth.
- 33:12So kind of you know as moving
- 33:15forward. Thinking about what you know,
- 33:17the low hanging fruit.
- 33:20You know it's the cancer type,
- 33:22it's easy like we don't need to create
- 33:23a new system to get cancer type.
- 33:25You can get that from billing codes,
- 33:27registry data.
- 33:28The treatments are hard like our chop
- 33:32times 6, that's hard determining that
- 33:34the patients in a complete remission,
- 33:37that's really hard.
- 33:37So what I go for the middle,
- 33:39I don't go for the middle ground, right,
- 33:40I'm going to tackle the thing in the middle.
- 33:43So now I'm going to switch gears here
- 33:45for a bit and talk about our work on
- 33:49standardizing systemic anti cancer treatment.
- 33:51And before I get into that.
- 33:55If you've not seen this XKCD cartoon,
- 33:58it's a classic.
- 33:59And this is a challenge, right?
- 34:01Whenever you decide to create a new
- 34:04standard or you actually just you know.
- 34:07Just creating more complexity or not.
- 34:09Hopefully we're not.
- 34:10Well, what we did in this space,
- 34:12there really weren't 14 existing standards.
- 34:14There were none.
- 34:15And so as everybody here knows,
- 34:17I could skip past this slide.
- 34:19Chemotherapy regimens are complicated and
- 34:22given in cyclic fashion combinations.
- 34:26This was the standard when
- 34:27we got started on our work.
- 34:30This is, you know,
- 34:31one example of these things called
- 34:32cancer chemotherapy handbooks,
- 34:34kind of recipe books, physical books,
- 34:37right.
- 34:37With some some details here,
- 34:39but maybe not enough.
- 34:41Here's another example from 2005.
- 34:46Which if you kind of look in
- 34:49detail about what's there.
- 34:51There's there's a lot of optionality here,
- 34:54some of the references.
- 34:56Here's a little excerpt from the Adenoma.
- 34:58I don't know carcinoma of
- 35:00unknown primary section,
- 35:00but the references are to non small
- 35:03cell lung cancer so there's sort of a
- 35:05mismatch there in the evidence base.
- 35:07So what we did is we.
- 35:09Really basically tried to collect
- 35:11all this information and put
- 35:12it into a computable format,
- 35:13which is our hemlock.org website and
- 35:16the ontology that comes from it.
- 35:19So he might.org is a is a website with
- 35:22the goal to collect all standard of
- 35:25care systemic anti cancer treatment.
- 35:27That's the goal.
- 35:29It's a big goal and at the website has
- 35:32grown over more than a decade now.
- 35:35Of almost 1000 primary content pages,
- 35:37over 7000 references,
- 35:39and a large editorial board,
- 35:41actually members of which are from Yale.
- 35:44And and many page views,
- 35:46so 1.4 million page views last year,
- 35:49we do get visitors from all over
- 35:52the world were primarily US based.
- 35:55I always like to throw in that we've
- 35:58had one visitor from North Korea.
- 36:00I don't know who it is,
- 36:01but I don't think I want to know.
- 36:03So what what can we do with this website?
- 36:06So what we did over time,
- 36:08over the past 11 years is create
- 36:10a structure such that we could
- 36:12actually take the content and
- 36:14develop a formal model.
- 36:15And so this is the model?
- 36:17Or this is part of the model?
- 36:20And I don't have time right
- 36:21now obviously to kind
- 36:22of go through all these details,
- 36:24but it's somewhat complex and enlarge
- 36:27we have over 100,000 concepts and
- 36:31300,000 ways in which those are
- 36:34interrelated in the latest version.
- 36:36This is yeah, this is basically,
- 36:39this is showing, you know,
- 36:41I don't have time to actually
- 36:43show the website.
- 36:43This is a screenshot from the
- 36:45website showing basically that
- 36:47each regimen on the website is.
- 36:49In such a way that we can take
- 36:50all those pieces and put them
- 36:52into the into the data model.
- 36:54And and and then we can start to do
- 36:55cool things with real world data.
- 36:56So here's a project that we did
- 36:59with with some folks in South
- 37:02Korea who basically had access to.
- 37:07Essentially medication level database.
- 37:08And remember I mentioned you know way
- 37:11back when that we might get medications,
- 37:13but to actually understand regimens
- 37:15we have to do something extra.
- 37:17And So what they did is they
- 37:19applied our model and they mapped
- 37:21medications through regiments and
- 37:22and they were able to look basically
- 37:25over a decade of time 2008 to 18.
- 37:27And you can see here that you know the
- 37:29changing pattern of care in that country.
- 37:31So you see that for example of you know
- 37:35bevacizumab wasn't used really until
- 37:372014 and then it started getting popular.
- 37:40And by the year 2018,
- 37:42it's, you know,
- 37:43full Fox and Bevacizumab Kappa,
- 37:44a good chunk of of the treatment regimens,
- 37:48whereas something like fluorouracil
- 37:50monotherapy essentially disappears
- 37:52off the off the scene by the
- 37:55by the time you get there.
- 37:57This is much more recent so that
- 37:59that's from a couple of years ago
- 38:01now we're working with folks at the.
- 38:05University of California System have a
- 38:08really cool combined database across all
- 38:10the UC's and California is kind of a.
- 38:12You know, country unto itself, once you
- 38:14start putting all this data together,
- 38:16this is just from UCSF and again
- 38:20we're taking, we're taking medication
- 38:23exposure data including time stamps
- 38:25and we're mapping that to regimens.
- 38:29And and you see that.
- 38:32At least nowadays, full fernox is
- 38:36the most popular regimen there.
- 38:39And so that's that alone is
- 38:40an interesting thing, right?
- 38:43You also see some funny things, right?
- 38:45Like so I didn't know Leuprolide was a
- 38:47treatment for pancreatic cancer, did you?
- 38:50Is it? Not no, right. No.
- 38:54But these are real patients, right.
- 38:56And they actually have second malignancies.
- 38:57So these are people who have
- 38:59also have prostate cancer and
- 39:00they're also getting leuprolide.
- 39:02So you, you, you kind of have to you know,
- 39:04it's not enough to get that data out.
- 39:06You've got to,
- 39:07you got to apply knowledge, right.
- 39:09You've got to, you've got to determine,
- 39:10you know, am I, what am I looking at?
- 39:12Does it make sense, is it?
- 39:14Is it relevant and and?
- 39:17And so that that that's why we're
- 39:19seeing things like that so.
- 39:21Umm. Here's another kind of.
- 39:24Here's another.
- 39:24Gives you a taste of what we can look at.
- 39:26So that this is this is looking at folfirinox
- 39:29and and then looking at cycle by cycle.
- 39:32What's happening?
- 39:33So one of my long-term interests,
- 39:35as well as Doctor Zach here,
- 39:37is to understand treatment delays,
- 39:40dose reductions.
- 39:42Removals of medications from a regimen drop,
- 39:45you know, dropping a drug and this starts
- 39:47to get at that and you can kind of see,
- 39:50you know,
- 39:50each of those bars represents cycle to cycle,
- 39:52the cycle you see.
- 39:54People.
- 39:55People dropping out, right.
- 39:56And and so and then you can actually see
- 39:59why and you can see on the top here.
- 40:02These, these bands at the top are showing.
- 40:04You know, these these are folks.
- 40:05You don't think that you have
- 40:06a pointer or something.
- 40:07Oh, actually, let's see if this will.
- 40:10Yeah.
- 40:11So,
- 40:11so you see these bands coming across,
- 40:13those are basically patients that are
- 40:15progressing and going on to a second
- 40:17line treatment what's not shown here.
- 40:19Just to spare you a little
- 40:20bit on the visual side,
- 40:22our patients who are or stopping therapy
- 40:24and and essentially transitioning to
- 40:26Hospice or some sort of end of life
- 40:29care and that's this big bar here.
- 40:32And then some patients these
- 40:34little these little ones they're
- 40:36going to a deescalated regimen.
- 40:38So they're dropping the.
- 40:40Arena taken or the oxaliplatin and
- 40:43so you can really start to see these
- 40:45patterns of care in the real world data so.
- 40:49OK,
- 40:49so.
- 40:50This is my little advertisement for Humalog.
- 40:54It's available to you.
- 40:56You can you can download the whole
- 40:59thing and and and mess around with
- 41:01it if you're an academic or non
- 41:04commercial user and just Google
- 41:06Hemac dataverse and you'll find it.
- 41:09Or you can you can use these links.
- 41:11It's also available through something
- 41:14called the Odyssey Athena vocabulary.
- 41:17And and yeah, I mean, we want more users.
- 41:20There's I think a lot more
- 41:22that can be done with it.
- 41:23So along comes a pandemic.
- 41:26So, so now I want to spend the last
- 41:29little bit here talking about the
- 41:32COVID-19 and cancer consortium.
- 41:34Which yells a member and this
- 41:36is our mission statement,
- 41:38which has been the same since
- 41:40we were created in March 2020,
- 41:43which is our goal is to collect
- 41:44and disseminate prospective,
- 41:45granular, uniformly organized
- 41:46information on people with cancer
- 41:48who are diagnosed with COVID-19 at
- 41:50scale and as rapidly as possible.
- 41:52But what I want to talk about
- 41:53here for a minute is sort of what
- 41:55I call the ancillary goals of
- 41:56C19 or the unwritten goals.
- 41:59So one of those was, you know,
- 42:00can we build a consortium,
- 42:02can we build an airplane while also flying?
- 42:05Just, you know, can we do it?
- 42:06That was the question.
- 42:07Convening a group of stakeholders
- 42:09was really in, you know,
- 42:11a goal including patients,
- 42:12really engaging patients and then.
- 42:15Pertinent to the talk today,
- 42:17can we demonstrate the additive value of
- 42:19real world data elements that are not
- 42:21easily obtained from structured EMR data?
- 42:22We knew that there were other efforts
- 42:24kind of getting rolling that were based
- 42:26on what was in that structured data.
- 42:28If you remember that's the.
- 42:31The tiny little blue bars right
- 42:34on the graph I showed you.
- 42:37So we wanted to, you know,
- 42:38get more than that. So this is.
- 42:43This is back in back in Rhode Island.
- 42:46Alright.
- 42:46Showed you Eli Whitney earlier.
- 42:48This is the this is Slater Mill
- 42:50in Pawtucket which I think I
- 42:53pronounced correctly but I'm
- 42:55getting my New England shops.
- 42:57And and what's interesting to me about
- 43:00this story is that he earned this name,
- 43:02Samuel Shredder Slater and and the
- 43:05reason he was branded as a traitor
- 43:08is that he was accused of stealing
- 43:11the ideas for industrialization.
- 43:14From from the from the,
- 43:16from England where he was born and grew up,
- 43:20and then replicating it in America.
- 43:22So this is really the beginning of
- 43:25the American Industrial Revolution.
- 43:27But what's interesting about that is
- 43:29that he didn't exactly steal the ideas.
- 43:32Like he didn't steal blueprints
- 43:33or things like that.
- 43:34He just like memorized them and
- 43:37brought the knowledge with him.
- 43:39So it's, you know, that's that's what he did.
- 43:42So. I think that that's great actually.
- 43:46And so you know when we think
- 43:48about C 19 and I certainly don't
- 43:49have time to go through all this,
- 43:51but we have many inspirations he,
- 43:55the hemlock,
- 43:56what I just spoke about is one of them.
- 43:58But in all the domains of C19,
- 44:01we are borrowing best ideas,
- 44:04modifying sometimes and putting
- 44:07together this consortium and and
- 44:09this is just sort of a list of that.
- 44:12The other thing I wanted to
- 44:13say about you know
- 44:14the C 19. Work and just,
- 44:15you know, research in general
- 44:18is that sponsors are critical.
- 44:20In a Samuel Slater's case,
- 44:22he had a sponsor named Moses
- 44:23Brown who basically fronted him
- 44:25the money to build those mills.
- 44:26And our sponsor is Julie Klem at the
- 44:29NCI who didn't front us any money but
- 44:32was very supportive and helped us kind of,
- 44:36you know, surface and socialize our ideas.
- 44:39So this is our.
- 44:40This is our data schema,
- 44:43and what I want to emphasize here related
- 44:45to this talk is that everything in red.
- 44:47Is not available in structured data,
- 44:50so as we sort of built this up.
- 44:53You know,
- 44:53some of these things you can collect,
- 44:55you know, in many different ways.
- 44:56But the red items.
- 44:57And you'll see in a few slides
- 44:59that those turn out to be critical
- 45:01things like ECOG performance status,
- 45:03things like toxicity of
- 45:06cancer treatment pneumonitis.
- 45:08Items like that,
- 45:09that we really wanted to zero in on.
- 45:12I'm going to skip this slide.
- 45:12I'm going to skip this and
- 45:15I'm going to just say that.
- 45:17We've done really pretty well on capturing
- 45:19what I would call elusive variables.
- 45:21So these are kind of the
- 45:23things that they're in the ER,
- 45:25but they're in that unstructured.
- 45:27Leak of data,
- 45:28but we we got a lot of them.
- 45:31So cancer status is the patient.
- 45:34Getting better,
- 45:35getting worse or staying this,
- 45:37you know the same as before,
- 45:38a stable disease.
- 45:39We have that in over 95% of the patients.
- 45:41Even smoking status is hard to get right.
- 45:44We have that.
- 45:46Did COVID affects the patients
- 45:48treatment plants that's not going to
- 45:50be unstructured data necessarily.
- 45:52We have over 90% on that on the ECOG
- 45:55which is a notorious notoriously
- 45:57difficult thing to get and and all
- 46:00the various efforts such as flat
- 46:01iron and and so forth have had
- 46:03had challenging and cancer link
- 46:04have had challenges with this.
- 46:06We have we have ECOG data on 88%
- 46:09although that does that includes
- 46:10patients who just didn't have
- 46:12any ECOG recorded but we that
- 46:13knowledge of no ECOG is still.
- 46:15Knowledge,
- 46:16right?
- 46:18And you know getting to our
- 46:20getting to our results again in
- 46:22just focus on the red and what
- 46:24we found is that these factors,
- 46:26these elusive factors are really important.
- 46:28And so this is unadjusted
- 46:31just kind of descriptive.
- 46:32If you had progressing cancer at
- 46:35baseline you get COVID your 30
- 46:37day mortality is 26% and if you
- 46:39had an ECOG of two or higher your
- 46:41your mortality is extremely high.
- 46:45And we also found that immunosuppression
- 46:48which is a somewhat nebulous
- 46:52definition and we have our
- 46:54definition here which is complex,
- 46:56which includes a lot of things you
- 46:58can't easily get out of structured data.
- 47:00So this is sort of the real
- 47:02world data is a is a huge,
- 47:04is a huge driver of of mortality.
- 47:08And if you look at the right on the right,
- 47:11the, the yellow table basically
- 47:12those are the patients who are
- 47:14immunosuppressed at baseline.
- 47:16And across the board,
- 47:17even younger patients have
- 47:19substantial mortality in our data set.
- 47:24Furthermore, if you add on
- 47:26top of that active cancer.
- 47:28So are they immunosuppressed
- 47:29and they have active cancer.
- 47:30Again, we have our definition for that.
- 47:34Because if you're not immunosuppressed
- 47:36and you have inactive cancer,
- 47:37in our data, at least you have a
- 47:39zero chance of dying in the 30 days,
- 47:42whereas if you're older immunosuppressed,
- 47:43your chance goes all the way up to 30%.
- 47:45So really a huge spread here
- 47:47based on these these data.
- 47:50And then if we start to look at.
- 47:53Multivariable adjusted analysis.
- 47:54Again, we see that these factors
- 47:58like ECOG or cancer status are are
- 48:01highly associated with outcome,
- 48:03both mortality as well as severity,
- 48:05which means hospitalization,
- 48:07intubation and so forth.
- 48:09We saw this as well more recently when
- 48:11we looked at vaccinated patients.
- 48:13So patients who are getting
- 48:15breakthrough COVID-19 after vaccine
- 48:16again we saw things like cancer
- 48:19status really you know being a,
- 48:20you know huge adjusted odds ratio
- 48:23there of six if you had an active and
- 48:26progressing cancer of of dying in 30 days.
- 48:31So I could talk about COVID
- 48:33C19 itself for an hour,
- 48:35but I'm going to pause and so I just
- 48:38want to share some parting thoughts.
- 48:40So, first of all, I think I'm,
- 48:43I'm a, you know, I'm, I'm,
- 48:44I'm a believer here that real
- 48:46world data has a great potential
- 48:48to yield real-world evidence if.
- 48:50We approach it with an understanding
- 48:51about the completeness issues,
- 48:52the accuracy issues,
- 48:54and we anticipate them and we come
- 48:56up with either ways to adjust for
- 48:58them or or avoid certain data,
- 49:02certain variables in the first place.
- 49:05We need a so.
- 49:07Yes, we need automated methods, right?
- 49:09Like, it wouldn't be great if NLP
- 49:11and a computer could do everything,
- 49:12but in reality a lot of real world
- 49:14data and real world evidence
- 49:16depends on human curators going
- 49:18into EHR's pulling out that data.
- 49:20And to do that we need rigorous approaches.
- 49:23We have a paper published earlier
- 49:25this year describing the approach
- 49:26we used in ACR Genie.
- 49:27I encourage you to check that out.
- 49:29It basically gets into.
- 49:31You know you need directives.
- 49:33You need you need two people
- 49:35to to independently curate the
- 49:37same record at a certain rate so
- 49:40you can see if you know there's
- 49:43comparability between their results.
- 49:45And so forth.
- 49:46If if there's widespread adoption
- 49:48of standards such as M code,
- 49:50hemac, omop and so forth,
- 49:52that that will increase the usefulness
- 49:54of structured data margin markedly.
- 49:57I think NLP is having a moment.
- 50:01If you pick up the newspaper nowadays,
- 50:03like you're going to see our other paper on,
- 50:05you're going to see an article on chat,
- 50:07GPT, for example,
- 50:08which is generative NLP but sort of
- 50:11the other side of NLP, and then Umm.
- 50:16You know, really important though,
- 50:18and I didn't get to touch on this at
- 50:20all except for the very beginning when
- 50:22I alluded to disparities and bias.
- 50:23There's a lot of concern that.
- 50:26That,
- 50:26you know,
- 50:27working with real-world data might
- 50:29might actually make biases worse that
- 50:32are already present in that data.
- 50:34So we need new approaches to to to.
- 50:38New approaches to deal with that issue.
- 50:41Just have some acknowledgements here.
- 50:42So there's two slides here.
- 50:43So this is my first acknowledgement slide.
- 50:45I acknowledge the himanka.org
- 50:47editorial board.
- 50:49Others that have worked on
- 50:50it are funding and and Dolly,
- 50:52which is the creator of some
- 50:54of those graphics you saw.
- 50:56And here's our acknowledgement for the C19,
- 50:58which is a huge endeavor that has more
- 51:02than 600 active investigators at this point.
- 51:06And with that,
- 51:07I will pause for questions.
- 51:16So I'll I'll start. And can you see
- 51:21the ones that are online or not?
- 51:23I do see, yes, yes, I see.
- 51:27So I I, I don't for a second dispute
- 51:30the value of real world data in terms
- 51:33of being able to answer questions,
- 51:35but I'm struck by the fact that we
- 51:39have these two extremes we have.
- 51:41Randomized controlled trials where
- 51:43we spend a fortune to collect every
- 51:47last bit of data and you know they
- 51:52cost $15,000 per patient or more.
- 51:56And we get lots of useless data as
- 51:59part of it. And then we then say,
- 52:02well, we can't do get everything
- 52:04from randomized controlled trials.
- 52:06So then we go to real world data
- 52:09where everything's pretty messy
- 52:10and you have to make all these
- 52:12assumptions and clean up the data.
- 52:14And the the question is,
- 52:17is there a role for much simpler randomized
- 52:22trials done as part of standard?
- 52:26Practice.
- 52:29I mean sure, yeah,
- 52:30I mean I think the recovery trial,
- 52:32I mean that they they showed that you
- 52:34can do these huge pragmatic trials
- 52:37in 10s of thousands of patients with
- 52:40they didn't spend a lot of money.
- 52:41I mean they used off the shelf drugs,
- 52:43right, dexamethason you know things.
- 52:46Some of the drugs we won't,
- 52:47we won't say the words but you
- 52:48know and things like oxygen right.
- 52:50But when you get into the you
- 52:52know the expensive drugs that
- 52:53are not yet FDA approved,
- 52:54I think that's a whole other area but.
- 52:57I I think that FDA has got to lead
- 53:00the way in some ways here because they
- 53:02and I didn't get to talk about this,
- 53:05but you know,
- 53:06there's a high profile rejection
- 53:08of real-world data within the
- 53:09last month or two that.
- 53:11You know,
- 53:11there was an attempt to get something
- 53:12approved based on some real-world data.
- 53:14And I think they rightly looked at that
- 53:15and they said that this particular
- 53:17set of data is not trustworthy
- 53:18and we're not going to go for it.
- 53:19But I don't think that that should
- 53:21shut down the whole endeavor.
- 53:23I think that they need,
- 53:24they need,
- 53:24we need guidance from them and
- 53:26and and you know about what
- 53:29components should and should not be,
- 53:31you know, collected routinely.
- 53:32I think that might simplify things a lot.
- 53:36Attempt to put together criteria that
- 53:38that would allow you to say that this.
- 53:41This set of real world data is adequate
- 53:44to Brock inclusions from you know,
- 53:47in terms of how much it has to be cleaned up,
- 53:49how large the sample size has to be.
- 53:54I think. It it's such an interesting
- 53:58question and I'm I'm not aware of of
- 54:01anything at this moment but I do you
- 54:03know we are there's this great bias
- 54:05that I just learned about called the
- 54:08informed presence bias which I kind of
- 54:10knew I knew it but not by those words
- 54:12but that basically means that patients
- 54:14who spend a lot of time in the clinic
- 54:17or the medical system have a lot of data
- 54:19right whereas those that don't don't
- 54:21and and and it's and and it's actually
- 54:24an incredibly important source of.
- 54:26The bias? That. That.
- 54:30You know, can you?
- 54:31So if a patient doesn't spend enough
- 54:33time to get enough data generated,
- 54:35that's something we should know.
- 54:37That's something we need to know, right?
- 54:38But that's almost that kind of,
- 54:40you know, descriptor is almost
- 54:42never available in in any real world
- 54:44data study to my knowledge, so.
- 54:47The online version.
- 54:53Yeah.
- 55:07Schedule.
- 55:11What is?
- 55:16COVID-19.
- 55:28Yeah. Yeah. So the question is,
- 55:32it seems to be the case that the
- 55:34patients with the pre-existing cancer
- 55:37having worse outcomes during the COVID
- 55:40era than before and why might that be?
- 55:43I can say from our consortium now we
- 55:44only look at patients who had COVID.
- 55:46So that's a subset, right?
- 55:47Well, as time goes on,
- 55:49it's going to be everybody maybe.
- 55:50But what we do see is that you know at
- 55:53least in our registry 40% of patients
- 55:55have their treatment altered in
- 55:57some way and usually that's a delay.
- 56:00But sometimes they can't get the
- 56:02same treatment that they were getting
- 56:04before a surgery gets cancelled,
- 56:06you know etcetera, etcetera.
- 56:08And and we know from you know previous work,
- 56:11obviously the treatment delays don't
- 56:13usually ever. Work out very well.
- 56:15So we haven't yet systematically
- 56:17evaluated that,
- 56:18but we have you know now several
- 56:20thousands of those patients.
- 56:21So we're going to be looking at
- 56:23that probably in the upcoming year.
- 56:24As far as other patients, well,
- 56:27I mean and there were a lot of practice
- 56:30changes, right, especially in.
- 56:34Especially in China, I think,
- 56:35but also with sort of substituting
- 56:38oral medications whenever possible,
- 56:40even if they were sort of known
- 56:42to be inferior or not, you know,
- 56:44not quite as good so that patients
- 56:46didn't have to come into the.
- 56:47To the clinic.
- 56:48So that's been presented on in
- 56:50in in some settings,
- 56:52but you know I think what we think that
- 56:54those substitutions are are generally OK,
- 56:56I know that.
- 56:58You know a lot of people went on
- 57:01neoadjuvant hormone therapy and
- 57:02instead of going direct to surgery
- 57:04for early stage breast cancer and
- 57:06you know so that they could push
- 57:08this you know during periods of time
- 57:10when when elective quote UN quote
- 57:12elective surgeries were shut down.
- 57:14So all those things probably add up right.
- 57:17But there's but there's absolutely
- 57:19a factor of psychology and patients
- 57:22being afraid to come into the
- 57:24clinic and you know potentially
- 57:26again skipping a a treatment or.
- 57:29So,
- 57:29so I think.
- 57:30To answer your question is that it's
- 57:33quite complex but I think we need to
- 57:35understand it better and of course
- 57:38new diagnosis coming in which we're
- 57:40starting to get that information.
- 57:42There's clearly a stage migration and and
- 57:44you know to later stage more more advanced,
- 57:46more metastatic disease.
- 57:48Because of delays in screening and so forth.
- 57:51So.
- 57:51So I think we're going to face a
- 57:53we're going to face a challenging
- 57:54decade and I you know I think Ned
- 57:56Sharpless forecast that at the
- 57:57very beginning of the pandemic.
- 57:58I think in the first month or two he
- 58:00wrote a paper and nature of science
- 58:01I think you know modeling out what
- 58:03what that might look like and and
- 58:05and you know that's probably going to
- 58:07that's probably going to come true but.
- 58:10Hopefully COVID ends really soon.
- 58:13So. Um, yeah.
- 58:20Seems like.
- 58:31Yeah so we're so we're overtime and and
- 58:33I think you know I mean there's there's
- 58:35many strategies to try to mitigate but
- 58:37you can't you can't eliminate bias right.
- 58:39So you you can understand it you
- 58:41can try to mitigate it there's you
- 58:43know there's matching strategies to
- 58:44if you're doing sort of a you know
- 58:46case and control style approach where
- 58:48you you try to make the controls as
- 58:50similar to the cases you know and
- 58:52everything but the. Closure.
- 58:54So you know and and and some of those are,
- 58:56some of those been around for decades,
- 58:58some of those are kind of
- 59:00emerging at this point.
- 59:01But I don't think we can forget that there's
- 59:04bias in in perspective trials as well, right.
- 59:08So I mean I think either side
- 59:10of the of the coin.
- 59:13Yeah, it's just, it's just,
- 59:14it's just one more thing and it's
- 59:15not the only, I mean it's there's
- 59:17also there's ascertainment,
- 59:18but I mean there's a lot of biases,
- 59:20right and. You know, one thing we've
- 59:22worked on with our consortium is developing
- 59:25standardized language around limitations,
- 59:27which I think is critical because you know.
- 59:31I mean, the data is the data are
- 59:34the data use the plural, right? But.
- 59:37But the way it's presented really
- 59:40does influence the reader, right?
- 59:42So. So that's something we're
- 59:43thinking about and might have some.
- 59:45You know, thought pieces or something
- 59:47coming out about about how to handle that.