Skip to Main Content

Using and Improving the Real-World Data Ecosystem in Cancer

January 11, 2023
  • 00:00So it's a pleasure to introduce Jeremy
  • 00:03Warner, who actually I first met
  • 00:06this year when when I was chairing
  • 00:08a panel at ESMO and he was
  • 00:11one of the speakers who we invited.
  • 00:15Jeremy is the director of the Brown
  • 00:18Lifespan Center for Cancer Bioinformatics
  • 00:21and data science and associate professor
  • 00:24technically pending I guess, at Ed Brown.
  • 00:29His clinical focus is Morgan
  • 00:31hematology and he received
  • 00:34his medical degree from from Boston
  • 00:37University and also in a Masters in
  • 00:41Photonics and Electrical and Computer
  • 00:43Engineering from UC San Diego.
  • 00:46In addition to his focus on
  • 00:48malignant hematology, Dr.
  • 00:50Warner is a leading expert in
  • 00:51the clinical and translational
  • 00:53clinical and translational
  • 00:54cancer informatics research,
  • 00:56including high dimensional data
  • 00:58analysis and visualization,
  • 00:59natural language processing
  • 01:01of narrative oncology texts,
  • 01:03and the creation and
  • 01:04implementation of health data standards.
  • 01:08Before coming to Brown.
  • 01:12Jeremy was at Vanderbilt
  • 01:14University Medical Center,
  • 01:16where he was an associate professor of
  • 01:18medicine and biomedical informatics.
  • 01:22And I should also note that he
  • 01:24is the deputy director of Escos
  • 01:27Clinical Cancer Informatics Journal
  • 01:29and a founding director of the New
  • 01:31Brown University Center for Cancer
  • 01:35Bioinformatics and Data Science.
  • 01:37So without further
  • 01:38ado, you're going to speak to us
  • 01:40about using and improving real world.
  • 01:42Hey there. Ecosystem in cancer.
  • 01:45Thanks. Look forward to it.
  • 01:49Thank you. Thank you so much for having me.
  • 01:51And if anybody wants to
  • 01:52come up to Providence,
  • 01:53just one stop away on the Acela, so.
  • 01:58Really nice that we're so close here.
  • 02:00In New England, so I just have a few
  • 02:03disclosures first before I get started.
  • 02:05So I have some grant funding,
  • 02:07some consulting.
  • 02:08I do have ownership in hemlock.org LLC,
  • 02:11but has no monetary value unless one
  • 02:13of you wants to be an Angel investor
  • 02:15and we can talk after the presentation.
  • 02:18So what I'm going to talk
  • 02:20about here is you know why,
  • 02:22why do we need real world data and real
  • 02:24world evidence in oncology and I'm going
  • 02:26to focus on electronic health records.
  • 02:28There are other sources of
  • 02:29real world data of course,
  • 02:31but you know most of this talk
  • 02:33will really focus on the ER,
  • 02:34the HR.
  • 02:35I will spend some time on talking about
  • 02:37in particular interest in mine which
  • 02:39is standardizing systemic anti cancer
  • 02:41treatment representations and then
  • 02:43I'll spend some time talking about
  • 02:46our COVID and COVID-19 and cancer consortium.
  • 02:49Which is a bit of a culmination,
  • 02:50if you will,
  • 02:51of some of these thoughts.
  • 02:52So there are some learning objectives here.
  • 02:55If hopefully this is a CME,
  • 02:57so we'll we'll cover,
  • 02:59you know,
  • 03:00some aspects of natural language
  • 03:02processing and how it can be used
  • 03:04to get information out of EHR's,
  • 03:05why we need formal representations
  • 03:08for complex concepts.
  • 03:10Such as systemic anti cancer therapy
  • 03:12and then learning about how these these
  • 03:14ideas went and propelled the COVID,
  • 03:16the C19 registry.
  • 03:20OK, so first of all, you know,
  • 03:22probably everybody might be already
  • 03:23familiar with these definitions,
  • 03:24but I think it's always helpful to go over,
  • 03:26you know, what is real world data,
  • 03:28what is real world evidence.
  • 03:30And you know, it's nebulous a little
  • 03:32bit and depending on where you,
  • 03:34you know, the, the resource you look at,
  • 03:36you'll get a different definition.
  • 03:37But this is my definition,
  • 03:39which is really based on something
  • 03:41called the I kW pyramid.
  • 03:44Has anybody heard about this?
  • 03:46Heard of the DKW? OK great.
  • 03:47So teach you a little bit here.
  • 03:49So the idea here is that it's a
  • 03:52pyramid where you're climbing a
  • 03:54levels here from a base of data.
  • 03:57The next step is information.
  • 03:59The next step is knowledge.
  • 04:00The next step is wisdom.
  • 04:01You'll know there's 5 levels here.
  • 04:03There's a little tiny level at the top
  • 04:05which some people use you for understanding.
  • 04:08But basically the idea is no matter
  • 04:10what where the data comes from.
  • 04:13What it is,
  • 04:14whether it's from a randomized control
  • 04:16trial or case control registry,
  • 04:19etcetera,
  • 04:19the idea is that as you move up this pyramid,
  • 04:21you're generating real-world evidence,
  • 04:23whereas real world data is really that base.
  • 04:26On the right here you see the
  • 04:28the sort of traditional pyramid
  • 04:30of of evidence based medicine.
  • 04:33So if you look at this from another
  • 04:35dimension kind of looking from above,
  • 04:36when we think about cancer in particular,
  • 04:39you know I think about sort of
  • 04:41three big aspects of cancer,
  • 04:42there's the genotype, the phenotype and
  • 04:44then the environment and sort of for
  • 04:46each of these you have these layers.
  • 04:48So, so if you think about the
  • 04:50data level for genotype that might
  • 04:52that's just the sequence right,
  • 04:53just the somatic tumor sequence.
  • 04:56For phenotype,
  • 04:57it might be just a histologic type,
  • 05:01a cell, you know what, what is that?
  • 05:03And for environment,
  • 05:04it might be pollutant levels.
  • 05:05Now this is data,
  • 05:06but it's not really telling you anything,
  • 05:08right.
  • 05:08So we need to kind of walk up this pyramid.
  • 05:10The next level for for these three buckets
  • 05:13would be for genotype and environment.
  • 05:15You might talk about pathogenicity.
  • 05:17What does that change mean in terms of is it,
  • 05:20is it a driver mutation?
  • 05:21That's sort of the next level information
  • 05:24cancer behavior on the phenotype.
  • 05:26Side so is it,
  • 05:28is it aggressive,
  • 05:29is it a high grade malignancy
  • 05:31or is it something indolent
  • 05:33kind of stepping up further?
  • 05:35For genotype knowledge,
  • 05:36the knowledge level is actionability.
  • 05:39What can you do with this information?
  • 05:41Can can you actually prescribe a medication
  • 05:44that will change the outcome for a patient?
  • 05:46Phenotype,
  • 05:46same,
  • 05:47you know,
  • 05:47just generally speaking what are
  • 05:49the treatment options and then the
  • 05:50environment are there risk modifications
  • 05:52that can be taken and then really
  • 05:53getting to that top level wisdom,
  • 05:55you know this is this is
  • 05:57really complicated now.
  • 05:59So in in phenotype you're thinking
  • 06:01about what are patient values
  • 06:03and preferences and how do those
  • 06:05influence what treatment options
  • 06:06you might consider for genotype,
  • 06:08what's the tumor going to do once
  • 06:10it gets exposed to treatments,
  • 06:12how is it going to evolve under
  • 06:13treatment pressure and an environment
  • 06:15you've got issues about social justice.
  • 06:16And structural racism.
  • 06:17So those are all kind of like.
  • 06:19The ideas of climbing this pyramid alright,
  • 06:21hopefully I've convinced you
  • 06:22the difference between data and
  • 06:23evidence as we kind of step up.
  • 06:25Now, why do we need this real-world evidence?
  • 06:28Well, clinical trials are wonderful,
  • 06:30but they're also expensive,
  • 06:32slow to conduct, and they don't
  • 06:35always represent the full population.
  • 06:37At risk also trials,
  • 06:40prospective trials,
  • 06:41collect some but not all potentially
  • 06:44pertinent information.
  • 06:45And our space is huge.
  • 06:46Oncology,
  • 06:46the treatment space and oncology is huge.
  • 06:49And then lastly,
  • 06:50I think last but not least is that
  • 06:51we've got this enormous data source,
  • 06:53which is the electronic medical record.
  • 06:56So just a few words about each
  • 06:57of these items.
  • 06:58So when you think about
  • 06:59trials and disparities,
  • 07:00this is a paper we just published
  • 07:02very recently and this one was
  • 07:06earlier this year and we just
  • 07:07published another one in Jim
  • 07:08Oncology looking at prostate cancer.
  • 07:09This one looks at immune checkpoint
  • 07:11inhibitors across cancers.
  • 07:12And basically the take home message here
  • 07:14is that when you look across Childs,
  • 07:16there is really a lot of disparity
  • 07:20in who enrolls in trials.
  • 07:22And it can be different by cancer type,
  • 07:25but it's pretty consistent across the board.
  • 07:27And it's not always underrepresentation.
  • 07:29Sometimes it's over representation,
  • 07:30as you can see from the bottom row.
  • 07:32But you know, essentially the yellow ones,
  • 07:35the yellow circles are intersections of,
  • 07:40in this case, gender, age,
  • 07:44race and ethnicity, and a cancer type
  • 07:46where the enrollment is as you'd expect.
  • 07:49If it's green, it's sort of more than
  • 07:50you'd expect, and if it's red, it's.
  • 07:52It's less than you'd expect.
  • 07:53So. So you know this,
  • 07:55this gets that generalizability and there
  • 07:57might be statistical ways around this,
  • 07:59but you know, essentially.
  • 08:03Our knowledge from clinical trials
  • 08:05is primarily coming from younger.
  • 08:08White men, OK, so.
  • 08:12How about the information that gets left out?
  • 08:14So this is, this was amazing.
  • 08:17This is the recovery group that
  • 08:19really geared up during the early
  • 08:21days of the COVID pandemic and found,
  • 08:23you know,
  • 08:24pragmatic trials that they ran in
  • 08:26the UK and they found some really
  • 08:29important treatment options for COVID.
  • 08:31This is one of their papers.
  • 08:32This is probably the most impactful look,
  • 08:35showing that dexamethasone could
  • 08:38help hospitalize patients with COVID.
  • 08:40And I've excerpted a table from that paper.
  • 08:45Take a minute and let.
  • 08:46So what's missing from this table?
  • 08:48So this is a table of previous
  • 08:50coexisting diseases in the
  • 08:51patients who have COVID.
  • 08:52Is there something missing?
  • 08:55From this table.
  • 08:58Something that's the topic of this talk.
  • 09:02Cancer, right.
  • 09:03There's no cancer in the stable.
  • 09:06They did not collect cancer and
  • 09:07and or they didn't report it.
  • 09:09Well, we actually, we actually went and
  • 09:11you know got their case report forms,
  • 09:14they didn't record cancer.
  • 09:15So here they enrolled 10s of thousands
  • 09:17of patients in these trials.
  • 09:18And they don't know if these
  • 09:20patients had cancer or not.
  • 09:21And so I mean, amazing work,
  • 09:23but we're missing a key piece of information.
  • 09:27And then sort of the last item you know
  • 09:29that I met that I mentioned before is
  • 09:31that this idea that our treatment space
  • 09:34is huge but head-to-head comparisons
  • 09:36of important drugs are mostly absent.
  • 09:39And I'll just give you one example.
  • 09:41So this is the space of PD1 inhibitors
  • 09:44which have changed our obviously changed
  • 09:46our fields from our hemac knowledge
  • 09:48base which I'll talk about a bit later.
  • 09:51We have 137 trials that have been published
  • 09:55using 64 different regimens of of various.
  • 09:5781 inhibitors.
  • 09:58This includes XUS by the way.
  • 10:01If you're like they're not 13 P you
  • 10:02want to have actually there are,
  • 10:04but many of those are only approved in China.
  • 10:07So 83 of those are phase three trials.
  • 10:11Take home point is one of those 83
  • 10:14actually compared to PD1 inhibitor
  • 10:16to a PDL 1 inhibitor kind of it
  • 10:19actually compared to Kobe matanov and
  • 10:21atezolizumab and that's grand total
  • 10:23of zero of these trials compared 1PD1
  • 10:26inhibitor to another PD1 inhibitor so.
  • 10:29You know,
  • 10:29maybe I'm missing some trials that are
  • 10:31ongoing now that have yet to be published.
  • 10:32But at this point in time,
  • 10:34we don't have any data at all on whether
  • 10:371PD1 inhibitor is better than another
  • 10:40except for indirect treatment comparisons,
  • 10:42so.
  • 10:43Hopefully I've convinced you that.
  • 10:45We should at least think
  • 10:47about using real-world data.
  • 10:49But.
  • 10:49They are messy,
  • 10:51ambiguous and unpredictable.
  • 10:53So let me talk about some some
  • 10:55challenges that we have once we
  • 10:57start delving into the real world.
  • 10:59So first of all.
  • 11:01This is real-world data from the Medline.
  • 11:07Institution information.
  • 11:08OK, so did you know that there were 21
  • 11:12clinical trial institutions in New Haven?
  • 11:15Did you know that?
  • 11:16That's amazing, right? Here they are.
  • 11:19Smilow Cancer Center, Smilow Cancer Hospital,
  • 11:23smilow cancer hospital at Yale,
  • 11:26Smilow Cancer Hospital at Yale University,
  • 11:28Yale Cancer Center.
  • 11:30Yale Cancer Center and Smilow Cancer
  • 11:32Hospital, Yale Medical school.
  • 11:33Alright, I think you get the idea, right?
  • 11:36So I mean this is real world.
  • 11:38I mean you have to do something.
  • 11:40I mean, a computer is not going to know,
  • 11:41right? I mean,
  • 11:42so if you want to use this data in some way,
  • 11:46someone's got to do some work
  • 11:48to actually fix this, right?
  • 11:49That is a big part of working
  • 11:52with real world data.
  • 11:53Yale New Haven hospital.
  • 11:55There's the 21st, OK?
  • 11:58OK. So how about,
  • 12:00so that's bibliometrics to some degree,
  • 12:02how about treatments,
  • 12:03how many tyrosine kinase
  • 12:05inhibitors are there?
  • 12:07And so this is a little project that
  • 12:10a student of mine undertook where they
  • 12:13mapped out how many letters you'd have
  • 12:16to switch around or basically misspell.
  • 12:19So that one tyrosine kinase inhibitor
  • 12:21would actually be another one.
  • 12:23And so it's it's fewer letters
  • 12:25than you think and.
  • 12:27You know, these drugs get misspelled
  • 12:30all the time in a pretty amazing ways.
  • 12:34I see that the net,
  • 12:34there's a little bit of formatting
  • 12:36issue with the next slide, but.
  • 12:38So this this is real data.
  • 12:43From the Vanderbilt University
  • 12:45Medical Center.
  • 12:46So this is from our text list of medications.
  • 12:49Now you might say, oh,
  • 12:51let's just, you know,
  • 12:52we've got to be able to get these
  • 12:53medications from structured data.
  • 12:54That may or may not be true.
  • 12:56It depends. We can talk more about that.
  • 12:58But these are real misspellings
  • 13:00of the drug or Latino BI.
  • 13:01Think you can tell looking at this
  • 13:03that all of these are or lot in him.
  • 13:05But again,
  • 13:06I mean if you don't have some sort of
  • 13:08system to harmonize all those misspellings.
  • 13:12You're not going to know
  • 13:13which patient got what drug.
  • 13:15So.
  • 13:15So that's you know, that's a real
  • 13:18world issue with real world data.
  • 13:20This is work that we did some
  • 13:22years ago on staging,
  • 13:23so cancer staging.
  • 13:26Here is what I call manageable ambiguity.
  • 13:30All right,
  • 13:32so.
  • 13:32And again and and maybe you
  • 13:34know during sort of discussion
  • 13:35we can talk about the value of
  • 13:37structured versus unstructured data,
  • 13:39but the idea here is that we would
  • 13:41take data from progress notes from
  • 13:44clinical text to all these notes.
  • 13:46And figure out if a patient had stage
  • 13:49123 or four. So just forget ABC.
  • 13:51We're just trying to go for the big stages.
  • 13:55And you know,
  • 13:56the problem that we knew ahead of
  • 13:58time is that these things are going
  • 14:00to be recorded variably in different
  • 14:01notes by different types of doctors.
  • 14:03But you know we did a pilot
  • 14:06with about 1000 patients with
  • 14:07lung cancer with over 460,000
  • 14:10clinical documents across them.
  • 14:11Now if you pause for a minute and you think
  • 14:14about. A chart review.
  • 14:15Think about how long it would take you to
  • 14:18go through 460,000 documents, right so.
  • 14:20Here's my pitch for natural
  • 14:22language processing.
  • 14:23You can actually automate this kind of
  • 14:26thing and and do this kind of work at scale.
  • 14:30So cutting to the chase a little bit here.
  • 14:35First of all, we found that
  • 14:36out of those 964 patients,
  • 14:3899% had some kind of stage freeze
  • 14:41in their note. At least one.
  • 14:45And we also had a gold standard
  • 14:47which was the tumor registry data.
  • 14:48So we were able to compare
  • 14:51our system to to the subset.
  • 14:53You'll notice only 790 out of
  • 14:56those 964 had tumor registry data,
  • 14:57but we were able to do a comparison and you
  • 15:00know our system worked really pretty well.
  • 15:03The green, you know,
  • 15:04basically the matches are in the green,
  • 15:05the big numbers and we got some things wrong,
  • 15:08but we didn't usually get things
  • 15:10really wrong most of the time.
  • 15:11So if it was stage one,
  • 15:12we called stage four,
  • 15:13that was a big mistake.
  • 15:15Only happened once.
  • 15:19This, this shows actually,
  • 15:22so again 460,000 documents.
  • 15:24So what we wanted to say is.
  • 15:26And you have to look at all
  • 15:28of those or can you just look
  • 15:30at notes that were written?
  • 15:32Right after a patient was diagnosed
  • 15:34you know with if you think of
  • 15:36some of this inspiration for this
  • 15:38project came from the copi measures.
  • 15:39And if any of you have done that work
  • 15:41you'll remember I believe and they
  • 15:43may have changed but at one point
  • 15:45the coping measure was was stage
  • 15:46recorded in one of the first two
  • 15:48progress notes written after diagnosis.
  • 15:50So it kind of makes sense that
  • 15:52you would look for stage early on
  • 15:54but if you look at this black line
  • 15:56here at the bottom. So does this.
  • 15:58Are you seeing my?
  • 15:59You don't see the arrow, are you?
  • 16:01I don't think you're seeing the error, OK.
  • 16:03If you look at the black line
  • 16:04towards the bottom, you'll see that.
  • 16:06If you look at the the notes in the
  • 16:08first five weeks from diagnosis,
  • 16:09actually there's a pretty high
  • 16:11rate of unknown stage.
  • 16:12Like we couldn't determine it.
  • 16:13It wasn't until we got to five
  • 16:15weeks and out that we had enough
  • 16:17mentions of stage that we could
  • 16:19sort of make that determination.
  • 16:21So we saw this kind of inflection point.
  • 16:23And so that's another thing just
  • 16:25to to note when you're working
  • 16:27with real-world data is that,
  • 16:28you know, time matters,
  • 16:30time can matter a lot.
  • 16:31And the other thing that really matters is,
  • 16:33is ambiguity.
  • 16:34So I mentioned we found stage
  • 16:36in 99% of the records.
  • 16:37What I didn't mention is that
  • 16:40most of those are 84% had more
  • 16:42than one stage in their records,
  • 16:44OK and some some degree of discordance.
  • 16:48So one note might say they have stage one,
  • 16:51another note night say they have stage two.
  • 16:54Actually when we constructed a network
  • 16:56graph on the right here you see like
  • 16:59every possible combination was present,
  • 17:00every possible combination including
  • 17:02you know more terms that are
  • 17:05more generic like early stage,
  • 17:07advanced stage.
  • 17:08Everything you know happens and you
  • 17:11know and and and on the bottom left
  • 17:14here you can see a histogram of of Co
  • 17:17occurrences of various stage information.
  • 17:19But I do think that so that
  • 17:22really potentially ambiguous.
  • 17:23One take home point from this
  • 17:25though is that we we use a
  • 17:27really simple decision rule on,
  • 17:28you know, what is the actual stage?
  • 17:30We just chose the phrase that showed up
  • 17:33the most OK and that and that seems to work.
  • 17:35So if stage three shows up
  • 17:37in the notes 100 times,
  • 17:38in stage one shows up twice.
  • 17:40Chances are at stage three now,
  • 17:42just sort of a practical rule and it worked.
  • 17:45Now getting back to that, you know,
  • 17:48whole idea of.
  • 17:49You know unknown or sort of lack of
  • 17:51information and missingness which is
  • 17:53a major issue with real world data.
  • 17:55This is another mini project we
  • 17:58did looking at colon cancer and
  • 18:00we wanted to say could you find
  • 18:03patients with stage 3 colon cancer
  • 18:05and this was for the OCM project,
  • 18:07the oncology care model.
  • 18:08So you know really important as a
  • 18:10metric to know if these patients
  • 18:11got appropriate treatment within
  • 18:13appropriate period of time.
  • 18:14But again what we saw here.
  • 18:16Is this sort of crossover at
  • 18:18about seven weeks, at which point?
  • 18:22You know, the the stage was changing
  • 18:23or it was or is missing in the records
  • 18:26and it wasn't until about seven weeks
  • 18:27after diagnosis that you get to a
  • 18:29kind of steady state where you can
  • 18:31definitively say a patient has stage
  • 18:33three or we don't know the stage so.
  • 18:37Here's some really interesting
  • 18:38work from here, actually from Yale,
  • 18:41from the Radiation Oncology department,
  • 18:43where they they actually looked at
  • 18:46missingness as a variable, if you will.
  • 18:49So they took the National Cancer database,
  • 18:52the NCDB data, and they split patients
  • 18:56into whether they had complete records or
  • 18:59had some missing data from their record.
  • 19:02Now the NCDB is not EHR data, right?
  • 19:05But it is based on EHR data.
  • 19:08So it I would call it a real
  • 19:09world data source because it's,
  • 19:11you know, curated out of EHR data.
  • 19:13And you know the punch line here is that
  • 19:17missing this is an independent prognostic
  • 19:19factor for survival which is really an
  • 19:22interesting thing to think about, right.
  • 19:24And and it kind of depends on what kind
  • 19:27of what kind of cancer you have as well.
  • 19:29So they found for instance on the left.
  • 19:31If you have non small cell lung cancer,
  • 19:33it's the non metastatic patient
  • 19:35who had a real difference in their
  • 19:38prognosis if they were missing data.
  • 19:40Whereas with prostate cancer it was the
  • 19:42metastatic group that sort of split apart.
  • 19:44But either way, I mean this is.
  • 19:47Yeah, just think about it for a
  • 19:48minute while I get my water bottle.
  • 19:49It's.
  • 19:50Interesting.
  • 19:58OK. It's it's certainly not something
  • 20:01that we conventionally use as a metric.
  • 20:04Certainly not in a clinical trial
  • 20:06because there's it's not an issue, right?
  • 20:09Case report forms are complete,
  • 20:11but missing this itself can be
  • 20:13informative as in real world data.
  • 20:16So what I wanted to do now is actually take
  • 20:19us down a little different path briefly,
  • 20:21which is a brief diversion into
  • 20:22the history of medical records.
  • 20:24Anybody know what the what this is?
  • 20:26It's a local local.
  • 20:31Eli Whitney's mill. OK,
  • 20:32so it's kind of cool if you never been there.
  • 20:34It's still there.
  • 20:35Doesn't exactly look like this anymore,
  • 20:38but you'll see why I'm showing
  • 20:39this in a in a couple slides, so.
  • 20:43So this is also this is a real thing.
  • 20:47OK, so this is one of my favorite vehicles
  • 20:51from the Lane Motor Museum in Nashville.
  • 20:54Which is Doctor Weiner mentioned,
  • 20:55I was there for about a decade
  • 20:58and so this is a real vehicle.
  • 20:59There's they actually have a
  • 21:01collection of these and it makes
  • 21:02me think of electronic medical
  • 21:04records because it it works, right?
  • 21:06It it actually this person's
  • 21:08actually driving this car.
  • 21:10But we don't exactly see propeller driven
  • 21:12cars on the roads these days, right?
  • 21:15So our ER, but it works.
  • 21:17So MR's are functional,
  • 21:18but are they fit for the purpose
  • 21:19that we want to use them for?
  • 21:23I think many of us have, you know,
  • 21:25some ideas about that, but you know,
  • 21:27when you think about medical records,
  • 21:28this is obviously a little bit
  • 21:31before the computer, you know,
  • 21:33medical records have been around for.
  • 21:36Almost, you know,
  • 21:373500 years in one form or another.
  • 21:39But what's interesting to me?
  • 21:42Is that they were primarily
  • 21:43used for teaching or didactics.
  • 21:45Until very recently,
  • 21:46that was the only purpose of medical records.
  • 21:50And then sort of the second
  • 21:53purpose that arose, if you will,
  • 21:56didn't arise until the 1880s.
  • 21:57It's not that long ago if you think about it.
  • 21:59And that was for legal purposes,
  • 22:02legal defense.
  • 22:03And, you know,
  • 22:05essentially to have a written record
  • 22:07of what happens in case there was a
  • 22:10lawsuit around medical malpractice.
  • 22:12And we'll skip that and sorry
  • 22:14about the there's some Mac to
  • 22:17PC changes here with the font.
  • 22:19So it's a little bit hard
  • 22:20to read some of this,
  • 22:21but you know how about billing
  • 22:23that that's billing is the
  • 22:25major driver rate of how our
  • 22:27medical records look like today.
  • 22:29But that only really happened in
  • 22:311960s is really not long ago and until
  • 22:34you know not so long ago physicians
  • 22:38were paid with food and lodging.
  • 22:41If they were lucky.
  • 22:44This is a picture from the Confucian
  • 22:47medical system where there's at
  • 22:49least some cases where the the court
  • 22:52physician was basically executed if
  • 22:53the if the emperor did not get better.
  • 22:56So that's a pretty harsh payment
  • 22:59or penalty if you will.
  • 23:00But you know what really changed
  • 23:02things was the Medicare Act of 1965,
  • 23:05which basically established this profile.
  • 23:08You know,
  • 23:09quote usual customary and
  • 23:10reasonable fees which.
  • 23:11Drive so much of what we do.
  • 23:13And sorry about the font
  • 23:14that's messed up here,
  • 23:15but there's a quote from the AMA,
  • 23:18the American Medical Association,
  • 23:20that said that the 1965 Medicare Act
  • 23:22was the most deadly challenge ever
  • 23:24faced by the medical profession.
  • 23:26That's actual quote.
  • 23:29It certainly changed things a lot.
  • 23:30And then what I'd argue also changed
  • 23:33things was really more recent was
  • 23:35in the 90s when the physician fee
  • 23:36schedule was introduced and then
  • 23:38something called the evaluation and
  • 23:40management guidelines, which I think.
  • 23:41A lot of us know more than
  • 23:43we ever wanted to know about,
  • 23:44but those really changed how
  • 23:47medical records were were written.
  • 23:51Noticed that haven't yet used
  • 23:52the word electronic, right?
  • 23:54So now what about patient care,
  • 23:57which I think all of us want that to
  • 23:59be the primary purpose of medical records.
  • 24:02This kind of dates back to
  • 24:05the 1800s in some ways.
  • 24:06The case records of the
  • 24:08Massachusetts General Hospital.
  • 24:10Introduced some ideas like
  • 24:11history of presenting illness,
  • 24:13past medical history and so forth,
  • 24:15medical record numbers.
  • 24:16The whole idea that you would track a
  • 24:19patient by a number was introduced at
  • 24:20the Mayo Clinic in the early 1900s,
  • 24:22where they also introduced the chief
  • 24:25complaint and the review of systems.
  • 24:28And then the American,
  • 24:30the American College of Surgeons,
  • 24:32this is amazing bit of history
  • 24:34if you didn't know in 1918, they.
  • 24:38There was no federal mandate of any.
  • 24:40They basically mandated as
  • 24:41a professional organization.
  • 24:42They mandated that hospitals had to
  • 24:45keep records including a discharge
  • 24:47summary that basically said was the,
  • 24:50you know, patient,
  • 24:50you know alive or dead at the time they left.
  • 24:52And at that time fewer than 20% of
  • 24:56physicians kept any kind of record at all,
  • 24:58which is like. Amazing, right?
  • 25:03Now this is tying back to that Eli Whitney.
  • 25:05So this is, you know,
  • 25:07for those that did take records,
  • 25:09this is kind of what they looked
  • 25:11like as these are called case books.
  • 25:12I'm not sure where this one is from,
  • 25:14but it's basically a handwritten.
  • 25:18And and what's really interesting about
  • 25:20this is that it's physician centered, right.
  • 25:23This is not,
  • 25:24this is a diary basically it's not you know,
  • 25:26one patient has one book,
  • 25:28this was written as.
  • 25:30The doctor saw patients,
  • 25:31so if you ever wanted to go back and say OK,
  • 25:34Mr. Smith or whoever,
  • 25:35like put their case together,
  • 25:37good luck.
  • 25:40So really the the most recent innovation
  • 25:42if you will in medical records was
  • 25:44this one and that from Austin from
  • 25:46the mid 1960s which is the problem
  • 25:49oriented medical record which which
  • 25:51was conceived as a quote medical
  • 25:53record that guides and teaches.
  • 25:55So kind of back to that idea of
  • 25:57didactics in a way and and I'm sure
  • 25:59everybody's familiar with this,
  • 26:00this idea, this soap notes, right.
  • 26:04What I like from the paper when
  • 26:07doctor we'd introduced this idea.
  • 26:09This is a quote which I think
  • 26:12actually forecasts the ER right so,
  • 26:15and it's worth reading it.
  • 26:16It can be readily,
  • 26:17readily be seen that all narrative
  • 26:19data presently in the medical
  • 26:21record can be structured,
  • 26:22and in the future all narrative data
  • 26:24may be entered through a series of
  • 26:26displays guaranteeing a thoroughness,
  • 26:27retrievability,
  • 26:28efficiency and economy important
  • 26:30to the scientific analysis of a
  • 26:32type of datum that has hitherto.
  • 26:34Been handled in a very unrigorous manner.
  • 26:36It's an amazing quote.
  • 26:37I mean,
  • 26:38this is essentially before any
  • 26:41electronic medical record, right?
  • 26:42But he basically saw it, saw it coming.
  • 26:47I think the most important part of
  • 26:49this quote is this to be concluded.
  • 26:52We're living through the evolution
  • 26:54of these electronic medical records.
  • 26:57This is actually a two-part paper,
  • 26:59that's why it says this.
  • 27:00But I think, you know,
  • 27:00he could have been like OK,
  • 27:02we don't know what's going to happen.
  • 27:05It's worth taking a step back
  • 27:07and saying what you know what.
  • 27:09So now I'm going to say electron what
  • 27:11is the electronic health record for?
  • 27:14And it's got primary uses and secondary uses.
  • 27:16So the primary uses are are patient
  • 27:19care and delivery, financial billing.
  • 27:20But it's this, when you talk about
  • 27:23real-world data and real world evidence,
  • 27:24that's a secondary use,
  • 27:26as it's conceived here in this model,
  • 27:28which the Institute of Medicine put forward.
  • 27:31All right.
  • 27:31So moving ahead a little bit.
  • 27:33So this is where we were in the mid 2000s
  • 27:36and this is when I was in medical school.
  • 27:38At that time there was issues
  • 27:41around funding to you know roll out
  • 27:44electronic medical records and.
  • 27:46What I like on the bottom here is in
  • 27:482003 the mass medical society did a
  • 27:50survey where 89% of physicians wanted
  • 27:52EHR data, but 48% refused to use an ER.
  • 27:56So little bit of a disconnect there
  • 27:59and and by 2004.
  • 28:01Hardly anybody was using medical records.
  • 28:04So what changed?
  • 28:05Arguably this this is, you know,
  • 28:08one of the events that really changed things.
  • 28:11Is everybody familiar with Katrina
  • 28:14and what happened in in New Orleans?
  • 28:17Does everybody know why the?
  • 28:20Record so there's a picture there
  • 28:21on the right.
  • 28:22It's everybody know why those
  • 28:24were in the basement.
  • 28:26That flooded.
  • 28:27It's the they are so heavy that the
  • 28:29building literally would have collapsed
  • 28:32under the weight of the paper if
  • 28:35they'd been up on higher floors.
  • 28:37So that's why they have their
  • 28:39medical records in the basement and
  • 28:41and they were all destroyed, right?
  • 28:42They were all just lost.
  • 28:43So, so fast.
  • 28:45We're a little bit the High Tech Act in 2009,
  • 28:48which Obama signed this this is what really.
  • 28:51You know,
  • 28:52gave a lot of money for institutions
  • 28:54to really start putting any Mrs.
  • 28:56but what is interesting is if
  • 28:58you look at sort of the adoption
  • 29:01curve and there's a couple,
  • 29:03I won't get into the details here.
  • 29:04There's a couple ways of like what is an
  • 29:06EHR basic versus complete and so forth,
  • 29:09but you actually see.
  • 29:12You actually see them starting,
  • 29:13so here's E&M coming out in the mid 90s.
  • 29:16Here's Katrina in 2005.
  • 29:19There's the High Tech act.
  • 29:20By the time the High Tech Act comes out,
  • 29:22actually we're like well on the
  • 29:25adoption curve and so, you know,
  • 29:27definitely help things along, but you know,
  • 29:30the process is already starting.
  • 29:32Umm.
  • 29:33And then you know where.
  • 29:35So this is already five years old,
  • 29:37but I think, you know it's it's.
  • 29:40And sorry,
  • 29:41sorry again,
  • 29:41can't see the text there.
  • 29:42But you know already by
  • 29:44five years ago people were
  • 29:46reporting that EHR's were
  • 29:47a major driver of burnout.
  • 29:49So, so you know, it's problematic.
  • 29:51But OK, here's a here's a
  • 29:53few other challenges. So.
  • 29:55And I'm sure everybody who's
  • 29:56clinical knows these things already.
  • 29:58But carry forward a copy pasting is
  • 30:02ubiquitous in medical records and
  • 30:04there's just a ton of redundancy.
  • 30:06Here's a paper that basically shows that.
  • 30:09Umm. You know, large,
  • 30:11large portions of any note you
  • 30:13particularly look at have been
  • 30:16copied forward from other notes.
  • 30:18Progress notes.
  • 30:19In particular,
  • 30:20more than half of progress
  • 30:22note material is copy copied
  • 30:24forward from previous notes.
  • 30:27This is a different study looking at you
  • 30:31know how many progress notes have a manually
  • 30:33entered text versus copied in any kind.
  • 30:34And you can see again like very few progress
  • 30:38notes have have fully written text.
  • 30:40Which you would say is fully original, but.
  • 30:44So I think it's a legitimate question
  • 30:45to say what are we dealing with here?
  • 30:47Is it a giant pile of paper
  • 30:48or is there actually meaning.
  • 30:49So this is a little little tiny project
  • 30:52I did and when during fellowship where
  • 30:54I basically took one of my patients
  • 30:57charts and I counted up like how
  • 30:59many data points are in that chart.
  • 31:02And you can see the blue bars are all the
  • 31:05structured data elements like billing
  • 31:07codes or vital signs or lab values.
  • 31:11And then these red bars are the words
  • 31:13in the clinical documents and you
  • 31:15see that that just drowns out right,
  • 31:17the structured data.
  • 31:18So there's a lot of data there but.
  • 31:21It's awesome.
  • 31:22There's even more than that, right?
  • 31:24So in this chart.
  • 31:26And this is small these days, right?
  • 31:28So this was more than 10 years ago,
  • 31:31there was another 277 pages of scanned
  • 31:34documents with 69,000 words in them
  • 31:37that were basically inaccessible,
  • 31:38but and and the take home point here.
  • 31:41Is that this is what it all boils down to,
  • 31:44OK?
  • 31:46Patient with diffuse large B cell lymphoma.
  • 31:48It was a complete remission after
  • 31:50getting 6 cycles of our chop.
  • 31:52I think that's enough for most research.
  • 31:54OK now how can we,
  • 31:57how can we boil things down like
  • 31:59that because that's that's kind of
  • 32:00maybe what we're talking about here.
  • 32:02So and of course there's more to it right.
  • 32:06But you know when you think about what's
  • 32:09in ER's or EHR's and and what is not.
  • 32:13Umm.
  • 32:13You have to know what you're,
  • 32:15you have to know what you're going to find,
  • 32:16right.
  • 32:17So, so let's say you know you've
  • 32:19unlocked this medical record,
  • 32:21but it's not necessarily going
  • 32:22to have what you want.
  • 32:23So here's, here's some, you know,
  • 32:25basically some big buckets, right.
  • 32:28So you're going to find the person's
  • 32:30date of birth, no problem, right.
  • 32:31But you're not going to find
  • 32:33probably where they were born,
  • 32:34the circumstances of their birth,
  • 32:35where their complications.
  • 32:37Very unlikely, because they will have.
  • 32:39You know they won't have lived
  • 32:41their whole system with their
  • 32:42life within the electronic air,
  • 32:43and they won't have all that data.
  • 32:45You might find their biologic sex,
  • 32:47no problem,
  • 32:47but are you going to find
  • 32:48their gender orientation,
  • 32:49that sexual identity?
  • 32:50You'll find race in this city,
  • 32:52but are you going to find other
  • 32:55social determinants of health?
  • 32:56You'll find the medications
  • 32:58that they are prescribed,
  • 32:59but will you find what they actually took,
  • 33:01the medication that they took and the
  • 33:04regimens and we're going to get into
  • 33:05that next you'll find laboratory tests,
  • 33:08but you want necessarily find images.
  • 33:10And so forth.
  • 33:12So kind of you know as moving
  • 33:15forward. Thinking about what you know,
  • 33:17the low hanging fruit.
  • 33:20You know it's the cancer type,
  • 33:22it's easy like we don't need to create
  • 33:23a new system to get cancer type.
  • 33:25You can get that from billing codes,
  • 33:27registry data.
  • 33:28The treatments are hard like our chop
  • 33:32times 6, that's hard determining that
  • 33:34the patients in a complete remission,
  • 33:37that's really hard.
  • 33:37So what I go for the middle,
  • 33:39I don't go for the middle ground, right,
  • 33:40I'm going to tackle the thing in the middle.
  • 33:43So now I'm going to switch gears here
  • 33:45for a bit and talk about our work on
  • 33:49standardizing systemic anti cancer treatment.
  • 33:51And before I get into that.
  • 33:55If you've not seen this XKCD cartoon,
  • 33:58it's a classic.
  • 33:59And this is a challenge, right?
  • 34:01Whenever you decide to create a new
  • 34:04standard or you actually just you know.
  • 34:07Just creating more complexity or not.
  • 34:09Hopefully we're not.
  • 34:10Well, what we did in this space,
  • 34:12there really weren't 14 existing standards.
  • 34:14There were none.
  • 34:15And so as everybody here knows,
  • 34:17I could skip past this slide.
  • 34:19Chemotherapy regimens are complicated and
  • 34:22given in cyclic fashion combinations.
  • 34:26This was the standard when
  • 34:27we got started on our work.
  • 34:30This is, you know,
  • 34:31one example of these things called
  • 34:32cancer chemotherapy handbooks,
  • 34:34kind of recipe books, physical books,
  • 34:37right.
  • 34:37With some some details here,
  • 34:39but maybe not enough.
  • 34:41Here's another example from 2005.
  • 34:46Which if you kind of look in
  • 34:49detail about what's there.
  • 34:51There's there's a lot of optionality here,
  • 34:54some of the references.
  • 34:56Here's a little excerpt from the Adenoma.
  • 34:58I don't know carcinoma of
  • 35:00unknown primary section,
  • 35:00but the references are to non small
  • 35:03cell lung cancer so there's sort of a
  • 35:05mismatch there in the evidence base.
  • 35:07So what we did is we.
  • 35:09Really basically tried to collect
  • 35:11all this information and put
  • 35:12it into a computable format,
  • 35:13which is our hemlock.org website and
  • 35:16the ontology that comes from it.
  • 35:19So he might.org is a is a website with
  • 35:22the goal to collect all standard of
  • 35:25care systemic anti cancer treatment.
  • 35:27That's the goal.
  • 35:29It's a big goal and at the website has
  • 35:32grown over more than a decade now.
  • 35:35Of almost 1000 primary content pages,
  • 35:37over 7000 references,
  • 35:39and a large editorial board,
  • 35:41actually members of which are from Yale.
  • 35:44And and many page views,
  • 35:46so 1.4 million page views last year,
  • 35:49we do get visitors from all over
  • 35:52the world were primarily US based.
  • 35:55I always like to throw in that we've
  • 35:58had one visitor from North Korea.
  • 36:00I don't know who it is,
  • 36:01but I don't think I want to know.
  • 36:03So what what can we do with this website?
  • 36:06So what we did over time,
  • 36:08over the past 11 years is create
  • 36:10a structure such that we could
  • 36:12actually take the content and
  • 36:14develop a formal model.
  • 36:15And so this is the model?
  • 36:17Or this is part of the model?
  • 36:20And I don't have time right
  • 36:21now obviously to kind
  • 36:22of go through all these details,
  • 36:24but it's somewhat complex and enlarge
  • 36:27we have over 100,000 concepts and
  • 36:31300,000 ways in which those are
  • 36:34interrelated in the latest version.
  • 36:36This is yeah, this is basically,
  • 36:39this is showing, you know,
  • 36:41I don't have time to actually
  • 36:43show the website.
  • 36:43This is a screenshot from the
  • 36:45website showing basically that
  • 36:47each regimen on the website is.
  • 36:49In such a way that we can take
  • 36:50all those pieces and put them
  • 36:52into the into the data model.
  • 36:54And and and then we can start to do
  • 36:55cool things with real world data.
  • 36:56So here's a project that we did
  • 36:59with with some folks in South
  • 37:02Korea who basically had access to.
  • 37:07Essentially medication level database.
  • 37:08And remember I mentioned you know way
  • 37:11back when that we might get medications,
  • 37:13but to actually understand regimens
  • 37:15we have to do something extra.
  • 37:17And So what they did is they
  • 37:19applied our model and they mapped
  • 37:21medications through regiments and
  • 37:22and they were able to look basically
  • 37:25over a decade of time 2008 to 18.
  • 37:27And you can see here that you know the
  • 37:29changing pattern of care in that country.
  • 37:31So you see that for example of you know
  • 37:35bevacizumab wasn't used really until
  • 37:372014 and then it started getting popular.
  • 37:40And by the year 2018,
  • 37:42it's, you know,
  • 37:43full Fox and Bevacizumab Kappa,
  • 37:44a good chunk of of the treatment regimens,
  • 37:48whereas something like fluorouracil
  • 37:50monotherapy essentially disappears
  • 37:52off the off the scene by the
  • 37:55by the time you get there.
  • 37:57This is much more recent so that
  • 37:59that's from a couple of years ago
  • 38:01now we're working with folks at the.
  • 38:05University of California System have a
  • 38:08really cool combined database across all
  • 38:10the UC's and California is kind of a.
  • 38:12You know, country unto itself, once you
  • 38:14start putting all this data together,
  • 38:16this is just from UCSF and again
  • 38:20we're taking, we're taking medication
  • 38:23exposure data including time stamps
  • 38:25and we're mapping that to regimens.
  • 38:29And and you see that.
  • 38:32At least nowadays, full fernox is
  • 38:36the most popular regimen there.
  • 38:39And so that's that alone is
  • 38:40an interesting thing, right?
  • 38:43You also see some funny things, right?
  • 38:45Like so I didn't know Leuprolide was a
  • 38:47treatment for pancreatic cancer, did you?
  • 38:50Is it? Not no, right. No.
  • 38:54But these are real patients, right.
  • 38:56And they actually have second malignancies.
  • 38:57So these are people who have
  • 38:59also have prostate cancer and
  • 39:00they're also getting leuprolide.
  • 39:02So you, you, you kind of have to you know,
  • 39:04it's not enough to get that data out.
  • 39:06You've got to,
  • 39:07you got to apply knowledge, right.
  • 39:09You've got to, you've got to determine,
  • 39:10you know, am I, what am I looking at?
  • 39:12Does it make sense, is it?
  • 39:14Is it relevant and and?
  • 39:17And so that that that's why we're
  • 39:19seeing things like that so.
  • 39:21Umm. Here's another kind of.
  • 39:24Here's another.
  • 39:24Gives you a taste of what we can look at.
  • 39:26So that this is this is looking at folfirinox
  • 39:29and and then looking at cycle by cycle.
  • 39:32What's happening?
  • 39:33So one of my long-term interests,
  • 39:35as well as Doctor Zach here,
  • 39:37is to understand treatment delays,
  • 39:40dose reductions.
  • 39:42Removals of medications from a regimen drop,
  • 39:45you know, dropping a drug and this starts
  • 39:47to get at that and you can kind of see,
  • 39:50you know,
  • 39:50each of those bars represents cycle to cycle,
  • 39:52the cycle you see.
  • 39:54People.
  • 39:55People dropping out, right.
  • 39:56And and so and then you can actually see
  • 39:59why and you can see on the top here.
  • 40:02These, these bands at the top are showing.
  • 40:04You know, these these are folks.
  • 40:05You don't think that you have
  • 40:06a pointer or something.
  • 40:07Oh, actually, let's see if this will.
  • 40:10Yeah.
  • 40:11So,
  • 40:11so you see these bands coming across,
  • 40:13those are basically patients that are
  • 40:15progressing and going on to a second
  • 40:17line treatment what's not shown here.
  • 40:19Just to spare you a little
  • 40:20bit on the visual side,
  • 40:22our patients who are or stopping therapy
  • 40:24and and essentially transitioning to
  • 40:26Hospice or some sort of end of life
  • 40:29care and that's this big bar here.
  • 40:32And then some patients these
  • 40:34little these little ones they're
  • 40:36going to a deescalated regimen.
  • 40:38So they're dropping the.
  • 40:40Arena taken or the oxaliplatin and
  • 40:43so you can really start to see these
  • 40:45patterns of care in the real world data so.
  • 40:49OK,
  • 40:49so.
  • 40:50This is my little advertisement for Humalog.
  • 40:54It's available to you.
  • 40:56You can you can download the whole
  • 40:59thing and and and mess around with
  • 41:01it if you're an academic or non
  • 41:04commercial user and just Google
  • 41:06Hemac dataverse and you'll find it.
  • 41:09Or you can you can use these links.
  • 41:11It's also available through something
  • 41:14called the Odyssey Athena vocabulary.
  • 41:17And and yeah, I mean, we want more users.
  • 41:20There's I think a lot more
  • 41:22that can be done with it.
  • 41:23So along comes a pandemic.
  • 41:26So, so now I want to spend the last
  • 41:29little bit here talking about the
  • 41:32COVID-19 and cancer consortium.
  • 41:34Which yells a member and this
  • 41:36is our mission statement,
  • 41:38which has been the same since
  • 41:40we were created in March 2020,
  • 41:43which is our goal is to collect
  • 41:44and disseminate prospective,
  • 41:45granular, uniformly organized
  • 41:46information on people with cancer
  • 41:48who are diagnosed with COVID-19 at
  • 41:50scale and as rapidly as possible.
  • 41:52But what I want to talk about
  • 41:53here for a minute is sort of what
  • 41:55I call the ancillary goals of
  • 41:56C19 or the unwritten goals.
  • 41:59So one of those was, you know,
  • 42:00can we build a consortium,
  • 42:02can we build an airplane while also flying?
  • 42:05Just, you know, can we do it?
  • 42:06That was the question.
  • 42:07Convening a group of stakeholders
  • 42:09was really in, you know,
  • 42:11a goal including patients,
  • 42:12really engaging patients and then.
  • 42:15Pertinent to the talk today,
  • 42:17can we demonstrate the additive value of
  • 42:19real world data elements that are not
  • 42:21easily obtained from structured EMR data?
  • 42:22We knew that there were other efforts
  • 42:24kind of getting rolling that were based
  • 42:26on what was in that structured data.
  • 42:28If you remember that's the.
  • 42:31The tiny little blue bars right
  • 42:34on the graph I showed you.
  • 42:37So we wanted to, you know,
  • 42:38get more than that. So this is.
  • 42:43This is back in back in Rhode Island.
  • 42:46Alright.
  • 42:46Showed you Eli Whitney earlier.
  • 42:48This is the this is Slater Mill
  • 42:50in Pawtucket which I think I
  • 42:53pronounced correctly but I'm
  • 42:55getting my New England shops.
  • 42:57And and what's interesting to me about
  • 43:00this story is that he earned this name,
  • 43:02Samuel Shredder Slater and and the
  • 43:05reason he was branded as a traitor
  • 43:08is that he was accused of stealing
  • 43:11the ideas for industrialization.
  • 43:14From from the from the,
  • 43:16from England where he was born and grew up,
  • 43:20and then replicating it in America.
  • 43:22So this is really the beginning of
  • 43:25the American Industrial Revolution.
  • 43:27But what's interesting about that is
  • 43:29that he didn't exactly steal the ideas.
  • 43:32Like he didn't steal blueprints
  • 43:33or things like that.
  • 43:34He just like memorized them and
  • 43:37brought the knowledge with him.
  • 43:39So it's, you know, that's that's what he did.
  • 43:42So. I think that that's great actually.
  • 43:46And so you know when we think
  • 43:48about C 19 and I certainly don't
  • 43:49have time to go through all this,
  • 43:51but we have many inspirations he,
  • 43:55the hemlock,
  • 43:56what I just spoke about is one of them.
  • 43:58But in all the domains of C19,
  • 44:01we are borrowing best ideas,
  • 44:04modifying sometimes and putting
  • 44:07together this consortium and and
  • 44:09this is just sort of a list of that.
  • 44:12The other thing I wanted to
  • 44:13say about you know
  • 44:14the C 19. Work and just,
  • 44:15you know, research in general
  • 44:18is that sponsors are critical.
  • 44:20In a Samuel Slater's case,
  • 44:22he had a sponsor named Moses
  • 44:23Brown who basically fronted him
  • 44:25the money to build those mills.
  • 44:26And our sponsor is Julie Klem at the
  • 44:29NCI who didn't front us any money but
  • 44:32was very supportive and helped us kind of,
  • 44:36you know, surface and socialize our ideas.
  • 44:39So this is our.
  • 44:40This is our data schema,
  • 44:43and what I want to emphasize here related
  • 44:45to this talk is that everything in red.
  • 44:47Is not available in structured data,
  • 44:50so as we sort of built this up.
  • 44:53You know,
  • 44:53some of these things you can collect,
  • 44:55you know, in many different ways.
  • 44:56But the red items.
  • 44:57And you'll see in a few slides
  • 44:59that those turn out to be critical
  • 45:01things like ECOG performance status,
  • 45:03things like toxicity of
  • 45:06cancer treatment pneumonitis.
  • 45:08Items like that,
  • 45:09that we really wanted to zero in on.
  • 45:12I'm going to skip this slide.
  • 45:12I'm going to skip this and
  • 45:15I'm going to just say that.
  • 45:17We've done really pretty well on capturing
  • 45:19what I would call elusive variables.
  • 45:21So these are kind of the
  • 45:23things that they're in the ER,
  • 45:25but they're in that unstructured.
  • 45:27Leak of data,
  • 45:28but we we got a lot of them.
  • 45:31So cancer status is the patient.
  • 45:34Getting better,
  • 45:35getting worse or staying this,
  • 45:37you know the same as before,
  • 45:38a stable disease.
  • 45:39We have that in over 95% of the patients.
  • 45:41Even smoking status is hard to get right.
  • 45:44We have that.
  • 45:46Did COVID affects the patients
  • 45:48treatment plants that's not going to
  • 45:50be unstructured data necessarily.
  • 45:52We have over 90% on that on the ECOG
  • 45:55which is a notorious notoriously
  • 45:57difficult thing to get and and all
  • 46:00the various efforts such as flat
  • 46:01iron and and so forth have had
  • 46:03had challenging and cancer link
  • 46:04have had challenges with this.
  • 46:06We have we have ECOG data on 88%
  • 46:09although that does that includes
  • 46:10patients who just didn't have
  • 46:12any ECOG recorded but we that
  • 46:13knowledge of no ECOG is still.
  • 46:15Knowledge,
  • 46:16right?
  • 46:18And you know getting to our
  • 46:20getting to our results again in
  • 46:22just focus on the red and what
  • 46:24we found is that these factors,
  • 46:26these elusive factors are really important.
  • 46:28And so this is unadjusted
  • 46:31just kind of descriptive.
  • 46:32If you had progressing cancer at
  • 46:35baseline you get COVID your 30
  • 46:37day mortality is 26% and if you
  • 46:39had an ECOG of two or higher your
  • 46:41your mortality is extremely high.
  • 46:45And we also found that immunosuppression
  • 46:48which is a somewhat nebulous
  • 46:52definition and we have our
  • 46:54definition here which is complex,
  • 46:56which includes a lot of things you
  • 46:58can't easily get out of structured data.
  • 47:00So this is sort of the real
  • 47:02world data is a is a huge,
  • 47:04is a huge driver of of mortality.
  • 47:08And if you look at the right on the right,
  • 47:11the, the yellow table basically
  • 47:12those are the patients who are
  • 47:14immunosuppressed at baseline.
  • 47:16And across the board,
  • 47:17even younger patients have
  • 47:19substantial mortality in our data set.
  • 47:24Furthermore, if you add on
  • 47:26top of that active cancer.
  • 47:28So are they immunosuppressed
  • 47:29and they have active cancer.
  • 47:30Again, we have our definition for that.
  • 47:34Because if you're not immunosuppressed
  • 47:36and you have inactive cancer,
  • 47:37in our data, at least you have a
  • 47:39zero chance of dying in the 30 days,
  • 47:42whereas if you're older immunosuppressed,
  • 47:43your chance goes all the way up to 30%.
  • 47:45So really a huge spread here
  • 47:47based on these these data.
  • 47:50And then if we start to look at.
  • 47:53Multivariable adjusted analysis.
  • 47:54Again, we see that these factors
  • 47:58like ECOG or cancer status are are
  • 48:01highly associated with outcome,
  • 48:03both mortality as well as severity,
  • 48:05which means hospitalization,
  • 48:07intubation and so forth.
  • 48:09We saw this as well more recently when
  • 48:11we looked at vaccinated patients.
  • 48:13So patients who are getting
  • 48:15breakthrough COVID-19 after vaccine
  • 48:16again we saw things like cancer
  • 48:19status really you know being a,
  • 48:20you know huge adjusted odds ratio
  • 48:23there of six if you had an active and
  • 48:26progressing cancer of of dying in 30 days.
  • 48:31So I could talk about COVID
  • 48:33C19 itself for an hour,
  • 48:35but I'm going to pause and so I just
  • 48:38want to share some parting thoughts.
  • 48:40So, first of all, I think I'm,
  • 48:43I'm a, you know, I'm, I'm,
  • 48:44I'm a believer here that real
  • 48:46world data has a great potential
  • 48:48to yield real-world evidence if.
  • 48:50We approach it with an understanding
  • 48:51about the completeness issues,
  • 48:52the accuracy issues,
  • 48:54and we anticipate them and we come
  • 48:56up with either ways to adjust for
  • 48:58them or or avoid certain data,
  • 49:02certain variables in the first place.
  • 49:05We need a so.
  • 49:07Yes, we need automated methods, right?
  • 49:09Like, it wouldn't be great if NLP
  • 49:11and a computer could do everything,
  • 49:12but in reality a lot of real world
  • 49:14data and real world evidence
  • 49:16depends on human curators going
  • 49:18into EHR's pulling out that data.
  • 49:20And to do that we need rigorous approaches.
  • 49:23We have a paper published earlier
  • 49:25this year describing the approach
  • 49:26we used in ACR Genie.
  • 49:27I encourage you to check that out.
  • 49:29It basically gets into.
  • 49:31You know you need directives.
  • 49:33You need you need two people
  • 49:35to to independently curate the
  • 49:37same record at a certain rate so
  • 49:40you can see if you know there's
  • 49:43comparability between their results.
  • 49:45And so forth.
  • 49:46If if there's widespread adoption
  • 49:48of standards such as M code,
  • 49:50hemac, omop and so forth,
  • 49:52that that will increase the usefulness
  • 49:54of structured data margin markedly.
  • 49:57I think NLP is having a moment.
  • 50:01If you pick up the newspaper nowadays,
  • 50:03like you're going to see our other paper on,
  • 50:05you're going to see an article on chat,
  • 50:07GPT, for example,
  • 50:08which is generative NLP but sort of
  • 50:11the other side of NLP, and then Umm.
  • 50:16You know, really important though,
  • 50:18and I didn't get to touch on this at
  • 50:20all except for the very beginning when
  • 50:22I alluded to disparities and bias.
  • 50:23There's a lot of concern that.
  • 50:26That,
  • 50:26you know,
  • 50:27working with real-world data might
  • 50:29might actually make biases worse that
  • 50:32are already present in that data.
  • 50:34So we need new approaches to to to.
  • 50:38New approaches to deal with that issue.
  • 50:41Just have some acknowledgements here.
  • 50:42So there's two slides here.
  • 50:43So this is my first acknowledgement slide.
  • 50:45I acknowledge the himanka.org
  • 50:47editorial board.
  • 50:49Others that have worked on
  • 50:50it are funding and and Dolly,
  • 50:52which is the creator of some
  • 50:54of those graphics you saw.
  • 50:56And here's our acknowledgement for the C19,
  • 50:58which is a huge endeavor that has more
  • 51:02than 600 active investigators at this point.
  • 51:06And with that,
  • 51:07I will pause for questions.
  • 51:16So I'll I'll start. And can you see
  • 51:21the ones that are online or not?
  • 51:23I do see, yes, yes, I see.
  • 51:27So I I, I don't for a second dispute
  • 51:30the value of real world data in terms
  • 51:33of being able to answer questions,
  • 51:35but I'm struck by the fact that we
  • 51:39have these two extremes we have.
  • 51:41Randomized controlled trials where
  • 51:43we spend a fortune to collect every
  • 51:47last bit of data and you know they
  • 51:52cost $15,000 per patient or more.
  • 51:56And we get lots of useless data as
  • 51:59part of it. And then we then say,
  • 52:02well, we can't do get everything
  • 52:04from randomized controlled trials.
  • 52:06So then we go to real world data
  • 52:09where everything's pretty messy
  • 52:10and you have to make all these
  • 52:12assumptions and clean up the data.
  • 52:14And the the question is,
  • 52:17is there a role for much simpler randomized
  • 52:22trials done as part of standard?
  • 52:26Practice.
  • 52:29I mean sure, yeah,
  • 52:30I mean I think the recovery trial,
  • 52:32I mean that they they showed that you
  • 52:34can do these huge pragmatic trials
  • 52:37in 10s of thousands of patients with
  • 52:40they didn't spend a lot of money.
  • 52:41I mean they used off the shelf drugs,
  • 52:43right, dexamethason you know things.
  • 52:46Some of the drugs we won't,
  • 52:47we won't say the words but you
  • 52:48know and things like oxygen right.
  • 52:50But when you get into the you
  • 52:52know the expensive drugs that
  • 52:53are not yet FDA approved,
  • 52:54I think that's a whole other area but.
  • 52:57I I think that FDA has got to lead
  • 53:00the way in some ways here because they
  • 53:02and I didn't get to talk about this,
  • 53:05but you know,
  • 53:06there's a high profile rejection
  • 53:08of real-world data within the
  • 53:09last month or two that.
  • 53:11You know,
  • 53:11there was an attempt to get something
  • 53:12approved based on some real-world data.
  • 53:14And I think they rightly looked at that
  • 53:15and they said that this particular
  • 53:17set of data is not trustworthy
  • 53:18and we're not going to go for it.
  • 53:19But I don't think that that should
  • 53:21shut down the whole endeavor.
  • 53:23I think that they need,
  • 53:24they need,
  • 53:24we need guidance from them and
  • 53:26and and you know about what
  • 53:29components should and should not be,
  • 53:31you know, collected routinely.
  • 53:32I think that might simplify things a lot.
  • 53:36Attempt to put together criteria that
  • 53:38that would allow you to say that this.
  • 53:41This set of real world data is adequate
  • 53:44to Brock inclusions from you know,
  • 53:47in terms of how much it has to be cleaned up,
  • 53:49how large the sample size has to be.
  • 53:54I think. It it's such an interesting
  • 53:58question and I'm I'm not aware of of
  • 54:01anything at this moment but I do you
  • 54:03know we are there's this great bias
  • 54:05that I just learned about called the
  • 54:08informed presence bias which I kind of
  • 54:10knew I knew it but not by those words
  • 54:12but that basically means that patients
  • 54:14who spend a lot of time in the clinic
  • 54:17or the medical system have a lot of data
  • 54:19right whereas those that don't don't
  • 54:21and and and it's and and it's actually
  • 54:24an incredibly important source of.
  • 54:26The bias? That. That.
  • 54:30You know, can you?
  • 54:31So if a patient doesn't spend enough
  • 54:33time to get enough data generated,
  • 54:35that's something we should know.
  • 54:37That's something we need to know, right?
  • 54:38But that's almost that kind of,
  • 54:40you know, descriptor is almost
  • 54:42never available in in any real world
  • 54:44data study to my knowledge, so.
  • 54:47The online version.
  • 54:53Yeah.
  • 55:07Schedule.
  • 55:11What is?
  • 55:16COVID-19.
  • 55:28Yeah. Yeah. So the question is,
  • 55:32it seems to be the case that the
  • 55:34patients with the pre-existing cancer
  • 55:37having worse outcomes during the COVID
  • 55:40era than before and why might that be?
  • 55:43I can say from our consortium now we
  • 55:44only look at patients who had COVID.
  • 55:46So that's a subset, right?
  • 55:47Well, as time goes on,
  • 55:49it's going to be everybody maybe.
  • 55:50But what we do see is that you know at
  • 55:53least in our registry 40% of patients
  • 55:55have their treatment altered in
  • 55:57some way and usually that's a delay.
  • 56:00But sometimes they can't get the
  • 56:02same treatment that they were getting
  • 56:04before a surgery gets cancelled,
  • 56:06you know etcetera, etcetera.
  • 56:08And and we know from you know previous work,
  • 56:11obviously the treatment delays don't
  • 56:13usually ever. Work out very well.
  • 56:15So we haven't yet systematically
  • 56:17evaluated that,
  • 56:18but we have you know now several
  • 56:20thousands of those patients.
  • 56:21So we're going to be looking at
  • 56:23that probably in the upcoming year.
  • 56:24As far as other patients, well,
  • 56:27I mean and there were a lot of practice
  • 56:30changes, right, especially in.
  • 56:34Especially in China, I think,
  • 56:35but also with sort of substituting
  • 56:38oral medications whenever possible,
  • 56:40even if they were sort of known
  • 56:42to be inferior or not, you know,
  • 56:44not quite as good so that patients
  • 56:46didn't have to come into the.
  • 56:47To the clinic.
  • 56:48So that's been presented on in
  • 56:50in in some settings,
  • 56:52but you know I think what we think that
  • 56:54those substitutions are are generally OK,
  • 56:56I know that.
  • 56:58You know a lot of people went on
  • 57:01neoadjuvant hormone therapy and
  • 57:02instead of going direct to surgery
  • 57:04for early stage breast cancer and
  • 57:06you know so that they could push
  • 57:08this you know during periods of time
  • 57:10when when elective quote UN quote
  • 57:12elective surgeries were shut down.
  • 57:14So all those things probably add up right.
  • 57:17But there's but there's absolutely
  • 57:19a factor of psychology and patients
  • 57:22being afraid to come into the
  • 57:24clinic and you know potentially
  • 57:26again skipping a a treatment or.
  • 57:29So,
  • 57:29so I think.
  • 57:30To answer your question is that it's
  • 57:33quite complex but I think we need to
  • 57:35understand it better and of course
  • 57:38new diagnosis coming in which we're
  • 57:40starting to get that information.
  • 57:42There's clearly a stage migration and and
  • 57:44you know to later stage more more advanced,
  • 57:46more metastatic disease.
  • 57:48Because of delays in screening and so forth.
  • 57:51So.
  • 57:51So I think we're going to face a
  • 57:53we're going to face a challenging
  • 57:54decade and I you know I think Ned
  • 57:56Sharpless forecast that at the
  • 57:57very beginning of the pandemic.
  • 57:58I think in the first month or two he
  • 58:00wrote a paper and nature of science
  • 58:01I think you know modeling out what
  • 58:03what that might look like and and
  • 58:05and you know that's probably going to
  • 58:07that's probably going to come true but.
  • 58:10Hopefully COVID ends really soon.
  • 58:13So. Um, yeah.
  • 58:20Seems like.
  • 58:31Yeah so we're so we're overtime and and
  • 58:33I think you know I mean there's there's
  • 58:35many strategies to try to mitigate but
  • 58:37you can't you can't eliminate bias right.
  • 58:39So you you can understand it you
  • 58:41can try to mitigate it there's you
  • 58:43know there's matching strategies to
  • 58:44if you're doing sort of a you know
  • 58:46case and control style approach where
  • 58:48you you try to make the controls as
  • 58:50similar to the cases you know and
  • 58:52everything but the. Closure.
  • 58:54So you know and and and some of those are,
  • 58:56some of those been around for decades,
  • 58:58some of those are kind of
  • 59:00emerging at this point.
  • 59:01But I don't think we can forget that there's
  • 59:04bias in in perspective trials as well, right.
  • 59:08So I mean I think either side
  • 59:10of the of the coin.
  • 59:13Yeah, it's just, it's just,
  • 59:14it's just one more thing and it's
  • 59:15not the only, I mean it's there's
  • 59:17also there's ascertainment,
  • 59:18but I mean there's a lot of biases,
  • 59:20right and. You know, one thing we've
  • 59:22worked on with our consortium is developing
  • 59:25standardized language around limitations,
  • 59:27which I think is critical because you know.
  • 59:31I mean, the data is the data are
  • 59:34the data use the plural, right? But.
  • 59:37But the way it's presented really
  • 59:40does influence the reader, right?
  • 59:42So. So that's something we're
  • 59:43thinking about and might have some.
  • 59:45You know, thought pieces or something
  • 59:47coming out about about how to handle that.