Preloader Icon

The Informonster Podcast

Episode 6: Three Takeaways from the COVID-19 Pandemic and the Importance of Managing Data Quality

April 15, 2020

On this episode of the Informonster Podcast, Charlie Harp talks about how healthcare evolves from the edges, three core types of patient data, and the three takeaways from the COVID-19 Pandemic that can help us be more prepared for the future.

View Transcript
I’m Charlie Harp, and this is The Informonster Podcast.

On this episode of the Informonster Podcast, I’m going to talk about the COVID-19 pandemic, and the three big takeaways relative to data quality. Before I start getting into the meat of today’s podcast, let me start out by just kind of creating a reference framework for what I’m going to be going through, as I talk about the different takeaways relative to COVID-19 and how we’ve dealt with it. There are three core types of data in healthcare. The first type is patient instance data. Patient instance data is all the information that we accumulate for a given patient that tells us what’s going on with them. And it’s primarily made up of two subtypes of patient instance data. One is unstructured data and includes things like encounter notes, things like that, radiology images, and the other is structured data. And that’s the data that has all the lists of conditions, and lab results, and medications, and all that rich structured data that provides the main driver for decision support, analytics, and things of that nature.

The structured data, which drives a lot of our core activity around healthcare analytics, is made up of reference data. So if you can think about the patient’s record, their instance data, as being something like a recipe, it’s got all these bits and pieces of it, but they’re actually using ingredients that come from the reference data (sic): The reference data is made up of the proprietary codes that are living in dictionaries, in the E.M.R., And the other systems that are being used in the front lines of healthcare. And then there’s also the standards that are being developed by folks like Reganstreif, National Library of Medicine, the World Health Organization, and the A.M.A., and others like that; and the local dictionary reference data we use to create the patient instance data. And then typically we map it, or bind it, to the standard terminologies, so that we can share that information with others, whether it’s a data partner, or C.D.C., or somebody else.

The last type of data is master data. And, whereas reference data are the value sets and code systems we use to articulate what’s going on with the patient, the master data are the codes in structured data that we use to describe the topology of an enterprise and its resources. So, “Who are the providers? What are the facilities? How many beds are there? What equipment is there?” – and things of that nature. So we’re going to call that master data. So those are the three big buckets of data that we engage with when we try to create data artifacts for people that are trying to make decisions about what’s going on; in this case with something like COVID-19. The one additional thing that I’m going to say is to remind everybody that I believe that healthcare evolves from the edges. Whereas in other industries, they can be driven by the center, healthcare is driven by the edges. So the codes that we create, (and) the things that we’re doing are usually encountered at the edges of healthcare before they make it to the center. That means that in order to create the patient instance data, enterprises, hospitals, clinics have to create data – reference data – before it’s created by the center. Before the center comes out with a standard, it grows at the edges first. I hope that makes sense because it’s one of the things that makes health care different than other industries is this emergent need, “We have to create data before the center knows what the heck is going on.” And COVID is a great example of this because those facilities that had to figure out how they were going to document what was happening, test results and all these other things, they were dealing with it before the folks in the standards organizations had an opportunity to figure out what to do about it, what to call it, how to create data relative to it. And that’s an important point because it forces us to think about things a little bit differently.

When I say that, what I mean is that really the first big takeaway from looking at what happened with COVID-19 (sic). And the first big takeaway is that when we create new reference data in an emergency situation or a public health crisis, unless we establish a shared semantec, a shared way of talking about it, we’re going to create well-intentioned chaos and semantic confusion at the edges. And it may not fundamentally impact a single facility, but when it comes time to pull that data together so we can figure out what’s going on as an enterprise or as a nation, that lack of a shared semantic creates a whole bunch of impedance. We have to now figure out how to map things. And you might’ve called it “2019NCove2.” They might’ve called it “SARS-Coronavirus-2”. I might’ve called it “COVID-19”, and someone else might’ve called it “SARSCOV2”. And yet someone else might’ve called it “Special Virus Test Number 27”. And when we try to pull that data together, when we try to throw that data at the public health laboratories, or we throw that data at the C.D.C, somebody’s got to sit down and figure out how to bind it all appropriately, how to wire it all up appropriately so that the data makes it to where it needs to go so that somebody somewhere can make sense of it. Because the sooner we make sense of it, the sooner we’re able to act upon it and make decisions about it. So the first big takeaway, for me, is that we should find a way, when something like this happens, to quickly not establish a code because it takes people time to establish a code, but we should be able to quickly say, “this is what we’re going to call it. If you’re creating a local lab test, call it this. If you’re creating a local diagnosis, call it this.” If we can agree on a semantic, that’s pretty easy.

Coding systems… For whatever reason, people wrapped around the axle sometimes, when it, when it comes down to formal code system definitions, and we cannot be hung up by that. And don’t get me wrong; the folks at LOINC, and SNOMED, and W.H.O., and AMA, they responded pretty quickly and we got those updates out to everybody as soon as they were hot off the presses. But from February to the end of March, healthcare was dealing with it, but there were no codes for it. And so now all these folks at the edges brewed up their own codes with their own names, and now we’ve got to figure out, “How do we map that in a reasonable and intelligent way without taking too long so we can get the data out there?” The other thing too, that I thought was interesting is, you know, we can focus on what we call the code, but when you think about the lab results, one of the things that I encountered when I was talking to a client, and I hadn’t even thought about it, was that they were getting results from all over the place for these qualitative COVID tests, and some of them were positive and negative, which is what LOINC has in their definition, and some were detected and not detected. And the question came up, “Can we convert, detected to positive, and Not detected to negative?” I’m not clinical, so I’m not going to pretend to answer that, but that’s the kind of thing that, if we had a shared semantic where we said, “Listen, we’re going to call it Coronavirus antigen, and we’re going to result it with positive and negative,” and everybody writes that down and everybody builds it that way, and then when you go to share the data, the simplest of algorithms can help with the semantic interoperability of mapping that code. So that’s takeaway number one, we need to figure out a way, when we’re dealing with new things, especially in a crisis, to quickly come together and say, “We are going to call it this. We’re going to result it this way. Now, go off and create your codes. And when we have a standard, we’ll tell you what it is.”

All right, let’s talk about thing number two. Thing number two is a little bit bigger than thing number one, and it goes back to this whole notion of why data quality is important. Now, sometimes when I talk about data quality in healthcare, I get a little passionate about it and people look at me like, “What’s wrong with you?” And the reason I get passionate about it is because, to me, it’s like the most obvious thing in the world. If we want to do decision support, artificial intelligence, machine learning, reporting, anything – anything – data quality is important. It’s a lifestyle style choice. It’s not an optional thing; it’s not a project. So when I see people set it aside for other priorities, from a data or technology perspective, I get a little frustrated because I just can’t see how they can’t see how important it is. And I think, with COVID-19, when you look at all the confusion that exists around it, part of that, and I’m not saying all of it because healthcare is complicated, right? (sic) But imagine a utopian future where we have optimal data quality, where we completely trust the data that we have on every patient, it’s clean, it’s longitudinal, it’s precise, and something like this happens. Imagine the ability to quickly and easily aggregate that data to a central data warehouse. And because we completely trust the data, and because the data is clean, precise, accurate, that we could start to immediately say, “Why are these patients going into the ICU? What drugs are they on? What lab tests have they had? What comorbidities do they have? What genetics do they have? What’s their blood type?” If we had high quality data, I’m not saying it would magically, you know, say, “This is exactly what’s going on with a patient,” but let’s take the term precision medicine and flip it around. Because when we talk about precision medicine, we talk about, “Timmy has these problems. Timmy’s getting decision support that is highly localized to Timmy.” But if we know that much about Timmy and we flipped precision medicine around, then all of a sudden, when we start talking about population health and analytics, we can create a high resolution picture – quality picture – of all the Timmies, and what’s leading people to require a ventilator, and what’s the story with people that are asymptomatic. Is there a pattern there? Is there something we can see? And so this is not a new lesson, but this lesson is uncertainty in our data impacts our confidence, and a lack of confidence creates indecision. It stops us from saying, “I know what’s going on. I can see it in the data. I know why older folks, some older folks, are requiring ventilators. It’s not just that they’re older is they have these specific comorbidities.”

I’m not saying I know that by the way; it’s hypothetical. Being able to see the data, rely on the data, and have a precise, good picture of what’s happening with the patients. And this is also, by the way, goes back to the unstructured data. Unstructured data is the embodiment of uncertainty. When we’re looking at an encounter note and that encounter note is unstructured, and we’re running N.L.P. against it to try to tease out what the symptoms are (or) what’s going on, we’re throwing a big cloud of uncertainty into that data too. So we have to find a way to get high quality, high certainty data, even from the encounter, into this analytics environment. We need to make sure the structured data is the data that we can trust and rely on. Because if we can do that, we can increase our confidence in our analytics. We can run the kind of reports, and the kind of artificial intelligence and machine learning that can look for patterns in the high quality data and come back with answers.

This whole idea of us turning to the quality of our data, turning to high certainty, or high fidelity inter-operability because that’s the other thing too (sic). You know, we want to have high quality data, which means we have to practice high quality things and live a high quality life when it comes to data. But we also have to remember that, when we’re sharing data with someone else, it’s important that we pass the quality through to them as well because they’re passing data to us. And we don’t want to get bad quality data, so we shouldn’t give out bad quality data. So that’s thing number two. Thing number two is a no-brainer. We really need to focus on the quality of our data so, when we get hit with a public health crisis, we can act with clarity and certainty and not wonder whether or not what we’re seeing in the data is real.

Number three is something that I think often gets overlooked in healthcare. And that’s the reliability of the master data that we’re dealing with. And, you know, a lot of times in healthcare, we focus on the patient, we focus on populations, we focus on those things that drive outcomes, and we don’t always focus on, or pay proper attention from a data quality perspective, to things like the enterprise typology, the facilities, the beds, the rooms, the providers, the equipment, the supplies. And one of the big takeaways from what’s been happening with the Coronavirus pandemic is this focus on ventilators, and masks, and personal protective equipment. Who would have thought three months ago, four months ago, that we’d have a 24 hour news cycle where they’re mentioning ventilators every 15 minutes; where they’re talking about, “Where are the masks? Who has masks? Do you have masks?”

This is the kind of thing that… Some organizations, by the way, did a fantastic job. I’m not going to name or shame, but the bottom line is there are some organizations out there who really had their act together. If you Google around, you can find them. But when it came time to understand their ventilators, their equipment and everything, they were on it, they had their act together. They had done a really good job mastering their master data when the crisis hit. They were able to very quickly figure out what their capacity was and where everything was located, and that’s gotta be a good feeling. And there were other organizations, I’m sure, that were not prepared, and they scrambled, “Where are the ventilators? Where are the masks? What are we going to do?” And you end up, you know, with clipboards and running reports and trying to figure out what the heck is going on. And so I think that the third takeaway is, really, when it comes to the data for your enterprise, this is an enterprise thing as much as it is a facility thing because if you’re a single facility, you probably have a decent handle on where your ventilators are.

The reason this thing is a takeaway for me is if you’re an enterprise, and you have dozens or hundreds of hospitals and facilities, that’s where you have to kind of get a handle on where is all this stuff so that you’re not on the phone, calling every facility saying, “How many ventilators do you have?” So the takeaway is when it comes to this master data, when it comes to looking at your resources and your enterprise topology, that’s another area where you have to live a data quality lifestyle when it comes to that kind of data because you don’t know when a crisis is going to hit. Nobody, before this happened, would have been thinking, “I need to find 4,000 ventilators,” but if you have all this stuff managed, if you’re focusing on data quality when something like this happens, you’re not throwing your arms up and running around. You’re calling the people in your B.I. Department and saying, “Run a query and just show me where everything is.” That’s my third takeaway. It’s a reminder that data quality is a lifestyle choice, and you need to focus on the quality of the data, not just the patient data, not just the reference data, but also the master data. If you’ve got data quality, and you’re maintaining it, when you run into a crisis, you won’t be happy, but you’ll be prepared.

All right. So that’s it for this episode of the Informonster Podcast, and I would like to say one more thing. Clinical Architecture, in partnership with Logica Health, MITRE and Regenstrief, and possibly others, has rolled out the COVID-19 Interoperability Alliance website where we’re producing data resources like value sets, and also useful links. So I recommend you go to and check it out. We’re also always looking for contributors. If you’d like to be part of what we’re doing to help pull together these data resources, we’d be happy to have you. And I’d also like to say thank you to all of the providers, the doctors and nurses, and other folks on the frontline of this, as well as all the folks in the essential businesses, whether you’re a firefighter, police officer, or the person who’s got to go into the grocery store and work the cashier so the rest of us can eat. I really do appreciate and say thank you for, uh, for being there. Heroes come in all shapes and sizes, and, if nothing else, in this current pandemic has shown normal, everyday people can step up to the plate and do what they have to do. So thanks to everyone and also to everyone who has sacrificed their social lives and stayed home to avoid spreading this to more folks. So this is Charlie Harp from the imprimatur podcast saying, stay well and stay safe. And thank you.

Follow Us

Have a question or topic idea?

Get our News and Updates

Get notified about new podcast episodes, upcoming events and webinars, and more!