The Informonster Podcast
Episode 21: Lab Data Interoperability
August 18, 2021
On this episode of the Informonster Podcast, Charlie is on the road talking about the relationship between clinical lab data and interoperability. He talks about LOINC, as well as the use of lab data in AI and machine learning, and some of the challenges faced in these applications.
Hi, I’m Charlie Harp. This is the for monster podcast. Now, on today’s podcast, I’m doing something a little bit different. I’m recording this podcast as I drive across the lovely desert between Grand Junction, Colorado, and Los Angeles, California. So right now I believe I’m in Utah. The view is breathtaking. I’m helping my middle son relocate to sunny Los Angeles from the Heartland of, uh, Indianapolis, Indiana.
Now, today’s podcast, what I wanted to talk about is lab data interoperability, clinical interoperability specifically, because there’s a lot of activity right now with Shield, and LIVD, and all these things that are happening when it comes to the ability to get lab results and, uh, and utilize them in an interoperable fashion at a large public health level, which is similar to a enterprise level, but obviously on a much grander scale. So first let’s talk about, once again, what we mean by interoperable. Interoperability can mean a lot of things. It can mean semantic interoperability, where you’re normalizing to a standard normative terminology like LOINC. It can mean, um, syntactic interoperability, where you’re taking a message and being able to format to a way that you can unpackage it and make use of the data. Canonical interoperability, where the data is, is logically organized in a way that makes sense so you can use it, which is slightly different than syntactic. And of course, physical interoperability, which is easy part, which is getting the information from you to me, which is, should be the easy part, but when it comes to faxing and things like that, obviously, you know, I can get it to you, but it’s still not physically inter-operable if it’s not actually data, but I won’t climb up on that particular high horse.
So, um, now that we have kind of a rough, high level definition of general interoperability, let’s focus in on labs. Now let me start out by saying I’m not a clinician. I’m not a subject matter expert when it comes to the clinical aspects of lab data. I am just a simple country programmer who’s been working with lab data since 1987. And no, I didn’t do that while I was in vitro. I, uh, I was a functioning human adult. I know, I know. I have… Despite my, uh, youthful exuberance, I’ve been doing this for a while now. So, um, lab data can be lumped into buckets, and let’s start by taking microbiology and things that are like microbiology out of the picture, and let’s just focus on analytical lab. So you might think that analytical lab is pretty straightforward when it comes to interoperability. If you can normalize to a normative terminology, um, you should be able to leverage that data. The challenge with analytical labs is a little more esoteric than that. And by the way, if anybody listening to this podcast wants to chime in or have another podcast to get into it in details, or to debate certain topics with me, I love that kind of thing. I’d be happy to do it. And, you know, if I’m, if I’m incorrect or if I’m off base, I’m the first person to admit that if it provided evidence to that point. So, um, when you think about clinical interoperability for labs, one of the things we have as an advantage is we have LOINC. LOINC has been around a while. It’s very comprehensive. It has essentially, um, I think 14 axes that are… Some axes have multiple axes kind of hidden within them. But the bottom line is there’s a certain amount of specificity, um, and granularity there. They’ve put a lot of work into it, it’s been around a long time, and it’s a well-established standard for reporting and, uh, and providing kind of a reference anchor for lab analytes. Especially if we stick with like general analytical lab; things like chemistry, hematology, things like that.
So you’d think if we have that hurdle overcome, of having a common, um, normative terminology, that we’d be in great shape. And you’re kind of right because LOINC provides a lot of value, and we’ve put a lot of effort over the last decade to getting people to try to leverage and use LOINC. And don’t get me wrong. There are some challenges because, you know, LOINC goes to great lengths to be comprehensive, and so sometimes the thing you’re mapping to is very specific. And one of the challenges that people have struggled with is kind of the ontological one around LOINC, and you know, what if I want to roll things up clinically? And what if I want to group things and do things of that nature? And that’s not what LOINC is, at least not today. It has an ontology, but they don’t really recommend using it, at least last time I checked, and it’s really just a way to kind of decompile the axes. But even in that, in that unofficial ontology that Regenstrief provides with LOINC, it’s still pretty cool. There’s a lot of really cool things in there, and if you’ve never taken a look at it, um, it’s worth looking at for the synonymy and all the other cool little nuggets that are hidden within that, and we could probably have a podcast where we talk about that. If you’d like us to do that, let me know. So when it comes to lab data, there are different people with different opinions, and I’m just coming at this from a pragmatic, analytical, engineering point of view because in the olden days, and maybe today, when I get lab data from somewhere else, my number one concern is I want to be able to take that lab data and I want to just display it to the clinical person that is trying to make a decision about the patient. So a lot of the mapping that happens in EHR’s today, when it comes to lab data, external lab data, is I want to get it on the chart so that the provider can see it and factor that into their calculus when they’re caring for the patient.
But I think one of the things that’s happening in public health, and in enterprise health for lack of a better term, is, um, we’re trying to use that data to do analytics and look for patterns and to do things like AI and machine learning. And the challenge is if you’re super granular with some of that data, well, you can’t combine that data. You can’t turn that data into information in motion, which, um, when you start looking at, especially this quantitative data as a vector, or you start trying to use it for things like inferencing or other reasoning use cases, it can be challenging because you have to kind of resort to value sets or you resort to, “If it’s this or this or this or this or this,” because you don’t have a way to reasonably combine it into a single compatible thing based upon the use case you’re presented with.
And there are a couple examples of this. So you might normalize a lab test, say for SARS-COVID positive or negative, and you have that test, but one of the things we may not have is a standard way of resulting that test. In fact, I said in a previous podcast that, you know, we found there were 74 ways to say “positive.” And of course that’s a challenge, so the standardization of things that are not quantitative, that are ordinal, um, can be a challenge because then you have to say, “Well, if they say this, this or this or this or this,” and that makes building the rules and making sure they’re going to function appropriately a challenge. Now, when it comes to quantitative results, you have a similar problem. The first problem is: Are they the same result unit? So, you know, you can have things that have properties where the property is, say, mass concentration, but the people doing the testing might use a different unit of mass concentration. Now, LOINC puts a lot of energy, and they’ve kind of established these properties, but most, uh, laboratories that do testing don’t really deal with the concept of property because it’s inferred based upon the unit.
So I have a lab test, and I’m gonna say that it’s, you know, nanograms per deciliter. I’m not going to necessarily point out to you, “this is mass concentration.” I’m saying, “here’s a unit.” So if I’ve got one mass concentration unit, mass over volume, and you’ve got a different one, let’s say (in) your test, you’re doing it in pounds per gallon, which I know pounds is weight and not mass, but it’s, I’m giving a hyperbolic example. At some point you have to have a mechanism for taking the test, which for all intents and purposes might be clinically equivalent, and bring them together by converting the value to a common base unit; In this case, let’s say it’s nanograms per deciliter. So the first stumbling block, when you go to combine quantitative lab results, is this stumbling block of getting them to a common base unit so that you can analyze them, trend them, and graph them using the same unit. And that’s pretty straightforward. The other thing that’s a little more esoteric is the method of the test because, you know, in some cases, the lab test itself, how it’s performed, and maybe even with the same unit, you might interpret the result differently. That may not always be the case, but when it is the case, it’s important. So another thing you need to look at is, well, does the method matter in how I’m clinically evaluating this or how I want a reasoning algorithm to clinically evaluate this?
And I’m really curious, I haven’t spoken to a subject matter expert about this, but as an engineer, as somebody who, you know, understands a fair amount of, uh, science, one of the things I run into every now and then is where people say, “Well, you can’t combine these tests, the results, if they have different reference ranges.” Now, I tend to believe that when it comes to a lab test, there are things that are contextual and there are things that are not. Things like what were they eating, what time of day was it, what’s the age of the patient, what’s the weight of the patient? Those things are all relevant. They’re all relevant to the context of when the test was performed, but the result itself lives outside of that. Now you might decide that, “I’m not going to combine things and have different contexts,” but if I do a, um, I don’t know, an albumin test on serum, and I get a result, regardless of the context, you could argue that the result is the result. If it’s the same unit of measure, the result is the result. The context might factor into how I’m interpreting it, but the result is the result. It’s this number for this unit. Now, um, when it comes to reference ranges, my interpretation of a reference range is that reference ranges are used by the people performing the test to decide how they’re going to flag the test as normal or abnormal, or, you know, low (or) high panic values. And it might based upon how they’re looking at the tests relative to how they’ve calibrated, the instrument that they’re running it on. And of course, reference ranges often are also specific to the age, and gender, and possibly, you know, comorbidities or other factors going on with the patient. But the result is the result. And so the question is, if I get two lab tests from two different places and they have different reference ranges for the same patient context, should I combine or not combine those tests, the results into a single, um, a single vector?
Now I’m not gonna, you know, put forth an answer to that question. I want, I’m really curious whether people think… Well, maybe I will put forth an answer to that question. I’ll say that to me, I would think it’s legitimate as long as the context is the same. And let’s say there are no special circumstances. There’s no time aspect. There’s no, you know, there’s no post-dosage information. It’s just a regular, run-of-the-mill test, same type of instrument, or same methodology, let’s say, um, same specimen type, but I’m getting it from two external labs, and let’s say it’s a week apart. And the labs for the same age and gender have different reference ranges. Now this is kind of a thought problem. A hypothetical question. If the one lab’s reference range says the value is low, and the other lab’s reference range would say that it’s not low… So let’s say one of the lab results is normal for both, in both lab’s reference ranges, but it’s low. The other result is low for the first reference range and normal for the second reference range. I know I’m making this really complicated without a whiteboard, but my question is can I combine those results? Because some, I would say the answer is yes, I would be able to combine those results if you know, I believe that both reference labs, both external sources, are valid and they’re doing a good job testing. They might have different interpretations of that result, but the result is the result. Because if the answer is, “no, I can’t combine those results,” then it’s almost as if you can’t ever combine results, you can only combine… You can almost evaluate everything as an interpretation of a result, and everything becomes an ordinal result at that point. We say it’s low, we say it’s high, which makes the whole concept of quantitative analysis and lab results kind of squishy, if you can’t combine those results.
I think this is a relevant question because the more we move down towards this idea, when you look at initiatives like Shield and LIVD, of being able to combine things into a continuum, into a set of vectors relative to a patient or a collection of patients, there’s this idea that I have to believe that I can trust the result. I have to believe that I can trust the quantitative result in context and make decisions based upon those quantitative results in context. If I can’t, then the whole foundation of public health is in question, I would think. Or we have to more strictly mandate things like reference ranges and how we calibrate lab instruments. So, I think that if we can agree that we can assume that the results are right or correct, and we assume that we can do base unit conversions, and we can decide which contexts caused things to fall in or out, then we could establish a meaningful way to combine data so that we can do analytics at scale. And I think those are some of the things we need to be able to do, to do some of these public health initiatives that I think, um, as a, as a nation and as an enterprise, those are the things you got to kind of get your arms around so you know that you can trust, normalize, and meaningfully leverage this data that you’re collecting from your partners out there doing work in the field.
All right. So that’s kind of what I wanted to talk about on this edition of the Informonster Podcast. As I make my way through the desert, um, I appreciate you guys tolerating whatever road noise came through, if I can sneak this past the people that publish the podcast. Enjoy this little personal touch. Um, I look forward to your feedback. I would love to have a panel podcast where we talk about this very topic. So maybe I’m kind of poking the bear with a stick a little bit to get some people to come out of the woodwork and talk with me about this. Because I think, you know, in public health, as we look in the rear view, or hopefully in the rear view, at COVID and look to the future at how we can do a better job of Biosurveillance, monitoring and understanding these patterns that we see, um, I think getting our arms around something that should be solvable is going to be important. And to do that, we’re going to have to answer some of these pragmatic questions and decide, you know, where we’re drawing the line.
Anyways, I am Charlie Harp, and this has been your “in the wild” edition of the Informonster Podcast. Thank you, and take care.
Have a question or topic idea?
Get our News and Updates
Get notified about new podcast episodes, upcoming events and webinars, and more!