Preloader Icon

The Informonster Podcast

Episode 16: Living On The Fringes with Semantic Ambiguity

March 4, 2021

On this episode of the Informonster Podcast, Charlie Harp discusses the importance of attaching accurate details to terms in a system in order to provide a window into the thought process of the scribe. He discusses treating term definitions as personal notes and how doing so keeps the system from having the same term defined in 73 different ways. He also explains how having this infrastructure in place could mitigate the effects of Clinical Emergencies like COVID-19.

View Transcript

I’m Charlie Harp, and this is the Informonster Podcast. Today on the Informonster Podcast, we’re going to talk about the concept of semantic dissonance. We’re going to talk about what that is, what forms it takes, and how it can impact the quality of the data that we use every day in healthcare. But first, I wanted to share a story with you that is a great, practical example of why data quality matters.

Now, this story takes place across the sea, in a kingdom on an island called Great Britain. The protagonist of our story is a strapping young journalist named Liam. As you likely know, Great Britain has a National Health Service, or NHS, and every patient in the NHS has a unique identifier and their health information is somewhat standardized. Furthermore, their clinical data is aggregated within regional clinical commissioning groups, or CCGs. It is a veritable healthcare information utopia. Now, during this time of COVID, the NHS decided to use algorithms operating against the data to help determine which citizens should be prioritized for vaccination based on their age and medical conditions. As an aside, as a person who has committed much of their adult life to healthcare information technology, the situation I’m describing is exactly the kind of thing that we strive to enable; enabling software to help make good decisions quickly. So I want to say that I applaud the initiative and the concept, and nothing I say from this point forward should detract from that. Back to our story. Now, the NHS algorithms were run and the invitations to get the vaccine, or the jab as they call it in a UK, were sent out. When our hero Liam got his invitation, he was really confused as to why he was receiving it so early in the process. I mean, after all, he’s in his thirties, he has no chronic conditions. Why was he being prioritized above other people that were obviously more vulnerable than him? Because he had this concern, he contacted the CCG and asked why? Why had he been prioritized? What did they know that he didn’t know? Was he a long lost member of the Royal family? Or did he have some terrible condition that the NHS had not informed him of? Well, it turned out to be the ladder, the CCG informed Liam that he was morbidly obese.

Now this is a surprise to Liam, who did not feel morbidly obese. I mean, could he lose a few pounds? Sure. But morbidly obese? “Nope,” the CCG informed him. “You are indeed morbidly obese because you have a body mass index, or BMI, of over 28,000.” 28,000. Now, for those of you who are not familiar with the formula used to calculate BMI, it is your weight in pounds times a constant of 703 divided, by the square of your height in inches. Now at a height of 6’2″, to achieve a BMI of greater than 28,000, Liam would have to weigh in at an impressive 224,000 pounds, which is roughly the weight of a real-world locomotive engine. Liam glanced around his flat, making sure it was not a wheelhouse, and informed them that there must be some kind of mistake. Nope. They confirmed that his weight on record was just a bit over 200 pounds, but that was not the issue. The issue was his diminutive height of 6.2 centimeters. According to the NHS, Liam was two and a half inches tall. So after what was probably somewhat awkward conversation between Liam and the NHS, he was able to convince him that he was actually not 6.2 centimeters tall. That corrected his BMI and he was appropriately slotted to a later set of immunizations.

The moral of the story, ladies and gentlemen, is that data quality matters, and it matters even more the minute we ask software to help us make decisions using that data. In this case, it resulted in a whimsical story that conjures adorable images of a tiny, albeit be at a bit chunky figure, maybe dressed in a green suit and a top hat with some lucky charms (Sorry, Liam,) getting an early immunization, but it could also be something more serious. Something that resulted in a missed intervention or a wrong intervention. What I also would like to highlight about this story is that it brings into focus that data quality is not always about terminology. In this scenario, it was a ubiquitous entity that lives between terminology and quantitative data: The unit of measure. The shift in unit of measure from feet and inches to centimeters, this one little mistake, opened the door to algorithmic mayhem. I want to say thanks to “metro.co.uk” for sharing Liam’s story and providing me with the opportunity to turn it into a cautionary tale, and thanks to Liam for being a thoughtful member of society and thinking of others.

Let’s talk about semantic dissonance. The term “semantic” originates from the ancient Greek adjective “simantikos”, meaning significant. Typically when we talk about semantics, it’s all about the expression of truth or meaning behind the words and tokens we choose to express information. When it comes to healthcare data quality, what we often struggle with is not semantics, per se. We struggle with this semantic dissonance, and semantic dissonance is essentially where things don’t mean what you think they mean or the meaning of things is not clear. Now, today I’m going to focus on two specific types of semantic dissonance. One is semantic ambiguity, and the other is semantic variance. Let’s start with semantic ambiguity

Semantic ambiguity, well, ironically semantic ambiguity is a relatively semantically certain term. It is what it says it is: It’s where the meaning is ambiguous. And it’s essentially when you have a piece of information that could be a lexical fragment, a word, a phrase, a sentence, a code, or an entire paragraph that was created to express something meaningful, but either it lacks the necessary specificity or you lack the context to comprehend the meaning within an acceptable level of certainty. Now we experienced this in everyday life. In my first visit to the UK, I was getting settled in at the hotel shortly after arrival, and before turning in for the night, my colleague who happened to live in the UK, approached me and said, “Charlie, have a good night. I’ll come by your room and knock you up in the morning.” Now I hadn’t had much sleep on the flight, so I was a little foggy, but even so my inner Inigo Montoya said, “I don’t think that word means what you think it means.” And before I was compelled to contact human resources, rather than go with the inconceivable interpretation of what my colleagues said, my semantic background process concluded that he was telling me that he would knock on my door and wake me in the morning, which thankfully turned out to be correct.

So If we encounter this as humans when we’re interacting with one another, then certainly when we’re sharing information between software platforms, it’s going to happen. In fact, it’s going to happen in a much worse way because of the way computers operate, right? So when we’re pushing around patient data, I like to think of a given patient’s medical record as a mosaic of sorts: A collection of informative entries that when combined properly have the ability to create a picture, a magnificent work of art. Work with me here. Now did you know the small fragments of material that are used to make a mosaic are called tessera, or tesserae for the plural? So sticking with my mosaic metaphor, when assembling the mosaic for a given patient, we are provided with a variety of tesserae from different domains, representing various types of information and differing degrees of semantic ambiguity. For example, if you have a lab result with a LOINC code, a valid result in a unit, the meaning of that piece of data, that tessera is relatively clear or semantically certain. However, if you get a local code for a condition that is abbreviated or truncated and devoid of context, it can result in semantic ambiguity, with the actual meaning uncertain. When you have these tesserae that lack resolution or clarity, and you place it into your mosaic and you pass it on, it impacts the quality of the overall picture. If you have too many of these ambiguous tesserae in your mosaic, you wouldn’t be able to tell if it was a portrait of your uncle Jim or a bowl of fruit. Now the origin of these ambiguous tesserae can also vary. It could be that the terms were created by a bad process 10 years ago, by a guy that’s working at the “gas and sip” now. It could be that they were discreet text, hastily entered into a patient’s record so that someone could rush off to deal with an emergency. It could be the slip of a finger on a mouse button that chose the wrong unit of measure, like with our friend, Liam. Regardless of the cause this ambiguous tessera will blur the mosaic and wreak havoc on those trying to make sense of it in order to make decisions based upon it, be they human or be they software. Now we are humans and as such, we are wired to wrangle uncertainty to a certain degree. We look at a cloud and see a bunny. When we look at a blurry mosaic, we mentally erase the tessera that confuse the image we believe is there. That’s how we’ve dealt with patient data in healthcare for the last 20 years.

Unfortunately, computers are not as good at seeing the bunny in the clouds. Software requires a certain amount of resolution to see the picture, and the more we try to leverage software to help us in healthcare, the more we experience the impact of the ambiguous tesserae in our patient mosaics. So this whole notion of placing information into the patient data that is unclear, lacking resolution, (and) lacking context is semantic ambiguity, and it wreaks havoc because it increases the likelihood of us changing the story and misinterpreting that mosaic as it goes from place to place to place.

So what about semantic variance? Well, if semantic ambiguity creates confusion because we don’t provide enough information to establish certainty. Semantic variance creates confusion because we have too many ways to express the same thing or the same meaning. So let’s take a look at a very real and poignant example. In 2020, we had an event occur in healthcare. You might’ve heard of it. We had this thing happen called COVID-19. When emergent crises happen in healthcare, like COVID, we often find ourselves unprepared to deal with it in many ways. One of the ways we can be caught flat-footed is not something that will necessarily make the evening news, and that’s when we lack the terminology and the software to express the information that we need to collect, move, and display the data. Now, when this happens, as it did with COVID, the people on the front lines need to create terminology. They need to have the codes to indicate a patient is being tested for the condition, has resulted positive or negative for the condition, has the condition, is being treated for the condition, or been immunized against the condition. This happens fast, relatively speaking, and in each silo, at each organization, people create these unique tesserae and start putting them into their local mosaics for their patients. Then at some point in the future, they send the mosaics out into the world and hope that everyone that encounters them understands what they meant Now, in the case of COVID, this happened, with each enterprise and testing facility sharing COVID test results with the folks charged with reporting those statistics to our government. Now you might be thinking, “Charlie, how difficult could it be to understand that someone was being tested for COVID, and whether they were positive or negative? It’s just a handful of tests, and they are positive or negative. Surely it’s not that challenging.” Now I’m going to bypass the obvious “Don’t call me Shirley” joke and answer your question with a question: Assuming that the labs were only sending the results of COVID tests, and further assuming that they were sending only one variety of test results, how many variations would you guess there were saying that the patient was positive for COVID-19?

Now I’m not saying positive or negative, just positive. How many lexical variations do you think were encountered, when the data was aggregated and analyzed, to say positive as a result code under a COVID 19 test? The answer… Well, wait, wait, wait, write it down. I want you to write it down on a piece of paper and then I’ll tell you. Take your time, go ahead. Did you write it down? All right. I trust you. The answer, ladies and gentlemen, is 73. That’s right, 73. That means that across healthcare, knowledgeable subject matter experts in the trenches, trying to stay ahead of a pandemic, we’re able to say positive 73 different ways. I’m not complaining. Those folks did nothing wrong. In fact, if you were to take the spreadsheet that I have and review each of the 73 lexical variations, you would likely conclude they’re all correct from the point of view of the person that created them. But the fact remains that this variation caused a nontrivial amount of work when it became necessary to combine all of these tesserae into a mosaic that made sense, and that work is still being done to this day.

So what is the root cause of semantic ambiguity and semantic variance, and semantic dissonance in general? Well, like with many things, my dear earthlings, the problem is us; It’s humans. Don’t get me wrong, I’m pro-human. Many of my dearest friends are humans. I’ve been accused of being a human myself from time to time. But these problems all began when we humans create information intended to represent concepts in healthcare, or really in any system for any purpose. The human creating the code has something in their mind. They assign a code and select a lexical representation of the concept they want to express and their language of preference. They might have limitations in terms of the length of what they can put in, they might not know how to spell it correctly, they might over or under specify what they were thinking, or even though they know exactly what they meant, the words they chose were insufficient to describe what they were thinking without the context. But in their mind, in that moment, it make total sense to them. So you might be wondering: How often does this happen? How often does something like this get created? Well, it happens all the time. It’s happening right now. Organizations are constantly creating terms that are used to describe the things that happen in their enterprise: Medical conditions, medications, lab tests, charge items, departments, you name it. You might be creating one right now while you listen to this podcast. So pay attention. Humans using software are prolific creators of terminology, and often they do it in a relatively indiscriminate fashion. It’s easy to create terminology. Even accidentally we can create terminology, and in the old days it was relatively harmless because when we made something terrible, it only hurt us. We reaped what we had sewn. But nowadays we create terminologies and send them out into the world in the name of interoperability. Once in the world, bad terminology choices grin maniacally, and create mayhem like the gremlin on the wing of the plane in that Twilight zone episode. (William Shatner, John Lithgow, whichever one works for you.)

Now, there are several reasons why I chose to talk about semantic dissonance in today’s podcast. The first is just to share the concept of semantic dissonance to make you think about it a little bit. Because awareness is the first step, and because we are creating them, we are creating these issues every day. We’re using them and sending them out to create chaos while William Shatner stares apoplectically at them through the airplane window. Now, if we humans are the root cause of semantic dissonance, then the first step to improve our situation is to understand the impact of our actions on our ecosystem. Once we understand the impact, we can focus on prevention. Now by prevention, what I mean is when you’re creating terminology… So let me, let me talk about what I mean by creating terminology. Creating information is really what I’m talking about. You’re creating this tessera that you’re going to plug into the mosaic and it could be a term that’s going to be plugged in thousands of places. It could be a discreet text entry that you as a provider, or as a BI person, is plugging into a piece of data, or it’s data entry, where you’re clicking on combinations of things. All of these situations create tesserae that go into the mosaic of patient data and business data that we’re sharing across healthcare. So the first rule of thumb is be mindful of that. Now, if you’re creating terminology, you want to do a couple of things. First, you want to avoid semantic ambiguity. And to do that, you want the term to be meaningful. You want to say everything that you’re thinking that you can fit into that term in a way that a person who’s selecting it, mapping it, (or) seeing it in analytics understands what you mean. Now you may not always be able to do that, but you should strive to do that so the term itself is not ambiguous. It Is chock full of prima facia meaning.

Now let’s say you can’t do that, and this is something a lot of people don’t think about. Anybody that’s ever built or worked in a content management system knows that there’s usually a term that says what’s the code, what’s the term, and then there’s a field. You know the one I’m talking about: The one after the name of the term itself. It might be called description, rationale, inclusion/exclusion criteria, editorial policy; and it’s the thing that’s always blank because at the moment we create the term we’re like, “I know what I mean.” The problem is when you don’t fill out that; that’s your… That is your window into your brain in that moment when you’re creating that thing. So my advice is to fill it in because when you don’t, it’s just like my notes. If you’re like me, when you’re in a meeting and you take notes, you occasionally will go back to look at your notes and you’ll look down and you’ll see that you wrote the word “disco ball,” or at least it looks like you wrote disco ball. And you’re like, “Why on earth did I write ‘disco ball?’ What was I, what was I thinking? What did that mean?” That’s exactly what happens with terminology: You create a term and then, six months later, somebody at your exchange partner, or the registry, or CMS, or somebody says, “So we got this thing. We have this local term from your system. What did you mean by disco ball?” Now, at the time, you knew exactly what you meant; that’s when you should have filled in that description field because that’s when it comes in handy. When somebody down the road asks you, “What does this mean?” It’s semantically uncertain, semantically ambiguous, and you’re looking at the term and you can’t remember what you meant either.

So that’s just a little piece of advice for me because I’m a lousy note taker, and it’s always better if you don’t get caught flat footed; because then you just feel silly. Take advantage of those types of fields, and even if you think that it’s semantically certain, if it makes total sense on its own, it’s still worth putting a little information in there so that the people that follow you, they might feel differently about what you came up with. So that’s Thing Number One.

Thing Number Two… So that’s semantic ambiguity. Try to include the context, try to have a place where you can put the context if it’s not something that you can fit in the term. When it comes to semantic variance, the answer is if you put, well, apparently 73 people in a room and say, “Tell me how you want to say that they were positive,” they could come up with 73 different things. And the best thing we can do when we’re coming up with something is see what the universe has already put out there. In some cases, you can go to places like SNOMED or Regenstrief, or CMS; there is a plethora of places. You can even come to Clinical Architecture and say, “Clinical Architecture, how should I say this?” My feeling is when it comes to things in public health like COVID, when brand new things are coming out and we don’t have names for them, especially if it’s something like a virus or whatever it is, we probably should come up with some public health mechanism where we seek to create a semantic alignment where we agree that we’re all gonna call It “this.” We’re all going to say, “Detected for positive.” We’re all going to call It the Novel Coronavirus 2019 for the test name. Now, why is that important? Because you might argue that, “Charlie, anybody that looks at detected by this, detected by that, positive by this, a plus sign, anybody that sees that they can tell what that is positive. What’s The big deal?” Well, it happens if we can say “positive” 73 ways, imagine how we can say the next virus name, how many ways we can come up with saying that. And you might not think it’s a big deal, but I live in the fringes of interoperability, like the gremlin on the wing. I live out there and I’m seeing all these things moving, and we at Clinical Architecture, and the people that do the mapping, and the people that live in the bowels of our systems that are responsible for making all this data move and work, we know that there’s a big difference between 73 ways of saying things and four ways of saying things, for example. If we have semantic agreement and we’re calling something the same way, or the same thing, the effort required to map that, and to feel good about how we’ve mapped that or harmonize that, is much, much less than when I’m wrangling 73 different ways of saying something, or when I’m trying to understand your synonymy, or the way you’ve truncated the word Coronavirus to make sure that it’s not something else. And if it’s something established and you’re creating a local term, I usually say, “Go shopping in the standard terminology. See what they call something.” Even mapping to something like LOINC, what LOINC calls things, from a mapping perspective, and the name you might be thinking about are very different. So understanding what the standards call it is great because the standards are typically what you’re normalizing to. But in a case where you don’t have a standard, trying to go out and find some group or some authority that says, “Your best bet is to call it this, your best bet is to do that,” and one of the things that I’d like to establish, (if it’s not already established, or to work with a group of people to establish,) is when we’re in a public health emergency, and we’re coming up with these terms in a hurry, to provide that kind of guidance. And that kind of guidance did come out relative to COVID. It just came out a little while after people had already had to start to create the terms to be able to manage the data.

So ladies and gentlemen, that is my podcast on semantic dissonance, and I hope you enjoyed it. I hope you find it interesting. In fact, I even hope you want to argue with me about it, so don’t hesitate. If you want to engage, you know, drop me a line. I’d love to love to chat about it. I’m always happy to be corrected. This has been the Informonster Podcast on semantic dissonance. I’m your host, Charlie Harp. Thanks again for listening and take care of yourself.

Follow Us

Have a question or topic idea?

Get our News and Updates

Get notified about new podcast episodes, upcoming events and webinars, and more!