The Informonster Podcast
Episode 4: Charlie Harp talks SNOMED with Shaun Shakib, Andrew Frangleton and Victor Lee
December 20, 2019
In this episode of the Informonster Podcast, Charlie sits down with Clinical Architecture thought leaders Shaun Shakib, MPH PhD, Andrew Frangleton, and Victor Lee, MD, to discuss SNOMED. They will talk about what SNOMED is, how it used by healthcare applications, the benefits of using SNOMED, potential obstacles and how to avoid them.
Hi, I’m Victor Lee, V.P. Of clinical informatics.
I am Andrew Frangleton, the Managing Director of the U.K. For Clinical Architecture.
I’m Shaun Shakib, the Chief Informatics Officer.
And today on the Informonster Podcast, we’re going to talk a little bit about the SNOMED C.T. Ontology, how you can use it, some of the pitfalls and benefits and things you have to keep an eye on when you do use it. We have some folks here that have a lot of both practical and academic and direct experience working with SNOMED and working on SNOMED. So we thought it’d be a good opportunity to just share some insights for people that are trying to figure out how they can make the best use of what is really a fantastic resource. Let’s start with talking about what’s in SNOMED; in a nutshell, what kind of concepts live in SNOMED?
It is officially a clinical terminology, whereas other code systems out there, I.C.D, that are officially classification systems, and the difference between a terminology and a classification system is that role of the classification system is to create these large buckets to aggregate data and categorize data. Whereas terminology’s role is to try to capture and encode data at the same level of granularity, as you would expect the clinician to care about that.
And the other thing about SNOMED that I’ve always kind of noted is, you know, you have the semantic types that live in SNOMED, and then the different domains that live in SNOMED, but you also have a category of things that are not really meant to be terminology they’re meant to qualify terminology.
There’s a lot of terms that are in SNOMED, which are to do with the structure of the ontology, rather than being useful clinical terms. And the other thing about SNOMED is, of course it’s covering drug, and lab, and disease, and disorder, and procedure, and other areas. Whereas a lot of the other terminologies and classifications are really focused on one domain of knowledge, like drug or lab or procedure.
And I’ll just make a comment just to kind of keep things really simple, in case we have listeners who need the context. We’re really talking about structured data. There are some advantages to having data that’s in a structured format that can be machine computable. And obviously there are things you can do with that kind of information, as opposed to unstructured text – just narrative, free text – that makes it easier for us to understand the clinical information that we’re trying to capture.
So Victor makes a great point. Any terminology that you see, whether it’s healthcare or somewhere else, instead of just having a note that a human can read, you’re really trying to enable a piece of software to be able to do something useful. And clinical terminologies, like SNOMED, do exactly what Victor described, as it allows software to do things like analytics and decision support, and other useful things that would be much more difficult if we were trying to parse apart unstructured, textual information and deal with it. One of the things that I think about, when I think about SNOMED, is obviously it’s a terminology. For those of you that don’t do a lot of things with terminologies and ontologies, (you should look) to Sean’s point about how a terminology is designed to describe things at a fairly granular level. The other thing about SNOMED is it’s also an ontology, so it has codes that represent these things that allow you to represent things at a granular level. It has a set of codes that are really descriptive and (used for) supporting things, but it also has a very rich set of relationships that show how these things relate to one another, including this big overarching relationship tree, or graph, in SNOMED, that is kind of, this is a hierarchy (sic). I don’t know if you want to talk about it as a hierarchy.
Sure. And Victor’s point sort of triggers something else, which is SNOMED is “S N O M E D”, not “S N O W M E D”, which is one sure way to tell that you haven’t had experience with the terminology before. And so the way content is related, and organized within SNOMED is the relationships and associations that are subsumptive. And by that, I mean there are these parent-child relationships, you can think of those as being like the vertical relationships, starting at the top of a tree taxonomy, and then just going down, you know, from parent to child, those types of associations. And then there are other semantic associations. Those are the vertical types of relationships and associations, so those would be things that are inbound relationships or associations, or outbound relationships and associations. You can think of those as being the vertical bar versus the horizontal bar. Examples of semantic relationships and associations would be things like medication, the ingredient, the strength, the form, the route.
So it lets you kind of determine what something is in its position in the ontology, who its parent is, who its children are, the thing that is less or is broader than it, and the things that are more specified versions of it, as well as things that it breaks down into or decomposes into, and, in some cases, things that for some reason are related to it.
There are sort of practical ways and there are sort of navigational ways that you can use the ontology. So for example, if I was trying to find a particular group, what I might do is pick one of its members, and then navigate the ontology to go up and find its parent, and then find the thing that had organized or created the group. So if I wanted to find viruses, I might pick a virus like C.M.V., search for it, and then start walking up within SNOMED to its parent, and then to its parent, until I found the larger classification or group that had all the things that I cared about.
Searching things in SNOMED is interesting to think about because SNOMED exists for kind of two reasons. It exists to allow us to find terms as humans, but it’s also trying to organize terms that are structured for computation and data analytics, decision support, and things like that. So as a human, when you start looking at SNOMED, understand the organizational structure of the parent-child relationships, the other ontological relationships, but sometimes things aren’t quite what they seem. And Victor, you and I were talking earlier on today about hypertension, and the fact that, if you looked at hypertension, you looked at the children, you don’t always get what you seem to think.
Yeah. That’s a great classic example. So if we’re trying to, for example, define this concept of hypertension, and enumerate the SNOMED codes that roll up to this concept of hypertension, we might be tempted to say, “Let’s pick the hypertensive disorder’s term and include all of its descendants”. And to your point, and really this is because SNOMED is a polyhierarchy, where certain concepts may live multiple times within SNOMED, we might find common things like primary and secondary hypertension, but also things like pregnancy related hypertensive disorders like eclampsia, and preeclampsia, and HELLP syndrome, and other obstetric related hypertensive disorders that we might not classically consider to be what is hypertension in the vernacular. And so we need to be mindful of how we leverage the richness of SNOMED. And sometimes it’s a little bit too rich and inclusive for a particular use case, so you kind of have to pay attention to these hierarchies.
That’s one of the things is as an engineer and healthcare, we go to these data sources and we really want to think there are a magic bullet they’re going to fix the problem. And one of the things you have to be careful about with SNOMED, that (this is true with a lot of these assets that are available) they were built based upon a policy and approach, and that doesn’t mean it’s universally applicable when you’re trying to do things. You still might need to go in and curate the things that you want. There are certain things where you can use SNOMED out of the box, and it’ll work great depending upon what your objective is, but you can also kind of think of SNOMED is like going to the grocery store. I want to build a value set that has terms that are relevant for my clinical use case, and I may not be able to just grab a node and pull all of its children. I might have to go and, like with your hypertension example, go in and pull some things out because for what I’m trying to do, even though they’re in that part of the hierarchy, it’s not appropriate. And that’s not necessarily saying that the people that maintain the SNOMED content made a mistake, but their approach, their policy included those things, And that’s just the way they wanted it organized.
Again, when you look at that SNOMED content, you can get, there’s a lot of overlapping content we mentioned before that SNOMED is maintained in different domains, like drug and lab procedure. There’s some sort of finer grain domains as well, like the observable entities, and findings, and scales and assessments. And what the interesting thing about those three areas is, if you were building your value set and you’re looking for maybe a depression score, as an example, there are three places you might find that depression score. You might find it under assessment and scales, which is really the definition of, “This is a depression score”, and it’s just saying, “This is the type of score it is.” You’ll also find it mentioned under the findings where, if that depression score had a value outcome of one to nine, you’ll find the depression score with a value of one, depression score with a value of two, through to nine. But you’ll also find it under the observable entity hierarchy, where it’ll just say it’s depression score, and you’re expected to supply an answer to that question and add a value of one, two, three, four, five, six, seven, eight, nine into your storage when you store that data. So when you’re building these value sets and you’re trying to choose which things go in your value set, you have to be very careful which domain in SNOMAD it comes from. And to do that, there’s a construct called The fully specified name. And if you look at the fully specified name, you can always see the semantic tags so that you can understand whether a term is a procedure, or a lab result, or an assessment scale, or any of the other domains that they have.
And this is a good time to point out, and I’m not going to get into the weeds of this, I’m not that guy, but when you look at the way SNOMED is structured, from a data architecture perspective, you have a concept table and the concept table does not have a description. It has a relationship to a series of descriptions that have purposes. Like one is the preferred term for a given language, and then you have the fully specified name, and then you have alternate terms. And SNOMED is designed to be international. The concept is kind of the semantically unique identifier. The descriptions are representations that have been assigned to that unique concept in that given set of language. There are modules and other things, that I don’t really get involved in as an engineer because I’m just concerned with the U.S. Extension. So if you’re a developer, for example, and you crack open the SNOMED file, you can go to the concept’s table. You’re going to see a big file full of numbers. You have to join that on the descriptions table to be able to start seeing what those concepts actually are. And that was changed from a few years ago, if I remember correctly.
So that whole area of just sort of dealing with consuming SNOMED generally is an interesting topic area. Andrew you were talking about earlier today was how, for example, we do the SNOMED U.S. Edition versus the U.K. Edition. Now might be interesting to hear about it.
Well, can we just talk about how the additions work relative to the core?
It does differ country by country, but if we just take the U.S. And the U.K., as two examples, SNOMED international publish a clinical core, which has the SNOMED International concepts that are altered by SNOMED International. Each country can publish their own extension to SNOMED, and actually as an organization, you can also publish your own SNOMED extensions as well. When you get SNOMED in the, in the U.S., unfortunately for you folk over here, there’s been a U.S. Edition created, which gives you one place to go and you’ll find one concept table, one description table. If you go to the U.K. and you want to use SNOMED in the U.K., structurally it’s the same. But actually there are three components that you need to do to build the equivalent of a U.K. Edition, and that is the International Core, the U.K. Clinical Edition, and the U.K. Drug Edition. The other major file we haven’t mentioned is the file that maintains and contains the relationships between concepts. So when you’re building the U.K. Edition, you have to take those three sets of data and you have to process them to create a single U.K. Edition. There are then several complications because you end up with similar term descriptions coming from multiple places in American English, British English. So you have to then look at the ref set contents in SNOMED, which contain appropriate pointers to which languages, which spellings, which words we want to use. So choice between displaying a term as paracetamol or acetaminophen is controlled by these realm rough sets. So when you’re building SNOMED and consuming it out of the box, you need to be very cognizant of the region you’re in and how you build the addition for that particular region.
It’s funny because SNOMED Is one of the things that drove us to create the subscription portal. When we built that, it was people just struggling with the process of creating the right set of data so they could consume it and make use of it. Because one of the things that I find, with standards in general, is they create these fantastic resources. And this is less true now than it was, say, five years ago, you have to talk to the invisible swordsman and you have to walk through these convoluted steps to be able to get something that you can actually put your hands on and work with. For a lot of people, it can be very daunting and it can be a big barrier to adoption.
I don’t know that it’s less true.
Well, I think for some things like, we’re not going to talk about the U.M.L.S. Metathesaurus, but the U.M.L.S. Metathesaurus, it’s an adventure to get a copy that you can work with. SNOMED, if you’re getting the raw file, some assembly is required to be able to make use of the content. And then, of course, you have to also do similar work when the content updates come out.
It is kind of funny that, if you look at different browsers that are available out there, you can look at what claims to be the same version of SNOMED, but you’ll find different preferred terms and different representations. It just shows the complexity of doing the build, but there’s also another set of supporting files. I think I mentioned the ref set files, but there’s also several different versions. So (you have to consider) whether you want a snapshot version, which is just SNOMED as it is today, or whether you want a version which includes all the history information for all those terms. Because terms in SNOMED are dynamic, they have a status on them which is active or inactive, and that’s another thing you’d have to manage when you’re applying updates to SNOMED. You have to be very aware of how to use those status flags, as well as just building the real words and relationships.
When it comes to utilizing SNOMED, one of the things that we do a lot is we build subsets. SNOMED has probably close to or over half a million terms. When people are trying to leverage SNOMED, one of the best things to do, and you’re doing that, is essentially carve out a section of relevant terms. You do that also because there are things that can be conflated. There are qualifiers that have very similar names or similar words to things that are disorders or conditions. If you’re trying to create something, say, for exchange and meaningful use, and they require a SNOMED code and you pass a qualifier code in, when you really should be passing a condition, you might be adhering to the spirit of the law because it’s technically a SNOMED code, but it’s really probably not in the value set that they’re thinking about when they wanted you to deliver a piece of information. Everybody here has built a SNOMED subset at some point or another. Any pointers or tips on that? Victor?
Well, I wonder if it might be helpful, for starters, to maybe define what we mean when we say subset, as well as you and Andrew both mentioned value set. And Shaun, we were talking the other day, and I kind of liked your definitions as you were describing similarities and differences between subsets and value sets. So I don’t know if you wanted to share?
That’s a pretty thorny topic, I would say, because you know, the problem is and the challenge is that it’s our jargon. One thing that’s absolutely true, (about) the terminology space and terminology geeks generally, is we don’t all have the same terminology. So at least when it comes to the way we think about it, a value set can include one to many terminologies as members in that value set. And they’re actually kind of rolling up to these, I mean we refer to these things as elements, but it’s like a variable to be used for other purposes in decision support logic, or some other place. So we create that roll up, we source multiple external terminology. So SNOMED would be one of those, but I.C.D., C.P.T, RXNorm could be other terminologies that would be searched for that purpose. For us, “subset” is the grocery store in the way Charlie was talking about it. So I want to go to SNOMED and I want to actually reduce the noise when I’m trying to map to a target, the way I can do it is I can raise these bars up to say, “Yeah, this is the area of SNOMED I’m focused on. I want to map to concepts within this, or I want to use concepts within this area.” So I create a subset, and we create a subset definition that lets me do that. That subset can be (that)I just want to grab anatomic locations from SNOMED and work with those. So I’ll create a definition, grab that subset of terms. But typically it’s not across multiple-sourced terminologies and it’s not really rolling up to something that we would call an element; know our jargon,
if you’re reading through the SNOMED documentation, you’ll find that they call ref sets.
Yeah, that’s why I talk about-
I think that that was done deliberately so that people using SNOMED could understand (that) this is what we mean by ref set. And there are different kinds of ref set, but one of the kinds of a ref set does match onto a subset and or value set.
When I first joined Clinical Architecture, Charlie had a different name for everything, right? He didn’t use any of the standard informatics names, and it was deliberately to avoid confusion, right? The fact that everybody sort of has their way of interpreting these. Mapping is another great (example of an) overused term in terminologies.
When I think of the patterns – from an engineering perspective, I think about patterns – and there’s a pattern where you have a collection of concepts that all come from the same place, and I tend to think of that as a subset. When I have a collection of things that come from multiple places, I tend to think of that as a “super-set”, but I think we tend to call those value sets. And we use those in places like quality measures and H.E.T.U.S. and things where we’re saying, “I know that in patient files, they might have documented using one of these code systems. I want to be able to recognize that a patient has diabetes mellitus through this editorial policy. And so these are the concepts and those different code systems that I want to be able to recognize as diabetes mellitus,” to your point earlier. I think, when we build those things, we build them for rule processing, whether it’s an inference or a quality measure, or you’re rolling things up to recognize and perform an action. But we also have built these things to create a narrower target when I’m trying to either map or sift something, and land on a particular subset of concepts. We could also do to create pick lists of things. I mean, when you talk about the grocery store of SNOMED, a demo I’ve done probably a hundred times is regular frequency. Where you want to create a pick list of frequencies, you could recreate everything from scratch, or you could go on to SNOMED and say, “what do they have?” You have the benefit of “A”, not having to do the research and the work; and “B”, having it be based on a standard terminology. Now this is probably a good time to talk about some of the things you have to keep in mind, if you’re using a standard terminology. I always tell people, “Standard terminologies are great, but you have to recognize that you don’t have control.” At some point, your standard provider will make a change or do something. Your only recourse is to react to that change and find some way to absorb it. And SNOMED actually does a good job of providing guidance on replacements. Not all terminologies do that. Sometimes they give you very good guidance; (it will) say, “This thing was retired, but it’s been replaced with this thing.” And as long as you can follow that process, you’re in good shape. The other thing about using an external terminology is they may not have something. And one of the things that you guys know much better than I do is there’s a process. If there’s a concept that is not in SNOMED or in my extension of SNOMED, and I want somebody to add it, there’s a process for that, right?
Yeah. There is a process and it’s normally controlled by a regional release center. So there’s a slightly different process in the U.S., and the U.K., and in other countries, but normally you can request a term and, if the term is genuinely felt to be useful, that can be put into a local country extension; it’ll be published. And, in the future, that might get promoted into the core and shared between countries, but typically the collation of that is done at a regional level; a country level.
Do those regional extensions ever get promoted to the international core.? Does that happen on a regular basis? Are pigeons released? What happens there?
If the term is added into a regional core, it will get sent up to the folks at SNOMED International, and they’ll look at that and decide whether it’s appropriate to put it into the international core. So that can and does happen on a regular basis.
It sounds like an opportunity for one of those schoolhouse rock videos, you know, “I’m just to term?” Ok, nevermind. (laughter)
Well, I mean, the other option is to just create an extension, right? If an area is not well addressed in SNOMED, and you need concepts there, you can create a formal SNOMED extension.
Do you have any obligation to share that with SNOMED, or are you just protecting a numeric range for yourself?
It’s really just giving you a strategy to use SNOMED as your backbone terminology, but yeah, there’s no obligation.
Typically, if you’re going to do something like create an extension, you almost have to have some kind of a protocol for recognizing when the thing you created in your extension hasn’t been added to either your regional edition or the core and then have your own replacement logic. If you want to do that –
And that’s going to be your strategy.
Exactly. So, if you’re thinking about creating a SNOMED extension, you have to realize that, to be compatible with SNOMED, you have to be aware that, if they add it, you probably should do your due diligence, and retire your extended term and create a map to the term that was replaced by SNOMED.
So the techie bit is that the identifiers in SNOMED are 64-bit integers, and part of that integer that you get has a namespace sequence in it. If you want to build your own extension, get in contact with SNOMED international, and they’ll be able to give you a nameset-based sequence. That means that the identifiers that you create won’t clash with any other person creating SNOMED codes, and that’s how they can remain unique. But you’re right. Once you’ve created your own code, you really do need to keep an eye on what’s coming downstream. And if that concept is hid in the national SNOMED, you’re better off retiring and following the mechanism to adopt the standard code rather than your own local code.
I’ll probably close this down by saying, if you’re thinking about using SNOMED in an application, or you just want to learn more about SNOMED, one of the best things you can do is just Google “SNOMED C.T. Starter Guide” and the I.H.T.S.D.O., SNOMED International folks have a really nice set of documentation to help primer you and get you through the process of how to use it. It’s definitely worth learning more about it before you try to use it. It’ll keep you from going down a lot of blind alleys and getting a lot of weirdness out of it.
I think that the starter guide is a great resource. There’s also some online tutorials and some accreditation schemes, and the education around SNOMED is really good.
And as always with the Informonsterr Podcast, if you ever need assistance, the folks in Clinical Architecture are always standing by to help out. I want to say thank you, gentleman.
Thanks for listening.
And we’ll see you on the next edition of the Informonster Podcast.
Have a question or topic idea?
Get our News and Updates
Get notified about new podcast episodes, upcoming events and webinars, and more!