Preloader Icon

The Informonster Podcast

Episode 24: Standardizing Health Data with OMOP’s Common Data Model

September 20, 2022

On this episode of the Informonster Podcast, Charlie provides a high-level primer on Observational Medical Outcomes Partnership (OMOP). He also discusses terminologies and adding data into an OMOP data repository. To learn more about OMOP please visit


View Transcript

Follow Us

Have a question or topic idea?

Get our News and Updates

Get notified about new podcast episodes, upcoming events and webinars, and more!


I’m Charlie Harp, and this is the Informonster Podcast. Today on the Informonster Podcast, we’re going to be talking about the Observational Medical Outcomes Partnership, or OMOP, from the perspective of healthcare information technology professionals. Let’s do it.

OMOP was funded by the US Food and Drug Administration in 2008, and it was focused exclusively on developing new statistical methods for drug adverse event surveillance. Later, in 2014, it was adopted by the Observational Health Data Sciences and Informatics Initiative, pronounced OHDSI, or O-H-D-S-I, and today, it’s used by over 2000 collaborators across 74 countries, encompassing approximately 800 million patients.

Now, the whole point of OMOP is to create a mechanism whereby people doing research can share data, and in the sharing of data, apply statistical analysis and data models, and look for things that researchers look for in healthcare. The reason why it’s relevant right now is because with everything happening in public health, and as we see the convergence of the payer market, life sciences market, and provider market from a health data perspective, OMOP seems like a pretty organic place to push data, so that people that are doing research for whatever purpose, or quality measures, or analytics, they have a common place where the data is standardized, and the analytics are also standardized, so they can apply certain types of measurements across any population that’s provided into one of these common data model environment.

Now, the power of something like OMOP and OHDSI is really driven by the fact that, first of all, they have a common model. They basically say, “This is what a patient’s data looks like. These are the elements. You have a patient, you have an occurrence, you have the details of the visit, drugs, procedures, devices, measurements, notes, observations, specimens, and other facts that fit into a standard, structured model.” That’s the first thing. You have this canonical representation of what the patient’s data looks like.

The second thing is, all of the data that’s being injected into this common model is standardized. Specifically, the terminologies are standardized through the OMOP process. The bottom line is, you have this environment where anybody can take a patient, transform their data into this OMOP common data model, semantically normalize the data elements themselves into OMOP identifiers, and put the data somewhere where it can be utilized for analytics. You can find out all about OMOP and OHDSI if you go to the website.

Now, in the payer space, they’re ingesting data into their data models, to do the things that payers do. In the provider space, pretty much every EHR platform has its own canonical structure. We have HL seven formats, we have CCDA formats. Now, we have FHIR formats, which create these shareable, canonical models. Really, every provider system has its own way of looking at the patient’s data. What we end up doing is we take things out of our standard model, whatever our systems model is, we reformat it into the messaging model of choice, whether it’s FHIR, HL seven, CCDA, and we throw it over the fence at somebody else. They receive it, and they go through the process of translating it out of one of these standard formats into their specific, canonical format.

We do the same thing with terminologies. We take our organic system terminologies, we translate it to something like USCDI, standard terminologies, and when we get the data on the other side of the fence, we take the data that’s USCDI and we map it and translate it into the terminologies that we understand. What they’re doing with OMOP is not that different, but it requires you start with the standard terminology, you put the data into their common data model, and when you go to put the data into their OMOP concept IDs, there’s a slight twist that surprised me when I ran into it, and I’m a salty dog, so if it surprised me, there’s a decent chance it’ll surprise you. That’s one of the reasons why I wanted to talk about this, to make sure you guys were aware of this surprise twist. People from OMOP and OHDSI are like, “Charlie, it’s not that surprising,” but it was surprising to me.

Before I talk about the surprise twist of OMOP semantic normalization, let me ask the listeners if you are familiar with the UMLS Metathesaurus. It’s not a dinosaur. The Metathesaurus is a big thesaurus that allows people to take healthcare terminologies and theoretically normalize them. I like the Metathesaurus. I think people try to rely on it too much, and it’s a little bit fuzzy. It’s not as concrete as people want it to be. You can use the Metathesaurus, but use it at your own peril.

The bottom line is, the Metathesaurus takes a terminology and a code system, and it breaks it down into the UMLS Metathesaurus model. The Metathesaurus model says, any term that comes in is an atom. If I have an ICD-10 term for asthma, with a code, that term is an atom. They take the string text itself, and they see if it exists anywhere. It would say, “This is the string, unique identifier.” For that string, they create a unique identifier that’s an integer. Then, they would look at the lexical value and say, “Is there a unique identifier for the lexical?” If there is, they’re going to assign a lexical unique identifier called the LUI. You’ve got the SUI, and the LUI, and of course, the atom that came in. Then, they take a look at that term, and they try to get to its conceptual meaning. What does this term mean? What is its semantic value? They assign something called the concept unique identifier, or the CUI. When something comes into the UMLS Metathesaurus, it’s got an atom, it’s got a SUI, a LUI, and a CUI.

The value of the Metathesaurus, because it’s thesaurus, is, if I come in with a SNOMED atom, that’s from SNOMED, it’s one, two, three, four, and it’s asthma, I can ask the Metathesaurus, “Is there anything else that has the exact same string as me? Does it have the same SUI?” I can say, “Is there something that has the same lexical string as me?” That’s the LUI. Then, I can say, “Is there anything that means the same thing as me?” That’s the CUI. The Metathesaurus, as a thesaurus, allows you to pivot from one code system or one atom to equivalent atoms in other terminologies, or other code systems. It does this all through this dumb number, and I don’t mean it in a pejorative way, I mean that it’s no intelligence baked into the numbering scheme. You have a CUI that it pivots through.

Now, I explained all of that for the explicit purpose of contrasting that to the way OMOP works, which is different. When I first saw OMOP, I thought, “Oh, it’s just like the Metathesaurus,” but it’s not. It’s different, and here’s how. When you look at OMOP, and you look at its vocabulary tables, there are these vocabulary tables, and they’re pretty basic. They’ve got a vocabulary table that says, “These are the vocabularies that are part of OMOP.” There are quite a few of them, but let’s stick to some standards we’re aware of, like SNOMED, ICD-9, ICD-10. You have domains, and there are 30 some odd domains, and the domains are the things you’d expect: drugs, measurements, things that basically you have concepts for.

Then, you have a concept table. A concept table is where every single code in every single terminology that’s part of the OMOP vocabulary lives. It is a spicy meatball, it’s a lot of stuff. Whenever I take a term in a vocabulary like SNOMED, I assign it an OMOP concept ID. It’s not lexical, it’s not conceptual, it’s none of the things like the UMLS Metathesaurus. If you give me a SNOMED code, you say, “SNOMED is my code system, here’s asthma. The code is one, two, three, four. I’m going to assign you an OMOP ID, and let’s just say that the OMOP ID is 9991.” That is the OMOP concept ID for that SNOMED code.

The other thing they do is they decide whether or not that is the standard code. Let’s say for SNOMED asthma one, two, three, four, it is the standard code. They take an, S and they stamp it in the next column, say, “This is the standard code for asthma in OMOP.”

Now, let’s take the ICD-10 code for asthma. The ICD-10 code for asthma comes in, it’s I 347, and I 347 gets assigned an OMOP identifier, because it’s a different code system, it’s a different code. It assigns OMOP ID 9992, but that ICD-10 code is not the standard code. I don’t stamp the big S, because I’ve already got the standard code for asthma. It’s the SNOMED code. The ICD-10 code, it has an OMOP id, but it’s not the standard code. In the concept relationship table, where I create a relationship that says, “This OMOP ID, 9992, which is my ICD-10 code, has a standard code of 9991, which is my SNOMED code for asthma.” That concept map is really the thing that associates the terminology with the standard code. What this means is that when you are feeding data to a common data model repository, they only want the standard codes, because all the analytics, all the rules, all the things they’re doing are only looking for standard codes.

The standard code is like the value set. The standard code is like the CUI. If you don’t give them the standard code, analytics aren’t going to work. If I had a patient, and the patient had an ICD-ICD 10 code for asthma, and I thought, “It has an OMOP ID of 99992, I’m just going to give that to the CDM,” you might as well not, because the CDM is not going to have any rules or analytics that are looking for 9992, they’re all looking for 9991. That piece of data you provided is, for all intents and purposes, invisible.

Now, I wasn’t expecting this. I was thinking that as long as I give it an OMOP ID, there’ll be some function in the analytical tools that will do the translation to the standard code. But, they do not. This is important, because let’s say you’re taking your clinical data summary information, and you want to contribute it for research, or for clinical trials, or looking for clinical trial recruiting, your patients to be eligible to be cohorts in a clinical trial, and they’re using OMOP to do that. It’s not enough to take the data and format it into OMOP, the OMOP CDM. It’s not enough to have an OMOP ID, and get that from Athena, which is one of their tools, and convert the ICD-10 code to the OMOP ID. You have to convert the ICD-10 code to the OMOP ID, and then go through the map that takes you from the ICD-10 codes’ nonstandard OMOP ID to the standard OMOP ID for asthma, which is the SNOMED code.

It’s one of those things where, if you’re not super familiar with it, it can be missed. It is a little different, but once again, the benefit you get from it is, you can contribute data into these environments where they can take advantage of that.

Now, once you get the data into OMOP, into an OMOP data repository, there are all kinds of tools that you can use to do things with that data. For example, the ATLAS tool is a publicly available, web-based suite of tools that allow you to build standardized analytics, patient-level observational data, review tools. Other folks in the OHDSI community, if they’ve built standard analytical mechanisms or metrics, as long as they built it using the OMOP common data model and the OMOP standard codes, they’re completely portable, which is not something we’re used to in the provider space. Everything has to be built from scratch.

If you put the data in a OMOP data repository, anybody across the spectrum of OHDSI that’s developed a set of analytics could share those analytical tools with you, those algorithms with you, and you should be able to apply them against your patient data, assuming you did the data standardization correctly.

Ladies and gentlemen, that is my quick, high level primer on OMOP. If you want to learn more, I suggest you go to the website, and check it out. They’re always looking for new members, and the conversion of provider data, FHIR, CCDA, HL seven to OMOP is something that we are actively working on, a clinical architecture to automate, and make that easier to work. If we can help you with that, we’d be delighted to as well. Until next time, this has been the Informonster Podcast, and I’m Charlie Harp. I really appreciate you spending these last 15 minutes with me. Thanks.