SNOMED CT – CORE Subset – Quick Overview and Impressions

I recevied an email from the NLM UMLS Users listserv today with the following subject ‘CORE Problem List Subset of SNOMED CT Now Available’.  Being a UMLS enthusiast, I quickly downloaded the data and scoped it out.  I thought I would share what I found out with you.

CORE Problem List Subset, What is it?

It is a subset of the complete SNOMED CT terminology that is design to help implementers by acting as a starter set of codes.

There are 5182 terms in the core data released today as opposed to the roughly 386,000 terms in the complete SNOMED CT terminology.

Where did it come from?

The data released today is based on the datasets submitted by the following institutions:

  • Beth Israel Deaconess Medical Center
  • Intermountain Healthcare
  • Kaiser Permanente
  • Mayo Clinic
  • Nebraska University Medical Center
  • Regenstrief Institute
  • Hong Kong Hospital Authority

Why was it created?

This new core subset can provide a vendor or institution with a starter set of common terms that are used to record clinical observations (in fact CORE stands for Clinical Observations Recording and Encoding).

What is in the file that you can download?

The terms that were selected are available in a pipe ‘|’ delimited file with the following record format.

Position Field name Description
1 SNOMED_CID This is the concept identifier for the term.  (If you are a regular SNOMED enthusiast, this is the same ID that you would find in the SCT_CONCEPTS_yyyymmdd.txt file.)
2 FSN Fully Specified Name (the term description)
3 CONCEPT_STATUS This is the concept status of the concept ID in the SCT_CONCEPT file.  According to the extractions rules for the core this should always be zero, which means ‘current’.  (Which also means you can probably ignore this field).
4 UMLS_CUI Concept Unique Identifier for this SNOMED CT concept in the UMLS Metathesaurus MRCONSO table.
5 OCCURRENCE The number of contributing institutions that have the concept in their problem list (currently from 1-7).
6 USAGE The sum of the usage of this term divided by the 7.  (I wonder if this would be better if it was the sum of the usage divided by the occurrence?? – I will follow up with NLM.)
7 IS_RETIRED This is a field for the future to support when terms are retired.  I would assume that the CONCEPT_STATUS field would also reflect that the SNOMED CT concept is no longer current as well.

Note:  I went back and forth on the USAGE field.  I thought it was interesting that the sum of the usage was divided by the full count of seven and not the OCCURRENCE value.  When you take the USAGE number, multiply it by seven and divide by the OCCURRENCE number the result is, in most cases, a much higher value that reflects the usage of the term within the institutions that are actually using the term.  If you are a big data nerd (like me) the variance in how the terms are ranked depending on which way you look at the usage is interesting.  I am also interested on how the original institutional average was calculated. (once again… nerd).

A Quick Look at the Data

When you take the supplied terms and sort them in order based on the USAGE number, here are the top 25 terms.

snomed terms

When I see this list it seems reasonable to me that these would have a higher usage in a problem or finding list.  All of the terms are at a fairly high level and are the types of things you would expect to have a higher volume of occurrences.


If you are just getting started with SNOMED CT and thinking about using it as a reference terminology for tracking findings and problems in your electronic medical record, this new CORE subset is a great starting point.  Kudos to the NLM and the contributing institutions for providing this information – it should facilitate the implementation of SNOMED CT by providing a place to start.

For more information checkout the full write up on the NLM website at:

Leave a Reply

Your email address will not be published. Required fields are marked *

41 − = 31