Preloader Icon



Back to Home

SNOMED-CT Core Subset – Significant Changes in July File

October 12, 2009

By: Charlie Harp

For those of you evaluating the use of the SNOMED-CT Core Subset, you need to be aware that the NLM has made some non-trivial changes to the format and content of the subset file in the latest (second) release dated 200908 (July). If you have developed a load program, as we have, that uses the subset file to identify concepts that are included in the subset, it is likely you will need to modify that program.

Here is a summary of the changes:

Term Changes:

  • Nine terms were added and eleven terms were retired from the core subset.

New Terms:

208892001Closed traumatic dislocation of hip (disorder)Current
165468009Erythrocyte sedimentation rate (ESR) raised (finding)Current
197321007Steatosis of liver (disorder)Current
40733004Infectious disease (disorder)Current
165346000Laboratory test result abnormal (situation)Current
442234001Serum cholesterol borderline high (finding)Current
442438000Influenza due to Influenza A virus (disorder)Current
442551007Dental caries extending into dentine (disorder)Current
4557003Preinfarction syndrome (disorder)Current

Retired Terms:

41006004Depression (finding)Ambiguous
309158009Laboratory finding abnormal (navigational concept)Current
371330000Fatty liver (disorder)Duplicate
131016008Increased thyroid stimulating hormone level (finding)Duplicate
166829003Serum cholesterol borderline (finding)Ambiguous
191415002Communicable disease (navigational concept)Current
78431007Influenza due to Influenza virus, type A, human (disorder)Ambiguous
416103000Elevated erythrocyte sedimentation rate (finding)Duplicate
50047001Compound dental caries (disorder)Ambiguous
63079007Closed traumatic dislocation of hip joint (disorder)Duplicate
64333001Preinfarction angina (disorder)Duplicate

File Structure Changes:

June SubsetJuly SubsetChange
Now uses Description instead of Code!!!

New Fields:

New FieldWhat is it?
FIRST_IN_SUBSETThis is the issue year and month when the concept first appeared in the subset.
LAST_IN_SUBSETThis is the issue year and month when the concept last appeared in the subset as a non-retired concept.
REPLACED_BYConcept ID of the concept replacing a retired concept.


If you developed a program that loads the core subset file this update likely broke it.

If you are using a text ODBC/OLEDB driver to load the file the name changes to the columns broke it.

If you are accessing the fields using sequential access and splitting the fields using the pipe delimiter, the insertion of the FIRST_IN_SUBSET before the IS_RETIRED fields will break your load program.

If you created a function that uses the coded values in the CONCEPT_STATUS field to support your load logic, that is now broken by the switch to the text value. (I don’t understand this change at all.  It seems to run contrary to the move away from free text.  I would change it back…)

Needless to say, this update was a painful one for the early adopter.  But, if you have already created logic based on the inaugural release of the core subset data… and early adopter is what you are and it is not without risks.

Along with the painful changes that left our load program writhing on the ground, clutching its face and yelling “You broke my nose!” are some new useful additions.

The FIRST_IN_SUBSET, LAST_IN_SUBSET and REPLACED_BY_SNOMED_CID are useful lifecycle management fields that will help with the management of term availability.

Patience is a Virtue

If this update frustrated you, I would ask that you focus on the positive and consider that the Core subset is another in a growing line of great, “FREE” work products from our friends at the NLM.

It is also worth noting that as we in the HIT industry leverage SNOMED-CT, RxNorm and LOINC the bar will continue to be raised in terms of update frequency and format stability.  From the interactions I have had with the NLM, I expect that they are paying attention and will be responsive as we evolve and leverage them more.

Free Advice

As someone who worked at a commercial content provider, I would encourage the following with respect to all data products.

1.) Do not change field/column names lightly if they are included in the file, as developers will leverage that with a text driver to load the information.

2.) Avoid inserting fields into a record, as some load programs will operate based on field order. If you append new fields to the end of the record you will be less likely to disrupt the load.

3.) Coded fields are better than text fields…always.

Regardless of the constructive criticism…this is good stuff.  If we at Clinical Architecture can help you better take advantage of it, give us a call!

Stay Up to Date with the Latest News & Updates


Submit a Comment

Your email address will not be published. Required fields are marked *

Share This