By: Charlie Harp
When looking into what makes a good terminology, I would be remiss if I did not mention Dr. James Cimino’s ‘Desiderata for Controlled Medical Vocabularies in the 21st Century’. Dr. Cimino’s body of work is very enlightening and this particular publication started me on my personal journey into medical informatics.
The characteristics represented in this post stem from what I learned from Dr. Cimino and other mentors that I have had the privilege of working with directly (you know who you are) as well as my personal experience, ideas and pragmatic tendencies.
For the purpose of this post, a terminology is a set of terms with identifiers. The terms collectively are designed to model some facet of a particular domain. This could be a terminology of medications, ingredients, routes of administration, lab tests, units of measure, species, electronic parts, legumes or even Pokémon.
There are some basic characteristics that you should always look for in a terminology:
Characteristic #1 – Unique Identifiers
The identifier for a given term should be unique. The same identifier should never represent two different concepts in the terminology, ever.
Characteristic #2 – Stable Identifiers
The identifier for a given term should be persistent. Regardless of the status of the term the identifier should stick around and never be re-used to represent another term. (see Rule #1 – and YES, I am looking at you National Drug Code…)
Note: Rule #1 and #2 are important because identifiers get stored in electronic records and when that data is accessed later, the electronic record is invalid if the identifier is gone or the meaning has changed.
Characteristic #3 – Dumb Identifiers
An identifier itself should not have meaning. If an identifier is comprised of other identifiers that have been combined, then the composite identifier is inherently unstable. If the circumstances that related the composite identifiers together in the first place change, the resulting identifier must also change. For example, drug identifiers with smart numbers based on therapeutic classes become unstable when a class splits or the drug is assigned to another class. This also can become a problem if part of the composite key outgrows its original bounds and effectively breaks the parse-able nature of the composite keys.
The term ‘dumb number’ was created as a counterpoint to ‘smart numbers’ or numbers with built-in meaning. I change number to identifier, because I am not convinced that identifiers need to be numbers. Dr. Cimino’s has a great name for them: ‘non-semantic identifiers’.
Characteristic #4 – Coverage
The terminology should adequately cover the domain it is meant to model. If the terminology does not have enough terms the consumer will find themselves wanting or worse… free texting.
Well Managed Terms
There are several things that you should look for to determine if the terms are well managed or have junk DNA that can pollute the terminology.
Characteristic #5 – Concept Orientation
This is a notion that is described in Dr. Cimino’s Desiderata very well. It means that “terms must correspond to at least one meaning (“nonvagueness”) and no more than one meaning (“nonambiguity”), and that meanings correspond to no more than one term (“nonredundancy”)”. In other words, a term should represent a concept and that concept should only be represented once as an active term in the terminology. If you lose concept orientation, you end up with a pick list where a term is repeated or a term that represents a concept broader than the scope of the terminology. If the concept orientation is not well managed in a terminology, it will look a mess. There will be repetition of terms, or worse, terms that don’t make sense (for example, an ingredient terminology with the term ‘Powder’)
Characteristic #6 – The Controlled Terminology should be controlled.
This states that a terminology should have a focus, and it should stay true to that focus. All too often the keepers of a terminology may be tempted to introduce a term into the set that is not an appropriate concept for the terminologies domain, but it serves some other purpose. On Sesame Street there was a segment called ‘One of these things (is not like the others)‘. This is the way you feel when working with a vocabulary that is not well controlled. You come across terms that don’t quite fit. This is especially prevalent in older terminologies and hopefully they have some indicator, or classification, that can help you navigate around them (because it can be hard even for a monster – click the link…).
Characteristic #7 – Consistent Term Structure
The terms themselves should have a consistent structure. When dealing with a term that describes a granular concept, like a dispensable medication. The lexical components that make up a term should have the same ordinal pattern from term to term. You should not see for example ‘ibuprofen 200 mg oral tablet’ and later ‘warfarin 200 mg tablet oral’.
Bringing the terminology to life
Making the terminologies more robust, or three dimensional, allows the terminology consumer to leverage the resulting metadata and maximize the utility of the terminology. The following characteristics help breath life into a terminology.
Characteristic #8 – The Terminology should have a lifecycle
Terminologies evolve. New terms are created, existing terms are split, become obsolete or are replaced. A good terminology provides the user information on the status of a term that allows the terminology consumer to take action when a something happens to a terms. This can be as simple as a term status that indicates whether it is active or obsolete, or as complex as replacement pointers that help the terminology consumer decide how to transform the obsolete term they are referencing. This is especially true if the terminology is stable. Since an identifier never gets removed from the terminology, the terminology consumer needs to know when it is past is ‘sell by’ date so it does not continue to get selected and used in an electronic record.
Characteristic #9 – The Terminology should be part of a well defined domain ontology
If the terminology represents a concept that is comprised of component parts, those parts should be represented by terminologies associated to those terms following the same guidelines listed above. In other words, if I have a terminology that describes a fully specified lab test, the term is naturally made up of several components, in this case: the analyte name, specimen type, method and result unit (for example) for the test. Each of the components should be represented by a terminology and my terms should have an associative relationship to those component terms. This allows the consumer to sift and sort terms and leverage the ontology to get maximum use form the terminology.
Characteristic #10 – Interoperability
A terminology should be associated to a standard interoperable terminology if one is available. When choosing a terminology the consumer needs to have the ability to exchange information with other applications. Not all domains have an interoperability standard, but those that do, like medications and RxNorm, should participate in those standards appropriately or have links to them.
Characteristic #11 – Extensibility
No terminology can satisfy all the needs of the consumer. Defining the terminology in such a way as to allow the consumer to extend the terminology facilitates the extension of the terminology to bridge a period until the term is added by the source or permanently, if the term is very local to the consumer. This can be accomplished in several ways and could also depend on how the consumer implements the terminology in their solution.
These are some of the characteristics that make up a good terminology there may be others that are both generic and domain specific. I welcome any comments and wish you luck in your personal informatics journey.