Diagnosis and Treatment of Free Text Syndrome

Diagnosis and Treatment of Free Text Syndrome
By: Victor Lee

Overview and Diagnosis

Free Text Syndrome (FTS) is a chronic condition that impacts a large number of health IT systems. Clinical features include the presence of unstructured data in procedure reports, clinical notes, and other types of documents or data fields. Although FTS is often readily apparent upon visual inspection, definitive diagnosis is made by an unsuccessful attempt to leverage unstructured data for reporting, analytics, decision support, or other computational use cases.

The prevalence of FTS has increased with the surge in electronic health record adoption in the last decade due to Meaningful Use provisions in the American Recovery and Reinvestment Act of 2009. The etiology of FTS is primarily due to numerous clinical workflows that involve free text data entry, but there are other contributing factors such as workflows that import data into health IT systems, particularly when interoperability solutions dumb down structured content into free text which is the lowest common denominator. One report estimates the volume of free text in health IT systems to be 80% and growing.

Treatment, Part 1: Early General Approaches

Historical approaches to the treatment of FTS include the usage of ICD, SNOMED CT, LOINC, RxNorm, and other standard terminologies to codify diagnoses, laboratory results, medications, and other types of health information. Although these terminologies may contain large numbers of structured and codified terms, there remains a large gap in the amount of information that is devoid of structure and computable meaning. Furthermore, while codification makes sense, the effectiveness of this treatment approach is mitigated by the pervasive use of local (“home-grown”) terminologies, thereby presenting additional challenges to normalizing data for analytics and decision support. While local terminologies can be normalized against standard terminologies through concept mapping, that is perhaps the topic of another blog post. In any case, sometimes it is just not possible to encode the entire patient record with standard or local terminologies.

Another treatment for FTS is structured documentation which works well for some use cases (e.g., certain structured notes and forms) but due to usability and efficiency issues may impose unrealistic time burdens and minimal return on investment for other use cases (e.g., reports that are transcribed from dictations). It is well-established that physicians feel overburdened with documentation requirements from rules and regulations and are looking for administrative simplification. For example, studies by Arndt et al (2017) and Sinsky et al (2016) quantify time allocation among primary care physicians, and they find that physicians spend more time in front of their EHRs than they do with patients. Therefore, while extremely valuable for many settings, it is not practical to impose structured documentation for all clinical workflows with today’s available solutions. That being said, Clinical Architecture is bringing to market a next generation solution called ClinEvolve that addresses the deficiencies found in most commercially available clinical documentation solutions by enabling clinicians to capture structured documentation in a fraction of the time it takes today… but that too is a topic for a separate blog post.

Other efforts to treat FTS involve natural language processing (NLP) technologies. While varied in their specific approaches, the general methods involve attempts to understand entire bodies of unstructured text—every word, sentence, paragraph, etc. If one’s goal is simply to extract certain pieces of knowledge from free text, NLP is usually overkill. Furthermore, it is well known that NLP does not work right out of the box and needs to be adapted to different health care settings before they are able to generate desired results. Therefore, depending on the use case, NLP solutions are often excessively bulky, slow, and expensive treatments for FTS.

Treatment, Part 2: A Precision Medicine Approach

A new approach to treating FTS is clinical language processing, which Clinical Architecture refers to as Semantic Interpretation of Free Text (SIFT). As the name implies, SIFT is used to extract knowledge from free text and ascribe semantic meaning through linkages to standard terminologies, but it differs from NLP in several important ways:

  • Performance: SIFT is faster than NLP because of its targeted approach to knowledge extraction
  • Focus: SIFT targets specific concepts and provides results with a higher degree of certainty than NLP
  • Control: SIFT finds exactly the items of interest (e.g., ICD, CPT, RxNorm, SNOMED CT concepts) and produces less noise than NLP
  • Implementation: SIFT is quicker to set up than NLP
  • Design: SIFT is designed with “arrays” and deep semantic knowledge as opposed to using traditional NLP technology stacks—after all, clinical language is not “natural” and deserves its own targeted, intelligent, and custom approach

SIFT focuses on identifying concepts (and their associated values if relevant) in free text and correlating these with terms in standard terminologies. The transposition of key information is of great value when it is needed to enhance decision-making, outcomes and performance measurement, reporting, and cost analysis. These benefits are all possible due to the structure and relationships present in clinical terminologies. Terminology provides the knowledge to interpret the semantics found in the free text.

To achieve this goal, SIFT leverages a rich set of healthcare metadata that is finely tuned to specific domains such as diagnoses, procedures, medications, laboratory results, observations, etc. When free text is parsed and aligned with this metadata, a deep understanding of the meaning is achieved so that it can be correlated to the most appropriate standard terms. For example, the medication domain has certain characteristics such as ingredient, dose form, dose strength, dose unit, route, etc. A laboratory domain has characteristics such as analyte, specimen source, scale, method, etc. Breaking down text into these constituent characteristics enables a fine-grained comparison of unstructured text and enables its alignment with the textual descriptions of a standard reference term.

NLP processes unstructured text in the same way, regardless of the user’s intent—this is akin to prescribing clopidogrel to a patient without knowing how well the drug will be metabolized and whether another antithrombotic agent would be a better choice. In contradistinction, SIFT takes a purposeful and targeted approach to knowledge extraction, as there are different SIFT arrays designed to find ICD-10, CPT, RxNorm, SNOMED CT, or other concepts in unstructured text. Furthering the analogy, SIFT is akin to a precision medicine solution to FTS.

To illustrate a SIFT use case, let’s say that a healthcare organization wants to identify patients for enrollment into a heart failure disease management program. An obvious first step might be to query the clinical data repository for patients with a specified set of standard terms and codes that roll up to the concept of heart failure. However, it is common for a small percentage of patients to have undocumented conditions. One way to close this gap is to infer the presence of heart failure based on the left ventricular ejection fraction (LVEF) from a dictated echocardiogram transcription. SIFT can find the LVEF and return the SNOMED CT code along with the LVEF value and units of measure. Meanwhile, codifying the left ventricular wall thickness or aortic root dimensions might not be so helpful for this use case and would be out of scope.

Figure: SNOMED CT code for left ventricular ejection fraction. Source: American Medical Association, Integrated Health Model Initiative. URL: https://ama-ihmi.org/groups/ama-ihm-community (requires registration)URL: https://ama-ihmi.org/groups/ama-ihm-community (requires registration)

The same philosophy applies to the detection of undocumented chronic obstructive pulmonary disease (COPD). It would be very useful to know that patient’s post-bronchodilator FEV1/FVC ratio which is obtained through a pulmonary function test (PFT) and is diagnostic of COPD but unfortunately is typically hidden in unstructured uncodified text. SIFT can identify the post-bronchodilator FEV1/FVC ratio while at the same time not attempting to derive meaning from other PFT results such as residual volume, maximum voluntary ventilation, or other observations that are not immediately useful for the use case at hand.


In summary, FTS is a highly prevalent chronic condition that is easily diagnosed but has been historically challenging to treat. Newer techniques such as SIFT focus on targeted knowledge extraction and can identify concepts and values from free text with resultant linkages to standard terms and codes. This approach facilitates analytics and clinical decision support for organizations that want to leverage their existing data to improve outcomes and the overall value of patient care.


Leave a Reply

Your email address will not be published. Required fields are marked *

11 + = 17