By: Charlie Harp
Five years ago Clinical Architecture had a client that called us and asked the following question about our mapping engine, “How large can a term be?”
“How big do you need it to be?” I responded.
“We were thinking five thousand bytes would do the trick.” Said the client.
As it turns out, they were actually trying to use our mapping engine to determine if text in microbiology reports they were getting from hospitals mentioned the MRSA organism. We told them using the mapping engine for this was not a good idea, but we wanted to get to the bottom of what they were trying to do so we could see if we could help them. The use cases they shared made us realize our industry was trying to solve a problem with the entirely wrong set of tools.
The use cases outlined were all about extracting usable information from unstructured text. The client in question had a partnership with a large technology vendor with a fairly robust set of tools, so I asked the first innocent question on this journey… “Why not use megacorps NLP capability?”
Issues with Traditional NLP in Healthcare
The short answer was that they had tried and failed. Why did it fail? The reasons were as follows:
The NLP processing was too slow and became a bottleneck when trying to process a high number of transactions.
The NLP process did not actually return anything useful. Typically the output was complex XML structures containing UMLS or SNOMED CT concepts that essentially attempted to codify the meaning of each sentence. This was beyond the use cases they were trying to solve.
The NLP process was designed to support a natural language. The information the client was grappling with was anything but natural. Whether it was a lab report, micro report, procedure note or encounter note, they simply did not obey the grammar of the English language.
The NLP process was not something that could be customized or tuned. It was a sealed black box that had to be sent back to the vender to be “adjusted”. When it was adjusted there were often unintended consequences that led to it having to go back to the vendor and so on and so on.
The NLP process would require training and since it was expecting ‘natural language’ there was a lot of training to do. The result was that a human ended up doing the work anyway in order to train the NLP process.
Our conclusion was that the NLP solutions available were not working for many healthcare situations. When faced with a use case that is not being addressed with traditional approaches, Clinical Architecture looks into those approaches to try to determine why they are not working.
A Brief History of NLP
NLP began as a twinkle in the eye of Alan Turing in an article titled “Computing Machinery and Intelligence”. Some progress was made in the science of NLP from the 1960s to the late 1980s. Much of the early focus was on machine driven language translation. During the 1970s programmers began to create conceptual ontologies designed to provide computers with a framework to understand real world concepts. During this period much of the NLP systems relied on complex hard coded rules. In the late 1980s, thanks to the steady increase in computing power and some changes in approach to linguistics, the shift was made to machine learning algorithms. This was followed by grammatical parsing, parts of speech tagging and probabilistic models in NLP. Today many of the NLP engines available both commercially and in the public domain leverage a common set of NLP libraries and textual corpora. This means that many of them have, at their core the same issues. It also explains why there are so many of them out there clamoring for attention.
This is not meant to be a comprehensive history of NLP, but rather a snapshot that shines a light on why NLP has failed healthcare. The root of it lies in the fact that NLP is essentially an artificial intelligence research project focusing on grammar-based human discourse.
SIFT(ing) Unstructured Text
Based on our clients issues and the limitations of traditional NLP we decided to build something new. We called our approach Semantic Interpretation of Free Text (SIFT) and the design principles were as follows:
- It must be fast.
- It must be something that can be focused on a particular set of concepts to meet a targeted need.
- It must not require a team of programmers to get it to work.
- It must be something that is constructed as information – not as code. So a subject matter expert can build one, test it and deploy.
- It must be tunable by the end user so they can make adjustment to get the results that they want.
- It must be grammar agnostic. It needs to cope with the rough terrain of unending variation in healthcare text styles.
- It must be reusable. You should be able to build portable units of understanding that can be integrated into a processing stack.
- It must be able to return concepts and values in a format that can be immediately consumed and processed by an application.
- It must be able to correlate the results back to the original text so that its results can be reviewed and understood by a human.
These were the principles we started with and spent almost three years in development and two years in private beta. The result is something novel that we are very proud of. The first UnNatural Language Processing Engine for healthcare, built on the most sophisticated enterprise terminology platform in our industry.
We will be demonstrating SIFT at HIMSS16, and showcasing how clients can customize SIFT tooling to meet specific use cases. We will also offer SIFT Services, which are cloud-based, SIFT APIs for medication reconciliation, cardiovascular and pulmonary observations, microorganisms, demographics and clinical document improvement.
There is too much information locked up in unstructured text for you to give up on it. If traditional NLP has failed to meet your expectations in the past, you should stop by booth 721 while at HIMSS16 and see what happens when a product is built specifically to solve the problem you have. If you are not going to HIMSS, or just can’t wait, contact us or give us a call and we would be happy to give you a demo.
I know, I know… “Charlie! You don’t usually try to sell us stuff in your blog!” That’s true and it’s on purpose. In this case I feel that SIFT is an important part of the terminology management jigsaw puzzle and I want to make sure that, before you abandon trying to leverage your free text assets, you check out what my team has built. I don’t think you will be disappointed.