A Basic Guide to Natural Language Processing


Humans communicate through words. Unfortunately, computers do not. This makes it difficult for a computer to understand our natural language. They see our words as a form of “unstructured data” and are unable to process this kind of data effectively.

When a computer is programmed it is given a set of rules to follow, a “structure” to operate by. When a computer is fed unstructured data however these rules become blurred, difficult to define and quite abstract.

Breaking Down The Language Barrier.

Humans, as a species have been writing things down in various forms for thousands of years. In this time, our brain has gained an enormous amount of experience with natural language and how it works. While a computer may be able to recognise individual words, only humans are able to read full blog posts or news articles and fully understand what they mean.

Let’s be clear, computers are nowhere near the same intuitive understanding of natural language as humans. They are not able to really understand textual content in the same way we are.

Natural Language Processing (NLP) is a form of Artificial Intelligence which is looking to narrow that gap. Striving to enable computers to make sense of natural language by allowing unstructured data to be processed and analysed more efficiently.

There have been some encouraging advancements recently. Deep Learning has enabled us to write programs to perform things like language translation, semantic understanding and text summarization. While these technical terms may not mean much to a non-technical person, they do in fact have some real-world applications.

For example…

Natural Language Processing in Real World Applications.

Energy companies want to improve operations and keep their employees safe. So, when machinery breaks on an oil rig somebody has to go fix it. These repairs are usually dangerous and quite expensive.

Being able to better analyse data will allow companies to improve their operations. Saving money and creating a safer workplace for their employees. However, only 20% of data needed for this analysis is in a structured format like spreadsheets or data bases. This form of data is easy for a computer to use.

The other 80% is in the form of text like repair manuals, injury reports and notes jotted down by technicians. This information is obviously extremely valuable. However, due to its size and structure, it has largely been invisible to analytics teams.

Imagine searching a database of injury reports and you want to find lower body injuries. The search tool will likely provide very few results, mostly because it is looking for the exact keyword.

The logical next step would be to use a more specific keyword, such as “Foot injuries.” This returns many more search results but unfortunately not the results you were looking for. The search tool instead provided results where foot was used in the context of distance. Not really that helpful. This is because a foot can be both a body part and a unit of measurement.

While humans can determine context and know the difference, up until recently computers were largely stumped. Thanks to Natural Language Processing however computers can now better understand textual data.

How Does NLP Work?

NLP algorithms can’t understand text as we do. However, they can look for patterns. They do this by turning huge chunks of text into matrices. The algorithm will first off remove words that offer very little value, words like “a” “the” “is” and “are”. These are called stop words.

From there the algorithm might split the sentences into groups of words. Count how many times each group of words appears in each document and how many documents have that group of words out of all the documents.

So tokens that appear lots of times in a lot of documents may not mean much. However, tokens that appear frequently in only a few documents, tell us that something is going on.

By feeding injury reports across all of the planets oil wells into this basic algorithm you might discover that falling debris injuries are clustered around oil wells in the Gulf of Mexico. You might then know about a new piece of machinery, environmental factor or something else that is causing injuries., thus allowing them to be prevented.

Oil and gas operators are now able to ask natural language questions when performing diagnostics before making repairs. Gaining more insights. NLP enables energy companies to unlock the value of their unstructured data. Every email and injury report can be turned into actual insights used to drive revenue.

Oil and gas industry work is some of the most dangerous on the planet. So while this may seem like quite a niche application. It is actually a major development in preventing serious injuries and fatalities across the energy sector.

This is just one example of how NLP algorithms can be used. It has many applications across a wide range of industries.

Leave a Comment

* Indicates a required field