ALL THINGS DATA by 1000ml

The Required Technical Knowledge for NLP Work

Last week we discussed the considerations for when you’re deciding whether to build or buy AI; feeding nicely into what is the Required Technical Knowledge for NLP work. So this is assuming you went through the framework of deciding whether to build or buy, and you were like, you know what? We can build this, let’s do this ourselves.
So now that you’re in the world of let’s build this because you’ve decided that we have the staffing resources, and we’re able to get consultants who could do this, having enough know-how and enough people with the breadth of knowledge required for this. Also, we’ve worked with or have the infrastructure available for this.
Generally, the organization is in a good shape, or our change management processes are quite good, understanding how to make sure that this blends into our organization well having in mind the total cost of ownership of such a project.
For a very long time anybody who’s done a lot of work in NLP has chosen to usually start their journey with the package NLTK is largely like the go-to and it is the basis, it’s the foundation that provides quite a bit of functionality, but some of the things that are critical for you to know if you’re going to do serious NLP you need to know stemming.
You need to split the content either into phrases because you may want to analyze phrases or sentences, or split it into paragraphs, but usually into words in order to understand and know how to use a part of speech triggers bringing you endless opportunities.
For example, there’s the possibility of doing a strictly extractive summary where you’re going to pull things out of a document. Imagine an entire thesis, there is an abstract, and using an NLP on that whole thesis, you’d probably just pull that abstract becoming your summary.
.So you want to make an abstract, so you’re largely looking to take the knowledge and understand it and then abstract the knowledge of that document. So you’re really summarizing as opposed to pulling specific parts of the content out.
Again, high risk for it cause we’re really good at this stuff. But you could also use machine learning and AI as a sort of input to NLP. You’d find that there are many clusters when you’re doing this that can then help lead you to create an AI program based on those clusters.
Lastly, if you wanted to use the NLP as an input to different programs, you could do that for things like the things we do, which we do a lot of work in document intelligence, in contract AI.

Let’s cut through the jargon, myths and nebulous world of data, machine learning and AI. Each week we’ll be unpacking topics related to the world of data and AI with the awarding winning founders of 1000ML. Whether you’re in the data world already or looking to learn more about it, this podcast is for you.