Getting started with NLTK

Natural Language Processing or NLP caught my interest because the ability to extract useful information and actionable insights from heaps of unstructured and raw textual data is very fascinating. It has applications in every industry, starting from e-commerce to property management. I've spent the past year often dabbling with one or the other text analytics package. In the past, I have primarily worked with SpaCy and for this module, I wanted to work with NLTK. These both very heavily used in the industry and have a plethora of functionalities built into them. They also have a very detailed documentation and a wide variety of tutorials available online. My target in this module is again going to be divided into two ways, one in explaining what am I working on and the approach I am taking and the other one on writing the code itself.

For NLP preprocessing, there are many open source tools available. Some of them are developed by organizations to build their own NLP applications, while some of them are open-sourced. Here is a
small list of available NLP tools:
• GATE
• SpaCy
• Open NLP
• UIMA
• Stanford toolkit
• Genism
• Natural Language Tool Kit (NLTK)

However, when it comes to the ease of use and explanation of the concepts, I felt that NLTK scores really high. NLTK is also a very good learning kit because the learning curve of Python is very fast. NLTK has incorporated most of the NLP tasks, it's very elegant and easy to work with. For all these reasons, NLTK has become one of the most popular libraries in the NLP community.

Getting started wasn't much of a challenge since I already had python installed for my previous module. I just had to install the package I was ready to get started.
For my first tutorial, I will be analyzing a Twitter dataset. Let's see how that goes.

Search This Blog

Musings on Learning

Getting started with NLTK

Comments

Post a Comment

Popular posts from this blog

Linear Regression Theory