Posts

Final week's musings

I spent the majority of this week working on the Sentiment analysis code. Good news is that finally works!! github: https://github.com/jayeetaroy/Text-analytics-Python/blob/master/Sentiment%20Analysis.ipynb Bad news is that the semester is coming to an end. It has been a very interesting semester. In a way, I think I learned a lot more than I would've learned in a traditional class. This last week was uncharacteristically smooth and with absolutely no chaos! I thought before starting this exercise, that I won't be able to finish it or maybe I won't be able to understand what's happening, or I might even have to scramble to finish all of it. But none of that happened! So surprising! I think this method of self-study is a lot more effective than it gets credited for. Sure yes, I would've loved to have someone teach me all this, in a classroom, that way I would've wasted a lot less time being lost, but then I also won't have this sense of accomplishment...

Sentiment Analysis

Even though the last exercise was a little taxing, I wanted to take another shot at a small project-based approach for this week too. So the mini project I decided for myself was a sentiment analyser.  Nowadays companies want to understand, what went wrong with their latest products? What users and the general public think about the latest feature? All this information can be quantified with reasonable accuracy using sentiment analysis. Quantifying users content, idea, belief, and opinion is known as sentiment analysis. User's online post, blogs, tweets, feedback of product helps business people to the target audience and innovate in products and services. Sentiment analysis helps in understanding people in a better and more accurate way. It is not only limited to marketing, but it can also be utilized in politics, research, and security. Sentiment Analysis is also referred to as Opinion Mining. It’s mostly used in social media and customer reviews data. There a...

Finally built a text classifier - Part 2

After applying the Naive Bayes, I found that the accuracy was only  0.7738980350504514. Now This is the part where I was stuck for a long time. It took a lot of hitting the old notes and random blogs to find out how can I use a Support Vector Machine based model to get a better accuracy. The advantages of support vector machines are: Effective in high dimensional spaces. Still effective in cases where a number of dimensions are greater than the number of samples. Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient. Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels. The disadvantages of support vector machines include: If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial. SVMs do not directly provi...

Finally built a text classifier - Part 1

Image
A text classifier primarily uses machine learning principles to classify text into different categories. These supervised learning based classifiers are used in a variety of different places such as spam filters, user comment categorization, etc.  Text categorization is the grouping of documents into a fixed number of predefined classes. Each document can be in multiple, exactly one, or no category at all. Using machine learning, the objective is to classifiers from examples which perform the category tasks automatically. This is a supervised learning problem. Since categories may overlap, each category is treated as a separate group. Supervised Learning for Text Classification. PART-I : Training 1) During training, a feature extractor is used to transform each input value to a feature set. 2) These feature sets, which capture the basic information about each input that should be used to categorize it. 3) Pairs of feature sets and l...

Building a text classifier - or more appropriately, trying to!

This exercise took a really long time. I was starting to get bored of running small examples and wanted to really get my hands dirty. I think this semester has helped me understand my inherent impatient and curious nature. One of the things that I learnt about myself through the course of these modules is that I am extremely impatient when it comes to learning.  I want to learn everything fast. I keep jumping topics, skipping videos. In a classroom structure, I can definitely see that I am tied by the speed of the course decided by the professor(some skilled and trained at designing a curriculum). But when left on my own, I keep jumping around and going down rabbit holes. It may seem productive and fast progress at first but it definitely leaves gaps in my understanding. I did something similar this week. I decided I have read enough blogs and tutorials that I can build a classifier already. I was confident that in my previous module I did work with classifiers, I will...

Parsing a Web page

The next thing I tried was parsing a web page using BeautifulSoup and NLTK. There is more information on the Internet than any human can absorb in a lifetime. What you need is not access to that information, but a scalable way to collect, organize, and analyze it. For this, we use web scraping. Web scraping automatically extracts data and presents it in a format you can easily make sense of. In this tutorial, we’ll focus on its applications in the financial market, but web scraping can be used in a wide variety of situations. The first step was as always, installing the libraries. So, I installed BeautifulSoup and urllib to start building my scraper. One of the key things to take note of was that to build a scraper you need a basic knowledge of what an html page looks like because these scrapers, use the html tags on a website to return text. There are also certain rules that need to be followed when building a web scraper. You should check a website’s Terms an...

Reflecting on the Meetups

For my previous module, I primarily attended the meetups organized by Women in Data Science and the events organized by Galvanize. For this module, I found a group which specialized in NLP called the  Austin Natural Language Processing group. I saw an upcoming event which I was very excited about, which was supposed to talk about generating text using deep learning. It did ask for an entry fee, but I thought it will be worth it because the event details showed a very interesting list of topics. But so far they have already postponed the even twice with no information on when will it happen next. Thus after this, I was in fix as to what meetup to attend. So I diversified my search and ended up signing up for PyLadies ATX  and Learn to Code - Thinkful Austin. To my surprise, I found that PyLadies host a Whiteboard Mock Interview meetup! When I did the first module, I did know what meetup to attend for it, this would have been perfect. I thoroughly enj...