...

Library - Natural Language Toolkit

Back to Course

Lesson Description


Lession - #539 Natural Language Toolkit-Unigram Tagger


What is Unigram Tagger?

As the name suggests, unigram tagger is a tagger that main purposes a solitary word as its setting for deciding the POS(Part-of-Speech>
tag. In straightforward words, Unigram Tagger is a setting based tagger whose setting is a solitary word, i.e., Unigram.
 
How does it work?
NLTK gives a module named UnigramTagger to this inspiration. However, prior to getting profound plunge into its working, let us figure out the order with the assistance of following diagram −

From the above chart, it is perceived that UnigramTagger is acquired from NgramTagger which is a subclass of ContextTagger, which acquires from SequentialBackoffTagger.

The working of UnigramTagger is made sense of with the assistance of following advances −
  • As we have seen, UnigramTagger acquires from ContextTagger, it carries out a unique context(>
    method. This unique context(>
    method accepts similar three contentions as choose_tag(>
    method.
  • The aftereffect of context(>
    method will be the word token which is additionally used to make the model. When the model is made, the word token is likewise used to look into the best tag.
  • Along these lines, UnigramTagger will fabricate a context model from the list of tagged sentences.

NLTK's UnigramTagger can be prepared by giving a list of tagged sentences at the hour of instatement. In the model underneath, we will utilize the tagged sentences of the treebank corpus. We will utilize initial 2500 sentences from that corpus.
Model First import the UniframTagger module from nltk −
from nltk.tag import UnigramTagger

Next, import the corpus you want to use. Here we are using treebank corpus −
from nltk.corpus import treebank

Now, take the sentences for training purpose. We are taking first 2500 sentences for training purpose and will tag them −
train_sentences = treebank.tagged_sents(>
[:2500]

Next, apply UnigramTagger on the sentences used for training purpose −
Uni_tagger = UnigramTagger(train_sentences>

Take some sentences, either equal to or less taken for training purpose i.e. 2500, for testing purpose. Here we are taking first 1500 for testing purpose −
test_sentences = treebank.tagged_sents(>
[1500:] Uni_tagger.evaluate(test_sents>
 
Output
0.8942306156033808

Here, we got around 89 percent accuracy for a tagger that uses single word lookup to determine the POS tag.
 
Complete implementation example
from nltk.tag import UnigramTagger
from nltk.corpus import treebank
train_sentences = treebank.tagged_sents(>
[:2500] Uni_tagger = UnigramTagger(train_sentences>
test_sentences = treebank.tagged_sents(>
[1500:] Uni_tagger.evaluate(test_sentences>
 
Output
0.8942306156033808