...

Library - Natural Language Toolkit

Back to Course

Lesson Description


Lession - #538 Basics of Part-of-Speech (POS) Tagging


What is POS tagging?

Tagging, a sort of characterization, is the programmed task of the depiction of the tokens. We call the descriptor s 'tag', which addresses one of the grammatical features (nouns, verb, adverbs, adjectives, pronouns, conjunction and their sub-categories>
, semantic data, etc.

Then again, in the event that we discuss Part-of-Speech (POS>
tagging, it could be characterized as the most common way of changing over a sentence as a list of words, into a list of tuples. Here, the tuples are as (word, tag>
. We can likewise call POS tagging a course of relegating one of the grammatical forms to the given word.

Following table addresses the most successive POS warning utilized in Penn Treebank corpus −
Sr.No Tag Description
1 NNP Proper noun, singular
2 NNPS Proper noun, plural
3 PDT Pre determiner
4 POS Possessive ending
5 PRP Personal pronoun
6 PRP$ Possessive pronoun
7 RB Adverb
8 RBR Adverb, comparative
9 RBS Adverb, superlative
10 RP Particle
11 SYM Symbol (mathematical or scientific>
12 TO to
13 UH Interjection
14 VB Verb, base form
15 VBD Verb, past tense
16 VBG Verb, gerund/present participle
17 VBN Verb, past
18 WP Wh-pronoun
19 WP$ Possessive wh-pronoun
20 WRB Wh-adverb
21 # Pound sign
22 $ Dollar sign
23 . Sentence-final punctuation
24 , Comma
25 : Colon, semi-colon
26 ( Left bracket character
27 >
Right bracket character
28 " Straight double quote
29 ' Left open single quote
30 " Left open double quote
31 ' Right close single quote
32 " Right open double quote


Example
Let us understand it with a Python experiment −
import nltk
from nltk import word_tokenize
sentence = "I am going to school"
print (nltk.pos_tag(word_tokenize(sentence>
>
>

Output
[('I', 'PRP'>
, ('am', 'VBP'>
, ('going', 'VBG'>
, ('to', 'TO'>
, ('school', 'NN'>
]


Why POS tagging?

POS tagging is an important part of NLP because it works as the prerequisite for further NLP analysis as follows −
  • Chunking
  • Syntax Parsing
  • Information extraction
  • Machine Translation
  • Sentiment Analysis
  • Grammar analysis & word-sense disambiguation

TaggerI - Base class

Every one of the taggers reside in NLTK's nltk.tag package. The base class of these taggers is TaggerI, implies every one of the taggers acquire from this class.
Methods − TaggerI class have the accompanying two strategies which should be carried out by the entirety of its subclasses −
  • tag(>
    method
    − As the name infers, this strategy takes a rundown of words as info and returns a list of tagged words as result.
  • evaluate(>
    method
    − With the assistance of this strategy, we can assess the precision of the tagger.



The Baseline of POS Tagging

The pattern or the fundamental stage of POS tagging is Default Tagging, which can be performed utilizing the DefaultTagger class of NLTK. Default tagging basically allocates similar POS tag to each token. Default tagging likewise gives a pattern to quantify precision enhancements.

DefaultTagger class
Default tagging is performed by utilizing DefaultTagging class, which takes the single contention, i.e., the tag we need to apply.

How can it function?
As told before, every one of the taggers are acquired from TaggerI class. The DefaultTagger is acquired from SequentialBackoffTagger which is a subclass of TaggerI class. Allow us to figure out it with the accompanying diagram −

Similar to the piece of SeuentialBackoffTagger, the DefaultTagger should execute choose_tag(>
technique which takes the accompanying three arguments.
  • Token’s list
  • Current token’s index
  • Previous token’s list, i.e., the history
Example
import nltk
from nltk.tag import DefaultTagger
exptagger = DefaultTagger('NN'>
exptagger.tag(['Tutorials','Point']>

Output
[('Instructional exercises', 'NN'>
, ('Point', 'NN'>
]

In this example, we picked a thing tag since it is the most normal kinds of words. Additionally, DefaultTagger is likewise most helpful when we pick the most widely recognized POS tag.