...

Library - Natural Language Toolkit

Back to Course

Lesson Description


Lession - #534 Stemming & Lemmatization


#### What is Stemming? Stemming is a strategy used to extricate the base type of the words by eliminating fastens from them. It is very much like chopping down the parts of a tree to its stems. For instance, the stem of the words eating, eats, eaten is eat. Search engines use stemming for indexing the words. That is the reason instead of putting away all types of a word, a search engine can store just the stems. Along these lines, stemming lessens the size of the index and increments recovery precision.   #### Different Stemming algorithms In NLTK, stemmerI, which have stem(>
technique, interface has every one of the stemmers which we will cover straightaway. Allow us to figure out it with the accompanying graph 


    #### Porter stemming algorithm It is one of the most widely recognized stemming algorithms which is fundamentally intended to eliminate and supplant notable additions of English words.    PorterStemmer class NLTK has PorterStemmer class with the assistance of which we can undoubtedly execute Porter Stemmer algorithms for the word we need to stem. This class realizes a few normal word structures and postfixes with the assistance of which it can change the info word to a last stem. The subsequent stem is much of the time a more limited word having a similar root meaning. Allow us to see a model −  To start with, we want to import the normal language toolkit(nltk>
.    ```plaintext import nltk ``` Now, import the PorterStemmer class to implement the Porter Stemmer algorithm.    ```plaintext from nltk.stem import PorterStemmer ``` Next, create an instance of Porter Stemmer class as follows − ```plaintext word_stemmer = PorterStemmer(>
``` Now, input the word you want to stem.    ```plaintext word_stemmer.stem('writing'>
``` Output ```plaintext 'write' ```   ```plaintext word_stemmer.stem('eating'>
```   Output ```plaintext 'eat' ```   #### Lancaster stemming algorithm It was created at Lancaster University and it is another extremely normal stemming calculations.    #### LancasterStemmer class NLTK has LancasterStemmer class with the assistance of which we can undoubtedly execute Lancaster Stemmer calculations for the word we need to stem. Allow us to see a model −  To start with, we really want to import the natural language toolkit(nltk>
.'    ```plaintext import nltk ``` Now, import the LancasterStemmer class to implement Lancaster Stemmer algorithm    ```plaintext from nltk.stem import LancasterStemmer ``` Next, create an instance of LancasterStemmer class as follows −    ```plaintext Lanc_stemmer = LancasterStemmer(>
```   #### Regular Expression stemming algorithm With the assistance of this stemming algorithm, we can build our own stemmer.    RegexpStemmer class NLTK has RegexpStemmer class with the assistance of which we can undoubtedly execute Regular Expression Stemmer algorithms. It essentially takes a solitary normal articulation and eliminates any prefix or postfix that matches the articulation. Allow us to see a example −  To start with, we want to import the natural language toolkit(nltk>
.    ```plaintext import nltk ``` Now, import the RegexpStemmer class to implement the Regular Expression Stemmer algorithm.    ```plaintext from nltk.stem import RegexpStemmer ``` Next, create an instance of RegexpStemmer class and provides the suffix or prefix you want to remove from the word as follows −    ```plaintext Reg_stemmer = RegexpStemmer(‘ing’>
``` Now, input the word you want to stem.    ```plaintext Reg_stemmer.stem('eating'>
```   Output ```plaintext 'eat' ```   ```plaintext Reg_stemmer.stem('ingeat'>
```   #### What is Lemmatization? Lemmatization procedure is like stemming. The result we will get after lemmatization is called 'lemma', which is a root word instead of root stem, the result of stemming. After lemmatization, we will get a substantial word that implies exactly the same thing.  NLTK gives WordNetLemmatizer class which is a flimsy covering around the wordnet corpus. This class utilizes morphy(>
function to the WordNet CorpusReader class to track down a lemma. Allow us to grasp it with a Example −  Example To start with, we want to import the natural language toolkit(nltk>
.    ```plaintext import nltk ``` Now, import the WordNetLemmatizer class to implement the lemmatization technique.    ```plaintext from nltk.stem import WordNetLemmatizer ``` Next, create an instance of WordNetLemmatizer class.    ```plaintext lemmatizer = WordNetLemmatizer(>
``` Now, call the lemmatize(>
method and input the word of which you want to find lemma.    ```plaintext lemmatizer.lemmatize('eating'>
``` Output ```plaintext 'eating' ```   ```plaintext lemmatizer.lemmatize('books'>
```   Output ```plaintext 'book' ``` ```plaintext