...

Library - Natural Language Toolkit

Back to Course

Lesson Description


Lession - #549 Natural Language Toolkit-Transforming Trees


Following are the two reasons to transform the trees −
  • To modify deep parse tree and
  • To flatten deep parse trees

Converting Tree or Subtree to Sentence

The first recipe we are going to discuss here is to convert a Tree or subtree back to a sentence or chunk string. This is very simple, let us see in the following example −
Example
from nltk.corpus import treebank_chunk
tree = treebank_chunk.chunked_sents(>
[2] ' '.join([w for w, t in tree.leaves(>
]>

Output
'Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields
PLC , was named a nonexecutive director of this British industrial
conglomerate .'


Deep tree flattening

Deep trees of settled phrases can't be utilized for preparing a chunk consequently we should flatten them prior to utilizing. In the accompanying example, we will utilize third parsed sentence, which is deep tree of settled phrases, from the treebank corpus.
Example
To accomplish this, we are characterizing a function named deeptree_flat(>
that will take a solitary Tree and will return another Tree that keeps simply the most reduced level trees. To do a large portion of the work, it utilizes a partner function which we named as childtree_flat(>
.
from nltk.tree import Tree
def childtree_flat(trees>
: children = [] for t in trees: if t.height(>
< 3: children.extend(t.pos(>
>
elif t.height(>
== 3: children.append(Tree(t.label(>
, t.pos(>
>
>
else: children.extend(flatten_childtrees([c for c in t]>
>
return children def deeptree_flat(tree>
: return Tree(tree.label(>
, flatten_childtrees([c for c in tree]>
>

Now, let us call deeptree_flat(>
function on 3rd parsed sentence, which is deep tree of nested phrases, from the treebank corpus. We saved these functions in a file named deeptree.py.
from deeptree import deeptree_flat
from nltk.corpus import treebank
deeptree_flat(treebank.parsed_sents(>
[2]>

Output
Tree('S', [Tree('NP', [('Rudolph', 'NNP'>
, ('Agnew', 'NNP'>
]>
, (',', ','>
, Tree('NP', [('55', 'CD'>
, ('years', 'NNS'>
]>
, ('old', 'JJ'>
, ('and', 'CC'>
, Tree('NP', [('former', 'JJ'>
, ('chairman', 'NN'>
]>
, ('of', 'IN'>
, Tree('NP', [('Consolidated', 'NNP'>
, ('Gold', 'NNP'>
, ('Fields', 'NNP'>
, ('PLC', 'NNP'>
]>
, (',', ','>
, ('was', 'VBD'>
, ('named', 'VBN'>
, Tree('NP-SBJ', [('*-1', '-NONE-'>
]>
, Tree('NP', [('a', 'DT'>
, ('nonexecutive', 'JJ'>
, ('director', 'NN'>
]>
, ('of', 'IN'>
, Tree('NP', [('this', 'DT'>
, ('British', 'JJ'>
, ('industrial', 'JJ'>
, ('conglomerate', 'NN'>
]>
, ('.', '.'>
]>


Building Shallow tree

In the past segment, we flatten a deep tree of nested phrases by just keeping the least level subtrees. In this segment, we will keep simply the most significant level subtrees for example to construct the shallow tree. In the accompanying example we will utilize third parsed sentence, which is deep tree of settled phrases, from the treebank corpus.
Example
To accomplish this, we are characterizing a function named tree_shallow(>
that will dispose of all the nested subtrees by keeping just the top subtree marks.
from nltk.tree import Tree
def tree_shallow(tree>
: children = [] for t in tree: if t.height(>
< 3: children.extend(t.pos(>
>
else: children.append(Tree(t.label(>
, t.pos(>
>
>
return Tree(tree.label(>
, children>

Now, let us call tree_shallow(>
function on 3rd parsed sentence, which is deep tree of nested phrases, from the treebank corpus. We saved these functions in a file named shallowtree.py.
from shallowtree import shallow_tree
from nltk.corpus import treebank
tree_shallow(treebank.parsed_sents(>
[2]>

Output
Tree('S', [Tree('NP-SBJ-1', [('Rudolph', 'NNP'>
, ('Agnew', 'NNP'>
, (',', ','>
, ('55', 'CD'>
, ('years', 'NNS'>
, ('old', 'JJ'>
, ('and', 'CC'>
, ('former', 'JJ'>
, ('chairman', 'NN'>
, ('of', 'IN'>
, ('Consolidated', 'NNP'>
, ('Gold', 'NNP'>
, ('Fields', 'NNP'>
, ('PLC', 'NNP'>
, (',', ','>
]>
, Tree('VP', [('was', 'VBD'>
, ('named', 'VBN'>
, ('*-1', '-NONE-'>
, ('a', 'DT'>
, ('nonexecutive', 'JJ'>
, ('director', 'NN'>
, ('of', 'IN'>
, ('this', 'DT'>
, ('British', 'JJ'>
, ('industrial', 'JJ'>
, ('conglomerate', 'NN'>
]>
, ('.', '.'>
]>

We can see the difference with the help of getting the height of the trees −
from nltk.corpus import treebank
tree_shallow(treebank.parsed_sents(>
[2]>
.height(>

Output
3

from nltk.corpus import treebank
treebank.parsed_sents(>
[2].height(>

Output
9


Tree labels conversion

In parse trees there are assortment of Tree name types that are absent in chunk trees. Be that as it may, while utilizing parse tree to prepare a chunker, we might want to lessen this assortment by changing over some of Tree marks to more normal label types. For instance, we have two elective NP subtrees specifically NP-SBL and NP-TMP. We can change over the two of them into NP. Allow us to perceive how to do it in the accompanying example.
Example
To accomplish this we are characterizing a function named tree_convert(>
that takes following two contentions −
  • Tree to convert
  • A label conversion mapping

This function will return another Tree with all matching names supplanted in view of the qualities in the mapping.
from nltk.tree import Tree
def tree_convert(tree, mapping>
: children = [] for t in tree: if isinstance(t, Tree>
: children.append(convert_tree_labels(t, mapping>
>
else: children.append(t>
label = mapping.get(tree.label(>
, tree.label(>
>
return Tree(label, children>

Presently, let us call tree_convert(>
function on third parsed sentence, which is deep tree of nested phrases, from the treebank corpus. We saved these functions in a file named converttree.py.
from converttree import tree_convert
from nltk.corpus import treebank
mapping = {'NP-SBJ': 'NP', 'NP-TMP': 'NP'}
convert_tree_labels(treebank.parsed_sents(>
[2], mapping>

Output
Tree('S', [Tree('NP-SBJ-1', [Tree('NP', [Tree('NNP', ['Rudolph']>
, Tree('NNP', ['Agnew']>
]>
, Tree(',', [',']>
, Tree('UCP', [Tree('ADJP', [Tree('NP', [Tree('CD', ['55']>
, Tree('NNS', ['years']>
]>
, Tree('JJ', ['old']>
]>
, Tree('CC', ['and']>
, Tree('NP', [Tree('NP', [Tree('JJ', ['former']>
, Tree('NN', ['chairman']>
]>
, Tree('PP', [Tree('IN', ['of']>
, Tree('NP', [Tree('NNP', ['Consolidated']>
, Tree('NNP', ['Gold']>
, Tree('NNP', ['Fields']>
, Tree('NNP', ['PLC']>
]>
]>
]>
]>
, Tree(',', [',']>
]>
, Tree('VP', [Tree('VBD', ['was']>
,Tree('VP', [Tree('VBN', ['named']>
, Tree('S', [Tree('NP', [Tree('-NONE-', ['*-1']>
]>
, Tree('NP-PRD', [Tree('NP', [Tree('DT', ['a']>
, Tree('JJ', ['nonexecutive']>
, Tree('NN', ['director']>
]>
, Tree('PP', [Tree('IN', ['of']>
, Tree('NP', [Tree('DT', ['this']>
, Tree('JJ', ['British']>
, Tree('JJ', ['industrial']>
, Tree('NN', ['conglomerate']>
]>
]>
]>
]>
]>
]>
, Tree('.', ['.']>
]>