NLP | Expanding and Removing Chunks with RegEx
RegexpParser or RegexpChunkRule.fromstring() doesn’t support all the RegexpChunkRule classes
. So, we need to create them manually.
This article focusses on 3 of such classes :
ExpandRightRule: It adds chink (unchunked) words to the right of a chunk.
ExpandLeftRule: It adds chink (unchunked) words to the left of a chunk.
For ExpandLeftRule
and ExpandRightRule
takes as parameter – the right and left chink pattern respectively that we want to add to the beginning and ending of the chunk respectively.
UnChunkRule: It unchunks any matching chunk and it becomes a chink.
Code #1: How the code works
# Loading Libraries from nltk.chunk.regexp import ChunkRule, ExpandLeftRule from nltk.chunk.regexp import ExpandRightRule, UnChunkRule from nltk.chunk import RegexpChunkParser # Initialising ChunkRule ur = ChunkRule( '<NN>' , 'single noun' ) # Initialising ExpandLeftRule el = ExpandLeftRule( '<DT>' , '<NN>' , 'get left determiner' ) # Initialising ExpandRightRule er = ExpandRightRule( '<NN>' , '<NNS>' , 'get right plural noun' ) # Initialising UnChunkRule un = UnChunkRule( '<DT><NN.*>*' , 'unchunk everything' ) chunker = RegexpChunkParser([ur, el, er, un]) sent = [( 'the' , 'DT' ), ( 'sushi' , 'NN' ), ( 'rolls' , 'NNS' )] chunker.parse(sent) |
Output:
Tree('S', [('the', 'DT'), ('sushi', 'NN'), ('rolls', 'NNS')])
Note: Output is a flat sentence as UnChunkRule undid the chunk created by the previous rules.
How the stuff works?
-
Make a chunk with noun.
-
Expanding the left determiners to chunks that begin with noun.
-
Expanding the right plural nouns to chunks that ends with noun.
-
Finally, it unchunk every chunk that is a determiner + noun + plural noun, resulting in the original sentence tree.
Code #2: Step by Step Code Explaining the diagram.
# Loading Libraries from nltk.chunk.regexp import ChunkRule, ExpandLeftRule from nltk.chunk.regexp import ExpandRightRule, UnChunkRule from nltk.chunk import RegexpChunkParser from nltk.chunk.regexp import ChunkString from nltk.tree import Tree chunk_string = ChunkString(Tree( 'S' , sent)) print ( "Chunk String : " , chunk_string) # Initialising ChunkRule ur = ChunkRule( '<NN>' , 'single noun' ) ur. apply (chunk_string) print ( "\nstep 1 : " , chunk_string) # Initialising ExpandLeftRule el = ExpandLeftRule( '<DT>' , '<NN>' , 'get left determiner' ) el. apply (chunk_string) print ( "step 2 : " , chunk_string) # Initialising ExpandRightRule er = ExpandRightRule( '<NN>' , '<NNS>' , 'get right plural noun' ) er. apply (chunk_string) print ( "step 3 : " , chunk_string) # Initialising UnChunkRule un = UnChunkRule( '<DT><NN.*>*' , 'unchunk everything' ) un. apply (chunk_string) print ( "step 4 : " , chunk_string) |
Output :
Chunk String : <DT> <NN> <NNS> step 1 : <DT> {<NN>} <NNS> step 2 : {<DT> <NN>} <NNS> step 3 : {<DT> <NN> <NNS>} step 4 : <DT> <NN> <NNS>
Contact Us