Link Extractor class of Scrapy
So, scrapy have the class “scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor” for extracting the links from a response object. For convenience scrapy also provides us with “scrapy.linkextractors.LinkExtractor“.
Firstly we need to import the LinkExtractor. There are quite a few ways to import and use the LinkExtractor class, but one of them is to import it in the following way:
from scrapy.linkextractors import LinkExtractor
Also, Using the LinkExtractor class. To use the “LinkExtractor” class you need to create the object as given below :
link_ext = LinkExtractor(arguments)
We can also fetch the links. Now that we have created an object, to fetch links we will use the “extract_links” method of the LinkExtractor class. For that run below code :
links = link_ext.extract_links(response)
The links fetched are in list format and of the type “scrapy.link.Link” . The parameters of the link object are:
- url : url of the fetched link.
- text : the text used in the anchor tag of the link.
- fragment : the part of the url after the hash (#) symbol.
- no-follow : tells whether the value of “rel” attribute of the anchor tag is “nofollow” or not.
Scrapy – Link Extractors
In this article, we are going to learn about Link Extractors in scrapy. “LinkExtractor” is a class provided by scrapy to extract links from the response we get while fetching a website. They are very easy to use which we’ll see in the below post.
Contact Us