Link Extractor class of Scrapy

So, scrapy have the class “scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor” for extracting the links from a response object. For convenience scrapy also provides us with “scrapy.linkextractors.LinkExtractor“.

Firstly we need to import the LinkExtractor. There are quite a few ways to import and use the LinkExtractor class, but one of them is to import it in the following way:

from scrapy.linkextractors import LinkExtractor

Also, Using the LinkExtractor class. To use the “LinkExtractor” class you need to create the object as given below :

link_ext = LinkExtractor(arguments)

We can also fetch the links. Now that we have created an object, to fetch links we will use the “extract_links” method of the LinkExtractor class. For that run below code :

links = link_ext.extract_links(response)

The links fetched are in list format and of the type “scrapy.link.Link” . The parameters of the link object are:

url : url of the fetched link.
text : the text used in the anchor tag of the link.
fragment : the part of the url after the hash (#) symbol.
no-follow : tells whether the value of “rel” attribute of the anchor tag is “nofollow” or not.

Scrapy – Link Extractors

In this article, we are going to learn about Link Extractors in scrapy. “LinkExtractor” is a class provided by scrapy to extract links from the response we get while fetching a website. They are very easy to use which we’ll see in the below post.

Tags:

#Python #python

Scrapy – Link Extractors

Stepwise Implementation

Link Extractor class of Scrapy

Scrapy – Link Extractors

Similar Reads

Contact Us