Link Extractor class of Scrapy

So, scrapy have the class “scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor” for extracting the links from a response object. For convenience scrapy also provides us with “scrapy.linkextractors.LinkExtractor“.

Firstly we need to import the LinkExtractor. There are quite a few ways to import and use the LinkExtractor class, but one of them is to import it in the following way:

from scrapy.linkextractors import LinkExtractor

Also, Using the LinkExtractor class. To use the “LinkExtractor” class you need to create the object as given below :

link_ext = LinkExtractor(arguments) 

We can also fetch the links. Now that we have created an object, to fetch links we will use the “extract_links” method of the LinkExtractor class. For that run below code : 

links = link_ext.extract_links(response)

The links fetched are in list format and of the type “scrapy.link.Link” . The parameters of the link object are:

  1. url : url of the fetched link.
  2. text : the text used in the anchor tag of the link.
  3. fragment : the part of the url after the hash (#) symbol.
  4. no-follow : tells whether the value of “rel” attribute of the anchor tag is “nofollow” or not.

Scrapy – Link Extractors

In this article, we are going to learn about Link Extractors in scrapy. “LinkExtractor” is a class provided by scrapy to extract links from the response we get while fetching a website. They are very easy to use which we’ll see in the below post. 

Similar Reads

Scrapy – Link Extractors

Basically using the “LinkExtractor” class of scrapy we can find out all the links which are present on a webpage and fetch them in a very easy way. We need to install the scrapy module (if not installed yet) by running the following command in the terminal:...

Link Extractor class of Scrapy

So, scrapy have the class “scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor” for extracting the links from a response object. For convenience scrapy also provides us with “scrapy.linkextractors.LinkExtractor“....

Stepwise Implementation

Step 1: Creating a spider...

Link Extractors using Scrapy

...

Contact Us