Nested Loaders

Nested loaders are useful when we are parsing values, that are related, from the subsection of a document. Without them, we need to mention the entire XPath or CSS path, of the data we want to extract. Consider, the following HTML footer example  –

Python3




# Create loader object
loader = ItemLoader(item=Item())
 
# Item loader method for phoneno,
# mention the field name and xpath expression
loader.add_xpath('phoneno',
                 '//footer/a[@class = "phoneno"]/@href')
 
# Item loader method for map,
# mention the field name and xpath expression
loader.add_xpath('map',
                 '//footer/a[@class = "map"]/@href')
 
# populate the item
loader.load_item()


 
Using nested loaders, we can avoid, using the nested footer selector, as follows: 

Python3




# Define Item Loader object by passing item
loader = ItemLoader(item=Item())
 
# Create nested loader with footer selector
footer_loader = loader.nested_xpath('//footer')
 
# Add phoneno xpath values relative to the footer
footer_loader.add_xpath('phoneno', 'a[@class = "phoneno"]/@href')
 
# Add map xpath values relative to the footer
footer_loader.add_xpath('map', 'a[@class = "map"]/@href')
 
# Call loader.load_item() to populate values
loader.load_item()


Please note the following points about nested loaders:

  • They work with CSS and XPath selectors.
  • They can be nested randomly.
  • They can make the code look simpler.
    • Do not use them needlessly, else the parser can get difficult to read.

Scrapy – Item Loaders

In this article, we are going to discuss Item Loaders in Scrapy.

Scrapy is used for extracting data, using spiders, that crawl through the website. The obtained data can also be processed, in the form, of Scrapy Items. The Item Loaders play a significant role, in parsing the data, before populating the Item fields.  In this article, we will learn about Item Loaders.

Similar Reads

Installing Scrapy:

Scrapy, requires a Python version, of 3.6 and above. Install it, using the pip  command, at the terminal as:...

Create a Scrapy Spider Project

Scrapy comes with an efficient command-line tool, called the Scrapy tool. The commands have a different set of arguments, based on their purpose. To write the Spider code, we begin by creating, a Scrapy project. Use the following, ‘startproject’ command, at the terminal –...

Data  Extraction Using Scrapy Items

We will scrape the Book Title, and, Book Price, from the Women’s fiction webpage. Scrapy, allows the use of selectors, to write the extraction code. They can be written, using CSS or XPath expressions, which traverse the entire HTML page, to get our desired data. The main objective, of scraping, is to get structured data, from unstructured sources. Usually, Scrapy spiders will yield data, in Python dictionary objects. The approach is beneficial, with a small amount of data. But, as your data increases, the complexity increases. Also, it may be desired, to process the data, before we store the content, in any file format. This is where, the Scrapy Items, come in handy. They allow the data,  to be processed, using Item Loaders. Let us write, Scrapy Item for Book Title and Price, and, the XPath expressions, for the same....

Introduction to Item Loaders

...

How do Item Loaders work?

...

Built-in processors:

Item loaders, allow a smoother way, to manage scraped data. Many times, we may need to process, the data we scrape. This processing can be:...

Item Loader Objects

So far we know, Item Loaders are used to parse, the data, before Item fields are populated. Let us understand, how Item Loaders work –...

Following are the methods available for ItemLoader objects:

Now, let us understand, the built-in processors, and, methods that we will use, in Item Loaders, implementation. Scrapy has six built-in processors. Let us know them –...

Nested Loaders

...

Reusing and Extending Item Loaders

...

Declaring Custom Item Loaders Processors

...

Implementing Item Loaders to Parse Data:

...

Contact Us