Handling Responses
Once a request is made to a particular website using Scrapy and the request does not lead to any error then in return the website server sends a Response object that has a number of properties associated with it which can be used to extract data related to the website and the request-response performed. A Scrapy Request Object consists of the following information:
- HTTP status code.
- any headers if required.
- the response body containing all the data.
Now that we know how to make a scrapy request to a URL and get data, let’s see how we can parse the response data so we can extract the required information from it.
Python3
import scrapy class MySpider(scrapy.Spider): name = 'scrapy_example' def start_requests( self ): yield scrapy.Request(url = 'http://www.example.com' , callback = self .parse) def parse( self , response): # Extract data from the response from all h1 and p tags headings = response.css( 'h1::text' ).getall() paragraphs = response.css( 'p::text' ).getall() # Print the extracted data print ( "Headings:" ) for heading in headings: print (heading) print ( "\nParagraphs:" ) for paragraph in paragraphs: print (paragraph) |
We’ve created a spider called MySpider in the above code. Once we execute the above code the `start_requests` method is called and makes the request to the URL provided. After the request response is received from the web server that contains the data it is passed to the parse() method.
In the parse method, we extract all the h1 and p tags data from the response and print it to the console as shown in the illustration below:
Scrapy – Requests and Responses
In this article, we will explore the Request and Response-ability of Scrapy through a demonstration in which we will scrape some data from a website using Scrapy request and process that scraped data from Scrapy response.
Contact Us