Cleaning the data

You might have noticed that your output contains duplicate news headlines and text contents that aren’t news headlines.

Create a list of all the text elements you want to get rid of:

unwanted = [‘BBC World News TV’, ‘BBC World Service Radio’, ‘News daily newsletter’, ‘Mobile app’, ‘Get in touch’]

Then print text elements only if they are not in this list by putting:

print(x.text.strip())

Below is the implementation:

Python3




import requests
from bs4 import BeautifulSoup
  
url = 'https://www.bbc.com/news'
response = requests.get(url)
  
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find('body').find_all('h3')
unwanted = ['BBC World News TV', 'BBC World Service Radio',
            'News daily newsletter', 'Mobile app', 'Get in touch']
  
for x in list(dict.fromkeys(headlines)):
    if x.text.strip() not in unwanted:
        print(x.text.strip())


Output:



How to get the Daily News using Python

In this article, we are going to see how to get daily news using Python. Here we will use Beautiful Soup and the request module to scrape the data.

Similar Reads

Modules needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal....

Stepwise Implementation:

Step 1: First of all, make sure to import these libraries....

Cleaning the data

...

Contact Us