How to use BeautifulSoup alongside with lxml parser In Python

For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.  

pip install beautifulsoup4

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:  

pip install lxml

Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it. 

Reading Data From an XML File

There are two steps required to parse a xml file:-  

  • Finding Tags 
  • Extracting from tags

Example:

XML File used: 

Python3




from bs4 import BeautifulSoup
 
 
# Reading the data inside the xml
# file to a variable under the name
# data
with open('dict.xml', 'r') as f:
    data = f.read()
 
# Passing the stored data inside
# the beautifulsoup parser, storing
# the returned object
Bs_data = BeautifulSoup(data, "xml")
 
# Finding all instances of tag
# `unique`
b_unique = Bs_data.find_all('unique')
 
print(b_unique)
 
# Using find() to extract attributes
# of the first instance of the tag
b_name = Bs_data.find('child', {'name':'Frank'})
 
print(b_name)
 
# Extracting the data stored in a
# specific attribute of the
# `child` tag
value = b_name.get('test')
 
print(value)


OUTPUT:

Writing an XML File

Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document. 

Example: 

Python3




from bs4 import BeautifulSoup
 
# Reading data from the xml file
with open('dict.xml', 'r') as f:
    data = f.read()
 
# Passing the data of the xml
# file to the xml parser of
# beautifulsoup
bs_data = BeautifulSoup(data, 'xml')
 
# A loop for replacing the value
# of attribute `test` to WHAT !!
# The tag is found by the clause
# `bs_data.find_all('child', {'name':'Frank'})`
for tag in bs_data.find_all('child', {'name':'Frank'}):
    tag['test'] = "WHAT !!"
 
 
# Output the contents of the
# modified xml file
print(bs_data.prettify())


Output:

Reading and Writing XML Files in Python

Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.

Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document

In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:

  • BeautifulSoup used alongside the lxml parser 
  • Elementtree library. 

Similar Reads

Using BeautifulSoup alongside with lxml parser

For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal....

Using Elementtree

...

Contact Us