Generating UnicodeDecodeError for a CSV file

The following code attempts to open the CSV file for processing. The above code, upon execution, led to the following error:

Python3




import pandas as pd
  
path = "test.csv"
  
# The following statement reads the csv file at the given path
# While decoding the contents of the file in utf-8 decoding standard
file = pd.read_csv(path)
  
print(file.head())


Output:

 

Understanding the Problem

The error occurred as the read_csv method could not decode the contents of the CSV file by using the default encoding, UTF-8. This is because the encoding of the file is UTF-16. Hence the encoding of the CSV file needs to be mentioned while opening the CSV file to fix the error and allow the processing of the CSV file.

Solution

Firstly, the pandas‘ library is imported, and the path to the CSV file is specified. Then the program calls the read_csv function to read the contents of the CSV file specified by the path and also passes the encoding through which the CSV file must be decoded (UTF-16 in this case). Since the decoding scheme mentioned in the argument is the one with which the CSV file was originally encoded, the file gets decoded successfully. 

Python3




import pandas as pd
  
path = "test.csv"
  
# The following statement reads the csv file at the given path
# While decoding the contents of the file in utf-8 decoding standard
file = pd.read_csv(path, encoding="utf-16")
  
# Displaying the contents
print(file.head())


Output:

 

Alternate Method to Solve UnicodeDecodeError

Another way of resolving the issue is by changing the encoding of the CSV file itself. For that, firstly, open the CSV file as a text file (using notepad or Wordpad):

 

Now go to file and select Save as:

 

A prompt would appear, and from there, select the encoding option and change it to UTF-8 (the default for Python and pandas), and select Save.

 

 

Now the following code would run without errors

The code ran without errors. This is because the default encoding of the CSV file was changed to UTF-8 before opening it with pandas. Since the default encoding used by pandas is UTF-8, the CSV file opened without error. 

Python3




import pandas as pd
  
path = "test.csv"
  
# The following statement reads the csv file at the given path
# While decoding the contents of the file in utf-8 decoding standard
file = pd.read_csv(path)
  
print(file.head())


Output:

 



How to resolve a UnicodeDecodeError for a CSV file in Python?

Several errors can arise when an attempt to decode a byte string from a certain coding scheme is made. The reason is the inability of some encoding schemes to represent all code points. One of the most common errors during these conversions is UnicodeDecode Error which occurs when decoding a byte string by an incorrect coding scheme. This article will teach you how to resolve a UnicodeDecodeError for a CSV file in Python.

Similar Reads

Why does the UnicodeDecodeError error arise?

The error occurs when an attempt to represent code points outside the range of the coding is made. To solve the issue, the byte string should be decoded using the same coding scheme in which it was encoded. i.e., The encoding scheme should be the same when the string is encoded and decoded....

Generating UnicodeDecodeError for a CSV file

...

Contact Us