How to use Filecmp In Python
The python module filecmp offers functions to compare directories and files. The cmp function compares the files and returns True if they appear identical otherwise False.
Syntax: filecmp.cmp(f1, f2, shallow)
Parameters:
- f1: Name of one file
- f2: Name of another file to be compared
- shallow: With this, we set if we want to compare content or not.
Note: The default value is True which ensures that only the signature of files is compared not content.
Return Type: Boolean value (True if the files are same otherwise False)
Example:
We’re assuming here for example purposes that “text_1.txt”, “text_3.txt”, “text_4.txt” are files having the same content, and “text_2.txt”, “text_5.txt” are files having the same content.
Python3
# Importing Libraries import os from pathlib import Path from filecmp import cmp # list of all documents DATA_DIR = Path( '/path/to/directory' ) files = sorted (os.listdir(DATA_DIR)) # List having the classes of documents # with the same content duplicateFiles = [] # comparison of the documents for file_x in files: if_dupl = False for class_ in duplicateFiles: # Comparing files having same content using cmp() # class_[0] represents a class having same content if_dupl = cmp ( DATA_DIR / file_x, DATA_DIR / class_ [ 0 ], shallow = False ) if if_dupl: class_ .append(file_x) break if not if_dupl: duplicateFiles.append([file_x]) # Print results print (duplicateFiles) |
Output:
Finding Duplicate Files with Python
In this article, we will code a python script to find duplicate files in the file system or inside a particular folder.
Contact Us