How to use Filecmp In Python

The python module filecmp offers functions to compare directories and files. The cmp function compares the files and returns True if they appear identical otherwise False.

Syntax: filecmp.cmp(f1, f2, shallow)

Parameters:

  • f1: Name of one file
  • f2: Name of another file to be compared
  • shallow: With this, we set if we want to compare content or not.

Note: The default value is True which ensures that only the signature of files is compared not content.

Return Type: Boolean value (True if the files are same otherwise False)

Example:

We’re assuming here for example purposes that “text_1.txt”, “text_3.txt”, “text_4.txt” are files having the same content, and “text_2.txt”, “text_5.txt” are files having the same content.

Python3




# Importing Libraries
import os
from pathlib import Path
from filecmp import cmp
  
  
# list of all documents
DATA_DIR = Path('/path/to/directory')
files = sorted(os.listdir(DATA_DIR))
  
# List having the classes of documents
# with the same content
duplicateFiles = []
  
# comparison of the documents
for file_x in files:
  
    if_dupl = False
  
    for class_ in duplicateFiles:
        # Comparing files having same content using cmp()
        # class_[0] represents a class having same content
        if_dupl = cmp(
            DATA_DIR / file_x,
            DATA_DIR / class_[0],
            shallow=False
        )
        if if_dupl:
            class_.append(file_x)
            break
  
    if not if_dupl:
        duplicateFiles.append([file_x])
  
# Print results
print(duplicateFiles)


Output:

Finding Duplicate Files with Python

In this article, we will code a python script to find duplicate files in the file system or inside a particular folder. 

Similar Reads

Method 1: Using Filecmp

The python module filecmp offers functions to compare directories and files. The cmp function compares the files and returns True if they appear identical otherwise False....

Method 2: Using Hashing and Dictionary

...

Contact Us