How to Remove a Large File from Commit History in Git?

Simply removing a large file through a commit won’t truly eliminate it. Git stores all versions of files in its history, even deleted ones, to allow for easy recovery. This can lead to wasted disk space.

We will discuss the various approaches for removing a large file from the commit history in Git:

Table of Content

  • Using git filter-branch
  • Using git filter-repo
  • Using BFG Repo-Cleaner

Using git filter-branch

Let’s say you committed a large file, like a 15MB photo named gfg.jpg, in a previous commit. To completely erase it from Git’s history, use the following command in your project directory:

git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch path/to/gfg.jpg' \
--prune-empty --tag-name-filter cat -- --all

Explanation of the Command:

  • git filter-branch: This command rewrites your Git branch history.
  • –force: Forces the rewrite, even if it rewrites existing remote branches.
  • –index-filter: Specifies a command to run on the index (staging area) for each commit.
  • git rm –cached –ignore-unmatch path/to/gfg.jpg: This command removes the file from the index for each commit, effectively deleting it from history.
    • –cached: Removes the file from the staging area.
    • –ignore-unmatch: Prevents errors if the file doesn’t exist in all commits.
  • –prune-empty: Removes commits that become empty after filtering (i.e., commits that only had the removed file).
  • –tag-name-filter cat –: Keeps tags in sync with the rewritten history (use cat to pass the original tag name through).
  • –all: Applies the filter to all branches.

Successful Removal Output:

If the file was successfully removed, you’ll see output similar to:

Rewrite ee94db7633eewdhu7b370512d95ejnd7dnuf6c8cf (2/2)
Ref 'refs/heads/master' was rewritten

Preventing Future Issues:

Update your .gitignore file to prevent accidentally committing similar files again.

  • Force Pushing to Remotes : If you’ve already pushed the large file to a remote hosting platform (like GitHub), you’ll need to force push the changes to update the rewritten commits on the remote server.
  • Important Note: Force pushing rewrites history on the remote server, which can disrupt other developers who have already pulled that branch. It’s recommended to:
  • Consult with experienced Git users on your team before force pushing.
  • Read about the consequences of force pushing before proceeding.
  • Force Push Command:
git push origin --force --all

Using git filter-repo

Git filter-repo is a tool used to rewrite Git repository history by applying various filters. It can be used to remove files, change file paths, modify commit messages, and more. This can be useful for cleaning up repository history or preparing a repository for public release.

Install git filter-repo:

pip install git-filter-repo

Command to Remove the File:

git filter-repo --path path/to/gfg.jpg --invert-paths

Explanation of the Command:

  • git filter-repo: The command to invoke the tool.
  • –path path/to/gfg.jpg: Specifies the path to the file you want to remove.
  • –invert-paths: Removes the specified paths from history instead of keeping them.

Successful Removal Output:

If the file was successfully removed, you’ll see output similar to:

Parsed 50 commits
New history written in 2.33 seconds;
now repacking/cleaning your repo...
Repacking your repo and cleaning out old unneeded objects
Enumerating objects: 36, done.
Counting objects: 100% (36/36), done.
Delta compression using up to 4 threads
Compressing objects: 100% (18/18), done.
Writing objects: 100% (36/36), done.
Total 36 (delta 10), reused 0 (delta 0)
Ref 'refs/heads/master' was rewritten

Force Pushing to Remotes:

git push origin --force --all

Using BFG Repo-Cleaner

The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Download BFG Repo-Cleaner:

Download the latest version from the official site.

Command to Remove the File:

java -jar bfg.jar --delete-files gfg.jpg

Follow-Up Commands:

git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push origin --force --all

Explanation of the Commands:

  • java -jar bfg.jar –delete-files gfg.jpg: Runs BFG to remove all occurrences of gfg.jpg from history.
  • git reflog expire –expire=now –all: Cleans up the reflog.
  • git gc –prune=now –aggressive: Garbage collects to remove all traces of the file.
  • git push origin –force –all: Force pushes the rewritten history to the remote repository.

Successful Removal Output:

If the file was successfully removed, you’ll see output similar to:

BFG Repo-Cleaner v1.13.0

Cleaning--------Found 3 commits

Cleaning commits: 100% (3/3)
Cleaning commits completed in 100 ms.

BFG Repo-Cleaner done! Removed files and cleaned commits.

Preventing Future Issues

Update your .gitignore file to prevent accidentally committing similar files again.

Summary

To remove a large file from Git history, you can use:

  • git filter-branch for a traditional but complex method.
  • git filter-repo for a modern, faster, and easier method.
  • BFG Repo-Cleaner for a specialized tool designed for this purpose.

Each method requires force-pushing the rewritten history to the remote repository and updating the .gitignore file to prevent future issues.



Contact Us