How to useGit-Submodules in GIT

It often happens that while working on one project, you need to use another project from within it. Then why not you can do similar things If you have some large files inside your project. You can manage the files in a separate repo and then use git-submodule to pull them into your project in the same way.

Here is a detailed explanation of the process.

Step 1: Create a new repository for your main project, this will contain your project source codes.

Main Project Repository

Create a separate repository for the large files, that you want to include in your projects.

Second Project Repository

In your main project repository, use the
git submodule add
command to add the separate repository as a submodule.

git submodule add YOUR_SECOND_REPOSITORY_URL

git submodule add YOUR_SECOND_REPOSITORY_URL

After adding the submodule, you will see a new folder in your main project repository that corresponds to the submodule. This folder will contain the files from the separate repository.

Commit the changes to your main project repository, including the addition of the submodule.

Git Submodule

The above strategies are based on the fact that keeping all those binary files is necessary for your project.

Optimizing Git for Large Binary Files

Version Control Systems are a category of software tools that help in recording changes made to files by keeping track of modifications done in the code.

Table of Content

  • What is large binary files?
  • The Challenge of Large Binary Files in Git
  • Why do we need to optimize binary files in Git?
  • Strategy for optimizing Git for Large Binary Files:
  • Approach 1: Using Git LFS:
  • Approach 2: Using Git-Annex
  • Differences Between Git LFS and Git-Annex:
  • Approach 3: Git-Submodules

Purpose of Version Control System:

  • Multiple people can work simultaneously on a single project. Everyone works on and edits their copy of the files and it is up to them when they wish to share the changes made by them with the rest of the team.
  • Version control provides access to the historical versions of a project. This is insurance against computer crashes or data loss. If any mistake is made, you can easily roll back to a previous version. It is also possible to undo specific edits that too without losing the work done in the meantime. It can be easily known when, why, and by whom any part of a file was edited.

Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. When you do actions in Git, nearly all of them only add data to the Git database.

Similar Reads

What is large binary files?

Large binary files or Binary File Objects (BLOBS) are complex large files, any external assets you add to your projects like images, videos, and animated content (.blb, .fbx), unlike other data strings that only contain letters and numbers....

The Challenge of Large Binary Files in Git

Git was originally designed to handle primarily text-based files efficiently. While it excels at managing source code, it can struggle with large binary files such as images, videos, compiled binaries, or datasets. These files can significantly increase repository size, making cloning, pushing, and pulling operations slower and more resource-intensive. Moreover, storing large files directly in the Git repository can lead to performance degradation over time, impacting the productivity of development teams....

Why do we need to optimize binary files in Git?

Let us consider a scenario now, Suppose you are working on a large-scale multi-modules project, and the project itself contains some large files or generates some large files in some of its phases....

Strategy for optimizing Git for Large Binary Files:

Approach 1: Using Git LFS:...

Approach 1: Using Git LFS:

Github has file size limits of 100MB. Files with a size of 50MB trigger a warning message but can still be pushed through....

Approach 2: Using Git-Annex

git-annex is an annotated git repository. git-annex is a distributed file synchronization system (development began in 2010). It aims to solve the problem of sharing and synchronizing collections of large files independent of a commercial service or even a central server....

Differences Between Git LFS and Git-Annex:

Subject Git LFS Git Annex Definition Git LFS is an open source Git command line extension. git-annex is a distributed file synchronization system. Working Principle Git LFS stores large files outside of the Git repository, while maintaining references to those files within the repository. Git Annex also takes large files out of Git’s history, but it handles this by storing symbolic links to the files in .git/annex Storage The files managed by Git LFS are stored as Git objects both in .git/lfs/objects and in the working repository, which can result in duplicated files and increased disk space usage. The actual data of the large files is stored in a separate backend, such as S3 or rsync. Protocols Git LFS supports both SSH and HTTPS protocols for accessing repositories. Git Annex works only through SSH. Programming Language Git LFS is primarily implemented by Go. Git Annex is written in Haskell....

Approach 3: Using Git-Submodules

It often happens that while working on one project, you need to use another project from within it. Then why not you can do similar things If you have some large files inside your project. You can manage the files in a separate repo and then use git-submodule to pull them into your project in the same way....

Conclusion:

In conclusion, Optimizing large binary files plays a crucial role, through the implementation of various techniques as discussed above you can able to improve the performance, reduce storage requirements, and enhance overall system efficiency....

Contact Us