TopHome
<2023-12-01 Fri>git

Removing un-needed files from git

Sometimes, un-needed binary files or other artefacts can get into your git repo. Prime example, pdf files in latex repos.

Once they get in, it can become really painful to handle. One reason this is not good is the size explosion - you will find that your git repo grows to large sizes. But even otherwise, it is not a good idea to do this - for the purposes of commit tracking and operations such as merge. Forget merge, sometimes even pulling a branch can become a problem, when your local compiled version of the file is not the same as the one checked in. Your best solution is to remove it ASAP and ensure that it doesn't get back in again.

1. Remove the file from the older commits using bfg

There are a few options to remove files, but bfg is as good as any other. Download the jar file and use it as follows:

$ cd path/to/repo
$ java -jar <path to bfg.jar> --delete-files paper.pdf .

This will remove out the file, as if it was never there. But, your work is not done yet.

2. Remove tracking of the file

Since the file got in at one point, it is still being tracked. Firstly, you need to ensure that git forgets about it.

$ git update-index --assume-unchanged paper.pdf

3. Ensure no tracking in the future

Now, ignore that file by adding an entry to .gitignore. This way, the file should not get in again, by mistake.

4. Push your changes

bfg has re-written the past. But this past needs to now be propogated to the main repo. You can do this using force push. Be careful.

5. Propogate those changes down

You have the changes and so does the main repo. But, we are not done yet! Others who have the repo don't have the change yet. They need to pull changes, that have diverged from their versions.

One simple solution for them is to delete the local folder and clone again. A better solution is to git reset to an older commit and pull the changes again. I would recommend this, especially if you know when the anomalous file got in, in the first place.