例如，當我使用git submodule update --init --force --remote它拉取子模塊時，它會創建包含 git diff 的新檔案

diff --git a/app/Services/Payment b/app/Services/Payment
index 72602bc..a726378 160000
--- a/app/Services/Payment
    b/app/Services/Payment
@@ -1  1 @@
-Subproject commit 72602bc5d9e7cef136043791242dfdcfd979370c
 Subproject commit a7263787e5515abe18e7cfe76af0f26d9f62ceb4

我不知道這些檔案是什么以及如何擺脫它們，當我洗掉它們時，sobmodule 結帳到舊提交

uj5u.com熱心網友回復：

TL; 博士

您在這里的問題是使用--remote. 別那樣做。

長

您在對VonC 回答的評論中提到：

當我[跑] git status[我得到]

    modified:   app/Services/Notification (new commits)
    modified:   app/Services/Payment (new commits)
    modified:   database/migrations (new commits)

這(new commits)部分意味著：您的子模塊正在積極使用（通過其當前檢出）的提交哈希 ID 不同于您的索引（建議的下一次提交）所說的應使用的提交哈希 ID 。

這里有很多術語（“子模塊”、“gitlinks”、“索引”、“提交哈希 ID”），因此有很多需要解壓。我們馬上就會談到這一點。

請注意，git status上面的輸出git diff是您在原始問題中參考的輸出的更緊湊的表示：

diff --git a/app/Services/Payment b/app/Services/Payment
index 72602bc..a726378 160000
--- a/app/Services/Payment
    b/app/Services/Payment
@@ -1  1 @@
-Subproject commit 72602bc5d9e7cef136043791242dfdcfd979370c
 Subproject commit a7263787e5515abe18e7cfe76af0f26d9f62ceb4

我們在這里看到的是app/Services/Payment，您的 (main, top-level, "or superproject" repository's index 說這個特定的子模塊應該使用 commit 72602bc5d9e7cef136043791242dfdcfd979370c。但它實際上使用的是commit a7263787e5515abe18e7cfe76af0f26d9f62ceb4。我們剛剛添加了一個術語來定義：超級工程。

一些初始定義

讓我們從Git 存盤庫的定義開始。存盤庫的核心是一對資料庫。一個是提交和其他內部 Git 物件的資料庫。另一個資料庫保存名稱——人類可讀的名稱，因為 Git 用于它自己的物件的名稱是不可理解的。

甲提交是四種型別的內部物件中的一個，在第一-通常大得多資料庫GIT中存盤。這些提交被編號，非常大的數字范圍可達 2 ¹⁶⁰ -1。這些數字以十六進制表示，例如，72602bc5d9e7cef136043791242dfdcfd979370c。（提交是您通常以我們將要描述的方式與之互動的唯一提交，因此我們將方便地忽略其余三個，但它們也都已編號。）

這些數字看起來是隨機的，盡管它們實際上是加密散列函式的輸出，因此完全是非隨機的。它們來自散列函式的事實也是我們稱它們為散列 ID 的原因。但真正的重點是它們似乎完全被打亂了，沒有人會記住它們。為此，我們需要一臺計算機。

幸運的是，我們有一臺電腦。我們只是讓計算機為我們記住這些哈希 ID，使用諸如分支名稱和標簽名稱之類的東西。每個提交還在其自身中存盤散列 ID 或一些先前的提交。我們真的不需要在這里擔心這個，但這就是分支在 Git 中的真正作業方式。

所以：

a repository is
a pair of databases, where one database holds commits
which have hash IDs or big ugly numbers.

We and Git use the second database, of names, to find the hash IDs of particular commits, and we use the commits to find more hash IDs of more commits, and so on.

Commits are read-only: the working tree and the index

Now, a crucial thing to know about these commits—and indeed all of Git's internal objects—is that they are all read only. They have to be, because of the hashing trick: the hash ID is a function of every single bit that goes into the internal object, and we find the object by the hash ID, so the hash ID must always match. If the hash ID of some object we extract from the database doesn't match the hash ID we used to find it in the database, Git decides the database is corrupt.¹

So the commits are completely read-only. Not only that, but the files inside each commit—we didn't define this earlier, but each commit holds a full snapshot of every file—are in a special Git-only format, compressed and de-duplicated, that only Git can read. (Literally nothing can write over them since everything is read-only.)

What this means is that just to use some commit, we must extract that commit. Git will extract a commit by:

reading the compressed and Git-ified files that are inside the commit;
expanding them into ordinary read/write files; and
writing out those files into a working tree.

This working tree—another bit of jargon—is where we actually do our work. Here, we can see, read, and even write to files. They exist as files, not as read-only, Git-only database entries. So, now we can get work done.

The working tree also enables us to make new commits, but here, Git inserts an extra stumbling block. Before Git will allow us to make a new commit, Git requires that we copy any updated files back into Git.

This step actually makes a certain amount of sense, because the files we see and work on / with in our working tree are not in Git at all. They may have been copied out of Git (out of a commit or one of its supporting objects) but once they are out, they are out.

Git calls the place that Git makes us re-copy updated files by three different names: the index, which as a name makes no sense by itself; the staging area, which refers to how we and Git use the index—and the cache, which is hardly ever used any more but still shows up as the flag in git rm --cached for instance.

The index's role as staging area is pretty straightforward. It takes on an expanded role during merge conflicts, but since we are not worried about these here, we'll just look at how we and Git use it as a staging area.

When we first check out a commit—with git checkout or git switch—Git needs to expand out all the compressed and Git-ified files into our working tree. But Git secretly sticks a "copy" of each of these files into its index / staging-area. I put the word "copy" in quotes here because Git's internal file copies are all de-duplicated. This is why a Git repository doesn't become enormously fat even though every commit stores every file: most commits re-use most files, and in this case, the re-used file takes no space at all, because it's been de-duplicated away.

The same goes for these index "copies": they're duplicates, because the file in question is in the commit. So the index "copies" take no space.² But the key for making a new commit is this: the index copies are exactly what is going to go into the next commit.

In other words, the index holds your proposed next commit. Right now, having done a "clean" checkout of some existing commit, the index matches the commit. But now you can modify some file(s) in the working tree, if you like. Once you have modified a working tree file, you're required to copy it back into Git's index. You do this with git add, which:

reads the working tree copy;
compresses it and otherwise Git-ifies it;
checks to see if the result is a duplicate; and
if it is a duplicate, uses the original (throwing away the temporary Git-ified copy), otherwise uses the new Git-ified file, and uses this to update the index.

The result is that the index now contains your proposed next commit—just as it did before you ran git add. It's just that now, your proposed next commit has been updated.

You repeat this for all files you intend to update: update them in the working tree, then, sooner or later, but always before running git commit, run git add as needed. The add step updates your proposed next commit from whatever you are adding. (Note that a totally-new file goes into the index too, in this same way, it's just that it does not have to kick out some existing de-duplicated copy.)

Hence we now know two things:

The working tree holds the useful copies of your files.
The staging area—or index—holds the proposed next commit, which you update after you update the working tree.

When you do run git commit, Git simply packages up whatever is in the index at that time and puts that into the new commit as the set of Git-ified, read-only, stored-forever, compressed and de-duplicated files.³

¹What we can do at this point is currently rather limited. The most common approach to handling corruption is to throw away the database entirely and clone a new one from a good copy, which works fine since Git is distributed and every repository has thousands of copies "out there". Of course, it stops working if there's no other copy.

²They take a bit of space to hold the file's name, an internal blob hash ID, and a bunch of cache data—that's where the name cache comes in again—which typically amounts to a bit under 100 bytes per file: hardly anything these days.

³If you use git commit -a, note that this is roughly equivalent to running:

git add -u
git commit

That is, all the -a option really does is insert an "update" style git add before committing. Git still builds the new commit out of the (updated-by-add) index. There are several technical complexities here though. These have to do with atomicity and the operation of Git hooks. Putting them all together means that if you do use pre-commit hooks, you must be very clever at writing these pre-commit hooks, and/or avoid using git commit -a. This is not the place for the details, though.

Submodules lead to an explosion of Git repositories

Now that you know:

what a repository is; and
how the index and working tree work

we're just about ready to move on to Git's submodules.

The very shortest definition of a Git submodule is that it is another Git repository. This definition is perhaps a little too short, though. It leaves out a key item, so let's try again: A submodule is:

a Git repository, where
some other Git repository refers to this Git repository; and
some other Git repository exercises some control over this Git repository.

We now know that there must be at least two Git repositories involved, and one repository is put into some sort of supervisory position over the other.

This is how we define the term superproject: a superproject is a Git repository that has a submodule. The superproject is the overseer / supervisor.

One superproject can be the superproject of multiple submodules. (This is the case for you: you have at least three submodules. So you have at least four Git repositories involved.)

A Git repository that is acting as a supervisor—playing the superproject role—can itself be a submodule for another Git repository. In this case, the "middle" repository is both submodule and superproject. I don't know if you have any of these: there's no evidence one way or another in your question.

Now, one thing about most Git repositories is this: they're clones of some other Git repository. We mostly work with a clone. So let's suppose that you have, as your superproject, your clone R1 of some repository R0. If your clone R1 is the superproject for three submodules, those three Git repositories are themselves probably clones of three more repositories. So we're suddenly talking about at least eight Git repositories here, in your basic question!

With eight or more repositories, things can rapidly become quite confusing. There's no longer the repository, the working tree, the index, and so on. Instead, there are eight repositories, four clones on your computer, four working trees, four Git index things, and so on.

We need to be able to talk about each repository, index, and working tree independently, even though they may be somewhat interdependent. This means we need names for each one. To simplify things somewhat, I'm going to use the name R for your superproject git clone, S0 for one of the repositories representing app/Services/Payment, and S1 for another of these.

How this all works

You cloned your superproject repository R from somewhere (from some repository R0), but after that, we can stop thinking about it for a while, so we'll just think about R itself. Your repository R has commits, and these commits contain files and so on.

You selected some commit in R to check out:

git checkout somebranch

The name somebranch resolves to a raw commit hash ID H, and this is the commit your Git fishes out of R to populate the index and working tree so that you can use R.

There are, as yet, no additional repositories. There is, however, a file named .gitmodules that came out of commit H in R. Moreover, commit H lists some gitlinks. A gitlink is a special entry that will go into a commit, and it contains two things:

a path name, in this case app/Services/Payment, and
some commit hash ID S (in this case 72602bc5d9e7cef136043791242dfdcfd979370c).

These gitlinks go into the index in R. We'll just talk about this one particular gitlink.

If you now run git submodule update --init (note the lack of --remote here), your Git commands, operating on repository R, will notice this gitlink in the index. (There's no corresponding files, just the gitlink.)

Your superproject Git commands, executing this git submodule update, will now notice that you haven't yet cloned any submodules, and—because of the --init option—will run a git clone command for you. This git clone command needs a URL. The URL comes out of the .gitmodules file.

The repository that Git clones at this point is repository S0 (perhaps over on GitHub: on some server anyway). The clone gets hidden away,⁴ creating a new repository S1. Your Git software now runs a git checkout operation within S1 so as to copy a commit into a working tree and index.

The index for S1 is hidden away in the repository for S1, but the working tree for S1 is placed into app/Services/Payment: the place you want the files you'll see and work with, from the submodule. So now the ordinary directory (or folder, if you prefer that term) app/Services/Payment is full of ordinary files. These comprise the working tree for S1.

Your submodule S1 is now ready to use. We have three repositories we need to think about: R, S0, and S1. We have two staging areas / index-es: one that goes with R and one that goes with S1. We have two working trees to use, one that goes with R and one that goes with S1. The working tree for S1 is inside the working tree for R, but the R repository won't use it. Only the S1 repository will use it.

⁴In modern Git, the clone's .git directory is stuffed into R in .git/modules/. In ancient versions of Git, submodule clones go into a .git right in the submodule path—in this case app/Services/Payment/.git.

`git submodule update --remote`

The --remote flag to git submodule update tells it that instead of obeying the superproject gitlink—remember, this is an entry in the R index, under the name app/Services/Payment, that currently holds hash ID 72602bc5d9e7cef136043791242dfdcfd979370c—your Git software should enter submodule S1 and run:

git fetch origin

This reaches out to repository S0. Repository S0 has its own branch and tag names, and its own commits. Repository S1 was cloned from S0 earlier, but S0 might be updated any time. So the git fetch step reaches out to the Git software that handles S0 and gets, from that Git, any new commits for S0 and puts them in your clone S1. Then, as the final step, git fetch origin within S1 creates or updates all of the remote-tracking names in S1 that go with the branch names from S0.

This updates your (local) origin/master, origin/develop, origin/feature/tall, and so on in your S1 based on the branch names as seen in S0. You now have, in S1, all the commits* from S0, and you know which commit they (S0) call the "latest" commit on their master for instance.

What your git submodule update --remote does now is turn your name origin/master into a hash ID. The hash ID your S1 Git gets from this operation is not 72602bc5d9e7cef136043791242dfdcfd979370c. It's actually a7263787e5515abe18e7cfe76af0f26d9f62ceb4.

Your superproject Git now directs your S1 Git to run:

git checkout --detach a7263787e5515abe18e7cfe76af0f26d9f62ceb4

(or the same with git switch; in any case it's all being done internally in the latest versions of Git, though older ones literally run git checkout here).

This populates your S1 index and working tree from commit a7263787e5515abe18e7cfe76af0f26d9f62ceb4. So that's now the current commit in your S1.

Meanwhile, your superproject repository R still calls for commit 72602bc5d9e7cef136043791242dfdcfd979370c. That's what is in the index / staging-area for new commits you will make in R.

What to do about all this

If you want R to start calling for a7263787e5515abe18e7cfe76af0f26d9f62ceb4, you will simply need to run:

git add app/Services/Payment

while working in R. This directs the R Git to run git rev-parse HEAD inside the S1 Git, which finds the current checked-out commit's hash ID. This hash ID then goes into the R index / staging-area, so that the next commit you make in R will call for that commit by that hash ID.

If you want S to have commit 72602bc5d9e7cef136043791242dfdcfd979370c checked out instead, you have a number of options:

(cd app/Services/Payment && git checkout --detach 72602bc5d9e7cef136043791242dfdcfd979370c)

will do it, for instance. Or you can run git submodule update. This command, run in R, tells the R Git to read the commit hash IDs from the R index and run git checkout commands within each submodule, to force the submodule checkout back to the desired commit.

When you run git submodule update --init, if you add --remote, you're directing your R Git to fetch in each submodule and find the latest commit from some branch in the source repository (S0 in our examples here). The chosen branch is defined in various places in R, although it tends to be master or main these days. The same goes for git submodule update without --init. The --init merely means do the initial clone if needed. The --remote part means do the fetch and get the hash ID from a remote-tracking name. The crucial part is always the hash ID. That comes from:

your index, or
some remote-tracking name

and that controls which commit your Git instructs the submodule Git to check out.

The git status and git diff commands, run in R, merely report whether the index (R's index) and working tree (S1's working tree checkout in this case) match. If not, git diff tells you what the difference is, and git status just says "they are different".

uj5u.com熱心網友回復：

git submodule update除了子模塊檔案夾內容之外，A不應“生成”任何檔案。

A git diffi，父存盤庫可能會顯示您提到的內容，如“從子模塊開始”中所見

如果你運行git diff它，你會看到一些有趣的東西：
$ git diff --cached DbConnector
diff --git a/DbConnector b/DbConnector
new file mode 160000
index 0000000..c3f01dc
--- /dev/null
    b/DbConnector
@@ -0,0  1 @@
 Subproject commit c3f01dc8862123d317dd46284b05b6892c7b29bc
盡管DbConnector是您作業目錄中的子目錄，但 Git 將其視為子模塊，并且當您不在該目錄中時不會跟蹤其內容。
相反，Git 將其視為來自該存盤庫的特定提交。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/343924.html

標籤：混帐 github GitLab

上一篇：如何通過http(s)創建遠程git存盤庫

下一篇：如何從遠程倉庫克隆特定檔案夾而不浪費.svn檔案夾上的資料？

git子模塊更新--init--force--remote