Git の最初のコミットを読む

Git の歴史的な最初のコミット e83c5163316f89bfbde7d9ab23ca2e25604af290 を読んで、最初にどのような実装になっていたかを調べてメモしておく。

最初のコミットを取得する方法

git clone https://github.com/git/git.git
cd git
git rev-list --max-parents=0 HEAD

名前をここに書く

README

 git という名前の由来一般的な UNIX コマンドに存在しない適当な3文字の組み合わせ
ばかげた, 単純な等のスラング

global information tracker の略

goddamn idiotic truckload of sh*t の略

 git が行う2つの抽象化Git が directory content manager として機能するために、2つの抽象的な概念を扱う。
object database
current directory cache

 The Object Databasecontent-addressable な object の集合。content-addressable とは、object がその内容（内容の SHA-1 ハッシュ）によって名付けられることを指す。全ての object は zlib で圧縮され、圧縮後の SHA-1 ハッシュで識別される。

blob: バイナリデータの塊で、ファイルの内容を指す object

tree: パーミッション/ファイル・ディレクトリ名/blob または tree ハッシュ のリストで、ディレクトリ構造を指す object。これを再帰的に積み重ねることでリポジトリ全体の構造を示すことができる

changeset: 変更内容、親となる changest、変更内容に対するコメントを示す（現在でいう commit）object

trust: 全ての object は SHA-1 でハッシュされているので改竄されていないことを保証できる
ここでいう trust は内容の完全性だけを示し、本当にその内容が信頼できるかは外部に依存している。つまり、Git のコミットが正しいかはファイルを圧縮して SHA-1 ハッシュし changeset のハッシュと突合することで検証可能だが、そのコミットが信頼できる人物によって作られたものかどうかは保証しない。

 Current Directory Cacheある時点の仮想ディレクトリ状態を表す単純なバイナリ。名前・日付・パーミッション・内容（Blob）を関連付けた配列を保持し、常に名前順に並び、名前は一意。
これにより、以下を効率的に行える。

再構成: キャッシュ状態（blob や tree）を完全に再生成できる

差分検出: 未コミットの tree と実際のディレクトリとの不整合を効率的に検出できる
cache は対応する tree の SHA-1 が分かれば復元でき、編集したファイルだけを部分的に更新できる。

名前をここに書く

ファイルは全部で11ある。ざっと見たところ cache.h は directory cache のデータ構造を表しており、他は git コマンドを表していそう。コマンドといっても branch や merge など現代で日常的に使われるコマンドはまだ無く、各ファイルをビルドしてバイナリを直接叩くことで .dircache/ というディレクトリに git object を格納する形式っぽい。

$ tree
.
├── cache.h
├── cat-file.c
├── commit-tree.c
├── init-db.c
├── Makefile
├── read-cache.c
├── read-tree.c
├── README
├── show-diff.c
├── update-cache.c
└── write-tree.c

1 directory, 11 files

名前をここに書く

GitLab が Git 20周年を記念して Git の歴史を語るブログを書いていた。

 Git 20年の歴史https://about.gitlab.com/blog/2025/04/14/journey-through-gits-20-year-history/
初期バージョンは7つの top-level コマンドを提供している

init-db: Git repository の初期化

update-cache: キャッシュにファイルを追加

write-tree: キャッシュを見て新たなツリーを作る

read-tree: ツリーを読む

commit-tree: ツリーからコミットを作る

cat-file: 特定の git object を一時ファイルに読み込む
最初のコミットを作るには
追加したいファイルごとに update-cache を呼び出す

write-tree で新しいツリーを作る

COMMITTER_NAME や COMMITTER_EMAIL, COMMITTER_DATE などの環境変数を設定する

commit-tree で git object をツリーに commit する
作ったコミットを見るには cat-file で一時ファイルに書き出す。
変更を加えるには show-diff で変更したファイルを確認する。

 最初のバージョンでできなかったことコミットを切り替える簡単な方法はまだなかった
ログを表示する方法がなかった
ブランチ、タグ、ref はなく、ユーザーは git object id を使って手動で追跡する必要があった。
マージがなかった
2つのリポジトリを互いに同期させる方法はなかった。代わりに、ユーザは rsync(1) を使って .dircache ディレクトリを同期させることが期待されていた

 開発者 Linus へのインタビューhttps://about.gitlab.com/blog/2025/04/07/celebrating-gits-20th-anniversary-with-creator-linus-torvalds/
Linux カーネルを開発する上で、Git を作るまでの数年にわたりバージョン管理システムを使っていた
当時使っていた幾つかのバージョン管理システムが使いづらかったり、ライセンス上の問題で揉めていた
OSS コミュニティから良いものが出てこなかったので自分で作った
最初のバージョンは数日でできたが、コードを書くのは簡単だったし時間はさほど重要じゃない。重要なのは、それまでに問題について考えていたから核となる設計ができたということ
主な目標は、分散型で高性能であること、そしてどんな破損もキャッチできる絶対的に信頼できるものであること
バージョン管理システムに興味はなく、カーネル開発のために必要だからやっていた
Git が（12日後にリリースされた）Mercurial より人気が出たのは、カーネルや Ruby on Rails 等が使っていたことによるネットワーク効果
（今フルタイムで Git に取り組めるとしたら実装したいことはあるか？という質問に対して）何もない。自分にとって Git の使い方は限定的でかなり早い段階で自分に必要なものを実現してくれたし、本当に大切にしているのはカーネルだけ
（Git の設計上の決定で後悔していることは？という質問に対して）細部については別のやり方をしていたかもしれないが、大まかなデザインには満足している。設計の原動力となるコンセプト（Git でいえば object database、Unix でいえば「全てはファイルである」）を導くことが重要。コードの 99% はその概念を現実世界で使えるようにするために構築する醜い細部
（カーネルのように Rust を Git の一部で使い始めることに意味があるか？という質問に対して）Git のコアのアイデアは単純なので一から実装するのが合理的だと思う。そして、Rust の利点が明らかだとは思わない。ネットワーク効果があるから、 Git を置き換えるには少し優れているのではなく非常に優れていなければならない

名前をここに書く

init-db

Git repository を初期化する。

仕様

.dircache/ というディレクトリを作る
- 今でいう .git/
環境変数 DB_ENVIRONMENT で共有ディレクトリを指定することで git objects を共有できる
- ディレクトリは作らない
DB_ENVIRONMENT を指定しない場合、DEFAULT_DB_ENVIRONMENT が示すパス（デフォルトは .dircache/objects）以下に git objects を格納するためのディレクトリを 256 個（00 ~ ff）作る
- git objects の SHA-1 ハッシュの先頭2桁によって格納先が 00 ~ ff に分かれ、一つのディレクトリに集中するのを防ぐ

コード

#include "cache.h"

int main(int argc, char **argv)
{
	char *sha1_dir = getenv(DB_ENVIRONMENT), *path;
	int len, i, fd;

	if (mkdir(".dircache", 0700) < 0) {
		perror("unable to create .dircache");
		exit(1);
	}

	/*
	 * If you want to, you can share the DB area with any number of branches.
	 * That has advantages: you can save space by sharing all the SHA1 objects.
	 * On the other hand, it might just make lookup slower and messier. You
	 * be the judge.
	 */
	sha1_dir = getenv(DB_ENVIRONMENT);
	if (sha1_dir) {
		struct stat st;
		if (!stat(sha1_dir, &st) < 0 && S_ISDIR(st.st_mode))
			return;
		fprintf(stderr, "DB_ENVIRONMENT set to bad directory %s: ", sha1_dir);
	}

	/*
	 * The default case is to have a DB per managed directory. 
	 */
	sha1_dir = DEFAULT_DB_ENVIRONMENT;
	fprintf(stderr, "defaulting to private storage area\n");
	len = strlen(sha1_dir);
	if (mkdir(sha1_dir, 0700) < 0) {
		if (errno != EEXIST) {
			perror(sha1_dir);
			exit(1);
		}
	}
	path = malloc(len + 40);
	memcpy(path, sha1_dir, len);
	for (i = 0; i < 256; i++) {
		sprintf(path+len, "/%02x", i);
		if (mkdir(path, 0700) < 0) {
			if (errno != EEXIST) {
				perror(path);
				exit(1);
			}
		}
	}
	return 0;
}

環境変数は cache.h でマクロとして定義されている。

#define DB_ENVIRONMENT "SHA1_FILE_DIRECTORY"
#define DEFAULT_DB_ENVIRONMENT ".dircache/objects"

メモ

mkdir や stat 等で OS の機能を呼び出しているが、これは Windows では動かなさそう
if (!stat(sha1_dir, &st) < 0 && の箇所は if (stat(sha1_dir, &st) < 0 && が正しそう

名前をここに書く

 update-cacheワークツリー内のファイルをハッシュしオブジェクトデータベースに保存する
ファイルの情報をインデックスに登録・更新・削除する

 仕様まず cache（現在の git でいう staging）が存在していることを確かめる
グローバル変数 active_nr を操作する

ロック用の一時ファイルを作成する
既に存在する場合はエラーになるので、複数のプロセスから一時ファイルが作られることはない

コマンドライン引数で与えられたパスを検証する
ドットファイルは ignore される

ファイルをキャッシュに追加する
ファイルの内容を圧縮して SHA1 を計算する
git object として保存する
キャッシュを作成して active_cache に追加する

キャッシュに書き込んでロック用の一時ファイルを .dircache/index に rename する

 コード#include "cache.h"

static int cache_name_compare(const char *name1, int len1, const char *name2, int len2)
{
	int len = len1 < len2 ? len1 : len2;
	int cmp;

	cmp = memcmp(name1, name2, len);
	if (cmp)
		return cmp;
	if (len1 < len2)
		return -1;
	if (len1 > len2)
		return 1;
	return 0;
}

static int cache_name_pos(const char *name, int namelen)
{
	int first, last;

	first = 0;
	last = active_nr;
	while (last > first) {
		int next = (last + first) >> 1;
		struct cache_entry *ce = active_cache[next];
		int cmp = cache_name_compare(name, namelen, ce->name, ce->namelen);
		if (!cmp)
			return -next-1;
		if (cmp < 0) {
			last = next;
			continue;
		}
		first = next+1;
	}
	return first;
}

static int remove_file_from_cache(char *path)
{
	int pos = cache_name_pos(path, strlen(path));
	if (pos < 0) {
		pos = -pos-1;
		active_nr--;
		if (pos < active_nr)
			memmove(active_cache + pos, active_cache + pos + 1, (active_nr - pos - 1) * sizeof(struct cache_entry *));
	}
}

static int add_cache_entry(struct cache_entry *ce)
{
	int pos;

	pos = cache_name_pos(ce->name, ce->namelen);

	/* existing match? Just replace it */
	if (pos < 0) {
		active_cache[-pos-1] = ce;
		return 0;
	}

	/* Make sure the array is big enough .. */
	if (active_nr == active_alloc) {
		active_alloc = alloc_nr(active_alloc);
		active_cache = realloc(active_cache, active_alloc * sizeof(struct cache_entry *));
	}

	/* Add it in.. */
	active_nr++;
	if (active_nr > pos)
		memmove(active_cache + pos + 1, active_cache + pos, (active_nr - pos - 1) * sizeof(ce));
	active_cache[pos] = ce;
	return 0;
}

static int index_fd(const char *path, int namelen, struct cache_entry *ce, int fd, struct stat *st)
{
	z_stream stream;
	int max_out_bytes = namelen + st->st_size + 200;
	void *out = malloc(max_out_bytes);
	void *metadata = malloc(namelen + 200);
	void *in = mmap(NULL, st->st_size, PROT_READ, MAP_PRIVATE, fd, 0);
	SHA_CTX c;

	close(fd);
	if (!out || (int)(long)in == -1)
		return -1;

	memset(&stream, 0, sizeof(stream));
	deflateInit(&stream, Z_BEST_COMPRESSION);

	/*
	 * ASCII size + nul byte
	 */	
	stream.next_in = metadata;
	stream.avail_in = 1+sprintf(metadata, "blob %lu", (unsigned long) st->st_size);
	stream.next_out = out;
	stream.avail_out = max_out_bytes;
	while (deflate(&stream, 0) == Z_OK)
		/* nothing */;

	/*
	 * File content
	 */
	stream.next_in = in;
	stream.avail_in = st->st_size;
	while (deflate(&stream, Z_FINISH) == Z_OK)
		/*nothing */;

	deflateEnd(&stream);
	
	SHA1_Init(&c);
	SHA1_Update(&c, out, stream.total_out);
	SHA1_Final(ce->sha1, &c);

	return write_sha1_buffer(ce->sha1, out, stream.total_out);
}

static int add_file_to_cache(char *path)
{
	int size, namelen;
	struct cache_entry *ce;
	struct stat st;
	int fd;

	fd = open(path, O_RDONLY);
	if (fd < 0) {
		if (errno == ENOENT)
			return remove_file_from_cache(path);
		return -1;
	}
	if (fstat(fd, &st) < 0) {
		close(fd);
		return -1;
	}
	namelen = strlen(path);
	size = cache_entry_size(namelen);
	ce = malloc(size);
	memset(ce, 0, size);
	memcpy(ce->name, path, namelen);
	ce->ctime.sec = st.st_ctime;
	ce->ctime.nsec = st.st_ctim.tv_nsec;
	ce->mtime.sec = st.st_mtime;
	ce->mtime.nsec = st.st_mtim.tv_nsec;
	ce->st_dev = st.st_dev;
	ce->st_ino = st.st_ino;
	ce->st_mode = st.st_mode;
	ce->st_uid = st.st_uid;
	ce->st_gid = st.st_gid;
	ce->st_size = st.st_size;
	ce->namelen = namelen;

	if (index_fd(path, namelen, ce, fd, &st) < 0)
		return -1;

	return add_cache_entry(ce);
}

static int write_cache(int newfd, struct cache_entry **cache, int entries)
{
	SHA_CTX c;
	struct cache_header hdr;
	int i;

	hdr.signature = CACHE_SIGNATURE;
	hdr.version = 1;
	hdr.entries = entries;

	SHA1_Init(&c);
	SHA1_Update(&c, &hdr, offsetof(struct cache_header, sha1));
	for (i = 0; i < entries; i++) {
		struct cache_entry *ce = cache[i];
		int size = ce_size(ce);
		SHA1_Update(&c, ce, size);
	}
	SHA1_Final(hdr.sha1, &c);

	if (write(newfd, &hdr, sizeof(hdr)) != sizeof(hdr))
		return -1;

	for (i = 0; i < entries; i++) {
		struct cache_entry *ce = cache[i];
		int size = ce_size(ce);
		if (write(newfd, ce, size) != size)
			return -1;
	}
	return 0;
}		

/*
 * We fundamentally don't like some paths: we don't want
 * dot or dot-dot anywhere, and in fact, we don't even want
 * any other dot-files (.dircache or anything else). They
 * are hidden, for chist sake.
 *
 * Also, we don't want double slashes or slashes at the
 * end that can make pathnames ambiguous. 
 */
static int verify_path(char *path)
{
	char c;

	goto inside;
	for (;;) {
		if (!c)
			return 1;
		if (c == '/') {
inside:
			c = *path++;
			if (c != '/' && c != '.' && c != '\0')
				continue;
			return 0;
		}
		c = *path++;
	}
}

int main(int argc, char **argv)
{
	int i, newfd, entries;

	entries = read_cache();
	if (entries < 0) {
		perror("cache corrupted");
		return -1;
	}

	newfd = open(".dircache/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
	if (newfd < 0) {
		perror("unable to create new cachefile");
		return -1;
	}
	for (i = 1 ; i < argc; i++) {
		char *path = argv[i];
		if (!verify_path(path)) {
			fprintf(stderr, "Ignoring path %s\n", argv[i]);
			continue;
		}
		if (add_file_to_cache(path)) {
			fprintf(stderr, "Unable to add %s to database\n", path);
			goto out;
		}
	}
	if (!write_cache(newfd, active_cache, active_nr) && !rename(".dircache/index.lock", ".dircache/index"))
		return 0;
out:
	unlink(".dircache/index.lock");
}

 メモ

名前をここに書く

 read-cachegit object を読み書きするためのユーティリティ関数を集めたもの
キャッシュの読み込みや検証、エラー処理

 仕様read_cache

.dircache/index と git object database を開けるか確認する
index のファイルを mmap でメモリにマッピングする
index のファイル構造が壊れてないか確認する
グローバル変数 active_nr にファイル数、active_alloc に要素数を格納する
グローバル変数 active_cache に cache entry のアドレスを格納する

active_nr を返す

 コード#include "cache.h"

const char *sha1_file_directory = NULL;
struct cache_entry **active_cache = NULL;
unsigned int active_nr = 0, active_alloc = 0;

void usage(const char *err)
{
	fprintf(stderr, "read-tree: %s\n", err);
	exit(1);
}

static unsigned hexval(char c)
{
	if (c >= '0' && c <= '9')
		return c - '0';
	if (c >= 'a' && c <= 'f')
		return c - 'a' + 10;
	if (c >= 'A' && c <= 'F')
		return c - 'A' + 10;
	return ~0;
}

int get_sha1_hex(char *hex, unsigned char *sha1)
{
	int i;
	for (i = 0; i < 20; i++) {
		unsigned int val = (hexval(hex[0]) << 4) | hexval(hex[1]);
		if (val & ~0xff)
			return -1;
		*sha1++ = val;
		hex += 2;
	}
	return 0;
}

char * sha1_to_hex(unsigned char *sha1)
{
	static char buffer[50];
	static const char hex[] = "0123456789abcdef";
	char *buf = buffer;
	int i;

	for (i = 0; i < 20; i++) {
		unsigned int val = *sha1++;
		*buf++ = hex[val >> 4];
		*buf++ = hex[val & 0xf];
	}
	return buffer;
}

/*
 * NOTE! This returns a statically allocated buffer, so you have to be
 * careful about using it. Do a "strdup()" if you need to save the
 * filename.
 */
char *sha1_file_name(unsigned char *sha1)
{
	int i;
	static char *name, *base;

	if (!base) {
		char *sha1_file_directory = getenv(DB_ENVIRONMENT) ? : DEFAULT_DB_ENVIRONMENT;
		int len = strlen(sha1_file_directory);
		base = malloc(len + 60);
		memcpy(base, sha1_file_directory, len);
		memset(base+len, 0, 60);
		base[len] = '/';
		base[len+3] = '/';
		name = base + len + 1;
	}
	for (i = 0; i < 20; i++) {
		static char hex[] = "0123456789abcdef";
		unsigned int val = sha1[i];
		char *pos = name + i*2 + (i > 0);
		*pos++ = hex[val >> 4];
		*pos = hex[val & 0xf];
	}
	return base;
}

void * read_sha1_file(unsigned char *sha1, char *type, unsigned long *size)
{
	z_stream stream;
	char buffer[8192];
	struct stat st;
	int i, fd, ret, bytes;
	void *map, *buf;
	char *filename = sha1_file_name(sha1);

	fd = open(filename, O_RDONLY);
	if (fd < 0) {
		perror(filename);
		return NULL;
	}
	if (fstat(fd, &st) < 0) {
		close(fd);
		return NULL;
	}
	map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
	close(fd);
	if (-1 == (int)(long)map)
		return NULL;

	/* Get the data stream */
	memset(&stream, 0, sizeof(stream));
	stream.next_in = map;
	stream.avail_in = st.st_size;
	stream.next_out = buffer;
	stream.avail_out = sizeof(buffer);

	inflateInit(&stream);
	ret = inflate(&stream, 0);
	if (sscanf(buffer, "%10s %lu", type, size) != 2)
		return NULL;
	bytes = strlen(buffer) + 1;
	buf = malloc(*size);
	if (!buf)
		return NULL;

	memcpy(buf, buffer + bytes, stream.total_out - bytes);
	bytes = stream.total_out - bytes;
	if (bytes < *size && ret == Z_OK) {
		stream.next_out = buf + bytes;
		stream.avail_out = *size - bytes;
		while (inflate(&stream, Z_FINISH) == Z_OK)
			/* nothing */;
	}
	inflateEnd(&stream);
	return buf;
}

int write_sha1_file(char *buf, unsigned len)
{
	int size;
	char *compressed;
	z_stream stream;
	unsigned char sha1[20];
	SHA_CTX c;

	/* Set it up */
	memset(&stream, 0, sizeof(stream));
	deflateInit(&stream, Z_BEST_COMPRESSION);
	size = deflateBound(&stream, len);
	compressed = malloc(size);

	/* Compress it */
	stream.next_in = buf;
	stream.avail_in = len;
	stream.next_out = compressed;
	stream.avail_out = size;
	while (deflate(&stream, Z_FINISH) == Z_OK)
		/* nothing */;
	deflateEnd(&stream);
	size = stream.total_out;

	/* Sha1.. */
	SHA1_Init(&c);
	SHA1_Update(&c, compressed, size);
	SHA1_Final(sha1, &c);

	if (write_sha1_buffer(sha1, compressed, size) < 0)
		return -1;
	printf("%s\n", sha1_to_hex(sha1));
	return 0;
}

int write_sha1_buffer(unsigned char *sha1, void *buf, unsigned int size)
{
	char *filename = sha1_file_name(sha1);
	int i, fd;

	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
	if (fd < 0)
		return (errno == EEXIST) ? 0 : -1;
	write(fd, buf, size);
	close(fd);
	return 0;
}

static int error(const char * string)
{
	fprintf(stderr, "error: %s\n", string);
	return -1;
}

static int verify_hdr(struct cache_header *hdr, unsigned long size)
{
	SHA_CTX c;
	unsigned char sha1[20];

	if (hdr->signature != CACHE_SIGNATURE)
		return error("bad signature");
	if (hdr->version != 1)
		return error("bad version");
	SHA1_Init(&c);
	SHA1_Update(&c, hdr, offsetof(struct cache_header, sha1));
	SHA1_Update(&c, hdr+1, size - sizeof(*hdr));
	SHA1_Final(sha1, &c);
	if (memcmp(sha1, hdr->sha1, 20))
		return error("bad header sha1");
	return 0;
}

int read_cache(void)
{
	int fd, i;
	struct stat st;
	unsigned long size, offset;
	void *map;
	struct cache_header *hdr;

	errno = EBUSY;
	if (active_cache)
		return error("more than one cachefile");
	errno = ENOENT;
	sha1_file_directory = getenv(DB_ENVIRONMENT);
	if (!sha1_file_directory)
		sha1_file_directory = DEFAULT_DB_ENVIRONMENT;
	if (access(sha1_file_directory, X_OK) < 0)
		return error("no access to SHA1 file directory");
	fd = open(".dircache/index", O_RDONLY);
	if (fd < 0)
		return (errno == ENOENT) ? 0 : error("open failed");

	map = (void *)-1;
	if (!fstat(fd, &st)) {
		map = NULL;
		size = st.st_size;
		errno = EINVAL;
		if (size > sizeof(struct cache_header))
			map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
	}
	close(fd);
	if (-1 == (int)(long)map)
		return error("mmap failed");

	hdr = map;
	if (verify_hdr(hdr, size) < 0)
		goto unmap;

	active_nr = hdr->entries;
	active_alloc = alloc_nr(active_nr);
	active_cache = calloc(active_alloc, sizeof(struct cache_entry *));

	offset = sizeof(*hdr);
	for (i = 0; i < hdr->entries; i++) {
		struct cache_entry *ce = map + offset;
		offset = offset + ce_size(ce);
		active_cache[i] = ce;
	}
	return active_nr;

unmap:
	munmap(map, size);
	errno = EINVAL;
	return error("verify header failed");
}

 メモ

名前をここに書く

 read-treetree object を読むコマンド

 仕様コマンドライン引数として tree object の SHA1 を受け取る
tree object を読み取り、以下のデータを標準出力する
モード（Read/Write 権限）
ファイル名
SHA1


 コード#include "cache.h"

static int unpack(unsigned char *sha1)
{
	void *buffer;
	unsigned long size;
	char type[20];

	buffer = read_sha1_file(sha1, type, &size);
	if (!buffer)
		usage("unable to read sha1 file");
	if (strcmp(type, "tree"))
		usage("expected a 'tree' node");
	while (size) {
		int len = strlen(buffer)+1;
		unsigned char *sha1 = buffer + len;
		char *path = strchr(buffer, ' ')+1;
		unsigned int mode;
		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
			usage("corrupt 'tree' file");
		buffer = sha1 + 20;
		size -= len + 20;
		printf("%o %s (%s)\n", mode, path, sha1_to_hex(sha1));
	}
	return 0;
}

int main(int argc, char **argv)
{
	int fd;
	unsigned char sha1[20];

	if (argc != 2)
		usage("read-tree <key>");
	if (get_sha1_hex(argv[1], sha1) < 0)
		usage("read-tree <key>");
	sha1_file_directory = getenv(DB_ENVIRONMENT);
	if (!sha1_file_directory)
		sha1_file_directory = DEFAULT_DB_ENVIRONMENT;
	if (unpack(sha1) < 0)
		usage("unpack failed");
	return 0;
}

 メモ

名前をここに書く

 write-treeインデックスから tree object を作成するコマンド

 仕様インデックスを読んで cache entry が指す blob が存在することを確認する
ヘッダ分のバッファを準備する
インデックスの cache entry を tree 形式で書き込む
バッファにヘッダを書き込む

 コード#include "cache.h"

static int check_valid_sha1(unsigned char *sha1)
{
	char *filename = sha1_file_name(sha1);
	int ret;

	/* If we were anal, we'd check that the sha1 of the contents actually matches */
	ret = access(filename, R_OK);
	if (ret)
		perror(filename);
	return ret;
}

static int prepend_integer(char *buffer, unsigned val, int i)
{
	buffer[--i] = '\0';
	do {
		buffer[--i] = '0' + (val % 10);
		val /= 10;
	} while (val);
	return i;
}

#define ORIG_OFFSET (40)	/* Enough space to add the header of "tree <size>\0" */

int main(int argc, char **argv)
{
	unsigned long size, offset, val;
	int i, entries = read_cache();
	char *buffer;

	if (entries <= 0) {
		fprintf(stderr, "No file-cache to create a tree of\n");
		exit(1);
	}

	/* Guess at an initial size */
	size = entries * 40 + 400;
	buffer = malloc(size);
	offset = ORIG_OFFSET;

	for (i = 0; i < entries; i++) {
		struct cache_entry *ce = active_cache[i];
		if (check_valid_sha1(ce->sha1) < 0)
			exit(1);
		if (offset + ce->namelen + 60 > size) {
			size = alloc_nr(offset + ce->namelen + 60);
			buffer = realloc(buffer, size);
		}
		offset += sprintf(buffer + offset, "%o %s", ce->st_mode, ce->name);
		buffer[offset++] = 0;
		memcpy(buffer + offset, ce->sha1, 20);
		offset += 20;
	}

	i = prepend_integer(buffer, offset - ORIG_OFFSET, ORIG_OFFSET);
	i -= 5;
	memcpy(buffer+i, "tree ", 5);

	buffer += i;
	offset -= i;

	write_sha1_file(buffer, offset);
	return 0;
}

 メモ

名前をここに書く

commit-tree

tree object から commit object を生成するコマンド

仕様

対象の tree object の SHA1 （-p オプションで親コミットの SHA1）を受け取り、commit object を作成する
commit のメタデータ（時刻やユーザー名など）はシステムのユーザー情報や環境変数から取得する

コミットメッセージは以下のように標準入力から受け取る必要がある。

$ echo "test commit message" | ./commit-tree <tree-SHA1> -p <parent-commit-SHA1> ...

コード

#include "cache.h"

#include <pwd.h>
#include <time.h>

#define BLOCKING (1ul << 14)
#define ORIG_OFFSET (40)

/*
 * Leave space at the beginning to insert the tag
 * once we know how big things are.
 *
 * FIXME! Share the code with "write-tree.c"
 */
static void init_buffer(char **bufp, unsigned int *sizep)
{
	char *buf = malloc(BLOCKING);
	memset(buf, 0, ORIG_OFFSET);
	*sizep = ORIG_OFFSET;
	*bufp = buf;
}

static void add_buffer(char **bufp, unsigned int *sizep, const char *fmt, ...)
{
	char one_line[2048];
	va_list args;
	int len;
	unsigned long alloc, size, newsize;
	char *buf;

	va_start(args, fmt);
	len = vsnprintf(one_line, sizeof(one_line), fmt, args);
	va_end(args);
	size = *sizep;
	newsize = size + len;
	alloc = (size + 32767) & ~32767;
	buf = *bufp;
	if (newsize > alloc) {
		alloc = (newsize + 32767) & ~32767;   
		buf = realloc(buf, alloc);
		*bufp = buf;
	}
	*sizep = newsize;
	memcpy(buf + size, one_line, len);
}

static int prepend_integer(char *buffer, unsigned val, int i)
{
	buffer[--i] = '\0';
	do {
		buffer[--i] = '0' + (val % 10);
		val /= 10;
	} while (val);
	return i;
}

static void finish_buffer(char *tag, char **bufp, unsigned int *sizep)
{
	int taglen;
	int offset;
	char *buf = *bufp;
	unsigned int size = *sizep;

	offset = prepend_integer(buf, size - ORIG_OFFSET, ORIG_OFFSET);
	taglen = strlen(tag);
	offset -= taglen;
	buf += offset;
	size -= offset;
	memcpy(buf, tag, taglen);

	*bufp = buf;
	*sizep = size;
}

static void remove_special(char *p)
{
	char c;
	char *dst = p;

	for (;;) {
		c = *p;
		p++;
		switch(c) {
		case '\n': case '<': case '>':
			continue;
		}
		*dst++ = c;
		if (!c)
			break;
	}
}

/*
 * Having more than two parents may be strange, but hey, there's
 * no conceptual reason why the file format couldn't accept multi-way
 * merges. It might be the "union" of several packages, for example.
 *
 * I don't really expect that to happen, but this is here to make
 * it clear that _conceptually_ it's ok..
 */
#define MAXPARENT (16)

int main(int argc, char **argv)
{
	int i, len;
	int parents = 0;
	unsigned char tree_sha1[20];
	unsigned char parent_sha1[MAXPARENT][20];
	char *gecos, *realgecos;
	char *email, realemail[1000];
	char *date, *realdate;
	char comment[1000];
	struct passwd *pw;
	time_t now;
	char *buffer;
	unsigned int size;

	if (argc < 2 || get_sha1_hex(argv[1], tree_sha1) < 0)
		usage("commit-tree <sha1> [-p <sha1>]* < changelog");

	for (i = 2; i < argc; i += 2) {
		char *a, *b;
		a = argv[i]; b = argv[i+1];
		if (!b || strcmp(a, "-p") || get_sha1_hex(b, parent_sha1[parents]))
			usage("commit-tree <sha1> [-p <sha1>]* < changelog");
		parents++;
	}
	if (!parents)
		fprintf(stderr, "Committing initial tree %s\n", argv[1]);
	pw = getpwuid(getuid());
	if (!pw)
		usage("You don't exist. Go away!");
	realgecos = pw->pw_gecos;
	len = strlen(pw->pw_name);
	memcpy(realemail, pw->pw_name, len);
	realemail[len] = '@';
	gethostname(realemail+len+1, sizeof(realemail)-len-1);
	time(&now);
	realdate = ctime(&now);

	gecos = getenv("COMMITTER_NAME") ? : realgecos;
	email = getenv("COMMITTER_EMAIL") ? : realemail;
	date = getenv("COMMITTER_DATE") ? : realdate;

	remove_special(gecos); remove_special(realgecos);
	remove_special(email); remove_special(realemail);
	remove_special(date); remove_special(realdate);

	init_buffer(&buffer, &size);
	add_buffer(&buffer, &size, "tree %s\n", sha1_to_hex(tree_sha1));

	/*
	 * NOTE! This ordering means that the same exact tree merged with a
	 * different order of parents will be a _different_ changeset even
	 * if everything else stays the same.
	 */
	for (i = 0; i < parents; i++)
		add_buffer(&buffer, &size, "parent %s\n", sha1_to_hex(parent_sha1[i]));

	/* Person/date information */
	add_buffer(&buffer, &size, "author %s <%s> %s\n", gecos, email, date);
	add_buffer(&buffer, &size, "committer %s <%s> %s\n\n", realgecos, realemail, realdate);

	/* And add the comment */
	while (fgets(comment, sizeof(comment), stdin) != NULL)
		add_buffer(&buffer, &size, "%s", comment);

	finish_buffer("commit ", &buffer, &size);

	write_sha1_file(buffer, size);
	return 0;
}

メモ

最初のバージョンの git では、commit object を作る方法は commit-tree のみ
write-tree で index から tree object を作り、commit-tree でコミットする
index はファイルの追加、更新などを行った後に update-cache で行う
update-cache で index を追加するには、対象のディレクトリで最初に一度だけ init-db を実行して object database を初期化する必要がある

名前をここに書く

 show-diffindex に登録されているファイルと現在のディレクトリのファイルとの差分を表示するコマンド

 仕様
diff -u で差分を表示する
index と現在のディレクトリを比較して差分やメタデータの変更を検出する

 コード#include "cache.h"

#define MTIME_CHANGED	0x0001
#define CTIME_CHANGED	0x0002
#define OWNER_CHANGED	0x0004
#define MODE_CHANGED    0x0008
#define INODE_CHANGED   0x0010
#define DATA_CHANGED    0x0020

static int match_stat(struct cache_entry *ce, struct stat *st)
{
	unsigned int changed = 0;

	if (ce->mtime.sec  != (unsigned int)st->st_mtim.tv_sec ||
	    ce->mtime.nsec != (unsigned int)st->st_mtim.tv_nsec)
		changed |= MTIME_CHANGED;
	if (ce->ctime.sec  != (unsigned int)st->st_ctim.tv_sec ||
	    ce->ctime.nsec != (unsigned int)st->st_ctim.tv_nsec)
		changed |= CTIME_CHANGED;
	if (ce->st_uid != (unsigned int)st->st_uid ||
	    ce->st_gid != (unsigned int)st->st_gid)
		changed |= OWNER_CHANGED;
	if (ce->st_mode != (unsigned int)st->st_mode)
		changed |= MODE_CHANGED;
	if (ce->st_dev != (unsigned int)st->st_dev ||
	    ce->st_ino != (unsigned int)st->st_ino)
		changed |= INODE_CHANGED;
	if (ce->st_size != (unsigned int)st->st_size)
		changed |= DATA_CHANGED;
	return changed;
}

static void show_differences(struct cache_entry *ce, struct stat *cur,
	void *old_contents, unsigned long long old_size)
{
	static char cmd[1000];
	FILE *f;

	snprintf(cmd, sizeof(cmd), "diff -u - %s", ce->name);
	f = popen(cmd, "w");
	fwrite(old_contents, old_size, 1, f);
	pclose(f);
}

int main(int argc, char **argv)
{
	int entries = read_cache();
	int i;

	if (entries < 0) {
		perror("read_cache");
		exit(1);
	}
	for (i = 0; i < entries; i++) {
		struct stat st;
		struct cache_entry *ce = active_cache[i];
		int n, changed;
		unsigned int mode;
		unsigned long size;
		char type[20];
		void *new;

		if (stat(ce->name, &st) < 0) {
			printf("%s: %s\n", ce->name, strerror(errno));
			continue;
		}
		changed = match_stat(ce, &st);
		if (!changed) {
			printf("%s: ok\n", ce->name);
			continue;
		}
		printf("%.*s:  ", ce->namelen, ce->name);
		for (n = 0; n < 20; n++)
			printf("%02x", ce->sha1[n]);
		printf("\n");
		new = read_sha1_file(ce->sha1, type, &size);
		show_differences(ce, &st, new, size);
		free(new);
	}
	return 0;
}

 メモ
diff コマンドがない環境では動かない
ファイル名に特殊文字があると失敗しそう

名前をここに書く

 cat-filegit object の SHA-1 を引数として受け取り内容を一時ファイルに書き込むコマンド

 仕様git object の SHA1 を受け取り、その内容 temp_git_file_XXXXX という名前のファイルに書き出す
一時ファイルの生成と書き込みに成功すると、ファイル名とオブジェクトタイプ（blob, tree, commit）を標準出力する

 コード#include "cache.h"

int main(int argc, char **argv)
{
	unsigned char sha1[20];
	char type[20];
	void *buf;
	unsigned long size;
	char template[] = "temp_git_file_XXXXXX";
	int fd;

	if (argc != 2 || get_sha1_hex(argv[1], sha1))
		usage("cat-file: cat-file <sha1>");
	buf = read_sha1_file(sha1, type, &size);
	if (!buf)
		exit(1);
	fd = mkstemp(template);
	if (fd < 0)
		usage("unable to create tempfile");
	if (write(fd, buf, size) != size)
		strcpy(type, "bad");
	printf("%s: %s\n", template, type);
}

 メモ現在の git cat-file はファイルではなく標準出力する

名前をここに書く

リリース時点の git の基本的なワークフローと用語を以下にまとめる。

 基本的なワークフローworking directory で init-db を実行し object database を初期化する
directory 配下にファイルを追加したり変更する

update-cache で working directory から blob object を生成し、インデックス .dircache/index に変更内容を登録する

write-tree でインデックスから tree object を生成する

commit-tree で tree object から commit object を生成する

 差分や git object の内容確認
cat-file は git object を一時ファイルに出力し、ファイル名と object type を標準出力する。

show-diff は index と current working directory を比較して差分を diff コマンドで標準出力する

read-tree は tree object （モードやファイル名, blob object の SHA-1 など）を標準出力する

 上記のコマンドが利用するユーティリティ関数
read_cache() は index ファイルを読んでグローバル変数に書き込む関数で、index のデータをコマンド内で取り扱いたいときに呼び出す

write_sha1_file() はデータを圧縮し SHA-1 を計算して object として保存する

write_sha1_buffer() は圧縮済みデータを object として保存する

read_sha1_file() は SHA-1 と object type を受け取って buffer に書き込んで返す

 用語
git object: git が扱うデータで、blob, tree, commit の 3 種類がある

blob: git が管理するファイルの内容そのもの（ファイル名や権限は含まない）

tree: ディレクトリの構造やファイル名、モードなどで、blob/tree への参照を含む

commit: tree への参照に author, 時刻, メッセージを付加した履歴データ

このスクラップは3ヶ月前にクローズされました