iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
📼

Revisiting tar

に公開
3

tar(1) and tarball

In the Unix world, you often see files with extensions such as .tar or .tar.gz. These are files called file archives that bundle multiple files together, and they are often referred to as tarballs[1]. This is similar to ZIP files in MS-Windows.

To manipulate tarballs, you use tar(1). tar(1) appeared in 1979 in Version 7 UNIX. Its name originates from tape archive, and at that time, as the name suggests, it seems to have been used as a command to archive files to tape drives.

Also, while tar(1) feels ubiquitous, it is not a POSIX command[2]. Implementations vary, and the file format might differ depending on the system. If you want to maintain interchangeability between systems, it might be better to use pax(1), which is a POSIX command that can handle tarballs[3].

tar(1) Versions

The versions and default values of tar(1) that I confirmed are as follows:

$ uname -srm
Linux 5.11.0-37-generic x86_64
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
$ tar --version
tar (GNU tar) 1.30
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
$ tar --show-defaults 
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/usr/sbin/rmt --rsh-command=/usr/bin/rsh

Basic Usage

tar [ key ] [ name ... ]

The command format for tar(1) is as shown above. The behavior of tar(1) is determined by the argument key. The key is a string consisting of one of the characters c, x, t, r, u, which determine the basic operation, and characters that specify options. I will introduce each operation.

Creating an Archive (c)

tar cf archive file...

To create an archive, use the tar c command. Specify one or more files or directories to archive in file.... If a directory is specified in file..., the contents of that directory are archived recursively. The f option specifies that the archive should be output to the file archive[4].

Example of creating an archive
$ tar cf hoge.tar hoge/	 # Archive all files under directory hoge into hoge.tar
$ tar cf archive.tar file1.txt file2.txt  # Put file1.txt and file2.txt into archive.tar
$ tar cf - foo bar >foobar.tar  # Specifying - for output writes to stdout

Extracting an Archive (x)

tar xf archive [member...]

To extract an archive, use the tar x command. Specify zero or more files or directories to extract in member.... If a directory is specified in member..., the contents of that directory are extracted recursively. If member... is not specified, all files contained in the archive are extracted. The f option specifies that the archive should be read from the file archive.

Example of extracting an archive
$ tar xf hoge.tar  # Extract all contents of hoge.tar
$ tar xf hoge.tar  hoge/fuga/  # Extract everything under directory hoge/fuga/ from hoge.tar
$ tar xf archive.tar file1.txt  # Extract only file1.txt from archive.tar
$ cat foobar.tar | tar xf - foo  # Specifying - for input reads from stdin

Listing Archive Contents (t)

tar tf archive [member...]

To check the contents of an archive (display the filenames contained in the archive), use the tar t command. Specify zero or more files or directories to check in member.... If a directory is specified in member..., the contents of that directory are displayed recursively. If member... is not specified, all files contained in the archive are displayed. An archive may contain multiple versions of the same file. In that case, the filename may be displayed multiple times.

Example of listing archive contents
$ tar tf hoge.tar  # Display all contents of hoge.tar
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt
hoge/fuga/baz.txt
$ tar tf hoge.tar piyo/  # Display contents under directory piyo/ from hoge.tar
piyo/
piyo/qux.dat
piyo/quux.txt
$ tar tf hoge.tar hoge/fuga/baz.txt  # Display only hoge/fuga/baz.txt from hoge.tar
hoge/fuga/baz.txt
hoge/fuga/baz.txt

Adding to an Archive (r)

tar rf archive file...

To add files to an archive, use the tar r command. This command is similar to the tar c command, but it adds files to an existing archive. If the archive is compressed (such as .tar.gz), you cannot perform this operation.

Example of adding files to an archive
$ tar tf hoge.tar  # Contents of hoge.tar before addition
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt
hoge/fuga/baz.txt
$ tar rf hoge.tar piyo  # Add directory piyo to hoge.tar
$ tar tf hoge.tar  # Contents of hoge.tar after addition
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt

Updating an Archive (u)

To update an archive, use the tar u command. This command is similar to the tar r command, but it adds files to the archive that are not already present or files that are newer than those already in the archive. If the archive is compressed (such as .tar.gz), you cannot perform this operation.

Example of updating an archive
$ tar tf hoge.tar  # Contents of hoge.tar before update
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt
hoge/fuga/baz.txt
$ touch hoge/foo.txt  # Update a file
$ touch hoge/fuga/ababa.txt  # Add a file
$ tar uf hoge.tar hoge  # Update the archive
$ tar tf hoge.tar  # Contents of hoge.tar after update
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/qux.dat
piyo/quux.txt
hoge/fuga/baz.txt
hoge/foo.txt
hoge/fuga/ababa.txt

Useful Options

Confirming Execution (w)

For operations other than tar t, this option prompts for confirmation before performing the action. Entering a word starting with the letter y will execute the operation.

Example of option w
$ tar cfw hoge.tar hoge piyo
add `hoge'?y
add `hoge/foo.txt'?y
add `hoge/bar.dat'?y
add `hoge/fuga'?n
add `piyo'?y
add `piyo/qux.dat'?n
add `piyo/quux.txt'?y

Verbose Output (v)

Normally, tar(1) operates "quietly." Specifying the v option causes it to display the names of files being processed.

Example of option v
$ tar xvf hoge.tar
hoge/
hoge/bar.dat
hoge/foo.txt
hoge/fuga/
hoge/fuga/baz.txt
piyo/
piyo/quux.txt
piyo/qux.dat
hoge/fuga/baz.txt

When specified with the tar t command, the output looks similar to ls -l.

Example of option v with listing
$ tar tvf hoge.tar 
drwxrwxr-x mkn/mkn           0 2021-10-07 20:56 hoge/
-rw-rw-r-- mkn/mkn        2796 2021-10-07 20:57 hoge/bar.dat
-rw-rw-r-- mkn/mkn         446 2021-10-07 20:56 hoge/foo.txt
drwxrwxr-x mkn/mkn           0 2021-10-07 20:57 hoge/fuga/
-rw-rw-r-- mkn/mkn          16 2021-10-07 20:57 hoge/fuga/baz.txt
drwxrwxr-x mkn/mkn           0 2021-10-07 20:55 piyo/
-rw-rw-r-- mkn/mkn          56 2021-10-07 20:55 piyo/quux.txt
-rw-rw-r-- mkn/mkn        1234 2021-10-07 20:55 piyo/qux.dat
-rw-rw-r-- mkn/mkn          24 2021-10-07 20:58 hoge/fuga/baz.txt

Changing the Working Directory (C)

Using the C option, you can change the working directory during processing.

Example of option C
$ mkdir dest  # Create a directory
$ tar xfC hoge.tar dest/  # Move to directory dest and then extract hoge.tar
$ ls -l dest
total 8
drwxrwxr-x 3 mkn mkn 4096 10月  7 16:24 hoge
drwxrwxr-x 2 mkn mkn 4096 10月  7 11:44 piyo
$ tar cfC fuga.tar dest/ hoge/fuga/  # Move to directory dest and then archive hoge/fuga/ into fuga.tar
$ tar tf fuga.tar 
hoge/fuga/
hoge/fuga/baz.txt
hoge/fuga/ababa.txt

Compressing Archives (z, Z, j, J, a)

To create a .tar.gz file, one way is to run a command like tar cf - file | gzip, but you can also use tar(1)'s built-in compression features. Specifying the z option compresses the archive with gzip. Similarly, Z uses Compress, j uses bzip2, and J uses xz. The a option infers the compression format from the archive's extension. You cannot perform tar r or tar u operations on compressed archives.

Creating compressed archives
$ tar cfz hoge.tar.gz hoge/ piyo/  # gzip
$ tar cfj hoge.tar.bz2 hoge/ piyo/  # bzip2
$ tar cfJ hoge.tar.xz hoge/ piyo/  # xz
$ tar cfZ hoge.tar.Z hoge/ piyo/  # Compress (likely no longer used)
$ tar cfa hoge.tar.gz hoge/ piyo/  # Infer gzip from extension
$ tar cfa hoge.tar.xz hoge/ piyo/  # Infer xz from extension

When extracting or listing the contents of an archive, you don't need to specify these options (though some versions might require them).

Reading compressed archives
$ tar tf hoge.tar.gz  # Works even without the z option
hoge/
hoge/foo.txt
hoge/bar.dat
hoge/fuga/
hoge/fuga/baz.txt
hoge/fuga/ababa.txt
piyo/
piyo/qux.dat
piyo/quux.txt
$ tar xf hoge.tar.gz  # Extraction also works

Excluding Files (--exclude=)

Using the --exclude= option, you can exclude files matching a specific pattern from being archived. This is useful, for example, when turning a git repository into a tarball.

Example of option --exclude=
$ ls -la repo/
total 32
drwxrwxr-x  4 mkn mkn 4096 Sep 21 21:07 .
drwxrwxr-x 14 mkn mkn 4096 Oct  7 21:06 ..
drwxrwxr-x  8 mkn mkn 4096 Sep 21 21:07 .git
-rw-rw-r--  1 mkn mkn  430 Sep 21 20:59 .gitignore
-rw-rw-r--  1 mkn mkn 1066 Sep 21 20:59 LICENSE
-rw-rw-r--  1 mkn mkn  197 Sep 21 21:07 Makefile
-rw-rw-r--  1 mkn mkn  191 Sep 21 20:59 README.md
drwxrwxr-x  2 mkn mkn 4096 Sep 21 20:59 src
$ tar cvf repo.tar --exclude='.git*' repo
repo/
repo/Makefile
repo/README.md
repo/LICENSE
repo/src/
repo/src/main.c

Conclusion

That's all.

As a side note, the Japanese version of the tar(1) man page on Ubuntu seems to be quite old (22 September 1993!!). It might be better to run LC_ALL=C man tar to read the English version, or search for man tar on Google and read the JM Project page. Although, installing tar-doc and reading the texinfo is the best approach.

Bonus: Using tar to copy directory structures

The V7 UNIX man page introduces the following command:

Tar can also be used to move hierarchies with the command

cd fromdir; tar cf - . | (cd todir; tar xf -)

It looks quite tricky. Since cp(1) at that time apparently could not copy directories, this might have been useful as a method for copying directory structures. In modern Unix, recursive directory copying is possible using cp -r.


References

  • TAR(1), UNIX PROGRAMMAER'S MANUAL, Seventh Edition, Volume 1, January 1979.
  • Man page of TAR, JM Project, updated 2019-02-04, accessed 2021-10-06.
  • tar, Wikipedia, accessed 2021-10-06.
  • Joel Chandler Harris, Uncle Remus: His Songs and His Saying, D. Appleton and Company, 1886. (Accessed Wikisource on 2021-10-06.)
  • Tar-baby, Fukumusume Fairy Tale Collection, accessed 2021-10-06.
  • tar, FreeBSD Manual Pages, The FreeBSD Project, 2020-01-31, accessed 2021-10-17.
脚注
  1. Also, the name "tarball" is a joke referencing the second chapter "The Wonderful Tar-Baby" from Joel Chandler Harris's collection of African-American folklore, "Uncle Remus." In this story, there is a depiction of a rabbit getting stuck in a tar baby trap made by a fox and becoming covered in tar and unable to move. ↩︎

  2. The POSIX standard for tar(1) existed in ISO/IEC 9945-1:1996 (“POSIX.1”), but it does not exist in IEEE Std 1003.1-2001 (“POSIX.1”). It also does not exist in ISO/IEC 9945-1:1996 (“POSIX.1”). Please refer to ko1nksm's comment. ↩︎

  3. See also ko1nksm's comment. ↩︎

  4. If the f option is not specified, the TAPE environment variable is checked first. If this environment variable is set, its value is used as the archive filename. Otherwise, a default value built into the tar(1) executable is used. You can check the default value with tar --show-defaults. ↩︎

Discussion

ko1nksmko1nksm

tar(1) の POSIX 規格は ISO/IEC 9945-1:1996 (“POSIX.1”) にはありましたが

ちょうどこの件に関する情報を調べていたのですが、おそらく ISO/IEC 9945-1:1996 にも tar(1) は無かったと思います。なぜならこれは C 言語 API の規格だからです。

ISO/IEC 9945-1:1996
Information technology — Portable Operating System Interface (POSIX) — Part 1: System Application Program Interface (API) [C Language]

こちらを見ても、シェルとユーティリティが最初に標準化された POSIX.2 (1992) に tar は含まれていません。ただしこちらを見ると以下のように書いてあるため tar 形式については規定されてあった可能性が高いです。

It also defines a format for data interchange.

POSIX (POSIX.1-2001) と統合される前の SUS (Single UNIX Specification) には tar が含まれていましたが、削除されてから POSIX へ統合されました。


システム間での交換可能性を保ちたいのであれば,POSIX コマンドで tarball を扱える pax(1) を利用するのが良いかもしれません.

実際に重要なのは、複数ある tar 形式のどれを使用するかという話で、pax (1) が使用する ustar 形式 または pax 形式(共に POSIX で標準化された歴史的な tar の拡張形式)は、GNU tar や BSD tar も対応しているはずなので、明示的に形式を指定すれば tar(1) を使っても大丈夫だと思います。

KusaReMKNKusaReMKN

tar(1) の POSIX 規格は ISO/IEC 9945-1:1996 (“POSIX.1”) にはありましたが

この部分は BSDTAR(1) の Man page からの引用でした (BSDTAR(1) の Man page を参考にしたことを明記し忘れていることに気が付きました). たしかに,ISO/IEC 9945-1:1996 には tar(1) について書いてなさそうです.

ファイル形式 tar(5) については,新たに調べてみたところ,次のような文言を見つけました. もしかしたらこれのことかもしれません.

IEEE Std 1003.1-1988 (“POSIX.1”) の初期のドラフトは、John Gilmore の pdtar プログラムや、1980 年代終わりから 1990 年代始めにかけてのシステムの実装の基礎となりました。
(中略)
IEEE Std 1003.1-1988 (“POSIX.1”) は、対応した tar(1) 実装で読み書きが出来る、標準的な tar ファイルフォーマットを定義しています。
TAR(5)


システム間での交換可能性を保ちたいのであれば,

これはまさしくその通りです.

(もはや気にする必要はありませんが) tar(1) には出力フォーマットを指定できないものもあります (SUS の tar(1) はそれの一例です). “Tarball の形式について深く考えずにアーカイブを操作したいならば pax(1) が一番かもしれないな” くらいの気分でした. もっとも,きっと最も多く使われている tar(1) は GNU tar でしょうから問題もそう多く起こらないはずです.

ko1nksmko1nksm

なるほど BSDTAR からの引用でしたか、気づきませんでした。

こちらは一つ訂正です。

シェルとユーティリティが最初に標準化された POSIX.2 (1992) に tar は含まれていません。

どうやら参照先のリストから tar の項目自体が削除されてしまっていたようです。それよりも古いリスト(こちらの Version 2 Interface Tables)を発見したので、それを調べた所 POSIX.2 に tar が含まれていたことを確認しました。

なので BSDTAR(1) の Man page …

There is no current POSIX standard for the tar command; it appeared in
ISO/IEC 9945-1:1996 ("POSIX.1") but was dropped from IEEE Std 1003.1-2001
("POSIX.1").

ISO/IEC 9945-1:1996 ("POSIX.1") は間違いで ISO/IEC 9945-2:1993 ("POSIX.2") が正しいのかもしれませんね。それはそれとして「"ISO/IEC"で登場」して「"IEEE" で削除」という書き方はおかしいわけですが。