Chapter 10. Data management

	Warning
	The uncoordinated write access to actively accessed devices and files from multiple processes must not be done to avoid the race condition. File locking mechanisms using flock(1) may be used to avoid it.

10.1. Sharing, copying, and archiving

The security of the data and its controlled sharing have several aspects.

The creation of data archive
The remote storage access
The duplication
The tracking of the modification history
The facilitation of data sharing
The prevention of unauthorized file access
The detection of unauthorized file modification

These can be realized by using some combination of tools.

Archive and compression tools
Copy and synchronization tools
Network filesystems
Removable storage media
The secure shell
The authentication system
Version control system tools
Hash and cryptographic encryption tools

10.1.1. Archive and compression tools

Here is a summary of archive and compression tools available on the Debian system.

Table 10.1. List of archive and compression tools

package	popcon	size	command	extension	comment
`tar`	http://qa.debian.org/popcon.php?package=tar	2464	tar(1)	`.tar`	the standard archiver (de facto standard)
`cpio`	http://qa.debian.org/popcon.php?package=cpio	920	cpio(1)	`.cpio`	Unix System V style archiver, use with find(1)
`binutils`	http://qa.debian.org/popcon.php?package=binutils	14823	ar(1)	`.ar`	archiver for the creation of static libraries
`fastjar`	http://qa.debian.org/popcon.php?package=fastjar	216	fastjar(1)	`.jar`	archiver for Java (zip like)
`pax`	http://qa.debian.org/popcon.php?package=pax	178	pax(1)	`.pax`	new POSIX standard archiver, compromise between `tar` and `cpio`
`gzip`	http://qa.debian.org/popcon.php?package=gzip	179	gzip(1), zcat(1), …	`.gz`	GNU LZ77 compression utility (de facto standard)
`bzip2`	http://qa.debian.org/popcon.php?package=bzip2	86	bzip2(1), bzcat(1), …	`.bz2`	Burrows-Wheeler block-sorting compression utility with higher compression ratio than gzip(1) (slower than `gzip` with similar syntax)
`lzma`	http://qa.debian.org/popcon.php?package=lzma	144	lzma(1)	`.lzma`	LZMA compression utility with higher compression ratio than gzip(1) (deprecated)
`xz-utils`	http://qa.debian.org/popcon.php?package=xz-utils	383	xz(1), xzdec(1), …	`.xz`	XZ compression utility with higher compression ratio than bzip2(1) (slower than `gzip` but faster than `bzip2`; replacement for LZMA compression utility)
`p7zip`	http://qa.debian.org/popcon.php?package=p7zip	986	7zr(1), p7zip(1)	`.7z`	7-Zip file archiver with high compression ratio (LZMA compression)
`p7zip-full`	http://qa.debian.org/popcon.php?package=p7zip-full	3895	7z(1), 7za(1)	`.7z`	7-Zip file archiver with high compression ratio (LZMA compression and others)
`lzop`	http://qa.debian.org/popcon.php?package=lzop	112	lzop(1)	`.lzo`	LZO compression utility with higher compression and decompression speed than gzip(1) (lower compression ratio than `gzip` with similar syntax)
`zip`	http://qa.debian.org/popcon.php?package=zip	636	zip(1)	`.zip`	InfoZIP: DOS archive and compression tool
`unzip`	http://qa.debian.org/popcon.php?package=unzip	377	unzip(1)	`.zip`	InfoZIP: DOS unarchive and decompression tool

	Warning
	Do not set the "`$TAPE`" variable unless you know what to expect. It changes tar(1) behavior.

	Note
	The gzipped tar(1) archive uses the file extension "`.tgz`" or "`.tar.gz`".

	Note
	The xz-compressed tar(1) archive uses the file extension "`.txz`" or "`.tar.xz`".

	Note
	Popular compression method in FOSS tools such as tar(1) has been moving as follows: `gzip` → `bzip2` → `xz`

	Note
	cp(1), scp(1) and tar(1) may have some limitation for special files. cpio(1) is most versatile.

	Note
	cpio(1) is designed to be used with find(1) and other commands and suitable for creating backup scripts since the file selection part of the script can be tested independently.

	Note
	Internal structure of OpenOffice data files are "`.jar`" file.

10.1.2. Copy and synchronization tools

Here is a summary of simple copy and backup tools available on the Debian system.

Table 10.2. List of copy and synchronization tools

package	popcon	size	tool	function
`coreutils`	http://qa.debian.org/popcon.php?package=coreutils	14088	GNU cp	locally copy files and directories ("-a" for recursive)
`openssh-client`	http://qa.debian.org/popcon.php?package=openssh-client	2246	scp	remotely copy files and directories (client, "`-r`" for recursive)
`openssh-server`	http://qa.debian.org/popcon.php?package=openssh-server	701	sshd	remotely copy files and directories (remote server)
`rsync`	http://qa.debian.org/popcon.php?package=rsync	724	-	1-way remote synchronization and backup
`unison`	http://qa.debian.org/popcon.php?package=unison	1987	-	2-way remote synchronization and backup

Copying files with rsync(8) offers richer features than others.

delta-transfer algorithm that sends only the differences between the source files and the existing files in the destination
quick check algorithm (by default) that looks for files that have changed in size or in last-modified time
"--exclude" and "--exclude-from" options similar to tar(1)
"a trailing slash on the source directory" syntax that avoids creating an additional directory level at the destination.

	Tip
	Execution of the `bkup` script mentioned in Section 10.1.9, “A copy script for the data backup” with the "`-gl`" option under cron(8) should provide very similar functionality as Plan9's `dumpfs` for the static data archive.

	Tip
	Version control system (VCS) tools in Table 10.15, “List of version control system tools” can function as the multi-way copy and synchronization tools.

10.1.3. Idioms for the archive

Here are several ways to archive and unarchive the entire content of the directory "./source" using different tools.

GNU tar(1):

$ tar cvzf archive.tar.gz ./source
$ tar xvzf archive.tar.gz

cpio(1):

$ find ./source -xdev -print0 | cpio -ov --null > archive.cpio; gzip archive.cpio
$ zcat archive.cpio.gz | cpio -i

10.1.4. Idioms for the copy

Here are several ways to copy the entire content of the directory "./source" using different tools.

Local copy: "./source" directory → "/dest" directory
Remote copy: "./source" directory at local host → "/dest" directory at "[email protected]" host

rsync(8):

# cd ./source; rsync -av . /dest
# cd ./source; rsync -av . [email protected]:/dest

You can alternatively use "a trailing slash on the source directory" syntax.

# rsync -av ./source/ /dest
# rsync -av ./source/ [email protected]:/dest

GNU cp(1) and openSSH scp(1):

# cd ./source; cp -a . /dest
# cd ./source; scp -pr . [email protected]:/dest

GNU tar(1):

# (cd ./source && tar cf - . ) | (cd /dest && tar xvfp - )
# (cd ./source && tar cf - . ) | ssh [email protected] '(cd /dest && tar xvfp - )'

cpio(1):

# cd ./source; find . -print0 | cpio -pvdm --null --sparse /dest

You can substitute "." with "foo" for all examples containing "." to copy files from "./source/foo" directory to "/dest/foo" directory.

You can substitute "." with the absolute path "/path/to/source/foo" for all examples containing "." to drop "cd ./source;". These copy files to different locations depending on tools used as follows.

"/dest/foo": rsync(8), GNU cp(1), and scp(1)
"/path/to/source/foo": GNU tar(1), and cpio(1)

	Tip
	rsync(8) and GNU cp(1) have option "`-u`" to skip files that are newer on the receiver.

10.1.5. Idioms for the selection of files

find(1) is used to select files for archive and copy commands (see Section 10.1.3, “Idioms for the archive” and Section 10.1.4, “Idioms for the copy”) or for xargs(1) (see Section 9.5.9, “Repeating a command looping over files”). This can be enhanced by using its command arguments.

Basic syntax of find(1) can be summarized as the following.

Its conditional arguments are evaluated from left to right.
This evaluation stops once its outcome is determined.
"Logical OR" (specified by "-o" between conditionals) has lower precedence than "logical AND" (specified by "-a" or nothing between conditionals).
"Logical NOT" (specified by "!" before a conditional) has higher precedence than "logical AND".
"-prune" always returns logical TRUE and, if it is a directory, searching of file is stopped beyond this point.
"-name" matches the base of the filename with shell glob (see Section 1.5.6, “Shell glob”) but it also matches its initial "." with metacharacters such as "*" and "?". (New POSIX feature)
"-regex" matches the full path with emacs style BRE (see Section 1.6.2, “Regular expressions”) as default.
"-size" matches the file based on the file size (value precedented with "+" for larger, precedented with "-" for smaller)
"-newer" matches the file newer than the one specified in its argument.
"-print0" always returns logical TRUE and print the full filename (null terminated) on the standard output.

find(1) is often used with an idiomatic style as the following.

# find /path/to \
    -xdev -regextype posix-extended \
    -type f -regex ".*\.cpio|.*~" -prune -o \
    -type d -regex ".*/\.git" -prune -o \
    -type f -size +99M -prune -o \
    -type f -newer /path/to/timestamp -print0

This means to do following actions.

Search all files starting from "/path/to"
Globally limit its search within its starting filesystem and uses ERE (see Section 1.6.2, “Regular expressions”) instead
Exclude files matching regex of ".*\.cpio" or ".*~" from search by stop processing
Exclude directories matching regex of ".*/\.git" from search by stop processing
Exclude files larger than 99 Megabytes (units of 1048576 bytes) from search by stop processing
Print filenames which satisfy above search conditions and newer than "/path/to/timestamp"

Please note the idiomatic use of "-prune -o" to exclude files in the above example.

	Note
	For non-Debian Unix-like system, some options may not be supported by find(1). In such a case, please consider to adjust matching methods and replace "`-print0`" with "`-print`". You may need to adjust related commands too.

10.1.6. Backup and recovery

We all know that computers fail sometime or human errors cause system and data damages. Backup and recovery operations are the essential part of successful system administration. All possible failure modes hit you some day.

	Tip
	Keep your backup system simple and backup your system often. Having backup data is more important than how technically good your backup method is.

There are 3 key factors which determine actual backup and recovery policy.

Knowing what to backup and recover.
- Data files directly created by you: data in "~/"
- Data files created by applications used by you: data in "/var/" (except "/var/cache/", "/var/run/", and "/var/tmp/")
- System configuration files: data in "/etc/"
- Local softwares: data in "/usr/local/" or "/opt/"
- System installation information: a memo in plain text on key steps (partition, …)
- Proven set of data: confirmed by experimental recovery operations in advance
Knowing how to backup and recover.
- Secure storage of data: protection from overwrite and system failure
- Frequent backup: scheduled backup
- Redundant backup: data mirroring
- Fool proof process: easy single command backup
Assessing risks and costs involved.
- Value of data when lost
- Required resources for backup: human, hardware, software, …
- Failure mode and their possibility

As for secure storage of data, data should be at least on different disk partitions preferably on different disks and machines to withstand the filesystem corruption. Important data are best stored on a write-once media such as CD/DVD-R to prevent overwrite accidents. (See Section 10.3, “The binary data” for how to write to the storage media from the shell commandline. GNOME desktop GUI environment gives you easy access via menu: "Places→CD/DVD Creator".)

	Note
	You may wish to stop some application daemons such as MTA (see Section 6.3, “Mail transport agent (MTA)”) while backing up data.

Note

You should pay extra care to the backup and restoration of identity related data files such as "/etc/ssh/ssh_host_dsa_key", "/etc/ssh/ssh_host_rsa_key", "~/.gnupg/*", "~/.ssh/*", "/etc/passwd", "/etc/shadow", "/etc/fetchmailrc", "popularity-contest.conf", "/etc/ppp/pap-secrets", and "/etc/exim4/passwd.client". Some of these data can not be regenerated by entering the same input string to the system.

	Note
	If you run a cron job as a user process, you must restore files in "`/var/spool/cron/crontabs`" directory and restart cron(8). See Section 9.5.14, “Scheduling tasks regularly” for cron(8) and crontab(1).

10.1.7. Backup utility suites

Here is a select list of notable backup utility suites available on the Debian system.

Table 10.3. List of backup suite utilities

package	popcon	size	description
`rdiff-backup`	http://qa.debian.org/popcon.php?package=rdiff-backup	704	(remote) incremental backup
`dump`	http://qa.debian.org/popcon.php?package=dump	716	4.4 BSD dump(8) and restore(8) for ext2/ext3 filesystems
`xfsdump`	http://qa.debian.org/popcon.php?package=xfsdump	595	dump and restore with xfsdump(8) and xfsrestore(8) for XFS filesystem on GNU/Linux and IRIX
`backupninja`	http://qa.debian.org/popcon.php?package=backupninja	276	lightweight, extensible meta-backup system
`sbackup`	http://qa.debian.org/popcon.php?package=sbackup	488	simple backup suite for GNOME desktop
`bacula-common`	http://qa.debian.org/popcon.php?package=bacula-common	1083	Bacula: network backup, recovery and verification - common support files
`bacula-client`	http://qa.debian.org/popcon.php?package=bacula-client	53	Bacula: network backup, recovery and verification - client meta-package
`bacula-console`	http://qa.debian.org/popcon.php?package=bacula-console	154	Bacula: network backup, recovery and verification - text console
`bacula-server`	http://qa.debian.org/popcon.php?package=bacula-server	53	Bacula: network backup, recovery and verification - server meta-package
`amanda-common`	http://qa.debian.org/popcon.php?package=amanda-common	7013	Amanda: Advanced Maryland Automatic Network Disk Archiver (Libs)
`amanda-client`	http://qa.debian.org/popcon.php?package=amanda-client	789	Amanda: Advanced Maryland Automatic Network Disk Archiver (Client)
`amanda-server`	http://qa.debian.org/popcon.php?package=amanda-server	920	Amanda: Advanced Maryland Automatic Network Disk Archiver (Server)
`backuppc`	http://qa.debian.org/popcon.php?package=backuppc	1955	BackupPC is a high-performance, enterprise-grade system for backing up PCs (disk based)
`backup-manager`	http://qa.debian.org/popcon.php?package=backup-manager	615	command-line backup tool
`backup2l`	http://qa.debian.org/popcon.php?package=backup2l	86	low-maintenance backup/restore tool for mountable media (disk based)

Backup tools have their specialized focuses.

Mondo Rescue is a backup system to facilitate restoration of complete system quickly from backup CD/DVD etc. without going through normal system installation processes.
sbackup and keep packages provide easy GUI frontend for desktop users to make regular backups of user data. An equivalent function can be realized by a simple script (Section 10.1.8, “An example script for the system backup”) and cron(8).
Bacula, Amanda, and BackupPC are full featured backup suite utilities which are focused on regular backups over network.

Basic tools described in Section 10.1.1, “Archive and compression tools” and Section 10.1.2, “Copy and synchronization tools” can be used to facilitate system backup via custom scripts. Such script can be enhanced by the following.

The rdiff-backup package enables incremental (remote) backups.
The dump package helps to archive and restore the whole filesystem incrementally and efficiently.

	Tip
	See files in "`/usr/share/doc/dump/`" and "Is dump really deprecated?" to lean about the `dump` package.

10.1.8. An example script for the system backup

For a personal Debian desktop system running unstable suite, I only need to protect personal and critical data. I reinstall system once a year anyway. Thus I see no reason to backup the whole system or to install a full featured backup utility.

I use a simple script to make a backup archive and burn it into CD/DVD using GUI. Here is an example script for this.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <[email protected]>, Public Domain
BUUID=1000; USER=osamu # UID and name of a user who accesses backup files
BUDIR="/var/backups"
XDIR0=".+/Mail|.+/Desktop"
XDIR1=".+/\.thumbnails|.+/\.?Trash|.+/\.?[cC]ache|.+/\.gvfs|.+/sessions"
XDIR2=".+/CVS|.+/\.git|.+/\.svn|.+/Downloads|.+/Archive|.+/Checkout|.+/tmp"
XSFX=".+\.iso|.+\.tgz|.+\.tar\.gz|.+\.tar\.bz2|.+\.cpio|.+\.tmp|.+\.swp|.+~"
SIZE="+99M"
DATE=$(date --utc +"%Y%m%d-%H%M")
[ -d "$BUDIR" ] || mkdir -p "BUDIR"
umask 077
dpkg --get-selections \* > /var/lib/dpkg/dpkg-selections.list
debconf-get-selections > /var/cache/debconf/debconf-selections

{
find /etc /usr/local /opt /var/lib/dpkg/dpkg-selections.list \
     /var/cache/debconf/debconf-selections -xdev -print0
find /home/$USER /root -xdev -regextype posix-extended \
  -type d -regex "$XDIR0|$XDIR1" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
find /home/$USER/Mail/Inbox /home/$USER/Mail/Outbox -print0
find /home/$USER/Desktop  -xdev -regextype posix-extended \
  -type d -regex "$XDIR2" -prune -o -type f -regex "$XSFX" -prune -o \
  -type f -size  "$SIZE" -prune -o -print0
} | cpio -ov --null -O $BUDIR/BU$DATE.cpio
chown $BUUID $BUDIR/BU$DATE.cpio
touch $BUDIR/backup.stamp

This is meant to be a script example executed from root.

I expect you to change and execute this as follows.

Edit this script to cover all your important data (see Section 10.1.5, “Idioms for the selection of files” and Section 10.1.6, “Backup and recovery”).
Replace "find … -print0" with "find … -newer $BUDIR/backup.stamp -print0" to make a incremental backup.
Transfer backup files to the remote host using scp(1) or rsync(1) or burn them to CD/DVD for extra data security. (I use GNOME desktop GUI for burning CD/DVD. See Section 12.1.8, “Shell script example with zenity” for extra redundancy.)

Keep it simple!

	Tip
	You can recover debconf configuration data with "`debconf-set-selections debconf-selections`" and dpkg selection data with "`dpkg --set-selection <dpkg-selections.list`".

10.1.9. A copy script for the data backup

For the set of data under a directory tree, the copy with "cp -a" provides the normal backup.

For the set of large non-overwritten static data under a directory tree such as the one under the "/var/cache/apt/packages/" directory, hardlinks with "cp -al" provide an alternative to the normal backup with efficient use of the disk space.

Here is a copy script, which I named as bkup, for the data backup. This script copies all (non-VCS) files under the current directory to the dated directory on the parent directory or on a remote host.

#!/bin/sh -e
# Copyright (C) 2007-2008 Osamu Aoki <[email protected]>, Public Domain
fdot(){ find . -type d \( -iname ".?*" -o -iname "CVS" \) -prune -o -print0;}
fall(){ find . -print0;}
mkdircd(){ mkdir -p "$1";chmod 700 "$1";cd "$1">/dev/null;}
FIND="fdot";OPT="-a";MODE="CPIOP";HOST="localhost";EXTP="$(hostname -f)"
BKUP="$(basename $(pwd)).bkup";TIME="$(date  +%Y%m%d-%H%M%S)";BU="$BKUP/$TIME"
while getopts gcCsStrlLaAxe:h:T f; do case $f in
g)  MODE="GNUCP";; # cp (GNU)
c)  MODE="CPIOP";; # cpio -p
C)  MODE="CPIOI";; # cpio -i
s)  MODE="CPIOSSH";; # cpio/ssh
t)  MODE="TARSSH";; # tar/ssh
r)  MODE="RSYNCSSH";; # rsync/ssh
l)  OPT="-alv";; # hardlink (GNU cp)
L)  OPT="-av";;  # copy (GNU cp)
a)  FIND="fall";; # find all
A)  FIND="fdot";; # find non CVS/ .???/
x)  set -x;; # trace
e)  EXTP="${OPTARG}";; # hostname -f
h)  HOST="${OPTARG}";; # [email protected]
T)  MODE="TEST";; # test find mode
\?) echo "use -x for trace."
esac; done
shift $(expr $OPTIND - 1)
if [ $# -gt 0 ]; then
  for x in $@; do cp $OPT $x $x.$TIME; done
elif [ $MODE = GNUCP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU";cp $OPT . "../$BU/"
elif [ $MODE = CPIOP ]; then
  mkdir -p "../$BU";chmod 700 "../$BU"
  $FIND|cpio --null --sparse -pvd ../$BU
elif [ $MODE = CPIOI ]; then
  $FIND|cpio -ov --null | ( mkdircd "../$BU"&&cpio -i )
elif [ $MODE = CPIOSSH ]; then
  $FIND|cpio -ov --null|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&&cpio -i )"
elif [ $MODE = TARSSH ]; then
  (tar cvf - . )|ssh -C $HOST "( mkdircd \"$EXTP/$BU\"&& tar xvfp - )"
elif [ $MODE = RSYNCSSH ]; then
  rsync -rlpt ./ "${HOST}:${EXTP}-${BKUP}-${TIME}"
else
  echo "Any other idea to backup?"
  $FIND |xargs -0 -n 1 echo
fi

This is meant to be command examples. Please read script and edit it by yourself before using it.

	Tip
	I keep this `bkup` in my "`/usr/local/bin/`" directory. I issue this `bkup` command without any option in the working directory whenever I need a temporary snapshot backup.

	Tip
	For making snapshot history of a source file tree or a configuration file tree, it is easier and space efficient to use git(7) (see Section 10.9.5, “Git for recording configuration history”).

10.1.10. Removable storage device

Removable storage devices may be any one of the following.

USB flash drive
Hard disk drive
Optical disc drive
Digital camera
Digital music player

They may be connected via any one of the following.

Modern desktop environments such as GNOME and KDE can mount these removable devices automatically without a matching "/etc/fstab" entry.

udisks package provides a daemon and associated utilities to mount and unmount these devices.
D-bus creates events to initiate automatic processes.
PolicyKit provides required privileges.

	Tip
	Automounted devices may have the "`uhelper=`" mount option which is used by umount(8).

	Tip
	Automounting under modern desktop environment happens only when those removable media devices are not listed in "`/etc/fstab`".

Mount point under modern desktop environment is chosen as "/media/<disk_label>" which can be customized by the following.

mlabel(1) for FAT filesystem
genisoimage(1) with "-V" option for ISO9660 filesystem
tune2fs(1) with "-L" option for ext2/ext3/ext4 filesystem

	Tip
	The choice of encoding may need to be provided as mount option (see Section 8.3.6, “Filename encoding”).

10.1.11. Filesystem choice for sharing data

When sharing data with other system via removable storage device, you should format it with common filesystem supported by both systems. Here is a list of filesystem choices.

Table 10.4. List of filesystem choices for removable storage devices with typical usage scenarios

filesystem	description of typical usage scenario
FAT12	cross platform sharing of data on the floppy disk (<32MiB)
FAT16	cross platform sharing of data on the small hard disk like device (<2GiB)
FAT32	cross platform sharing of data on the large hard disk like device (<8TiB, supported by newer than MS Windows95 OSR2)
NTFS	cross platform sharing of data on the large hard disk like device (supported natively on MS Windows NT and later version, and supported by NTFS-3G via FUSE on Linux)
ISO9660	cross platform sharing of static data on CD-R and DVD+/-R
UDF	incremental data writing on CD-R and DVD+/-R (new)
MINIX filesystem	space efficient unix file data storage on the floppy disk
ext2 filesystem	sharing of data on the hard disk like device with older Linux systems
ext3 filesystem	sharing of data on the hard disk like device with older Linux systems
ext4 filesystem	sharing of data on the hard disk like device with current Linux systems

	Tip
	See Section 9.4.1, “Removable disk encryption with dm-crypt/LUKS” for cross platform sharing of data using device level encryption.

The FAT filesystem is supported by almost all modern operating systems and is quite useful for the data exchange purpose via removable hard disk like media.

When formatting removable hard disk like devices for cross platform sharing of data with the FAT filesystem, the following should be safe choices.

Partitioning them with fdisk(8), cfdisk(8) or parted(8) (see Section 9.3.2, “Disk partition configuration”) into a single primary partition and to mark it as the following.
- Type "6" for FAT16 for media smaller than 2GB.
- Type "c" for FAT32 (LBA) for larger media.
Formatting the primary partition with mkfs.vfat(8) with the following.
- Just its device name, e.g. "/dev/sda1" for FAT16
- The explicit option and its device name, e.g. "-F 32 /dev/sda1" for FAT32

When using the FAT or ISO9660 filesystems for sharing data, the following should be the safe considerations.

Archiving files into an archive file first using tar(1), or cpio(1) to retain the long filename, the symbolic link, the original Unix file permission and the owner information.
Splitting the archive file into less than 2 GiB chunks with the split(1) command to protect it from the file size limitation.
Encrypting the archive file to secure its contents from the unauthorized access.

	Note
	For FAT filesystems by its design, the maximum file size is `(2^32 - 1) bytes = (4GiB - 1 byte)`. For some applications on the older 32 bit OS, the maximum file size was even smaller `(2^31 - 1) bytes = (2GiB - 1 byte)`. Debian does not suffer the latter problem.

	Note
	Microsoft itself does not recommend to use FAT for drives or partitions of over 200 MB. Microsoft highlights its short comings such as inefficient disk space usage in their "Overview of FAT, HPFS, and NTFS File Systems". Of course, we should normally use the ext4 filesystem for Linux.

	Tip
	For more on filesystems and accessing filesystems, please read "Filesystems HOWTO".

10.1.12. Sharing data via network

When sharing data with other system via network, you should use common service. Here are some hints.

Table 10.5. List of the network service to chose with the typical usage scenario

network service	description of typical usage scenario
SMB/CIFS network mounted filesystem with Samba	sharing files via "Microsoft Windows Network", see smb.conf(5) and The Official Samba 3.2.x HOWTO and Reference Guide or the `samba-doc` package
NFS network mounted filesystem with the Linux kernel	sharing files via "Unix/Linux Network", see exports(5) and Linux NFS-HOWTO
HTTP service	sharing file between the web server/client
HTTPS service	sharing file between the web server/client with encrypted Secure Sockets Layer (SSL) or Transport Layer Security (TLS)
FTP service	sharing file between the FTP server/client

Although these filesystems mounted over network and file transfer methods over network are quite convenient for sharing data, these may be insecure. Their network connection must be secured by the following.

Encrypt it with SSL/TLS
Tunnel it via SSH
Tunnel it via VPN
Limit it behind the secure firewall

10.1.13. Archive media

When choosing computer data storage media for important data archive, you should be careful about their limitations. For small personal data backup, I use CD-R and DVD-R by the brand name company and store in a cool, shaded, dry, clean environment. (Tape archive media seem to be popular for professional use.)

	Note
	A fire-resistant safe are meant for paper documents. Most of the computer data storage media have less temperature tolerance than paper. I usually rely on multiple secure encrypted copies stored in multiple secure locations.

Optimistic storage life of archive media seen on the net (mostly from vendor info).

100+ years : Acid free paper with ink
100 years : Optical storage (CD/DVD, CD/DVD-R)
30 years : Magnetic storage (tape, floppy)
20 years : Phase change optical storage (CD-RW)

These do not count on the mechanical failures due to handling etc.

Optimistic write cycle of archive media seen on the net (mostly from vendor info).

250,000+ cycles : Harddisk drive
10,000+ cycles : Flash memory
1,000 cycles : CD/DVD-RW
1 cycles : CD/DVD-R, paper

	Caution
	Figures of storage life and write cycle here should not be used for decisions on any critical data storage. Please consult the specific product information provided by the manufacture.

	Tip
	Since CD/DVD-R and paper have only 1 write cycle, they inherently prevent accidental data loss by overwriting. This is advantage!

	Tip
	If you need fast and frequent backup of large amount of data, a hard disk on a remote host linked by a fast network connection, may be the only realistic option.

10.2. The disk image

Here, we discuss manipulations of the disk image. See Section 9.3, “Data storage tips”, too.

10.2.1. Making the disk image file

The disk image file, "disk.img", of an unmounted device, e.g., the second SCSI drive "/dev/sdb", can be made using cp(1) or dd(1) by the following.

# cp /dev/sdb disk.img
# dd if=/dev/sdb of=disk.img

The disk image of the traditional PC's master boot record (MBR) (see Section 9.3.2, “Disk partition configuration”) which reside on the first sector on the primary IDE disk can be made by using dd(1) by the following.

# dd if=/dev/hda of=mbr.img bs=512 count=1
# dd if=/dev/hda of=mbr-nopart.img bs=446 count=1
# dd if=/dev/hda of=mbr-part.img skip=446 bs=1 count=66

"mbr.img": The MBR with the partition table
"mbr-nopart.img": The MBR without the partition table
"part.img": The partition table of the MBR only

If you have a SCSI device (including the new serial ATA drive) as the boot disk, substitute "/dev/hda" with "/dev/sda".

If you are making an image of a disk partition of the original disk, substitute "/dev/hda" with "/dev/hda1" etc.

10.2.2. Writing directly to the disk

The disk image file, "disk.img" can be written to an unmounted device, e.g., the second SCSI drive "/dev/sdb" with matching size, by the following.

# dd if=disk.img of=/dev/sdb

Similarly, the disk partition image file, "partition.img" can be written to an unmounted partition, e.g., the first partition of the second SCSI drive "/dev/sdb1" with matching size, by the following.

# dd if=partition.img of=/dev/sdb1

10.2.3. Mounting the disk image file

The disk image "partition.img" containing a single partition image can be mounted and unmounted by using the loop device as follows.

# losetup -v -f partition.img
Loop device is /dev/loop0
# mkdir -p /mnt/loop0
# mount -t auto /dev/loop0 /mnt/loop0
...hack...hack...hack
# umount /dev/loop0
# losetup -d /dev/loop0

This can be simplified as follows.

# mkdir -p /mnt/loop0
# mount -t auto -o loop partition.img /mnt/loop0
...hack...hack...hack
# umount partition.img

Each partition of the disk image "disk.img" containing multiple partitions can be mounted by using the loop device. Since the loop device does not manage partitions by default, we need to reset it as follows.

# modinfo -p loop # verify kernel capability
max_part:Maximum number of partitions per loop device
max_loop:Maximum number of loop devices
# losetup -a # verify nothing using the loop device
# rmmod loop
# modprobe loop max_part=16

Now, the loop device can manage up to 16 partitions.

# losetup -v -f disk.img
Loop device is /dev/loop0
# fdisk -l /dev/loop0

Disk /dev/loop0: 5368 MB, 5368709120 bytes
255 heads, 63 sectors/track, 652 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x452b6464

      Device Boot      Start         End      Blocks   Id  System
/dev/loop0p1               1         600     4819468+  83  Linux
/dev/loop0p2             601         652      417690   83  Linux
# mkdir -p /mnt/loop0p1
# mount -t ext3 /dev/loop0p1 /mnt/loop0p1
# mkdir -p /mnt/loop0p2
# mount -t ext3 /dev/loop0p2 /mnt/loop0p2
...hack...hack...hack
# umount /dev/loop0p1
# umount /dev/loop0p2
# losetup -d /dev/loop0

Alternatively, similar effects can be done by using the device mapper devices created by kpartx(8) from the kpartx package as follows.

# kpartx -a -v disk.img
...
# mkdir -p /mnt/loop0p2
# mount -t ext3 /dev/mapper/loop0p2 /mnt/loop0p2
...
...hack...hack...hack
# umount /dev/mapper/loop0p2
...
# kpartx -d /mnt/loop0

	Note
	You can mount a single partition of such disk image with loop device using offset to skip MBR etc., too. But this is more error prone.

10.2.4. Cleaning a disk image file

A disk image file, "disk.img" can be cleaned of all removed files into clean sparse image "new.img" by the following.

# mkdir old; mkdir new
# mount -t auto -o loop disk.img old
# dd bs=1 count=0 if=/dev/zero of=new.img seek=5G
# mount -t auto -o loop new.img new
# cd old
# cp -a --sparse=always ./ ../new/
# cd ..
# umount new.img
# umount disk.img

If "disk.img" is in ext2 or ext3, you can also use zerofree(8) from the zerofree package as follows.

# losetup -f -v disk.img
Loop device is /dev/loop3
# zerofree /dev/loop3
# cp --sparse=always disk.img new.img

10.2.5. Making the empty disk image file

The empty disk image "disk.img" which can grow up to 5GiB can be made using dd(1) as follows.

$ dd bs=1 count=0 if=/dev/zero of=disk.img seek=5G

You can create an ext3 filesystem on this disk image "disk.img" using the loop device as follows.

# losetup -f -v disk.img
Loop device is /dev/loop1
# mkfs.ext3 /dev/loop1
...hack...hack...hack
# losetup -d /dev/loop1
$ du  --apparent-size -h disk.img
5.0G  disk.img
$ du -h disk.img
83M disk.img

For "disk.img", its file size is 5.0 GiB and its actual disk usage is mere 83MiB. This discrepancy is possible since ext2fs can hold sparse file.

	Tip
	The actual disk usage of sparse file grows with data which are written to it.

Using similar operation on devices created by the loop device or the device mapper devices as Section 10.2.3, “Mounting the disk image file”, you can partition this disk image "disk.img" using parted(8) or fdisk(8), and can create filesystem on it using mkfs.ext3(8), mkswap(8), etc.

10.2.6. Making the ISO9660 image file

The ISO9660 image file, "cd.iso", from the source directory tree at "source_directory" can be made using genisoimage(1) provided by cdrkit by the following.

#  genisoimage -r -J -T -V volume_id -o cd.iso source_directory

Similarly, the bootable ISO9660 image file, "cdboot.iso", can be made from debian-installer like directory tree at "source_directory" by the following.

#  genisoimage -r -o cdboot.iso -V volume_id \
   -b isolinux/isolinux.bin -c isolinux/boot.cat \
   -no-emul-boot -boot-load-size 4 -boot-info-table source_directory

Here Isolinux boot loader (see Section 3.3, “Stage 2: the boot loader”) is used for booting.

You can calculate the md5sum value and make the ISO9660 image directly from the CD-ROM device as follows.

$ isoinfo -d -i /dev/cdrom
CD-ROM is in ISO 9660 format
...
Logical block size is: 2048
Volume size is: 23150592
...
# dd if=/dev/cdrom bs=2048 count=23150592 conv=notrunc,noerror | md5sum
# dd if=/dev/cdrom bs=2048 count=23150592 conv=notrunc,noerror > cd.iso

	Warning
	You must carefully avoid ISO9660 filesystem read ahead bug of Linux as above to get the right result.

10.2.7. Writing directly to the CD/DVD-R/RW

	Tip
	DVD is only a large CD to wodim(1) provided by cdrkit.

You can find a usable device by the following.

# wodim --devices

Then the blank CD-R is inserted to the CD drive, and the ISO9660 image file, "cd.iso" is written to this device, e.g., "/dev/hda", using wodim(1) by the following.

# wodim -v -eject dev=/dev/hda cd.iso

If CD-RW is used instead of CD-R, do this instead by the following.

# wodim -v -eject blank=fast dev=/dev/hda cd.iso

	Tip
	If your desktop system mounts CD automatically, unmount it by "`sudo unmount /dev/hda`" before using wodim(1).

10.2.8. Mounting the ISO9660 image file

If "cd.iso" contains an ISO9660 image, then the following manually mounts it to "/cdrom".

# mount -t iso9660 -o ro,loop cd.iso /cdrom

	Tip
	Modern desktop system mounts removable media automatically (see Section 10.1.10, “Removable storage device”).

10.3. The binary data

Here, we discuss direct manipulations of the binary data on storage media. See Section 9.3, “Data storage tips”, too.

10.3.1. Viewing and editing binary data

The most basic viewing method of binary data is to use "od -t x1" command.

Table 10.6. List of packages which view and edit binary data

package	popcon	size	description
`coreutils`	http://qa.debian.org/popcon.php?package=coreutils	14088	basic package which has od(1) to dump files (HEX, ASCII, OCTAL, …)
`bsdmainutils`	http://qa.debian.org/popcon.php?package=bsdmainutils	558	utility package which has hd(1) to dump files (HEX, ASCII, OCTAL, …)
`hexedit`	http://qa.debian.org/popcon.php?package=hexedit	108	binary editor and viewer (HEX, ASCII)
`bless`	http://qa.debian.org/popcon.php?package=bless	991	full featured hexadecimal editor (GNOME)
`okteta`	http://qa.debian.org/popcon.php?package=okteta	295	full featured hexadecimal editor (KDE4)
`ncurses-hexedit`	http://qa.debian.org/popcon.php?package=ncurses-hexedit	192	binary editor and viewer (HEX, ASCII, EBCDIC)
`beav`	http://qa.debian.org/popcon.php?package=beav	164	binary editor and viewer (HEX, ASCII, EBCDIC, OCTAL, …)

	Tip
	HEX is used as an acronym for hexadecimal format with radix 16. OCTAL is for octal format with radix 8. ASCII is for American Standard Code for Information Interchange, i.e., normal English text code. EBCDIC is for Extended Binary Coded Decimal Interchange Code used on IBM mainframe operating systems.

10.3.2. Manipulating files without mounting disk

There are tools to read and write files without mounting disk.

Table 10.7. List of packages to manipulate files without mounting disk

package	popcon	size	description
`mtools`	http://qa.debian.org/popcon.php?package=mtools	408	utilities for MSDOS files without mounting them
`hfsutils`	http://qa.debian.org/popcon.php?package=hfsutils	236	utilities for HFS and HFS+ files without mounting them

10.3.3. Data redundancy

Software RAID systems offered by the Linux kernel provide data redundancy in the kernel filesystem level to achieve high levels of storage reliability.

There are tools to add data redundancy to files in application program level to achieve high levels of storage reliability, too.

Table 10.8. List of tools to add data redundancy to files

package	popcon	size	description
`par2`	http://qa.debian.org/popcon.php?package=par2	272	Parity Archive Volume Set, for checking and repair of files
`dvdisaster`	http://qa.debian.org/popcon.php?package=dvdisaster	1481	data loss/scratch/aging protection for CD/DVD media
`dvbackup`	http://qa.debian.org/popcon.php?package=dvbackup	392	backup tool using MiniDV camcorders (providing rsbep(1))
`vdmfec`	http://qa.debian.org/popcon.php?package=vdmfec	88	recover lost blocks using Forward Error Correction

10.3.4. Data file recovery and forensic analysis

There are tools for data file recovery and forensic analysis.

Table 10.9. List of packages for data file recovery and forensic analysis

package	popcon	size	description
`testdisk`	http://qa.debian.org/popcon.php?package=testdisk	1153	utilities for partition scan and disk recovery
`magicrescue`	http://qa.debian.org/popcon.php?package=magicrescue	344	utility to recover files by looking for magic bytes
`scalpel`	http://qa.debian.org/popcon.php?package=scalpel	124	frugal, high performance file carver
`myrescue`	http://qa.debian.org/popcon.php?package=myrescue	84	rescue data from damaged harddisks
`recover`	http://qa.debian.org/popcon.php?package=recover	104	utility to undelete files on the ext2 filesystem
`e2undel`	http://qa.debian.org/popcon.php?package=e2undel	244	utility to undelete files on the ext2 filesystem
`ext3grep`	http://qa.debian.org/popcon.php?package=ext3grep	296	tool to help recover deleted files on the ext3 filesystem
`scrounge-ntfs`	http://qa.debian.org/popcon.php?package=scrounge-ntfs	80	data recovery program for NTFS filesystems
`gzrt`	http://qa.debian.org/popcon.php?package=gzrt	68	gzip recovery toolkit
`sleuthkit`	http://qa.debian.org/popcon.php?package=sleuthkit	750	tools for forensics analysis. (Sleuthkit)
`autopsy`	http://qa.debian.org/popcon.php?package=autopsy	1372	graphical interface to SleuthKit
`foremost`	http://qa.debian.org/popcon.php?package=foremost	140	forensics application to recover data
`guymager`	http://qa.debian.org/popcon.php?package=guymager	949	forensic imaging tool based on Qt
`dcfldd`	http://qa.debian.org/popcon.php?package=dcfldd	109	enhanced version of `dd` for forensics and security
`rdd`	http://qa.debian.org/popcon.php?package=rdd	200	forensic copy program

10.3.5. Splitting a large file into small files

When a data is too big to backup as a single file, you can backup its content after splitting it into, e.g. 2000MiB chunks and merge those chunks back into the original file later.

$ split -b 2000m large_file
$ cat x* >large_file

	Caution
	Please make sure you do not have any files starting with "`x`" to avoid name crashes.

10.3.6. Clearing file contents

In order to clear the contents of a file such as a log file, do not use rm(1) to delete the file and then create a new empty file, because the file may still be accessed in the interval between commands. The following is the safe way to clear the contents of the file.

$ :>file_to_be_cleared

10.3.7. Dummy files

The following commands create dummy or empty files.

$ dd if=/dev/zero    of=5kb.file bs=1k count=5
$ dd if=/dev/urandom of=7mb.file bs=1M count=7
$ touch zero.file
$ : > alwayszero.file

You should find following files.

"5kb.file" is 5KB of zeros.
"7mb.file" is 7MB of random data.
"zero.file" may be a 0 byte file. If it existed, its mtime is updated while its content and its length are kept.
"alwayszero.file" is always a 0 byte file. If it existed, its mtime is updated and its content is reset.

10.3.8. Erasing an entire hard disk

There are several ways to completely erase data from an entire hard disk like device, e.g., USB memory stick at "/dev/sda".

	Caution
	Check your USB memory stick location with mount(8) first before executing commands here. The device pointed by "`/dev/sda`" may be SCSI hard disk or serial-ATA hard disk where your entire system resides.

Erase all the disk content by resetting data to 0 with the following.

# dd if=/dev/zero of=/dev/sda

Erase all by overwriting random data with the following.

# dd if=/dev/urandom of=/dev/sda

Erase all by overwriting random data very efficiently with the following.

# shred -v -n 1 /dev/sda

Since dd(1) is available from the shell of many bootable Linux CDs such as Debian installer CD, you can erase your installed system completely by running an erase command from such media on the system hard disk, e.g., "/dev/hda", "/dev/sda", etc.

10.3.9. Erasing unused area of an hard disk

Unused area on an hard disk (or USB memory stick), e.g. "/dev/sdb1" may still contain erased data themselves since they are only unlinked from the filesystem. These can be cleaned by overwriting them.

# mount -t auto /dev/sdb1 /mnt/foo
# cd /mnt/foo
# dd if=/dev/zero of=junk
dd: writing to `junk': No space left on device
...
# sync
# umount /dev/sdb1

	Warning
	This is usually a good enough for your USB memory stick. But this is not perfect. Most parts of erased filenames and their attributes may be hidden and remain in the filesystem.

10.3.10. Undeleting deleted but still open files

Even if you have accidentally deleted a file, as long as that file is still being used by some application (read or write mode), it is possible to recover such a file.

For example, try the following

$ echo foo > bar
$ less bar
$ ps aux | grep ' less[ ]'
bozo    4775  0.0  0.0  92200   884 pts/8    S+   00:18   0:00 less bar
$ rm bar
$ ls -l /proc/4775/fd | grep bar
lr-x------ 1 bozo bozo 64 2008-05-09 00:19 4 -> /home/bozo/bar (deleted)
$ cat /proc/4775/fd/4 >bar
$ ls -l
-rw-r--r-- 1 bozo bozo 4 2008-05-09 00:25 bar
$ cat bar
foo

Execute on another terminal (when you have the lsof package installed) as follows.

$ ls -li bar
2228329 -rw-r--r-- 1 bozo bozo 4 2008-05-11 11:02 bar
$ lsof |grep bar|grep less
less 4775 bozo 4r REG 8,3 4 2228329 /home/bozo/bar
$ rm bar
$ lsof |grep bar|grep less
less 4775 bozo 4r REG 8,3 4 2228329 /home/bozo/bar (deleted)
$ cat /proc/4775/fd/4 >bar
$ ls -li bar
2228302 -rw-r--r-- 1 bozo bozo 4 2008-05-11 11:05 bar
$ cat bar
foo

10.3.11. Searching all hardlinks

Files with hardlinks can be identified by "ls -li".

$ ls -li
total 0
2738405 -rw-r--r-- 1 root root 0 2008-09-15 20:21 bar
2738404 -rw-r--r-- 2 root root 0 2008-09-15 20:21 baz
2738404 -rw-r--r-- 2 root root 0 2008-09-15 20:21 foo

Both "baz" and "foo" have link counts of "2" (>1) showing them to have hardlinks. Their inode numbers are common "2738404". This means they are the same hardlinked file. If you do not happen to find all hardlinked files by chance, you can search it by the inode, e.g., "2738404" as the following.

# find /path/to/mount/point -xdev -inum 2738404

10.3.12. Invisible disk space consumption

All deleted but open files consumes disk space although they are not visible from normal du(1). They can be listed with their size by the following.

# lsof -s -X / |grep deleted

10.4. Data security infrastructure

The data security infrastructure is provided by the combination of data encryption tool, message digest tool, and signature tool.

Table 10.10. List of data security infrastructure tools

command	package	popcon	size	description
gpg(1)	`gnupg`	http://qa.debian.org/popcon.php?package=gnupg	4621	GNU Privacy Guard - OpenPGP encryption and signing tool
N/A	`gnupg-doc`	http://qa.debian.org/popcon.php?package=gnupg-doc	4124	GNU Privacy Guard documentation
gpgv(1)	`gpgv`	http://qa.debian.org/popcon.php?package=gpgv	397	GNU Privacy Guard - signature verification tool
paperkey(1)	`paperkey`	http://qa.debian.org/popcon.php?package=paperkey	88	extract just the secret information out of OpenPGP secret keys
cryptsetup(8), …	`cryptsetup`	http://qa.debian.org/popcon.php?package=cryptsetup	648	utilities for dm-crypto block device encryption supporting LUKS
ecryptfs(7), …	`ecryptfs-utils`	http://qa.debian.org/popcon.php?package=ecryptfs-utils	368	utilities for ecryptfs stacked filesystem encryption
md5sum(1)	`coreutils`	http://qa.debian.org/popcon.php?package=coreutils	14088	compute and check MD5 message digest
sha1sum(1)	`coreutils`	http://qa.debian.org/popcon.php?package=coreutils	14088	compute and checks SHA1 message digest
openssl(1ssl)	`openssl`	http://qa.debian.org/popcon.php?package=openssl	1079	compute message digest with "`openssl dgst`" (OpenSSL)

See Section 9.4, “Data encryption tips” on dm-crypto and ecryptfs which implement automatic data encryption infrastructure via Linux kernel modules.

10.4.1. Key management for GnuPG

Here are GNU Privacy Guard commands for the basic key management.

Table 10.11. List of GNU Privacy Guard commands for the key management

command	description
`gpg --gen-key`	generate a new key
`gpg --gen-revoke my_user_ID`	generate revoke key for my_user_ID
`gpg --edit-key user_ID`	edit key interactively, "help" for help
`gpg -o file --exports`	export all keys to file
`gpg --imports file`	import all keys from file
`gpg --send-keys user_ID`	send key of user_ID to keyserver
`gpg --recv-keys user_ID`	recv. key of user_ID from keyserver
`gpg --list-keys user_ID`	list keys of user_ID
`gpg --list-sigs user_ID`	list sig. of user_ID
`gpg --check-sigs user_ID`	check sig. of user_ID
`gpg --fingerprint user_ID`	check fingerprint of user_ID
`gpg --refresh-keys`	update local keyring

Here is the meaning of the trust code.

Table 10.12. List of the meaning of the trust code

code	description of trust
`-`	no owner trust assigned / not yet calculated
`e`	trust calculation failed
`q`	not enough information for calculation
`n`	never trust this key
`m`	marginally trusted
`f`	fully trusted
`u`	ultimately trusted

The following uploads my key "1DD8D791" to the popular keyserver "hkp://keys.gnupg.net".

$ gpg --keyserver hkp://keys.gnupg.net --send-keys 1DD8D791

A good default keyserver set up in "~/.gnupg/gpg.conf" (or old location "~/.gnupg/options") contains the following.

keyserver hkp://keys.gnupg.net

The following obtains unknown keys from the keyserver.

$ gpg --list-sigs --with-colons | grep '^sig.*\[User ID not found\]' |\
  cut -d ':' -f 5| sort | uniq | xargs gpg --recv-keys

There was a bug in OpenPGP Public Key Server (pre version 0.9.6) which corrupted key with more than 2 sub-keys. The newer gnupg (>1.2.1-2) package can handle these corrupted subkeys. See gpg(1) under "--repair-pks-subkey-bug" option.

10.4.2. Using GnuPG on files

Here are examples for using GNU Privacy Guard commands on files.

Table 10.13. List of GNU Privacy Guard commands on files

command	description
`gpg -a -s file`	sign file into ASCII armored file.asc
`gpg --armor --sign file`	, ,
`gpg --clearsign file`	clear-sign message
`gpg --clearsign file\|mail [email protected]`	mail a clear-signed message to `[email protected]`
`gpg --clearsign --not-dash-escaped patchfile`	clear-sign patchfile
`gpg --verify file`	verify clear-signed file
`gpg -o file.sig -b file`	create detached signature
`gpg -o file.sig --detach-sig file`	, ,
`gpg --verify file.sig file`	verify file with file.sig
`gpg -o crypt_file.gpg -r name -e file`	public-key encryption intended for name from file to binary crypt_file.gpg
`gpg -o crypt_file.gpg --recipient name --encrypt file`	, ,
`gpg -o crypt_file.asc -a -r name -e file`	public-key encryption intended for name from file to ASCII armored crypt_file.asc
`gpg -o crypt_file.gpg -c file`	symmetric encryption from file to crypt_file.gpg
`gpg -o crypt_file.gpg --symmetric file`	, ,
`gpg -o crypt_file.asc -a -c file`	symmetric encryption intended for name from file to ASCII armored crypt_file.asc
`gpg -o file -d crypt_file.gpg -r name`	decryption
`gpg -o file --decrypt crypt_file.gpg`	, ,

10.4.3. Using GnuPG with Mutt

Add the following to "~/.muttrc" to keep a slow GnuPG from automatically starting, while allowing it to be used by typing "S" at the index menu.

macro index S ":toggle pgp_verify_sig\n"
set pgp_verify_sig=no

10.4.4. Using GnuPG with Vim

The gnupg plugin let you run GnuPG transparently for files with extension ".gpg", ".asc", and ".ppg".

# aptitude install vim-scripts vim-addon-manager
$ vim-addons install gnupg

10.4.5. The MD5 sum

md5sum(1) provides utility to make a digest file using the method in rfc1321 and verifying each file with it.

$ md5sum foo bar >baz.md5
$ cat baz.md5
d3b07384d113edec49eaa6238ad5ff00  foo
c157a79031e1c40f85931829bc5fc552  bar
$ md5sum -c baz.md5
foo: OK
bar: OK

	Note
	The computation for the MD5 sum is less CPU intensive than the one for the cryptographic signature by GNU Privacy Guard (GnuPG). Usually, only the top level digest file is cryptographically signed to ensure data integrity.

10.5. Source code merge tools

There are many merge tools for the source code. Following commands caught my eyes.

Table 10.14. List of source code merge tools

command	package	popcon	size	description
diff(1)	`diffutils`	http://qa.debian.org/popcon.php?package=diffutils	1118	compare files line by line
diff3(1)	`diffutils`	http://qa.debian.org/popcon.php?package=diffutils	1118	compare and merges three files line by line
vimdiff(1)	`vim`	http://qa.debian.org/popcon.php?package=vim	1873	compare 2 files side by side in vim
patch(1)	`patch`	http://qa.debian.org/popcon.php?package=patch	218	apply a diff file to an original
dpatch(1)	`dpatch`	http://qa.debian.org/popcon.php?package=dpatch	237	manage series of patches for Debian package
diffstat(1)	`diffstat`	http://qa.debian.org/popcon.php?package=diffstat	45	produce a histogram of changes by the diff
combinediff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	create a cumulative patch from two incremental patches
dehtmldiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	extract a diff from an HTML page
filterdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	extract or excludes diffs from a diff file
fixcvsdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	fix diff files created by CVS that patch(1) mis-interprets
flipdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	exchange the order of two patches
grepdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	show which files are modified by a patch matching a regex
interdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	show differences between two unified diff files
lsdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	show which files are modified by a patch
recountdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	recompute counts and offsets in unified context diffs
rediff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	fix offsets and counts of a hand-edited diff
splitdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	separate out incremental patches
unwrapdiff(1)	`patchutils`	http://qa.debian.org/popcon.php?package=patchutils	221	demangle patches that have been word-wrapped
wiggle(1)	`wiggle`	http://qa.debian.org/popcon.php?package=wiggle	203	apply rejected patches
quilt(1)	`quilt`	http://qa.debian.org/popcon.php?package=quilt	814	manage series of patches
meld(1)	`meld`	http://qa.debian.org/popcon.php?package=meld	2023	compare and merge files (GTK)
xxdiff(1)	`xxdiff`	http://qa.debian.org/popcon.php?package=xxdiff	1090	compare and merge files (plain X)
dirdiff(1)	`dirdiff`	http://qa.debian.org/popcon.php?package=dirdiff	224	display differences and merge changes between directory trees
docdiff(1)	`docdiff`	http://qa.debian.org/popcon.php?package=docdiff	692	compare two files word by word / char by char
imediff2(1)	`imediff2`	http://qa.debian.org/popcon.php?package=imediff2	76	interactive full screen 2-way merge tool
makepatch(1)	`makepatch`	http://qa.debian.org/popcon.php?package=makepatch	148	generate extended patch files
applypatch(1)	`makepatch`	http://qa.debian.org/popcon.php?package=makepatch	148	apply extended patch files
wdiff(1)	`wdiff`	http://qa.debian.org/popcon.php?package=wdiff	900	display word differences between text files

10.5.1. Extracting differences for source files

One of following procedures extract differences between two source files and create unified diff files "file.patch0" or "file.patch1" depending on the file location.

$ diff -u file.old file.new > file.patch0
$ diff -u old/file new/file > file.patch1

10.5.2. Merging updates for source files

The diff file (alternatively called patch file) is used to send a program update. The receiving party applies this update to another file by the following.

$ patch -p0 file < file.patch0
$ patch -p1 file < file.patch1

10.5.3. Updating via 3-way-merge

If you have three versions of a source code, you can perform 3-way-merge effectively using diff3(1) by the following.

$ diff3 -m file.mine file.old file.yours > file

10.6. Version control systems

Here is a summary of the version control systems (VCS) on the Debian system.

	Note
	If you are new to VCS systems, you should start learning with Git, which is growing fast in popularity.

Table 10.15. List of version control system tools

package	popcon	size	tool	VCS type	comment
`cssc`	http://qa.debian.org/popcon.php?package=cssc	2240	CSSC	local	clone of the Unix SCCS (deprecated)
`rcs`	http://qa.debian.org/popcon.php?package=rcs	1101	RCS	local	"Unix SCCS done right"
`cvs`	http://qa.debian.org/popcon.php?package=cvs	4059	CVS	remote	previous standard remote VCS
`subversion`	http://qa.debian.org/popcon.php?package=subversion	4107	Subversion	remote	"CVS done right", the new de facto standard remote VCS
`git`	http://qa.debian.org/popcon.php?package=git	13073	Git	distributed	fast DVCS in C (used by the Linux kernel and others)
`mercurial`	http://qa.debian.org/popcon.php?package=mercurial	369	Mercurial	distributed	DVCS in Python and some C
`bzr`	http://qa.debian.org/popcon.php?package=bzr	65	Bazaar	distributed	DVCS influenced by `tla` written in Python (used by Ubuntu)
`darcs`	http://qa.debian.org/popcon.php?package=darcs	13347	Darcs	distributed	DVCS with smart algebra of patches (slow)
`tla`	http://qa.debian.org/popcon.php?package=tla	881	GNU arch	distributed	DVCS mainly by Tom Lord (Historic)
`monotone`	http://qa.debian.org/popcon.php?package=monotone	5150	Monotone	distributed	DVCS in C++
`tkcvs`	http://qa.debian.org/popcon.php?package=tkcvs	2476	CVS, …	remote	GUI display of VCS (CVS, Subversion, RCS) repository tree
`gitk`	http://qa.debian.org/popcon.php?package=gitk	1045	Git	distributed	GUI display of VCS (Git) repository tree

VCS is sometimes known as revision control system (RCS), or software configuration management (SCM).

Distributed VCS such as Git is the tool of choice these days. CVS and Subversion may still be useful to join some existing open source program activities.

Debian provides free VCS services via Debian Alioth service. It supports practically all VCSs. Its documentation can be found at http://wiki.debian.org/Alioth .

There are few basics for creating a shared access VCS archive.

Use "umask 002" (see Section 1.2.4, “Control of permissions for newly created files: umask”)
Make all VCS archive files belonging to a pertinent group
Enable set group ID on all VCS archive directories (BSD-like file creation scheme, see Section 1.2.3, “Filesystem permissions”)
Make user sharing the VCS archive belonging to the group

10.6.1. Comparison of VCS commands

Here is an oversimplified comparison of native VCS commands to provide the big picture. The typical command sequence may require options and arguments.

Table 10.16. Comparison of native VCS commands

CVS	Subversion	Git	function
`cvs init`	`svn create`	`git init`	create the (local) repository
`cvs login`	-	-	login to the remote repository
`cvs co`	`svn co`	`git clone`	check out the remote repository as the working tree
`cvs up`	`svn up`	`git pull`	update the working tree by merging the remote repository
`cvs add`	`svn add`	`git add .`	add file(s) in the working tree to the VCS
`cvs rm`	`svn rm`	`git rm`	remove file(s) in working tree from the VCS
`cvs ci`	`svn ci`	-	commit changes to the remote repository
-	-	`git commit -a`	commit changes to the local repository
-	-	`git push`	update the remote repository by the local repository
`cvs status`	`svn status`	`git status`	display the working tree status from the VCS
`cvs diff`	`svn diff`	`git diff`	diff <reference_repository> <working_tree>
-	-	`git repack -a -d; git prune`	repack the local repository into single pack
`tkcvs`	`tkcvs`	`gitk`	GUI display of VCS repository tree

	Caution
	Invoking a `git` subcommand directly as "`git-xyz`" from the command line has been deprecated since early 2006.

	Tip
	GUI tools such as tkcvs(1) and gitk(1) really help you with tracking revision history of files. The web interface provided by many public archives for browsing their repositories is also quite useful, too.

	Tip
	Git can work directly with different VCS repositories such as ones provided by CVS and Subversion, and provides the local repository for local changes with `git-cvs` and `git-svn` packages. See git for CVS users, and Section 10.9.4, “Git for the Subversion repository”.

	Tip
	Git has commands which have no equivalents in CVS and Subversion: "fetch", "rebase", "cherry-pick", …

10.7. CVS

See the following.

cvs(1)
"/usr/share/doc/cvs/html-cvsclient"
"/usr/share/doc/cvs/html-info"
"/usr/share/doc/cvsbook"
"info cvs"

10.7.1. Configuration of CVS repository

The following configuration allows commits to the CVS repository only by a member of the "src" group, and administration of CVS only by a member of the "staff" group, thus reducing the chance of shooting oneself.

# cd /var/lib; umask 002; mkdir cvs
# export CVSROOT=/srv/cvs/project
# cd $CVSROOT
# chown root:src .
# chmod 2775 .
# cvs -d $CVSROOT init
# cd CVSROOT
# chown -R root:staff .
# chmod 2775 .
# touch val-tags
# chmod 664 history val-tags
# chown root:src history val-tags

	Tip
	You may restrict creation of new project by changing the owner of "`$CVSROOT`" directory to "`root:staff`" and its permission to "`3775`".

10.7.2. Local access to CVS

The default CVS repository is pointed by "$CVSROOT". The following sets up "$CVSROOT" for the local access.

$ export CVSROOT=/srv/cvs/project

10.7.3. Remote access to CVS with pserver

Many public CVS servers provide read-only remote access to them with account name "anonymous" via pserver service. For example, Debian web site contents are maintained by webwml project via CVS at Debian alioth service. The following sets up "$CVSROOT" for the remote access to this CVS repository.

$ export CVSROOT=:pserver:[email protected]:/cvsroot/webwml
$ cvs login

	Note
	Since pserver is prone to eavesdropping attack and insecure, write access is usually disable by server administrators.

10.7.4. Remote access to CVS with ssh

The following sets up "$CVS_RSH" and "$CVSROOT" for the remote access to the CVS repository by webwml project with SSH.

$ export CVS_RSH=ssh
$ export CVSROOT=:ext:[email protected]:/cvs/webwml

You can also use public key authentication for SSH which eliminates the remote password prompt.

10.7.5. Importing a new source to CVS

Create a new local source tree location at "~/path/to/module1" by the following.

$ mkdir -p ~/path/to/module1; cd ~/path/to/module1

Populate a new local source tree under "~/path/to/module1" with files.

Import it to CVS with the following parameters.

Module name: "module1"
Vendor tag: "Main-branch" (tag for the entire branch)
Release tag: "Release-initial" (tag for a specific release)

$ cd ~/path/to/module1
$ cvs import -m "Start module1" module1 Main-branch Release-initial
$ rm -Rf . # optional

10.7.6. File permissions in CVS repository

CVS does not overwrite the current repository file but replaces it with another one. Thus, write permission to the repository directory is critical. For every new module for "module1" in repository at "/srv/cvs/project", run the following to ensure this condition if needed.

# cd /srv/cvs/project
# chown -R root:src module1
# chmod -R ug+rwX   module1
# chmod    2775     module1

10.7.7. Work flow of CVS

Here is an example of typical work flow using CVS.

Check all available modules from CVS project pointed by "$CVSROOT" by the following.

$ cvs rls
CVSROOT
module1
module2
...

Checkout "module1" to its default directory "./module1" by the following.

$ cd ~/path/to
$ cvs co module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ cvs diff -u

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from CVS by the following.

$ cvs up -C file_to_undo

Save the updated local source tree to CVS by the following.

$ cvs ci -m "Describe change"

Create and add "file_to_add" file to CVS by the following.

$ vi file_to_add
$ cvs add file_to_add
$ cvs ci -m "Added file_to_add"

Merge the latest version from CVS by the following.

$ cvs up -d

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in ".#filename.version".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ cvs ci -m "last commit for Release-1"
$ cvs tag Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ cvs tag -d Release-1

Check in changes to CVS by the following.

$ cvs ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" to updated CVS HEAD of main by the following.

$ cvs tag Release-1

Create a branch with a sticky branch tag "Release-initial-bugfixes" from the original version pointed by the tag "Release-initial" and check it out to "~/path/to/old" directory by the following.

$ cvs rtag -b -r Release-initial Release-initial-bugfixes module1
$ cd ~/path/to
$ cvs co -r Release-initial-bugfixes -d old module1
$ cd old

	Tip
	Use "`-D 2005-12-20`" (ISO 8601 date format) instead of "`-r Release-initial`" to specify particular date as the branch point.

Work on this local source tree having the sticky tag "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch while creating new directories as needed by the following.

$ cvs up -d

Edit files to fix conflicts as needed.

Check in changes to CVS by the following.

$ cvs ci -m "checked into this branch"

Update the local tree by HEAD of main while removing sticky tag ("-A") and without keyword expansion ("-kk") by the following.

$ cvs up -d -kk -A

Update the local tree (content = HEAD of main) by merging from the "Release-initial-bugfixes" branch and without keyword expansion by the following.

$ cvs up -d -kk -j Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to CVS by the following.

$ cvs ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes

	Tip
	"`cvs up`" command can take "`-d`" option to create new directories and "`-P`" option to prune empty directories.

	Tip
	You can checkout only a sub directory of "`module1`" by providing its name as "`cvs co module1/subdir`".

Table 10.17. Notable options for CVS commands (use as first argument(s) to cvs(1))

option	meaning
`-n`	dry run, no effect
`-t`	display messages showing steps of cvs activity

10.7.8. Latest files from CVS

To get the latest files from CVS, use "tomorrow" by the following.

$ cvs ex -D tomorrow module_name

10.7.9. Administration of CVS

Add module alias "mx" to a CVS project (local server) by the following.

$ export CVSROOT=/srv/cvs/project
$ cvs co CVSROOT/modules
$ cd CVSROOT
$ echo "mx -a module1" >>modules
$ cvs ci -m "Now mx is an alias for module1"
$ cvs release -d .

Now, you can check out "module1" (alias: "mx") from CVS to "new" directory by the following.

$ cvs co -d new mx
$ cd new

	Note
	In order to perform above procedure, you should have appropriate file permissions.

10.7.10. Execution bit for CVS checkout

When you checkout files from CVS, their execution permission bit is retained.

Whenever you see execution permission problems in a checked out file, e.g. "filename", change its permission in the corresponding CVS repository by the following to fix it.

# chmod ugo-x filename

10.8. Subversion

Subversion is a recent-generation version control system replacing older CVS. It has most of CVS's features except tags and branches.

You need to install subversion, libapache2-svn and subversion-tools packages to set up a Subversion server.

10.8.1. Configuration of Subversion repository

Currently, the subversion package does not set up a repository, so one must set it up manually. One possible location for a repository is in "/srv/svn/project".

Create a directory by the following.

# mkdir -p        /srv/svn/project

Create the repository database by the following.

# svnadmin create /srv/svn/project

10.8.2. Access to Subversion via Apache2 server

If you only access Subversion repository via Apache2 server, you just need to make the repository only writable by the WWW server by the following.

# chown -R www-data:www-data /srv/svn/project

Add (or uncomment) the following in "/etc/apache2/mods-available/dav_svn.conf" to allow access to the repository via user authentication.

<Location /project>
  DAV svn
  SVNPath /srv/svn/project
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/subversion/passwd
<LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
</LimitExcept>
</Location>

Create a user authentication file with the command by the following.

# htpasswd2 -c /etc/subversion/passwd some-username

Restart Apache2.

Your new Subversion repository is accessible at URL "http://localhost/project" and "http://example.com/project" from svn(1) (assuming your URL of web server is "http://example.com/").

10.8.3. Local access to Subversion by group

The following sets up Subversion repository for the local access by a group, e.g. project.

# chmod  2775     /srv/svn/project
# chown -R root:src /srv/svn/project
# chmod -R ug+rwX   /srv/svn/project

Your new Subversion repository is group accessible at URL "file:///localhost/srv/svn/project" or "file:///srv/svn/project" from svn(1) for local users belonging to project group. You must run commands, such as svn, svnserve, svnlook, and svnadmin under "umask 002" to ensure group access.

10.8.4. Remote access to Subversion via SSH

A group accessible Subversion repository is at URL "example.com:/srv/svn/project" for SSH, you can access it from svn(1) at URL "svn+ssh://example.com:/srv/svn/project".

10.8.5. Subversion directory structure

Many projects uses directory tree similar to the following for Subversion to compensate its lack of branches and tags.

  ----- module1
    |   |-- branches
    |   |-- tags
    |   |   |-- release-1.0
    |   |   `-- release-2.0
    |   |
    |   `-- trunk
    |       |-- file1
    |       |-- file2
    |       `-- file3
    |
    `-- module2

	Tip
	You must use "`svn copy …`" command to mark branches and tags. This ensures Subversion to record modification history of files properly and saves storage spaces.

10.8.6. Importing a new source to Subversion

Create a new local source tree location at "~/path/to/module1" by the following.

$ mkdir -p ~/path/to/module1; cd ~/path/to/module1

Populate a new local source tree under "~/path/to/module1" with files.

Import it to Subversion with the following parameters.

Module name: "module1"
Subversion site URL: "file:///srv/svn/project"
Subversion directory: "module1/trunk"
Subversion tag: "module1/tags/Release-initial"

$ cd ~/path/to/module1
$ svn import file:///srv/svn/project/module1/trunk -m "Start module1"
$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-initial

Alternatively, by the following.

$ svn import ~/path/to/module1 file:///srv/svn/project/module1/trunk -m "Start module1"
$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-initial

	Tip
	You can replace URLs such as "`file:///…`" by any other URL formats such as "`http://…`" and "`svn+ssh://…`".

10.8.7. Work flow of Subversion

Here is an example of typical work flow using Subversion with its native client.

	Tip
	Client commands offered by the `git-svn` package may offer alternative work flow of Subversion using the `git` command. See Section 10.9.4, “Git for the Subversion repository”.

Check all available modules from Subversion project pointed by URL "file:///srv/svn/project" by the following.

$ svn list file:///srv/svn/project
module1
module2
...

Checkout "module1/trunk" to a directory "module1" by the following.

$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/trunk module1
$ cd module1

Make changes to the content as needed.

Check changes by making "diff -u [repository] [local]" equivalent by the following.

$ svn diff

You find that you broke some file "file_to_undo" severely but other files are fine.

Overwrite "file_to_undo" file with the clean copy from Subversion by the following.

$ svn revert file_to_undo

Save the updated local source tree to Subversion by the following.

$ svn ci -m "Describe change"

Create and add "file_to_add" file to Subversion by the following.

$ vi file_to_add
$ svn add file_to_add
$ svn ci -m "Added file_to_add"

Merge the latest version from Subversion by the following.

$ svn up

Watch out for lines starting with "C filename" which indicates conflicting changes.

Look for unmodified code in, e.g., "filename.r6", "filename.r9", and "filename.mine".

Search for "<<<<<<<" and ">>>>>>>" in files for conflicting changes.

Edit files to fix conflicts as needed.

Add a release tag "Release-1" by the following.

$ svn ci -m "last commit for Release-1"
$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Edit further.

Remove the release tag "Release-1" by the following.

$ svn rm file:///srv/svn/project/module1/tags/Release-1

Check in changes to Subversion by the following.

$ svn ci -m "real last commit for Release-1"

Re-add the release tag "Release-1" from updated Subversion HEAD of trunk by the following.

$ svn cp file:///srv/svn/project/module1/trunk file:///srv/svn/project/module1/tags/Release-1

Create a branch with a path "module1/branches/Release-initial-bugfixes" from the original version pointed by the path "module1/tags/Release-initial" and check it out to "~/path/to/old" directory by the following.

$ svn cp file:///srv/svn/project/module1/tags/Release-initial file:///srv/svn/project/module1/branches/Release-initial-bugfixes
$ cd ~/path/to
$ svn co file:///srv/svn/project/module1/branches/Release-initial-bugfixes old
$ cd old

	Tip
	Use "`module1/trunk@{2005-12-20}`" (ISO 8601 date format) instead of "`module1/tags/Release-initial`" to specify particular date as the branch point.

Work on this local source tree pointing to branch "Release-initial-bugfixes" which is based on the original version.

Work on this branch by yourself … until someone else joins to this "Release-initial-bugfixes" branch.

Sync with files modified by others on this branch by the following.

$ svn up

Edit files to fix conflicts as needed.

Check in changes to Subversion by the following.

$ svn ci -m "checked into this branch"

Update the local tree with HEAD of trunk by the following.

$ svn switch file:///srv/svn/project/module1/trunk

Update the local tree (content = HEAD of trunk) by merging from the "Release-initial-bugfixes" branch by the following.

$ svn merge file:///srv/svn/project/module1/branches/Release-initial-bugfixes

Fix conflicts with editor.

Check in changes to Subversion by the following.

$ svn ci -m "merged Release-initial-bugfixes"

Make archive by the following.

$ cd ..
$ mv old old-module1-bugfixes
$ tar -cvzf old-module1-bugfixes.tar.gz old-module1-bugfixes
$ rm -rf old-module1-bugfixes

	Tip
	You can replace URLs such as "`file:///…`" by any other URL formats such as "`http://…`" and "`svn+ssh://…`".

	Tip
	You can checkout only a sub directory of "`module1`" by providing its name as "`svn co file:///srv/svn/project/module1/trunk/subdir module1/subdir`", etc.

Table 10.18. Notable options for Subversion commands (use as first argument(s) to svn(1))

option	meaning
`--dry-run`	dry run, no effect
`-v`	display detail messages of svn activity

10.9. Git

Git can do everything for both local and remote source code management. This means that you can record the source code changes without needing network connectivity to the remote repository.

10.9.1. Configuration of Git client

You may wish to set several global configuration in "~/.gitconfig" such as your name and email address used by Git by the following.

$ git config --global user.name "Name Surname"
$ git config --global user.email [email protected]

If you are too used to CVS or Subversion commands, you may wish to set several command aliases by the following.

$ git config --global alias.ci "commit -a"
$ git config --global alias.co checkout

You can check your global configuration by the following.

$ git config --global --list

10.9.2. Git references

See the following.

manpage: git(1) (/usr/share/doc/git-doc/git.html)
Git User's Manual (/usr/share/doc/git-doc/user-manual.html)
A tutorial introduction to git (/usr/share/doc/git-doc/gittutorial.html)
A tutorial introduction to git: part two (/usr/share/doc/git-doc/gittutorial-2.html)
Everyday GIT With 20 Commands Or So (/usr/share/doc/git-doc/everyday.html)
git for CVS users (/usr/share/doc/git-doc/gitcvs-migration.html)
- This also describes how to set up server like CVS and extract old data from CVS into Git.
Other git resources available on the web
- Git - SVN Crash Course
- Git Magic (/usr/share/doc/gitmagic/html/index.html)

git-gui(1) and gitk(1) commands make using Git very easy.

	Warning
	Do not use the tag string with spaces in it even if some tools such as gitk(1) allow you to use it. It may choke some other `git` commands.

10.9.3. Git commands

Even if your upstream uses different VCS, it may be good idea to use git(1) for local activity since you can manage your local copy of source tree without the network connection to the upstream. Here are some packages and commands used with git(1).

Table 10.19. List of git related packages and commands

command	package	popcon	size	description
N/A	`git-doc`	http://qa.debian.org/popcon.php?package=git-doc	8398	official documentation for Git
N/A	`gitmagic`	http://qa.debian.org/popcon.php?package=gitmagic	924	"Git Magic", easier to understand guide for Git
git(7)	`git`	http://qa.debian.org/popcon.php?package=git	13073	Git, the fast, scalable, distributed revision control system
gitk(1)	`gitk`	http://qa.debian.org/popcon.php?package=gitk	1045	GUI Git repository browser with history
git-gui(1)	`git-gui`	http://qa.debian.org/popcon.php?package=git-gui	1666	GUI for Git (No history)
git-svnimport(1)	`git-svn`	http://qa.debian.org/popcon.php?package=git-svn	686	import the data out of Subversion into Git
git-svn(1)	`git-svn`	http://qa.debian.org/popcon.php?package=git-svn	686	provide bidirectional operation between the Subversion and Git
git-cvsimport(1)	`git-cvs`	http://qa.debian.org/popcon.php?package=git-cvs	779	import the data out of CVS into Git
git-cvsexportcommit(1)	`git-cvs`	http://qa.debian.org/popcon.php?package=git-cvs	779	export a commit to a CVS checkout from Git
git-cvsserver(1)	`git-cvs`	http://qa.debian.org/popcon.php?package=git-cvs	779	CVS server emulator for Git
git-send-email(1)	`git-email`	http://qa.debian.org/popcon.php?package=git-email	529	send a collection of patches as email from the Git
stg(1)	`stgit`	http://qa.debian.org/popcon.php?package=stgit	1628	quilt on top of git (Python)
git-buildpackage(1)	`git-buildpackage`	http://qa.debian.org/popcon.php?package=git-buildpackage	2219	automate the Debian packaging with the Git
guilt(7)	`guilt`	http://qa.debian.org/popcon.php?package=guilt	360	quilt on top of git (SH/AWK/SED/…)

	Tip
	With git(1), you work on a local branch with many commits and use something like "`git rebase -i master`" to reorganize change history later. This enables you to make clean change history. See git-rebase(1) and git-cherry-pick(1).

	Tip
	When you want to go back to a clean working directory without loosing the current state of the working directory, you can use "`git stash`". See git-stash(1).

10.9.4. Git for the Subversion repository

You can check out a Subversion repository at "svn+ssh://svn.example.org/project/module/trunk" to a local Git repository at "./dest" and commit back to the Subversion repository. E.g.:

$ git svn clone -s -rHEAD svn+ssh://svn.example.org/project dest
$ cd dest
... make changes
$ git commit -a
... keep working locally with git
$ git svn dcommit

	Tip
	The use of "`-rHEAD`" enables us to avoid cloning entire historical contents from the Subversion repository.

10.9.5. Git for recording configuration history

You can manually record chronological history of configuration using Git tools. Here is a simple example for your practice to record "/etc/apt/" contents.

$ cd /etc/apt/
$ sudo git init
$ sudo chmod 700 .git
$ sudo git add .
$ sudo git commit -a

Commit configuration with description.

Make modification to the configuration files.

$ cd /etc/apt/
$ sudo git commit -a

Commit configuration with description and continue your life.

$ cd /etc/apt/
$ sudo gitk --all

You have full configuration history with you.

	Note
	sudo(8) is needed to work with any file permissions of configuration data. For user configuration data, you may skip `sudo`.

	Note
	The "`chmod 700 .git`" command in the above example is needed to protect archive data from unauthorized read access.

	Tip
	For more complete setup for recording configuration history, please look for the `etckeeper` package: Section 9.2.10, “Recording changes in configuration files”.