tar and POSIX tar
GNU tar was based on an early draft of the POSIX 1003.1
ustar standard. GNU extensions to tar, such as the
support for file names longer than 100 characters, use portions of the
tar header record which were specified in that POSIX draft as
unused. Subsequent changes in POSIX have allocated the same parts of
the header record for other purposes. As a result, GNU tar is
incompatible with the current POSIX spec, and with tar programs
that follow it.
We plan to reimplement these GNU extensions in a new way which is
upward compatible with the latest POSIX tar format, but we
don't know when this will be done.
In the mean time, there is simply no telling what might happen if you
read a GNU tar archive, which uses the GNU extensions, using
some other tar program. So if you want to read the archive
with another tar program, be sure to write it using the
`--old-archive' option (`-o').
@FIXME{is there a way to tell which flavor of tar was used to write a particular archive before you try to read it?}
Traditionally, old tars have a limit of 100 characters. GNU
tar attempted two different approaches to overcome this limit,
using and extending a format specified by a draft of some P1003.1.
The first way was not that successful, and involved `@MaNgLeD@'
file names, or such; while a second approach used `././@LongLink'
and other tricks, yielding better success. In theory, GNU tar
should be able to handle file names of practically unlimited length.
So, if GNU tar fails to dump and retrieve files having more
than 100 characters, then there is a bug in GNU tar, indeed.
But, being strictly POSIX, the limit was still 100 characters.
For various other purposes, GNU tar used areas left unassigned
in the POSIX draft. POSIX later revised P1003.1 ustar format by
assigning previously unused header fields, in such a way that the upper
limit for file name length was raised to 256 characters. However, the
actual POSIX limit oscillates between 100 and 256, depending on the
precise location of slashes in full file name (this is rather ugly).
Since GNU tar use the same fields for quite other purposes,
it became incompatible with the latest POSIX standards.
For longer or non-fitting file names, we plan to use yet another set
of GNU extensions, but this time, complying with the provisions POSIX
offers for extending the format, rather than conflicting with it.
Whenever an archive uses old GNU tar extension format or POSIX
extensions, would it be for very long file names or other specialities,
this archive becomes non-portable to other tar implementations.
In fact, anything can happen. The most forgiving tars will
merely unpack the file using a wrong name, and maybe create another
file named something like `@LongName', with the true file name
in it. tars not protecting themselves may segment violate!
Compatibility concerns make all this thing more difficult, as we
will have to support all these things together, for a while.
GNU tar should be able to produce and read true POSIX format
files, while being able to detect old GNU tar formats, besides
old V7 format, and process them conveniently. It would take years
before this whole area stabilizes...
There are plans to raise this 100 limit to 256, and yet produce POSIX
conformant archives. Past 256, I do not know yet if GNU tar
will go non-POSIX again, or merely refuse to archive the file.
There are plans so GNU tar support more fully the latest POSIX
format, while being able to read old V7 format, GNU (semi-POSIX plus
extension), as well as full POSIX. One may ask if there is part of
the POSIX format that we still cannot support. This simple question
has a complex answer. Maybe that, on intimate look, some strong
limitations will pop up, but until now, nothing sounds too difficult
(but see below). I only have these few pages of POSIX telling about
`Extended tar Format' (P1003.1-1990 -- section 10.1.1), and there are
references to other parts of the standard I do not have, which should
normally enforce limitations on stored file names (I suspect things
like fixing what / and NUL means). There are also
some points which the standard does not make clear, Existing practice
will then drive what I should do.
POSIX mandates that, when a file name cannot fit within 100 to
256 characters (the variance comes from the fact a / is
ideally needed as the 156'th character), or a link name cannot
fit within 100 characters, a warning should be issued and the file
not be stored. Unless some --posix option is given
(or POSIXLY_CORRECT is set), I suspect that GNU tar
should disobey this specification, and automatically switch to using
GNU extensions to overcome file name or link name length limitations.
There is a problem, however, which I did not intimately studied yet.
Given a truly POSIX archive with names having more than 100 characters,
I guess that GNU tar up to 1.11.8 will process it as if it were an
old V7 archive, and be fooled by some fields which are coded differently.
So, the question is to decide if the next generation of GNU tar
should produce POSIX format by default, whenever possible, producing
archives older versions of GNU tar might not be able to read
correctly. I fear that we will have to suffer such a choice one of these
days, if we want GNU tar to go closer to POSIX. We can rush it.
Another possibility is to produce the current GNU tar format
by default for a few years, but have GNU tar versions from some
1.POSIX and up able to recognize all three formats, and let older
GNU tar fade out slowly. Then, we could switch to producing POSIX
format by default, with not much harm to those still having (very old at
that time) GNU tar versions prior to 1.POSIX.
POSIX format cannot represent very long names, volume headers,
splitting of files in multi-volumes, sparse files, and incremental
dumps; these would be all disallowed if --posix or
POSIXLY_CORRECT. Otherwise, if tar is given long
names, or `-[VMSgG]', then it should automatically go non-POSIX.
I think this is easily granted without much discussion.
Another point is that only mtime is stored in POSIX
archives, while GNU tar currently also store atime
and ctime. If we want GNU tar to go closer to POSIX,
my choice would be to drop atime and ctime support on
average. On the other hand, I perceive that full dumps or incremental
dumps need atime and ctime support, so for those special
applications, POSIX has to be avoided altogether.
A few users requested that --sparse (-S) be always active by
default, I think that before replying to them, we have to decide
if we want GNU tar to go closer to POSIX on average, while
producing files. My choice would be to go closer to POSIX in the
long run. Besides possible double reading, I do not see any point
of not trying to save files as sparse when creating archives which
are neither POSIX nor old-V7, so the actual --sparse (-S) would
become selected by default when producing such archives, whatever
the reason is. So, --sparse (-S) alone might be redefined to force
GNU-format archives, and recover its previous meaning from this fact.
GNU-format as it exists now can easily fool other POSIX tar,
as it uses fields which POSIX considers to be part of the file name
prefix. I wonder if it would not be a good idea, in the long run,
to try changing GNU-format so any added field (like ctime,
atime, file offset in subsequent volumes, or sparse file
descriptions) be wholly and always pushed into an extension block,
instead of using space in the POSIX header block. I could manage
to do that portably between future GNU tars. So other POSIX
tars might be at least able to provide kind of correct listings
for the archives produced by GNU tar, if not able to process
them otherwise.
Using these projected extensions might induce older tars to fail.
We would use the same approach as for POSIX. I'll put out a tar
capable of reading POSIXier, yet extended archives, but will not produce
this format by default, in GNU mode. In a few years, when newer GNU
tars will have flooded out tar 1.11.X and previous, we
could switch to producing POSIXier extended archives, with no real harm
to users, as almost all existing GNU tars will be ready to read
POSIXier format. In fact, I'll do both changes at the same time, in a
few years, and just prepare tar for both changes, without effecting
them, from 1.POSIX. (Both changes: 1--using POSIX convention for
getting over 100 characters; 2--avoiding mangling POSIX headers for GNU
extensions, using only POSIX mandated extension techniques).
So, a future tar will have a --posix
flag forcing the usage of truly POSIX headers, and so, producing
archives previous GNU tar will not be able to read.
So, once pretest will announce that feature, it would be
particularly useful that users test how exchangeable will be archives
between GNU tar with --posix and other POSIX tar.
In a few years, when GNU tar will produce POSIX headers by
default, --posix will have a strong meaning and will disallow
GNU extensions. But in the meantime, for a long while, --posix
in GNU tar will not disallow GNU extensions like --label=archive-label (-V archive-label),
--multi-volume (-M), --sparse (-S), or very long file or link names.
However, --posix with GNU extensions will use POSIX
headers with reserved-for-users extensions to headers, and I will be
curious to know how well or bad POSIX tars will react to these.
GNU tar prior to 1.POSIX, and after 1.POSIX without
--posix, generates and checks `ustar ', with two
suffixed spaces. This is sufficient for older GNU tar not to
recognize POSIX archives, and consequently, wrongly decide those archives
are in old V7 format. It is a useful bug for me, because GNU tar
has other POSIX incompatibilities, and I need to segregate GNU tar
semi-POSIX archives from truly POSIX archives, for GNU tar should
be somewhat compatible with itself, while migrating closer to latest
POSIX standards. So, I'll be very careful about how and when I will do
the correction.
Go to the first, previous, next, last section, table of contents.