uni.horse / executable tar archives

So an internet person said something about making tarballs executable (the post has since fallen off the internet, it seems), in the same way as e.g. AppImages, and I got Inspired.

There's prior art for this, of course, in the form of AppImage (which works by mounting the file as a filesystem) and shar (which generates a shell script that when executed creates a bunch of files).

These are intended to be completely serious tools that people actually use, which is unacceptable for my purposes.

first pass: a shell script with a tarball attached

We'll start with this shell script, main, as the application we want to package:

#!/bin/bash
echo "hi I'm a tarball"
echo "argv: $*"
$ ./main hello world
hi I'm a tarball
argv: hello world

We can just pack this up in a tarball, then have the end user extract it and execute it. We can automate that by prefixing it with a shell script:

#!/bin/bash
tempdir=$(mktemp -d)
sed '1,/^exit/d' <$0 |tar -x -C $tempdir
$tempdir/main $*
rm -rf $tempdir
exit

The sed command on line 3 has the effect of skipping every line up to, and including, exit on line 6 (which is there both as a convenient "start of archive" marker and to ensure that we don't go off trying to execute the archive as shell commands after the wrapped application exits). We then pass the remaining file contents (the archive) to tar -x.

Note that, since the file suddenly switches from text to binary at this point, there must be exactly one newline at the end to make it work correctly (zero would cause sed to cut off the start of the archive thinking it's part of the exit line, and two or more would mean n - 1 of them get left in before the archive):

$ hexdump header -C -s 0x60
00000060  24 74 65 6d 70 64 69 72  0a 65 78 69 74 0a        |$tempdir.exit.|

We can then just concenate that with an archive and execute it:

$ cp header executable.tar
$ tar -c entry >>executable.tar
$ chmod +x executable.tar
$ ./executable.tar hello world
hi I'm a tarball
argv: hello world

but that's not a tarball

The problem with this is that tar no longer understands this archive on its own, so it must be executed to extract it:

$ tar -xf executable.tar 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

So, can we do better? The tar file format has no archive-wide header, only per-file headers; i.e. the first thing in the archive is a file header. If we can construct a tar file header that is also a valid shell script, both methods should work.

A file header is 500 bytes, where the first 100 are a file name and the last 155 are a prefix (prepended to the name to make 255 total characters). Additionally, there's a 100-character linkname at offset 157, used for symlinks.

struct posix_header
{                              /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */ /* sum of all header bytes, with this field set to ASCII spaces, then 6-char octal + null + space */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */ /* for GNU tar, version+magic = "ustar  \0" */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
                                /* 500 */
};

(All these fields are ASCII; all arrays but version are null-terminated. Even the numbers, which are written in octal. It's a weird format.)

Our existing header script from above is 110 bytes. We can reduce it to under 100 by using a shorter variable name for $tempdir. We also now know exactly how many bytes to skip (512, because tar headers are 512-byte aligned), so we don't need to do the sed thing:

#!/bin/sh
t=$(mktemp -d)
tail -c+513 $0|tar -x -C $t
$t/main $*
rm -rf $t
exit
This is 79 bytes, which can fit in the name field of a placeholder entry:
00000000  23 21 2f 62 69 6e 2f 73  68 0a 74 3d 24 28 6d 6b  |#!/bin/sh.t=$(mk|
00000010  74 65 6d 70 20 2d 64 29  0a 74 61 69 6c 20 2d 63  |temp -d).tail -c|
00000020  2b 35 31 33 20 24 30 7c  74 61 72 20 2d 78 20 2d  |+513 $0|tar -x -|
00000030  43 20 24 74 0a 24 74 2f  6d 61 69 6e 20 24 2a 0a  |C $t.$t/main $*.|
00000040  72 6d 20 2d 72 66 20 24  74 0a 65 78 69 74 0a 00  |rm -rf $t.exit..|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 30 30 30 30  36 34 34 00 30 31 37 37  |....0000644.0177|
00000070  37 37 36 00 30 31 37 37  37 37 36 00 30 30 30 30  |776.0177776.0000|
00000080  30 30 30 30 30 30 30 00  30 30 30 30 30 30 30 30  |0000000.00000000|
00000090  30 30 30 00 30 32 31 30  36 32 00 20 56 00 00 00  |000.021062. V...|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  00 75 73 74 61 72 20 20  00 00 00 00 00 00 00 00  |.ustar  ........|
00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The file type (offset 0x9C) is set to 'V', a GNU extension (volume header, normally set with -V) that has the side effect of being skipped during extraction without causing an error (source). (GNU tar prints it in tar -tvf output; BSD tar silently ignores it.)

And then we can prepend this to any tar file to make it executable:

$ tar -cf normal.tar main
$ cat executable-tar-header normal.tar >executable.tar 
$ chmod +x executable.tar 

$ file executable.tar 
executable.tar: POSIX tar archive (GNU)

$ tar -tvf executable.tar 
Vrw-r--r-- 65534/65534       0 1969-12-31 19:00 #!/bin/sh\nt=$(mktemp -d)\ntail -c+513 $0|tar -x -C $t\n$t/main $*\nrm -rf $t\nexit\n--Volume Header--
-rwxr-xr-x emily/emily      50 2022-04-04 15:33 main

$ ./executable.tar hello world
hi I'm a tarball
argv: hello world

I take no responsibility for whatever horrifying thing you do with this information.