UU-encoding is a way to code a file which may contain any characters into a
standard character set that can be reliably sent over diverse networks.
As some transmission mechanisms compress or remove spaces, spaces are changed
into back-quote characters (a 96). (A better scheme might be to use a bias of
33 so the space is not created, but this is not done.)
Another newer less popular encoding method, called XX-encoding uses the set:
+-01..89ABC...XYZabc...xyz
In my opinion, XX-encoding is superior to UU-encoding because it uses more
"normal" characters that are less likely to get corrupted. In fact several of
the special characters in the UU set do not get thru an EBCDIC to ASCII
translation correctly. Conversely, an advantage of the UU set is that it does
not use lower case characters. Now-a-days both upper and lower case are sent
with no problems; maybe in the communications dark ages, there was a problem
with lower case.
This "UU" encode/decode pair can handle either XX or UU encoding. The encode
program defaults to creating a UU encoded file; but can be run with a "-x"
option to create an XX encoding.
The decode program defaults to autodetect. However the program can get confused
by comment lines preceeding the actual encoded data. The decode mode can be
forced to UU or XX with the "-u" or "-x" parameter.
Another option is for the character mapping table to be inserted at the front of
the file. The format for this is discussed later. The table parameters are
detected and used by this decode program. (A table will override the "-x" or
"-u" parameters.) The encode program can be run with a "-t" option which tells
it to put the table into the encoded file.
A third encode mapping is the one used by Brad Templeton's ABE program. This is
not handled by these programs as the check and control information surrounding
the actual encoded data is in a different form.
From a theoritical view, this encoding is breaking down 24 bits modulo 64. Note
that 64**3 is = 2**24. The result is 24 bits in for 32 bits out, a 33% size
increase. Note that 85**5 > 2**32. Also note that there are 94 transmittable
ASCII characters (from 0x21 thru 0x7e). Thus modulo 85 encoding (the atob
encoder) transforms 32 bits to 5 ASCII chars or 40 bits for a 25% size increase.
The trade off in the module 85 encoding is that many communications systems do
not reliably transmit 85 ASCII characters. The tilda, carat, brackets, and
sometimes upper or lower case frequently get corrupted.
This encode program puts a check character at the end of each line. The check
is the sum of all the encoded characters, before adding the mapping, modulo 64.
Note: Horton 9/1/87 UUENCODE has a bug in the line check algorithm; it uses the
sum of the original, not the encoded characters. This decode program accepts
either form of line check character.
In previous versions (4.13 and lower) the line check characters was generated by
default by this encode program and was supressed with the "-L" option. One
reason to supress them is if they will be decoded by one of the old Horton
decoders. Most decoders either accept this form of check or simply stop looking
after the line length is exhausted. My feelings are mixed about the line
checksums because errors of this type essentially never occur.
However with modern, error-free communications systems and with the CRC checks
on the entire file (see below) I have made the default for uuencoding to have NO
line level check characters effective version 4.21. The "-L" option on uuencode
turns on generation of line checksums. If you have a really bad communications
system and you want to isolate a problem, turn them on.
Uudecode automatically checks for the presense line checksums, so the default
for uudecode is to leave line level checks on; if there are some problems the
"-L" option for uudecode turns them off. Sometimes there is junk at the end of
the line which causes spurious line checksum errors.
I have encountered various other ways that encoders end lines. One encoder put
a "M" at both the start and end of the line. Another used a line count
character. This decode program checks all of these. I would not be surprised
if some encoder out there ends lines with astrological symbols. If you
encounter some other wierd form of encoded file, let me know.
Done privately and not for profit (freeware). Suggestions appreciated.
The programs are written in Turbo Pascal 5.5 with about 5% TASM for speed. The
source is not public domain. I would entertain consulting contracts for porting
to other hardware platforms. Also if included in your for profit product,
please contact me.
Richard Marks
Copyright Richard E. Marks, Bryn Mawr, PA, 1992
THE CHARACTER ENCODING:
The basic scheme is to break groups of 3 eight bit characters (24 bits) into 4
six bit characters and then add 32 (a space) to each six bit character which
maps it into the readily transmittable character. Another way of phrasing this
is to say that the encoded 6 bit characters are mapped into the set:
`!"#$%&'()*+,-./012356789:;<=>?@ABC...XYZ[\]^_
for transmission over communications lines.
COMPOSING A LINE OF ENCODED CHARACTERS:
A small number of eight bit characters are encoded into a single line and a
count is put at the start of the line. (Most lines in an encoded file have 45
encoded characters. When you look at a UU-encoded file note that most lines
start with the letter "M". "M" is decimal 77 which, minus the 32 bias, is 45.)
931 Sulgrave Lane
Bryn Mawr, PA 19010
by