BFA7


                       Blowfish Advanced 7



                    File Format Specification
                     and Technical Reference



                        (c)1996 AtmuteSoft






File Format
~~~~~~~~~~~


When you encrypt a file with Blowfish Advanced 7 (BFA7) its
original data will be put in a kind of digital enveloppe.
This enveloppe guarantees save data recovering, secure
password error and data corruption detection and at last an
authentication check. Another design criteria was
serialization, that means that a file's bytes always will
be written one after the other, thus no seek operation will
be necessary at any time. This enables BFA7 to work with
backup utilities, especially streamer tapes, and avoids a
lot of i/o overhead because the read/write heads of harddisks
mustn't be moved too much. But the most important argument
was to develop a format that can be extended in future
versions easily but also be read by older versions if the
new added informations aren't important for an accurate 
decryption. BFA7's successor is already under development
and will become a native Windows(tm) 95 program. Normally the
movement from a 16bit to a 32bit operation system (just think
about the long filenames) is a major undertaking and often
ends in a necessary redesign of file formats. But not to
Blowfish Advanced 7. It has already been prepared for the
future.


A file encrypted BFA7 will rawly look like that :

                           ==============
                           |  loader    |
                           ==============
                           |  header    |
                           ==============          -----
                           | infoblock  |           / \
                           | (filename) |            |
                           ==============            |
                           | data chunk |        encrypted (*)
                           | data chunk |            |
                           |    ...     |            |
                           ==============            |
                           |   tailer   |           \ /
                           ==============          -----

Before I start to explain this scheme please remember that
all data structures are declared in the Borland Pascal
programming language for the 16bit DOS operating system with
no alignment between the structure members. And you'll often
hear about keys and passwords. Both are the same, there's no
difference between them, although a password is mostly a real
word or sentence and a key the generally description for the
corresponding ASCII bytes or the bytes entered in numercial/
mouse/antitap mode


The first unit in a cryptfile is the so called "loader",
in Pascal its structure looks like the following :

        TCryptfileLoader = record
          LatestVersion : Word;
          Signature     : Longint;
          HeaderSize    : Byte;
        end;

The loader is, as you can see, only 7 bytes long, so it can be read
out very quickly. This allows BFA7 to detect unencrypted or unsupported
files (especially in viewing mode) fast and easily.

'LatestVersion' contains the version number of the program necessary
to decrypt this file. Future version may have new features which 
won't be handled correctly with BFA7. The minor version number is
stored in the low, the major version number in the high byte.

'Signature' is a 4byte ID which always contains the hexdecimal 
constant $75191114 (yes, it's a birtday). If BFA7 detects this
value the file will be identified as a cryptfile, otherwise it'll
output the "(not encrypted)" message.

'HeaderSize' contains the size of the following header structure. This
allows future version to store additional data in the header. BFA7 will
read them - and ignore them. Now you see why the loader is called
"loader".


After the loader has been interpreted correctly BFA7 will load
TCrypfileLoader.HeaderSize bytes and treat them as the header
in the following structure :

TCryptfileHeader = record
 	Version         : Word;
	TailerSize      : Byte;
	FileInfChecksum : Word;
	FileInfSize     : Byte;
	CBC_IV_Left     : Longint;
	CBC_IV_Right    : Longint;
end;

The header contains all data necessary to start the decryption
process. Please remember that all bytes following the header are
already encrypted (*).

'Version' stores the version of the program which has encrypted the
actual file. Therefore future version might have the chance to fix
problems or speed up the decryption process.

'TailerSize' contains the length of the cryptfile's tailer. More
about the tailer will follow later.

'FileInfChecksum' contains a CRC16 checksum of the following file
information block (FIB). As already mentioned, the FIB will be
encrypted. This gives BFA7 the possibility to detect wrong passwords
very fast. If a wrong password is given, the FIB won't be decrypted
correctly, the CRC16 check will fail and BFA7 will break with the
"password/algorithms failed" message. Please watch that the CRC16
is only a "fingerprint" of the FIB. If the calculated CRC16 is equal
to the one stored in the header it does NOT mean that the password
is the right one for 100%, but only with a propability of 1:65536!
Let us assume that you have to try out 2^32 passwords because your
estimated (binary) key length is 4 bytes. The CRC16 check will show
success every 65536th password on average. With 2^32 possibilities
at all you will get 2^32/65536 possible passwords. Have fun while
trying them all out! Secure passwords should always be longer, so the
CRC16 check is not a security lack, but a fine method to detect
wrong passwords quickly, even if you want to encrypt your whole
file system (do you really have more that 65536 files on your drive?).
Even if a wrong key passes the CRC16 check BFA7 will detect it
by the global CRC32 check.
Just to be mentioned : BFACRACK, the brute force key searching program
for registered users is trying to get the passwords by decrypting
the FIB and checking the CRC16. This allows you to find passwords you
don't remember completely or you may have typed in incorrectly.
But BFACRACK is no wizzard! And BFA7 has no trapdoors.

'FileInfSize' is the size of the FIB. Future versions will have
larger FIB's, so it's necessary to play the same game as we did
it with the sizable header.

'CBC_IV_Left' and 'CBC_IV_Right' contain the so called Cipher Block
Chaining Initalisation Vector (CBC IV), a 64bit random value that is
splitted into two 32 parts. This encryption start value makes every
cryptfile unique, so even if you encrypt the file with the same password
you will always get different cryptfiles. Because BFA7 encrypts all
data in 64bit blocks the CBC IV must have the same size. All the
following blocks, which contain the encrypted data are linked
together. And the first element in this chain is simply this vector.
You may miss an entry for the algorithm(s) used for encryption.
Well, I decide to resign for that option for two reasons : a)
decryption of multiple files can be managed much faster b) it's an
additional secret information. If someone doesn't know the algorithm(s)
you've choosen (provided that you didn't used those definded in the
BFA.INI) (s)he has to try out all combinations in addition to all
possible passwords. BFA7 now offers 5 different algorithms now, so there
are 25 combinations (20 for mixed and 5 for single encryption).


The file information block (FIB) is placed direcly after the header
and has the following form :

	TCryptfileInfoBlock = record
 	  Attribute    : Byte; 
          DateTime     : Longint; 
          FileNameLen  : Byte; 
          Dummy        : array[1..2] of Byte;
        end;

'Attribute' contains the original attribute of the encrypted file,
which will be restored after decryption.

'DateTime' is the original date+time stamp of the encrypted file
and also be restored after decryption.

'FileNameLen' stores the length of the filename appended to the FIB.
If you forced BFA7 not to store the original filename then the
value is always 0.

'Dummy' aligns the size of TCryptfileInfoBlock to the next 
64bit (= 8 bytes) border, which you can check easily :
		 size of 'Attribute'  	1 byte
	       + size of 'DateTime'  	4 bytes
	       + size of 'FileNameLen'	1 byte
               + size of 'Dummy'	2 bytes
				      = 8 bytes = 64 bits 			
This 8byte border wasn't choosen arbitrary, it depends on how the
used algorithms (Blowfish, GOST, TDES and Cobra) work. They always
take 8 bytes together in a block (that's why they called block
ciphers) and encrypt them. If your data isn't adjustable to an 8byte
border you will have to add some dummy bytes. These dummy bytes can
be everything, but it's better to choose them via a random generator,
as BFA7 does.
The FIB is encrypted, as already mentioned. So nobody who doesn't
know the right password/key will be able to get the original file's
attribute, date+time stamp and the length of the filename.


If you decide to store the original filename in the cryptfile then
the filename's bytes will be adjusted with random bytes to the next
64bit border and after that be encrypted. E.g. with the filename
"test1.txt" BFA7 has to encrypt 9 bytes . First it must add (2*8)-9 =
7 random bytes, then it encrypts these two 8byte blocks and appends
them after the FIB. Now it's obvious why we have to store the original
filename length into the FIB.


The encrypted file data follows right after the FIB and the optional 
stored filename. For easier handling and because BFA7 offers
additional data compression the original data is splitted, encrypted 
and compressed in so called "chunks". A chunk is a block with a small
header and an appended data area. The header is defined by this way :

	TChunkHeader = record
	  Flags     : Byte;  
	  ChunkLen  : Word;   
	  OrigBytes : Word;  
	end;

It's not necessary for BFA7 to encrypt a chunk's header, because it
doesn't contain any private data (this also saves 3 bytes per chunk
since the header musn't be aligned from 4 to 8 bytes).

'Flags' defines a bit mask to describe the header's characteristics.
The following bits are valid, all others are undefined and always set
to zero :        0 0 0 0 0 x x x
			   | | |
                           | | +--- 1 = last, 0 = normal chunk
                   	   | +----- 1 = compressed, 0 = uncompress
                           +------- 1 = 24bit, 0 = 16bit chunk
The first bit is set if no chunk will follow the actual one, so
the decryption routine can prepare itself to read out the waiting
tailer. The second bit is set if the chunk contains compressed data,
if so the decryption routine has to pass the decrypted data through
a decompressor to get the original bytes. Although the third bit is
always set to zero in BFA7 it already has been defined. The Windows(tm) 95
version will, thanks to the 32bit technology, be able to handle
chunks up to the size of 16 megabytes, so this flag will be set then.
Unfortunately 16bit DOS based BFA7 won't be able to decrypt, but
will recognize such extended chunks (and refuse to handle them,
instead of wondering about following strange data).

'ChunkLen' is a 16bit value containing the size of the following
(unadjusted) data. After BFA7 has read out the chunk header this 
value will be round up to the next 8 byte border, telling the program
how much bytes to load and decrypt.

'OrigBytes' contains the number of the original (uncompressed) bytes
that were stored in the chunk. If the bytes were not compressed it's
easy to see that the nuber in 'ChunkLen' is equal to that in 'OrigBytes'.

To make the whole chunking process more clearly let's try an
example : your original file is 80001 bytes large. BFA7 gets
the maximum number of possible bytes (65536-4096=61440).
Let's assume these bytes are compressed to about 45223 bytes, then
the put out chunk will look like this :
	        ------------------------------------- 
		header :  Flags     = 0 0 0 0 0 0 1 0
                	  ChunkLen  = 45223 
			  OrigBytes = 61440
        	-------------------------------------
	        data : 45223+1 bytes (45224 / 8 = 0) 
  		-------------------------------------
Now you've 80001-61440=18561 bytes left. Let's assume that this
rest isn't compressible. BFA7 tries to compress, but the compressed
result will be larger that the original. It's the last chunk, so the
it have look like this now:
	        ------------------------------------- 
		header :  Flags     = 0 0 0 0 0 0 0 1
                	  ChunkLen  = 18561
			  OrigBytes = 18561
        	-------------------------------------
                data : 18561+7 bytes (18568 / 8 = 0) 
          	-------------------------------------
You see that you can't say exactly that a cryptfile of BFA7 is 
compressed or not. It CAN contain compressed chunks. There's
only one exception : if you use the /cf option the program will
even stored compressed data that is larger than the original.
But BFA7 knows nothing about you and this option until it has read out 
the chunk header and interpreted it.


After the final chunk has been added the cryptfile is closed with
the tailer structure :

	TCryptfileTailer = record
	  OrigFileLen  : Longint; 
	  FileChecksum : Longint; 
	end;

The tailer is also encrypted, but doesn't need any additional
bytes to align, because it already fits into a single 64bit element.

'OrigFileLen' contains the number of original bytes. Please
remember that cryptfiles are read and written in a linear way.
Because of that the original file length can only be stored after
all bytes have been read (, compressed) and encrypted.

'FileChecksum' is a CRC32 checksum of the file's original bytes.
It's mainly there for detecting data corruption, e.g. bad sectors
on floppy disks or undetected transmission errors, but it's also
good for the authentication of a file. Because the CRC32 is
encrypted nobody can add or remove some data from your file without
the knowledge of your password.



Algorithms
~~~~~~~~~~

BFA7 offers the user 5 different algorithms. All algorithms
are 64bit block chipers and used in CBC (Cipher Block Chaining)
mode.

Blowfish - The algorithm was designed by Bruce Schneier
           (schneier@counterpane.com) and gave BFA7 its
	   name. Blowfish is a very fast algorithm, performing
           excellent on modern 32bit processors. Another advantage
           is its variable key size up to 448 bits (56 bytes).    
           It was first published in DDJ 4/94. After a year of intensive
           cryptanalyse it was still unbroken.

Blowfish32 - Standard Blowfish encryption works with 16 rounds,
             that means that every 64bit block will pass the
             encryption function 16 times. Blowfish is easy to
             extend to make more or less rounds. Blowfish32 now
             is just Blowfish with 32 instead of 16 rounds. That 
             means that the data will be twice more irrecognizable
	     as with Blowfish16, but the encryption and decryption
             will take twice more time, too. Although Blowfish32 might
             be "overkill" it's a good additional feature.

GOST - An algorithm descending from the former Sovjet Union. It's like
       the russian counterpart to the DES algorithm. Although it has
       been used for a long time there are no known weaknesses. The
       only strange part is a short table of fixed data (the so called
       substitution boxes, s-boxes). These data is exchangable and might
       have influence on the algorithms security. Now you can speculate
       that some institutions might have had "better" s-boxes and some
       "minor" institutions s-boxes easier to crack, but nobody knows.
       The s-boxes used in BFA7 are choosen randomly so there's no
       built-in weakness. GOST isn't as fast as Blowfish, but it encrypts
       data in 32 rounds as standard (however the encryption function
       is simpler than the one of Blowfish). GOST's maximum key length 
       is 32 bytes.

TDES - DES is the standard encryption algorithm, designed by IBM in
       the middle 70es and very often used all over the world. Although
       it has been cryptanalysed for more than 20 years now nobody has
       found a real weakness. The only problem of DES is its short
       key length : 7 bytes. If you have some very fast computers you
       can try out all possible passwords within a few hours ("very
       fast" means machines which will cost about 1 million dollars).
       There are some DES variants, extending the original algorithm
       to a new one which has a larger key. The most popular one is
       triple-DES, where the 64bit data block will be encrypted
       almost 3 times with DES, using 3 different keys.
       Originally I didn't want to implement any DES variant, because
       DES is rather slow (to say nothing of triple-DES), but a lot
       of people trust this proved algorithm.
       To speed things up triple-DES was modified in an uncritical
       way. Before and after the original encryption a 64bit block should
       be permutated (refering to the DES reference papers). Originally
       DES was implemented in hardware, not in software, and these
       permutations are good for loading the block's bytes into the
       hardware registers. But they don't affect DES's security - so
       it's the programmers decision to implement the permutation
       stage or not. BFA7's implementation doesn't use these
       permutations to obtain more speed. I called this "reduced"
       algorithm TDES instead of triple-DES, because it's not the
       original algorithm, of course. The key length is 21 bytes.
       For the experts : TDES gets a block with 8 bytes (with no MSB 
       conversion), encrypts it with the byte 0..6 of the key,
       decrypts it with byte 7..13 of the key and encrypts it with
       byte 17..20 of the key.
          
Cobra - This is a new algorithm, designed by Christian Schneider
        (100542.2132@compuserve.com). It was published in the    
        newsgroup sci.crypt.research in the middle of April 1996. 
        Normally it's silly to implement an unproved algorithm,
        even Blowfish demands for more analyse before you can call
        it really save. But Cobra wasn't designed from the scratch.
        You can describe it rather as a mutation of Blowfish using
        some interesting extending techniques. A real difference is
        its key size : 32 bytes (64bit Cobra) instead of 56 bytes
        (Blowfish). Cobra was originally designed to be a 128bit block 
        chipher with 24 encryption rounds. Due to it's open architecture
        it can be reduced or extended for larger or smaller block
        handling. BFA7 uses Cobra in a 64bit block form with 16 rounds.
        For more information about Cobra please contact Chris via email.

If you combine algorithms (mixed encryption), then BFA7 will work in
the following manner : The 64bit block will first be encrypted with
the algorithm #1. The encrypted block will then be encrypted a second
time with algorithm #2. Decryption will do the whole procedure
in reverse : first decrypt it with algorithm #2, then with algorithm #1.
So two different algorithms will be melted together. CBC vector updating
is done the same way as in single encryption mode.
           


Weak keys
~~~~~~~~~

Weak keys are such passwords for which the encryption algorithm
doesn't produce a secure encrypted output. Only Blowfish and TDES
have known weak keys, but they will be correctly identified by BFA7.
You can test this by giving BFA7 the numerical password 0,0,0,0 for TDES.
The program will then pop up a warning message that a weak key was
detected and give you the chance to quit the program.



Key setup
~~~~~~~~~

As you might already have recognized every algorithm has a different
key size. BFA7 allows you to use a password/key variable in its size
from 3 to the maximum number of possible characters. If you don't
enter a password with the maximum length then it'll be extended by just
repeating the bytes. If you choose e.g. "hello" as your password and
TDES as the encryption algorithm, the password will be extended to
"hellohellohellohellohelloh" (21 characters). Of course there are
other possibilities to extend passwords, with hash functions like
MD5, for example. The problems with hash functions is that you have to
trust them to crunch or expand the original key without additional
weaknesses and that most of them put out a fixed key (MD5 e.g. 16 bytes)
which has to be expanded by other (sometimes risky) techniques.
But the simple repeating method (SRM) used by BFA7 allows a very direct
input of passwords/keys, especially in binary mode (numercial input).
The only disadvantage of BFA7's SRM key setup is that you have to
take care of the password you choose. Let's assume you want to use
"bonbon". 6 characters, 308.915.776 (26^6) possibilities, right?
Wrong! If you only would have typed "bon" it would have been the same
because of the repetition technique. So the real number of possibilities
with "bonbon" is 17.576 (26^3) - your password is too weak! An attacker
doesn't know how long your password is, of course. But (s)he will
try the lower sized passwords first. To avoid insecure encryption
BFA7 checks for such password "clones" and will warn you if you've
entered something weak like "bonbon".



Random generator
~~~~~~~~~~~~~~~~

The random generator BFA7 uses is a pseudo-random-sequence generator
using linear feedback shift registers. Although the program needs random
bytes only to fill unaligned 8byte (64bit) blocks and for generating
the CBC initalisation vectors this method is much saver than the
conventional built-in linear congruential generators of the programming
languages.



Data compression
~~~~~~~~~~~~~~~~

BFA7 uses the LZH (LZ77 with adaptive huffman coding) algorithm to
compress data. The original LZ77 algorithm was coded by Haruhiko
Okumura, the adaptive Huffman coding by Haruyasu Yoshizaki (developer
of the LHA compression program). The translation into a Turbo
Pascal unit was done by Douglas Webb.
The compression speed is really slow in BFA7. This depends on the
pure Pascal implementation. Future version will have an assembly engine
to speed up the compression process.



-end-