BCOS File FormatsProject Map
BCOS Native File Format Specification
Preliminary Draft
 

Contents

1                Native File Format Header
1.1                File Size
1.2                Compliance String
1.3                Checksum
1.3.1                CRC Calculation
1.4                File Type
2                Compliance Testing



1   Native File Format Header

To improve consistency, all native file formats used by BCOS begin with a generic file header. Depending on the file type, this generic header may be extended with a file type dependant header extension. This specification describes the generic file header only and forms the basis for all other native file formats (see the specifications for each specific file type for details on any file type dependant header extension).

Table 1.1: Generic File Header shows the format of the generic file header.

OffsetSizeDescription
  0x00000000
  8 bytes
  File size (see Section 1.1: File Size)
  0x00000008
  8 bytes
  Compliance string (see Section 1.2: Compliance String)
  0x00000010
  4 bytes
  Checksum (see Section 1.3: Checksum)
  0x00000014
  4 bytes
  File type (see Section 1.4: File Type)
  0x00000018
  8 bytes
  Reserved (must be zero)
Table 1.1 - Generic File Header

These fields are mainly intended to be used to verify that the file actually does comply with one of the native file formats, and that it hasn't become corrupt or truncated.


1.1   File Size

The file size is included in the generic file header so that it's possible to easily detect if a file has become truncated or extended, and so that it's easier to work with files in RAM.

Note: As this field is a 64-bit field, no native file format will be able to support any file that is equal to or larger than 18,446,744,073,709,551,616 bytes (16 exbibytes). This is equal to the maximum file size for some modern file systems (NTFS, ZFS) and larger than the maximum volume size for other modern file systems (ext4, ReiserFS).


1.2   Compliance String

The compliance string must be the ASCII/UTF-8 string "BCOS_NFF", where the first byte (at offset 0x00000008) is 'B' or 0x42 and the last byte (at offset 0x0000000F) is 'F' or 0x46. There is no terminating zero. Note: There is no difference between ASCII and UTF-8 for these characters.


1.3   Checksum

The checksum field may contain a 32-bit CRC calculated as per Subsection 1.3.1: CRC Calculation, or may be set to zero to indicate that no CRC is present.


1.3.1   CRC Calculation

The checksum is calculated using a standard 32-bit CRC algorithm called "CRC-32". It's the same CRC algorithm used for Ethernet and PKZIP. See Listing 1.1: CRC Reference Implementation for the exact algorithm (expressed in C).

#include <string.h>
#include <stdio.h>

// Input Data (for illustrative purposes)

unsigned char string[] = {"123456789"};

// CRC Lookup Table

unsigned int CRCtable[256];


// Code

static void generateCRCtable(void) {
      int i;
      unsigned int crc;

      for(i = 0; i < 256; i++) {
            crc = 0;
            if( (i & 1) != 0) crc ^= 0x77073096;
            if( (i & 2) != 0) crc ^= 0xEE0E612C;
            if( (i & 4) != 0) crc ^= 0x076DC419;
            if( (i & 8) != 0) crc ^= 0x0EDB8832;
            if( (i & 16) != 0) crc ^= 0x1DB71064;
            if( (i & 32) != 0) crc ^= 0x3B6E20C8;
            if( (i & 64) != 0) crc ^= 0x76DC4190;
            if( (i & 128) != 0) crc ^= 0xEDB88320;
            CRCtable[i] = crc;
      }
}


static unsigned int calculateCRC(unsigned char *string, int length) {
      unsigned int crc;

      crc = 0xFFFFFFFF;
      while(length > 0) {
            crc = (crc >> 8) ^ CRCtable[(crc & 0xFF) ^ *string++];
            length--;
      }
      crc ^= 0xFFFFFFFF;
      return(crc);
}


int main(void) {
      int length;

      generateCRCtable();

      length = strlen((char *)string);
      printf("In: '%s' (%d bytes)\n", string, length);
      printf("Out: 0x%X\n", calculateCRC(string, length));

      return 0;
}
Listing 1.1 - CRC Reference Implementation

When calculating the CRC, the first 3 fields in the generic header (the file size field, the compliance string and the checksum field) are skipped. This means that the CRC is calculated starting from the file type field at offset 0x00000014 in the generic file header.

If the calculated CRC is zero then it's substituted with the value 0xFFFFFFFF, so that compliance tests know that the CRC is present. This weakens the strength of the CRC slightly (as the value 0xFFFFFFFF in the checksum field may mean that the CRC is either 0xFFFFFFF or 0x00000000) however this is negligable, and much better than using a 31-bit CRC with a "present/not present" flag.


1.4   File Type

The file type field in the generic header is a 32-bit number that uniquely identifies the type of the file, and therefore identifies the format to be used for the rest of the file's data (including the file type dependant header extension, if any). These file types are also used by the OS (file systems) for non-native file formats, but for non-native file formats there is no header and the OS needs to use other, less reliable methods to determine the file format.

For a full description of file types and a list of currently defined file type numbers (including links to specific file format specifications for all native file formats), please refer to the BCOS File Type Specification.


2   Compliance Testing

Suggestions for detecting if a file uses a native file format (and has not become corrupted) include checking that the value in the file size field is larger than 32 bytes (the size of the generic header), checking if the value in the file size field is larger than possible (e.g. larger than the file system's maximum file size), checking if the value in the file size field is the same as the file's size reported by the file system, checking for the compliance string, and (if the value in the checksum field is nonzero) checking if the value in the checksum field matches the CRC for the file's data.

Even though the first 3 fields are not included in the CRC calculation, there is no need to be concerned that they may have been modified because if any of these fields have been modified the file will still fail compliance testing.

For general compliance testing code (that includes code for calculating a file's CRC for the purpose of ensuring that a non-zero value in the checksum field matches), if a file has no CRC in it's checksum field it may be convenient to set one.


Generated on Tue Sep 22 05:34:46 2009