BCOS File FormatsProject Map
BCOS Native Executable File Format Specification
Version 1.0
(Preliminary Draft)
 

Contents

1                Overview
2                General File Structure
3                File Format
3.1                Generic File Header
3.2                Extended Header
3.2.1                Executable Format Version Numbers (Major and Minor)
3.2.2                Numbers (Major, Minor, Revision and Reliability Rating)
3.2.3                UTF-8 Strings
3.2.3.1                Executable Name String
3.2.3.2                User Support Email Address
3.2.3.3                Bug Report Email Address
3.2.3.4                Web Site URL
3.2.3.5                Copyright Owner String
3.2.3.6                Copyright Description String
3.2.4                Executable Flags
3.2.5                Platform ID
4                Platform IDs '8632' and '8664'
4.1                '8632' and '8664' Required and Beneficial CPU Feature Flags
4.2                Executable Area
4.3                Read Only Area
4.4                Uninitialized Area
4.5                Process Space
4.6                Entry Point


Tables

Table 3.1      Extended Header Format
Table 3.2      Reliability Rating Values
Table 3.3      Executable Flag Descriptions
Table 3.4      Platform IDs
Table 4.1      '8632' and '8664' Platform Header



1   Overview

This specification defines the file format used for native executable files. The executable file is loaded "as is" - relocations and dynamic linking aren't supported by the executable file format or the operating system.


2   General File Structure

The basic structure of the file is described in Figure 2.1: File Layout.

_End of File

 Read/Write Data 


 Read Only Data 


 Executable Code 

 UTF-8 Strings 
 Platform Header 
_Offset 40
 Extended Header 
_Offset 32
 Generic File Header 
_Offset 0

Figure 2.1 - File Layout


3   File Format

3.1   Generic File Header

This is the native file format header defined in BCOS Native File Format Specification.


3.2   Extended Header

The Extended Header follows the Generic File Header, and is described in Table 3.1: Extended Header Format.

OffsetSizeDescription
  0x00000020
  1 byte
  Executable format version number minor (see Subsection 3.2.1: Executable Format Version Numbers (Major and Minor))
  0x00000021
  1 byte
  Executable format version number major (see Subsection 3.2.1: Executable Format Version Numbers (Major and Minor))
  0x00000022
  2 bytes
  Reserved (must be zero)
  0x00000024
  1 byte
  Reliability rating (see Subsection 3.2.2: Numbers (Major, Minor, Revision and Reliability Rating))
  0x00000025
  1 byte
  Revision number (see Subsection 3.2.2: Numbers (Major, Minor, Revision and Reliability Rating))
  0x00000026
  1 byte
  Minor version number (see Subsection 3.2.2: Numbers (Major, Minor, Revision and Reliability Rating))
  0x00000027
  1 byte
  Major version number (see Subsection 3.2.2: Numbers (Major, Minor, Revision and Reliability Rating))
  0x00000028
  2 bytes
  Offset from start of file for zero terminated UTF-8 executable name string (see Subsection 3.2.3: UTF-8 Strings)
  0x0000002A
  2 bytes
  Offset from start of file for zero terminated UTF-8 user support email address (see Subsection 3.2.3: UTF-8 Strings)
  0x0000002C
  2 bytes
  Offset from start of file for zero terminated UTF-8 bug report email address (see Subsection 3.2.3: UTF-8 Strings)
  0x0000002E
  2 bytes
  Offset from start of file for zero terminated UTF-8 web site URL (see Subsection 3.2.3: UTF-8 Strings)
  0x00000030
  2 bytes
  Offset from start of file for zero terminated UTF-8 copyright owner string (see Subsection 3.2.3: UTF-8 Strings)
  0x00000032
  2 bytes
  Offset from start of file for zero terminated UTF-8 copyright description string (see Subsection 3.2.3: UTF-8 Strings)
  0x00000034
  4 bytes
  Offset for the byte after the last string (see Subsection 3.2.3: UTF-8 Strings)
  0x00000038
  4 bytes
  Executable flags (see Subsection 3.2.4: Executable Flags)
  0x0000003C
  4 bytes
  Platform ID (see Subsection 3.2.5: Platform ID)
Table 3.1 - Extended Header Format


3.2.1   Executable Format Version Numbers (Major and Minor)

The executable format version numbers allow code to determine which version of this specification an executable file complies with. These fields are encoded in BCD. For display purposes, the version number is displayed as 2 seperate decimal numbers seperated by a full stop (e.g. "major.minor") where leading zeros are suppressed for the major version number and displayed for the minor version number; and trailing zeros are suppressed for the minor version number and displayed for the major version number. For example, if the major version number is 0x01 and the minor version number is 0x02 then it would be displayed as "1.02"; and if the major version number is 0x10 and the minor version number is 0x20 then it would be displayed as "10.2".

To indicate compliance with this version of the specification, the major version number must be 0x01 and the minor version number must be 0x00 (version 1.0).


3.2.2   Numbers (Major, Minor, Revision and Reliability Rating)

All executables have both a version number and a reliability rating. This allows users to decide if they want to run the latest version (e.g. with all the new features), or an older more thoroughly tested version (that lacks newer features). The version numbers are not a "freestyle" string, where a developer can decide to use "Version 1.2.3.4" or "Version 1A". The operating system uses a standardized representation for version numbers, so that software can determine which version of an executable is older or newer than another (which makes it much easier for things like automated software updates).

The major, minor and revision numbers are in BCD. For display purposes, the version number is displayed as the major version number (with leading zeros suppressed), followed by a decimal point, followed by the minor version number (with trailing zeros suppressed), followed by a dash, then the letter r, then the revision (with leading zeros suppressed). For example, if "major = 0x01", "minor = 0x20" and "revision = 0x30", then the full version number would be displayed as "Version 1.2-r30".

The reliability rating provides an indication of how stable the software is meant to be; ranging from 0 (not stable at all) to 255 (extremely stable). As a general guide, see Table 3.2: Reliability Rating Values.

Value/sMeaning
  0 to 63
  Developer's unfinished work (typically untested and/or containing known and unknown faults)
  64 to 127
  Alpha version (potentially containing both known and unknown faults)
  128 to 191
  Beta version (potentially containing unknown faults)
  192 to 223
  Stable version (potentially not containing unknown faults)
  224 to 254
  Mature stable version (throughly tested by many users without any faults found)
  255
  Extremely mature stable version (guaranteed to be free of all possible faults)
Table 3.2 - Reliability Rating Values

If the reliability rating is in the range from 0 to 63, then the text "-developer" will be appended to the version number when it is displayed (e.g. "Version 1.2-r30-developer"). If the reliability rating is in the range from 64 to 127, then the text "-alpha" will be appended to the version number when it is displayed (e.g. "Version 1.2-r30-alpha"). If the reliability rating is in the range from 128 to 191, then the text "-beta" will be appended to the version number when it is displayed (e.g. "Version 1.2-r30-beta"). Finally, if the reliability rating is equal to or higher than 192, then no text will be appended to the version number when it is displayed (e.g. "Version 1.2-r30").

In general it's important for developers to estimate the reliability rating as best they can (under-estimating the reliability rating will probably have less effect on the developer's reputation than over-estimating the reliability rating).


3.2.3   UTF-8 Strings

The executable file format includes several zero terminated UTF-8 strings. All of these strings except the copyright description string must be within the first 4 KiB of the file (so that software can load the first 4 KiB of the file and have access to all headers and almost all strings). The "Offset for the byte after the last string" field in the Extended Header determines how much of the file must be loaded to access all strings.

These strings serve a variety of purposes, and are described in the following sections.


3.2.3.1   Executable Name String

This string provides contains the name for the executable, which is used as the process name when the executable is running. It must be present.


3.2.3.2   User Support Email Address

This string contains the email address that users can use to obtain help or support for the executable. This string is optional (the offset to this string in the extended header may be zero).


3.2.3.3   Bug Report Email Address

This string contains the email address that users and software can use to submit bug reports. The operating system itself may contain software to automatically generate bug reports if the executable crashes and send these automated bug reports to this email address (the intention here is to provide a way for developers to find out about any bugs quickly so that they can be fixed). This string is optional (the offset to this string in the extended header may be zero). If this string is not present, then the user support email address will be used for bug reports. If this string and the user support email address are both not present then no automated bug reports can be sent.


3.2.3.4   Web Site URL

This string contains the full URL for the executable's web page (for e.g. "http://bcos.hopto.org/example" and not "bcos.hopto.org/example"). The intention is that end users can quickly find the web site to (manually) download new versions of the executable, or find out about other products written by the same group of people. This string is optional (the offset to this string in the extended header may be zero).


3.2.3.5   Copyright Owner String

This string contains a short string, intended to include the name of the main copyright holder, and possibly also including the year it was published (or range of years if it was published multiple times) and an abbreviated name of the type of copyright used. For example, this string might contain "Copyright © Brendan Trotter", or "Copyright © 2006-2009 Brendan Trotter", or "Copyright © 2006-2009 Brendan Trotter (GPLv2)". This string is optional (the offset to this string in the extended header may be zero). If this string is not present then it does not mean that there is no copyright - it simply means that the copyright owner hasn't been included in the executable.


3.2.3.6   Copyright Description String

This string contains the full description of the copyright, and is the only string that allows line breaks. In general this should contain the entire copyright notice or licence agreement that is used by the executable (however, it may contain some kind of note saying where the entire copyright notice or licence agreement can be found). This string is optional (the offset to this string in the extended header may be zero). If this string is not present then it does not mean that there is no copyright - it simply means that the copyright description hasn't been included in the executable.

If present, the copyright description string must begin in the first 4 KiB of the executable file, but may not end within the first 4 KiB of the executable file (there is no restriction on the size of the copyright description).


3.2.4   Executable Flags

These flags are used to inform the operating system of certain conditions. See Table 3.3: Executable Flag Descriptions for a full decription of these flags.

Bit/sDescription
  0
  If this bit is set then run-time debugging is allowed (otherwise run-time debugging is not allowed)
  1 to 31
  Reserved (must be zero)
Table 3.3 - Executable Flag Descriptions


3.2.5   Platform ID

The platform ID is used to determine the format for the executable file. The operating system uses the platform ID to make sure the executable file is suitable for the computer being booted, and to arrange any special support needed. The platform ID is a 4 character string (without a zero terminator).

Defined platform IDs are listed in Table 3.4: Platform IDs, including a reference to the section within this document that describes the format for the Platform Header and executable code and data for each platform ID.

Platform IDPlatform DescriptionSection
  "8632"
  32-bit 80x86 systems
  Chapter 4: Platform IDs '8632' and '8664'
  "8664"
  64-bit 80x86 systems
  Chapter 4: Platform IDs '8632' and '8664'
Table 3.4 - Platform IDs


4   Platform IDs '8632' and '8664'

The Platform Header follows the Extended Header, and is described in Table 4.1: '8632' and '8664' Platform Header.

OffsetSizeDescription
  0x00000040
  16 bytes
  Required CPU feature flags (see Section 4.1: '8632' and '8664' Required and Beneficial CPU Feature Flags)
  0x00000050
  16 bytes
  Beneficial CPU feature flags (see Section 4.1: '8632' and '8664' Required and Beneficial CPU Feature Flags)
  0x00000060
  8 bytes
  Offset for the end of the executable area (see Section 4.2: Executable Area)
  0x00000068
  8 bytes
  Offset for the end of the read only area (see Section 4.3: Read Only Area)
  0x00000070
  8 bytes
  Offset for the end of the uninitialized data area (see Section 4.4: Uninitialized Area)
  0x00000078
  4 bytes
  Size of process space in GiB (see Section 4.5: Process Space)
  0x0000007C
  4 bytes
  Reserved (must be zero)
  0x00000080
  8 bytes
  Entry point (see Section 4.6: Entry Point)
  0x00000088
  8 bytes
  Reserved (must be zero)
Table 4.1 - '8632' and '8664' Platform Header


4.1   '8632' and '8664' Required and Beneficial CPU Feature Flags

The required CPU feature flags field and the beneficial CPU feature flags field are bitfields, where each bit in each bitfield corresponds to a certain CPU feature that may or may not be supported by a CPU or emulated by the operating system. If an executable requires a certain CPU feature then the corresponding flag in the required CPU feature flags field must be set, and if an executable can benefit from a certain CPU feature then the corresponding flag in the beneficial CPU feature flags field should be set.

All flags have identical meanings in both the CPU feature flags field and the beneficial CPU feature flags field (for e.g. bit 0 in both fields corresponds to FPU instructions), which are identical to the bitfields returned by [add a reference to the relevant kernel API function here].

The required CPU feature flags field and the beneficial CPU feature flags field are used by the operating system to determine which CPUs can be used to run the executable and to help to determine which CPUs are the best CPUs to run the executable.

For example, an executable might use the CMPXCHG8B instruction a few times during initialization, might use lots of FPU instructions, and might be capable of using SSE to improve performance if it is supported (and fall back to non-SSE code if SSE isn't supported). In this case both CMPXCHG8B and FPU are required features (and SSE is not required), as these features must be either supported by the CPU/s or emulated by the operating system for the executable to run; and both FPU and SSE are beneficial features because performance is improved if these features are supported by the CPU (and not emulated by the operating system).


4.2   Executable Area

This area is intended to contain executable code; and the start and end of this area are used by the operating system to set protection flags, so that (if supported by the CPU) any attempt to execute code outside of this area results in a protection violation.

The start of this area is determined by the "Offset for the byte after the last string" field in the Extended Header, rounded down to the nearest page boundary. The end of this area is determined by the "Offset for the end of the executable area field in the Platform Header, rounded up to the nearest page boundary. If the rounding involved matters, then the executable file should include padding to ensure that this area starts and ends on a page boundary.

The "Offset for the end of the executable area" field in the Platform Header may be set to 0xFFFFFFFFFFFFFFFF to allow (almost) everything within process space to be executed. In general, it is recommended that this field be set to the smallest area possible for security purposes (and to improve the chance of detecting errors), however there are cases (e.g. dynamic translators) where this isn't practical.

If the "Offset for the end of the executable area" field in the Platform Header is set to a value that is less than or equal to the "Offset for the byte after the last string" field in the Extended Header, then the executable is not considered valid (e.g. the file is considered corrupt or otherwise unusable).


4.3   Read Only Area

This area is intended to contain both executable code and read only data; and the start and end of this area are used by the operating system to set protection flags, so that any attempt to write to an address that is within this area results in a protection violation.

The start of this area is 0x0000000000000000. The end of this area is determined by the "Offset for the end of the read only area" field in the Platform Header, rounded down to the nearest page boundary. If this rounding matters, then the executable file should include padding to ensure that this area ends on a page boundary.

The "Offset for the end of the read only area" field in the Platform Header may be set to 0x0000000000000000 to allow (almost) everything within the address space to be modified. In general, it is recommended that this field be set to the largest area possible for security purposes (and to improve the chance of detecting errors).


4.4   Uninitialized Area

This area is intended to contain read/write data used for uninitialized data (e.g. the executable's ".bss" section). Despite its name, the operating system guarantees that everything in this area appears to be usable RAM that is filled with zeros. Note: The operating system may use "allocation on demand" for this area, where in reality it contains multiple copies of the same page full of zeros mapped as "read only", where the first write to a page causes a new page of RAM to be allocated and mapped as "read/write" to provide the illusion of read/write pages without consuming large amounts of RAM.

The start of this area is determined by size of the executable file, padded with zeros up to the nearest page boundary. The end of this area is determined by the "Offset for the end of the uninitialized data area" field in the Platform Header, rounded up to the nearest page boundary.

If the "Offset for the end of the uninitialized data area" field in the Platform Header is set to a value that is less than the size of the executable file, then there is no uninitialized data area. If the "Offset for the end of the uninitialized data area" field in the Platform Header is set to 0xFFFFFFFFFFFFFFFF, then all usable pages of process space above the end of the executable file behave as if they are read/write pages full of zeros.


4.5   Process Space

The entire address space is split into 3 main sections. The highest section is called "kernel space". The size of kernel space depends on the kernel being used; and normal processes have no access to kernel space at all. The remaining 2 sections are called "process space" and "thread space". The size of process space is determined by the "Size of process space in GiB" field in the executable's Platform Header, while the size of thread space is determined by the remainder (e.g. "thread space size = total addres space size - kernel space size - process space size").

However, if the "Size of process space in GiB" field in the executable's Platform Header is smaller than the kernel can support then it will be rounded up to the nearest size that the kernel can support (this typically makes process space 1 GiB); and if the "Size of process space in GiB" field in the executable's Platform Header is larger than the kernel can support it will be rounded down to the nearest size that the kernel can support (this typically makes process space 2 GiB on 32-bit kernels, and 131071 GiB on 64-bit kernels). In all cases "process space" is a minimum of 252 MiB, and "thread space" is a minimum of 252 MiB.


4.6   Entry Point

The ""entry point" field in the Platform Header contains an address (not an offset from the start of the file) that the operating system will jump to when starting the executable. In all cases this address must correspond to an address that is considered executable (as determined by Section 4.2: Executable Area), that is also present in the executable file (e.g. the entry point can't refer to uninitialized space that is considered executable).


Generated on Mon Nov 9 12:20:50 2009