BCOS Home » ... » ... » BCOS File Format Specifications » BCOS System File Format Specifications

BCOS Faulty RAM List File Format Specification

Preliminary Draft

Tables:

Table 4-1. Extended Header Format
Table 4-2. Platform IDs
Table 4-3. RAM Test Mode Values
Table 4-4. RAM Test Control Flags
Table 5-1. Faulty RAM List Entry, First Dword
Table 5-2. Faulty RAM List Entry Examples

Chapter 1: Overview

1.1. A Cautionary Tale

In the past one of the computers I used developed faulty RAM. The only symptom was that the web browser occasionally crashed, but I ignored this (the web browser I was using was well known for stability problems, and everything else worked). Several months later, just after defragmenting the file system, I found out the RAM was faulty. The utility I used to defragment the file system loaded data from disk into the faulty RAM and wrote the data back to disk in a different place; which caused around half of my files to become corrupted. Of course I had no easy way to tell which half of the files had become corrupted - I completely reformatted the drive (after doing some testing and replacing the faulty RAM).

There's a few important lessons in this example. The first lesson is that RAM faults can remain undetected for a relatively long time. This may be partly due to the CPU's caches (e.g. if data is written to the cache and the faulty RAM and then read back from the cache, then corrupt data in RAM isn't used) and partly because a relatively large amount data stored in RAM can be modified with no noticeable problem - a pixel with a slightly different color in some graphics data, a digitized sound with slightly more "noise", etc.

The second lesson is that any RAM testing done by the firmware is almost entirely useless. In my experience it's capable of detecting incorrectly inserted RAM modules and major RAM faults, but subtle errors and intermittent errors are (almost) never detected. To detect if RAM is faulty you need to use a tool designed to test RAM properly (e.g. http://www.memtest.org), but most people don't use a stand-alone tool regularly (either because they simply don't know about them, or because the computer can't be used for anything while the tool is running). Instead, they wait until they suspect problems, and by then it's too late.

If you're lucky you'll never see a RAM failure. If you're unlucky RAM failures can be one of the most insidious hardware errors possible. The best solution is to use ECC RAM; however (due to petty product differentiation tactics used by Intel) ECC RAM isn't supported on most modern desktop and laptop systems and (partly because it's less common) can be significantly more expensive, and for these reason ECC RAM (which should be a minimum requirement) is rarely used outside of servers, and all of the user's data is at risk.

1.2. The Faulty RAM List File

For the purpose of minimising risk, the operating system is capable of avoiding areas of faulty RAM, and supports various strategies to detect and recover from RAM faults while the OS is running. The "Faulty RAM List" file is used to keep track of areas of RAM that shouldn't be used by the operating system, and to control features intended to detect and recover from RAM faults.

The "Faulty RAM List" file is one of the first files used during boot, so that faulty/unreliable areas of RAM can be avoided from a very early stage. While the OS is running changes to this file may be made automatically by the OS (e.g. if additional faulty areas are detected) or manually by administrators (e.g. if certain fault tolerance features are enabled or disabled). This means that new areas of faulty RAM detected while the operating system is running can be "remembered" and not used after rebooting.

Chapter 2: Specification Change Policy

Any changes made in future versions of this specification are guaranteed to be backward and forward compatible.

Future versions of this specification may increase the size of the extended header, or add additional areas between the extended header and the Faulty RAM List entries or after the Suspect RAM List Entries, or define new meanings to any reserved fields in the extended header. Code written to handle faulty RAM lists that comply with this specification should ignore anything not defined in this version of the specification.

Chapter 3: General File Structure

The basic structure of the file is described in Figure 3-1. File Layout.

Figure 3-1. File Layout
		End of file
	(Optional) Metadata
	Suspect RAM List Entries
	Faulty RAM List Entries
	Extended Header	0x00000030
	Generic File Header	0x00000000

Chapter 4: File Format

4.1. Generic File Header

This is the native file format header defined in BCOS Native File Format Specification. For this specification, the "file type" field in the generic header (at offset 0x00000014) must be set to the value 0xFFFF0010.

4.2. Extended Header

The extended header follows the generic file header, and is described in Table 4-1. Extended Header Format.

Table 4-1. Extended Header Format
Offset	Size	Description
0x00000030	4 bytes	Platform ID (see Subsection 4.2.1. Platform ID)
0x00000034	1 byte	RAM Test Mode (see Subsection 4.2.2. RAM Test Mode)
0x00000035	1 byte	RAM Test Control Flags (see Subsection 4.2.3. RAM Test Control Flags)
0x00000036	2 bytes	Run Time Check Target Frequency (see Subsection 4.2.4. Run Time Check Target Frequency)
0x00000038	2 bytes	Scheduled Boot RAM Test Passes (see Subsection 4.2.5. Scheduled Boot RAM Test Passes)
0x0000003A	2 bytes	Reserved
0x0000003C	4 bytes	32-bit offset within file for start of Faulty RAM List Entries (see Section 4.3. Faulty RAM List Entries)
0x00000040	4 bytes	32-bit offset within file for start of Suspect RAM List Entries (see Section 4.4. Suspect RAM List Entries)
0x00000044	4 bytes	32-bit offset within file for byte after Suspect RAM List Entries (see Section 4.4. Suspect RAM List Entries)

4.2.1. Platform ID

The platform ID is used to determine the format for Faulty RAM List Entries and Suspect RAM List Entries, and other requirements. Boot code that uses the Faulty RAM List file should check to make sure that the platform ID is the correct platform ID for the computer being booted. The platform ID is a 4 character string (without a zero terminator).

Defined platform IDs are listed in Table 4-2. Platform IDs, including a reference to the section within this document that describes the format for Faulty RAM List Entries and Suspect RAM List Entries for each platform ID.

Table 4-2. Platform IDs
Platform ID	Section	Platform Description
"8632"	Chapter 5: Platform ID '8632'	All 80x86 systems (including 64-bit 80x86 systems)

4.2.2. RAM Test Mode

The "RAM Test Mode" field determines the mode that the operating system's RAM testing features will use. Possible modes and their values are shown in Table 4-3. RAM Test Mode Values.

Table 4-3. RAM Test Mode Values
Value	Description
0x00	Performance Mode (see Section 6.1. Performance Mode)
0x40	Background RAM Fault Detection (see Section 6.2. Background RAM Fault Detection)
0x60	Active RAM Fault Detection and Correction (see Section 6.3. Active RAM Fault Detection and Correction)
0x80	ECC Without Software Patrol Scrubbing (see Section 6.4. ECC Without Software Patrol Scrubbing)
0xC0	ECC With Software Patrol Scrubbing (see Section 6.5. ECC With Software Patrol Scrubbing)

4.2.3. RAM Test Control Flags

The "RAM Test Control Flags" field in the extended header controls the RAM testing features of the operating system.

Table 4-4. RAM Test Control Flags
Bit/s	Description
0	Scheduled Boot RAM Test Enable (see Subsubsection 4.2.3.1. Scheduled Boot RAM Test Enable Flag)
1 to 7	Reserved (must be clear)

4.2.3.1. Scheduled Boot RAM Test Enable Flag

This flag (when set) enables a thorough RAM test during boot, where the number of passes performed is specified via. the Scheduled Boot RAM Test Passes field (see Subsection 4.2.5. Scheduled Boot RAM Test Passes) in the extended header and none of the testing is postponed.

Note: the Scheduled Boot RAM Test Enable Flag is ignored if an ECC mode is in use.

It is expected that the operating system will clear this flag and update the "Faulty RAM List" file after the OS has booted (and after the Scheduled Boot RAM Test is done). This allows a normal utility to schedule regular tests (e.g. once per week a utility might set this flag and reboot the OS). The Scheduled Boot RAM Test is (typically) also used when the operating system is first installed (when the OS can't know which areas of RAM are reliable) and so that a more permanent "Faulty RAM List" file (that includes any areas of faulty RAM that should be avoided, and has the Scheduled Boot RAM Test Enable Flag cleared) is generated and installed with the OS.

4.2.4. Run Time Check Target Frequency

This field controls how often all RAM should be checked while the OS is running in minutes minus one; and ranges from 0x0000 (all RAM should be checked each minute) to 65535 (all RAM should be tested every 65536 minutes, or approximately every 45.5 days).

4.2.5. Scheduled Boot RAM Test Passes

This field specifies the number of times that a full RAM test should be performed during a Scheduled Boot RAM Test. This value is ignored if the Scheduled Boot RAM Test is disabled (see Subsection 4.2.3. RAM Test Control Flags) or if the operating system is operating in an ECC mode (see Subsection 4.2.2. RAM Test Mode).

Non-zero values specify between 1 and 65535 passes. The value zero is used to specify an infinite number of RAM test passes, where RAM testing is done continuously until the computer is turned off or reset by the user (this effectively turns the operating system's boot code into a stand-alone RAM testing utility). In this case, nothing that's normally executed after the RAM test (e.g. the operating system itself) is necessary - it won't be used.

4.3. Faulty RAM List Entries

Faulty RAM List Entries are used to inform boot code of faulty or unreliable areas of RAM, so that these areas can be avoided during boot and after boot.

The offset of the first Faulty RAM List Entry within the file and the number of Faulty RAM List Entries are specified in the extended header (see Table 4-1. Extended Header Format). The format for a Faulty RAM List Entry depends on the platform ID (see Table 4-2. Platform IDs), and is described in the section corresponding to the platform ID.

If there are no Faulty RAM List Entries, then the "32-bit offset within file for start of Faulty RAM List Entries" field must be equal to the "32-bit offset within file for start of Suspect RAM List Entries" field.

To improve search times, all Faulty RAM List Entries must be sorted in order of lowest starting address to highest starting address.

4.4. Suspect RAM List Entries

Some RAM faults are transient (e.g. caused by one-time events like cosmic rays). When the operating system detects that area/s of RAM may be faulty it needs to determine if the fault is a transient fault or a permanent fault by extensively testing the area/s of RAM (which can take a while). If the operating system needs to be shutdown or rebooted before the operating system has been able to determine if fault/s were transient or permanent, then completing this testing would create (potentially lengthy) delays during shutdown/reboot. To avoid these delays, the operating system can create Suspect RAM List Entries, update the "Faulty RAM List" file, then reboot immediately; and the areas of RAM will be tested properly when the operating system is booted again.

The offset of the first Suspect RAM List Entry within the file and the number of Suspect RAM List Entries are specified in the extended header (see Table 4-1. Extended Header Format). The format for a Suspect RAM List Entry is exactly the same as the format used for Faulty RAM List Entries, which is described in the section corresponding to the platform ID.

If there are no Suspect RAM List Entries, then the "32-bit offset within file for start of Suspect RAM List Entries" field must be equal to the "32-bit offset within file for byte after Suspect RAM List Entries" field in the extended header. Also, the "32-bit offset within file for byte after Suspect RAM List Entries" must refer to either the byte after the Suspect RAM List Entries (if there are any) or the byte after any preceding data (so that new Suspect RAM List Entries can always be inserted at the offset).

To improve search times, all Suspect RAM List Entries must be sorted in order of lowest starting address to highest starting address.

Chapter 5: Platform ID '8632'

5.1. Maximum File Size

Due to boot time constraints, for 80x86 systems a "Faulty RAM List" file should not exceed 64 KiB. The size of the file can be reduced (if/when necessary) by finding the Faulty RAM List Entries that have the least number of good pages between them and combining them until enough space has been created.

5.2. Faulty RAM List Entries

For 80x86 systems the operating system manages 4 KiB pages, and each Faulty RAM List Entry contains the address for the start of the first page containing faulty RAM and the number of sequential pages containing faulty RAM. A Faulty RAM List Entry is encoded as between one and three 32-bit dwords to save space.

The first dword in a Faulty RAM List Entry has the format shown in Table 5-1. Faulty RAM List Entry, First Dword.

Table 5-1. Faulty RAM List Entry, First Dword
Bit/s	Description
0 to 10	Area Size (see Subsection 5.2.1. Area Size)
11	Area Starting Address Size Flag (see Subsection 5.2.2. Area Starting Address)
12 to 31	Area Starting Address Low (see Subsection 5.2.2. Area Starting Address)

5.2.1. Area Size

If the Area Size field in the first dword is non-zero, then the Area Size field in the first dword specifies the number of faulty pages of RAM at the starting address. Otherwise (if the Area Size field in the first dword is zero), the last dword in the Faulty RAM List Entry contains the number of faulty pages of RAM at the starting address minus 2048.

5.2.2. Area Starting Address

If the Area Starting Address Size Flag in the first dword is clear, then the starting address of the first page of faulty RAM is a 32-bit address that is entirely contained within the Area Starting Address Low field of the first dword of the Faulty RAM List Entry. Otherwise (if the Area Starting Address Size Flag in the first dword is set) the starting address of the first page of faulty RAM is a 64-bit address, where the next dword in the Faulty RAM List Entry contains the highest (most significant) bits of the starting address and the Area Starting Address Low field of the first dword of the Faulty RAM List Entry contains bits 12 to 31 of the starting address.

Note: Because pages start on 4 KiB boundaries the least significant 12 bits of a starting address are always zero.

5.2.3. Faulty RAM List Entry Examples

The following table provides examples of Faulty RAM List Entries.

Table 5-2. Faulty RAM List Entry Examples
Entry Data	Description
0x76543001	One faulty page (4 KiB) at 0x0000000076543000
0x76543400	1024 faulty pages (4 MiB) at 0x0000000076543000
0x76543801, 0xFEDCBA98	One faulty page (4 KiB) at 0xFEDCBA9876543000
0x76543000, 0x00000000	2048 faulty pages (8192 KiB) at 0x0000000076543000
0x76543800, 0xFEDCBA98, 0x00012345	76613 faulty pages (306452 KiB) at 0xFEDCBA9876543000

This encoding allows a "single dword" Faulty RAM List Entry to refer to up to 2047 faulty pages (8188 KiB) at a 32-bit starting address. A "double word" Faulty RAM List Entry with a 32-bit starting address, or a "triple word" Faulty RAM List Entry with a 64-bit starting address, can refer to up to 4294969343 faulty pages (a little more than 16 GiB).

For areas of faulty RAM that are larger than 4294969343 pages, multiple entries should be used such that the areas defined by the first entry/entries cover exactly 4294969343 pages and the last entry covers any remainder. For example, for 50 GiB of faulty RAM (13421772800 pages) the first 3 entries would be for 4294969343 pages and the last entry would be for the remaining 536864771 pages.

Chapter 6: Operating System Behaviour

This information is provided to help people understand how the Faulty RAM List effects the behaviour of the OS.

There are 4 different modes (determined by Subsection 4.2.2. RAM Test Mode), and common behaviour that occurs during boot regardless of mode.

During Boot, the Faulty RAM List is read from the boot device and checked. If the Faulty RAM List has become corrupt and/or fails to comply with this specification boot is aborted.

After checking the Faulty RAM List itself, the boot code initialises a physical memory manager by combining information from the boot environment (e.g. firmware's memory map) with information from the Faulty RAM List to determine which pages are usable RAM (excluding pages that are marked as either faulty or suspect in the Faulty RAM List).

The boot code also creates a list of any areas of RAM that were relied on before physical memory manager was initialised. This list is called the "trusted RAM list".

Note: The "trusted RAM list" includes RAM that the boot code itself used, and also includes RAM that the boot code knows the boot environment has used. Sadly, in many cases the boot code is unable to determine everything that the boot environment relied on before the boot code was started.

Then the boot code checks the RAM Test Mode (Subsection 4.2.2. RAM Test Mode) to determine how it should precede.

6.1. Performance Mode

For this mode, the boot code checks if any areas in the "trusted RAM list" (e.g. areas that were relied on) were marked as faulty in the Faulty RAM List. If RAM that was relied on was marked as faulty then boot code can't assume that anything (including itself) hasn't been effected, and boot is aborted.

Other than the Scheduled Boot RAM Test (if enabled) no other faulty RAM checking is done by boot code or kernel. However, the operating system may refuse to execute software that has reliability requirements on a computer that is operating in this mode.

6.2. Background RAM Fault Detection

After attempting to ensure its own reliablity, boot code checks if the Scheduled Boot RAM Test is enabled (see Subsubsection 4.2.3.1. Scheduled Boot RAM Test Enable Flag). If the Scheduled Boot RAM Test is enabled it performs the number of passes requested (see Subsection 4.2.5. Scheduled Boot RAM Test Passes), which leaves free pages filled with known contents. If the Scheduled Boot RAM Test is disabled the boot code does a minimal RAM test (designed to only test address lines), then fills free pages with known contents.

For the remainder of boot (up until the kernel's memory management takes over from the boot code's physical memory management); pages are tested when they are allocated (including a check to see if the page still contains the known contents set previously), and are filled with known contents if/when they are freed.

When the kernel takes control of memory management it marks all allocated pages as "not tested yet" and puts all free pages into a pool of free pages that haven't been tested. When CPU/s are idle they are used to test free pages and shift them from the untested free page pool to the tested free page pool, and used to re-allocate/replace allocate pages that are still marked as "not tested yet". When pages are allocated they are taken from the tested free page pool if possible, and otherwise taken from the untested free page pool and tested when allocated. In addition, suspect pages are re-tested (as often as deemed necessary for the kernel to be satisfied that they are either faulty or not) during this time.

When all pages have been tested the kernel waits for the time determined by the Run Time Check Target Frequency field to pass (if necessary). Note that at this point the kernel may adjust the priority of its run time checking based on whether the time has already elapsed (priority can be increased to meet the target frequency) or if it needs to wait (priority can be lowered). Then the kernel shifts all pages from the tested free page pool back into the untested free page pool, and marks all allocated pages as "not tested yet" again.

The "trusted RAM list" is also used (and checked when pages are found faulty) while the OS is running; so that users/administrators can be warned if a page that is relied on during boot becomes faulty. This gives users/administrators a chance to make alternative arrangements before the computer needs to be rebooted (e.g. order some replacement RAM, backup/shift any data to other computers, etc). The OS itself may also (automatically) shift data to other computers in the cluster if it knows it will have trouble when it is rebooted.

6.3. Active RAM Fault Detection and Correction

The main weakness with Background RAM Fault Detection mode is that (under normal/expected conditions) most RAM is allocated and nothing checks that data in allocated pages hasn't been corrupted - transient faults go unnoticed, and permanent faults aren't noticed until they're discovered during the run time check. The Active RAM Fault Detection and Correction mode is designed to avoid this weakness.

For this mode, the boot code behaves the same as it does in the Background RAM Fault Detection mode. The differences are when the kernel takes control.

When the kernel takes control of memory management it marks all allocated pages as "not tested yet" and generates information for each allocated page that is used to check (and possibly correct) data in allocated pages, and puts all free pages into a pool of free pages that haven't been tested. When CPU/s are idle they are used to test free pages and shift them from the untested free page pool to the tested free page pool, and used to check (and if found corrupted, re-allocate/replace and possibly correct) allocate pages that are still marked as "not tested yet". When pages are allocated they are taken from the tested free page pool if possible, and otherwise taken from the untested free page pool and tested when allocated. In addition, suspect pages are re-tested (as often as deemed necessary for the kernel to be satisfied that they are either faulty or not) during this time.

In addition the kernel marks all virtual pages possible as "not present", checks (and possibly corrects) the page's contents when a page fault occurs, changes the page to "present", adds it to a list of pages that have been touched, then checks if the list of pages that have been touched has grown too large (e.g. larger than the CPU's caches). If the list of pages that have been touched has grown too large the kernel removes the oldest page from this list after updating the information that is used to check and possibly correct the data in the page if the page was modified, and resetting the page back to "not present". During task switches the kernel does the same (update the information that's used to check and possibly correct the data in the page if the page was modified and reset the page back to "not present") for all pages on the list of pages that have been touched to empty the list.

In this way data in allocated pages is likely to be checked (and possibly corrected) when it is first brought into CPU's cache (where it is immune to RAM faults and only effected by CPU cache faults).

When all pages have been checked or tested the kernel waits for the time determined by the Run Time Check Target Frequency field to pass (if necessary). Note that at this point the kernel may adjust the priority of its run time checking based on whether the time has already elapsed (priority can be increased to meet the target frequency) or if it needs to wait (priority can be lowered). Then the kernel shifts all pages from the tested free page pool back into the untested free page pool, and marks all allocated pages as "not tested yet" again.

If any pages are found faulty or suspect, the kernel sets a flag to indicate that the Faulty RAM List (on the boot device) is stale and needs to be updated, and tries to get a utility to update the Faulty RAM List on the boot device. The "trusted RAM list" is also used (and checked when pages are found faulty) while the OS is running; so that users/administrators can be warned if a page that is relied on during boot becomes faulty.

6.4. ECC Without Software Patrol Scrubbing

After this, the boot code and the kernel do no RAM testing or checking - instead it is assumed that the hardware does this itself, and reports any corrected or uncorrected errors found to the motherboard driver (which informs the kernel).

When informed of corrected or uncorrected RAM errors by the motherboard driver, the kernel still sets a flag to indicate that the Faulty RAM List (on the boot device) is stale and needs to be updated, and tries to get a utility to update the Faulty RAM List on the boot device. The "trusted RAM list" is also used (and checked when pages are found faulty) while the OS is running; so that users/administrators can be warned if a page that is relied on during boot becomes faulty.

For this mode, "suspect" is used if a page had one corrected error and "faulty" is used if a page had an uncorrected error or if a page had a corrected error while it was marked as "suspect". Pages that are "suspect" are still used by the OS, and if/when they are used for an adequate amount of time without further problems the OS decides that the pages are no longer suspect. The intent is to reduce the chance of uncorrected errors and to reduce/avoid any hardware overhead that corrections cause.

It should be impossible for end users to enable ECC Without Software Patrol Scrubbing mode themselves using normal utilities.

If the motherboard driver, motherboard/chipset and RAM support ECC, and the motherboard/chipset does patrol scrubbing in hardware, then the motherboard driver should:

ask the kernel for the run time check target frequency and attempt to configure the hardware's patrol scrubbing to suit if possible
ask the kernel to change the mode to ECC Without Software Patrol Scrubbing and change the Run Time Check Target Frequency to match hardware's patrol scrubbing (so that, if it is different to the current settings in the Faulty RAM List, the kernel can update the Faulty RAM List file on the boot device to reflect the hardware)

6.5. ECC With Software Patrol Scrubbing

This mode is mostly the same as ECC Without Software Patrol Scrubbing, except that after the kernel takes control of memory management it does patrol scrubbing in sofware.

When the kernel takes control of memory management it marks all allocated pages as "not scrubbed yet". When CPU/s are idle they are used to read from each location in each allocated page that is marked as "no scrubbed yet" (so that the hardware has a chance to correct any correctable errors before they become uncorrectable), and after they are scrubbed allocated pages are marked as "scrubbed".

When all pages have been scrubbed the kernel waits for the time determined by the Run Time Check Target Frequency field to pass (if necessary). Note that at this point the kernel may adjust the priority of its run time checking based on whether the time has already elapsed (priority can be increased to meet the target frequency) or if it needs to wait (priority can be lowered). Then the kernel marks all allocated pages as "not scrubbed yet" again.

Free pages are not scrubbed (as they contain no useful data); and when free pages are allocated the pages are marked as "scrubbed" (as they're typically filled with data or zeros when allocated).

It should be impossible for end users to enable ECC With Software Patrol Scrubbing mode themselves using normal utilities.

If the motherboard driver, motherboard/chipset and RAM support ECC, and the motherboard/chipset doesn't do patrol scrubbing in hardware, then the motherboard driver should ask the kernel to change the mode to ECC With Software Patrol Scrubbing (so that, if it is different to the current settings in the Faulty RAM List, the kernel can update the Faulty RAM List file on the boot device to reflect the hardware).