View on GitHub

ebc.asm

EFI Byte Code Assembler macroinstructions for fasmg assembly engine

still under development

ebc.asm is a set of macroinstructions, structures, and constants defining full featured assembly framework for EFI Byte Code (EBC) Virtual Machine ontop of flat assembler g assembly engine.

External References

Table of Contents

INTRODUCTION

How to use

  1. include 'ebc.asm' as the first line of source code.
  2. define output UEFI image type with image [type] directive, which supports one of the following types:
    • efi application
    • efi boot service driver
    • efi runtime driver
  3. Add one or more section definitions for executable code and data
  4. Add mandatory efi_main label to a section containing executable code, which defines image entry point. Image execution will start from this label.
  5. Add instructions and data definitions
  6. Compile and run!

Below is a simple example of EFI application image source code.

include 'ebc.asm'

      image efi application


section '.text' code readable executable

efi_main:

      movn r6, @r0(+1, +16)
      movn r5, @r6(EFI_SYSTEM_TABLE.ConOut)
      movrel r4, msg_hello
      pushn r4
      pushn r5
      callex @r5(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.OutputString)
      movnw r0, r0(+2, +0)
      ret


section '.data' data readable writeable

msg_hello wchar 'Hi, UEFI!',13,10,0

Save it as hello.asm and compile.

How to compile

Source code can be compiled with fasmg from command line under Windows, Linux, or MacOS.

Syntax

fasmg [source] [output]

Example

> fasmg hello.asm hello.efi

How to run

put info on how to install and configure QEMU here «<

1 ASSEMBLER DIRECTIVES

All assembler directives are case-insensitive.

1.1 Data Allocation and Initialization

Data allocation and initialization directives are used to allocate and optionally initialize storage space.

ebc.asm defines several types of data allocation and initialization directives:

Table 1.1 - Common unaligned data types

Data Type Size, bytes Alignment, bytes Description
BYTE 1 0 1-byte value
WORD 2 0 2-byte value
DWORD 4 0 4-byte value
QWORD 8 0 8-byte value

Table 1.2 - Naturally aligned data types

Data Type Size, bytes Alignment, bytes Description
BOOLEAN 1 1 1 = true, 0 = false
INT8 1 1 1-byte signed value
UINT8 1 1 1-byte unsigned value
INT16 2 2 2-byte signed value
UINT16 2 2 2-byte usigned value
INT32 4 4 4-byte signed value
UINT32 4 4 4-byte unsigned value
INT64 8 8 8-byte signed value
UINT64 8 8 8-byte unsigned value
EFI_LBA 8 8 Logical block address

Table 1.3 - Naturally aligned data types of native width

Data Type Size, bytes Alignment, bytes Description
INTN machine word machine word Signed value of native width
UINTN machine word machine word Unsigned value of native width
EFI_EVENT machine word machine word Handle to an event structure
EFI_HANDLE machine word machine word A collection of related interfaces
EFI_PTR machine word machine word Memory pointer
EFI_STATUS machine word machine word Status code
EFI_TPL machine word machine word Task priority level

Table 1.4 - String data types

Data Type Size, bytes Alignment, bytes Description
CHAR 1 0 ASCII or UTF-8 string
CHAR8 1 1 ASCII or UTF-8 string
WCHAR 2 0 Unaligned wide (UCS-2) string
CHAR16 2 2 Naturally aligned wide (UCS-2) string

Table 1.5 - Complex data types

Data Type Size, bytes Alignment, bytes Description
EFI_GUID 16 8 128-bit buffer containing GUID

Allocating one uninitialized data unit

Syntax

{name} [datatype]
{name} [datatype] ?

Examples

Allocate one byte

BYTE
BYTE ?

Allocate two named variables

image_handle   EFI_HANDLE
system_table   EFI_PTR

Allocating n uninitialized data units

Syntax

{name} [datatype] [[n]]

Examples

Allocate four uninitialized dwords as a named variable foo

foo DWORD [4]

It is allowed to allocate 0 data units for a variable. In this case no storage cells will be reserved.

foo BYTE [0]
bar BYTE ?

In the above example both foo and bar will point to the same memory address.

Allocating initialized data units

Syntax

{name} [datatype] initializer {, initializer ...}

Examples

Initialize named UINTN variable foo with value 0xDEADBEEF

foo UINTN 0xDEADBEEF

Initialize named UINT8 variable bar with values 0xBE and 0xEF

bar UINT8 0xBE, 0xEF

It is also allowed to give quoted UTF-8 text as initializer. However, it should not exceed the size of datatype unit:

WORD 'MZ'     ; 0x5a4d
DWORD 'PE'    ; 0x00004550

EFI_GUID

To initialize variables of EFI_GUID type, use the same format for initializer parameter as given in the UEFI Specification:

EFI_GUID \
  {0x48ecb431, 0xfb72, 0x45c0, \
  {0xa9, 0x22, 0xf4, 0x58, 0xfe, 0x04, 0x0b, 0xd5}}

Strings

String data types accept quoted text as initializer. Either single- or double quote marks are accepted in pairs.

msg_hello   WCHAR 'Hello, UEFI!',13,10,0

Size of data type

Size of each defined data type can be extracted with sizeof prefix directive.

Maximum supported machine word length (8 bytes) will be returned for naturally aligned data types of native width.

Syntax

sizeof.[datatype]

Examples

sizeof.byte       ; = 1
sizeof.dword      ; = 4
sizeof.uintn      ; = 8

__size and __length properties

All named variables allow accessing their datatype size with __size property and total number of allocated bytes with __length property.

Examples

foo BYTE   0x0F, 0x00, 0x00
; foo.__size          = 1
; foo.__length        = 3

msg_hello   WCHAR 'Hi there!', 0
; msg_hello.__size    = 2
; msg_hello.__length  = 20

1.2 Data Alignment

align directive aligns the next variable or instruction on a byte that is a multiple of n using value bytes for padding.

Syntax

align n {, value}

Examples

Align to 8 bytes and pad with 0x00 bytes (default)

align 8

Align to 4 bytes boundary and pad with 0xCC bytes

align 4, 0xCC

1.3 Data Structures

struct directive declares a structure type having the specified field_declarations. Each field must be a valid data definition.

Syntax

struct [name]
  field_declarations
end struct

Examples

struct EFI_TABLE_HEADER
  Signature             UINT64
  Revision              UINT32
  HeaderSize            UINT32
  CRC32                 UINT32
  Reserved              UINT32
end struct

struct EFI_SYSTEM_TABLE
  Hdr                   EFI_TABLE_HEADER
  FirmwareVendor        EFI_PTR
  FirmwareRevision      UINT32
  ConsoleInHandle       EFI_HANDLE
  ConIn                 EFI_PTR
  ConsoleOutHandle      EFI_HANDLE
  ConOut                EFI_PTR
  StandardErrorHandle   EFI_HANDLE
  StdErr                EFI_PTR
  RuntimeServices       EFI_PTR
  BootServices          EFI_PTR
  NumberOfTableEntries  UINTN
  ConfigurationTable    EFI_PTR
end struct

Allocating and Initializing Structures

Allocating structure variables and arrays

Structures can be allocated and initialized in a similar way to any other data type.

Allocating structure EFI_TABLE_HEADER as a named variable tbl_header

tbl_header EFI_TABLE_HEADER

Allocating 8 structures EFI_MEMORY_DESCRIPTOR as a named variable mem_descriptors

struct EFI_MEMORY_DESCRIPTOR
  Type            UINT32
  PhysicalStart   UINT64
  VirtualStart    UINT64
  NumberOfPages   UINT64
  Attribute       UINT64
end struct

mem_descriptors EFI_MEMORY_DESCRIPTOR [8]

Initializing structure elements

Structure elements can be initialized with values as a comma-separated key:value list. It is allowed to initialize all or some of the elements.

mem_descriptor EFI_MEMORY_DESCRIPTOR \
  Type: 1, PhysicalStart: 0x00100000, VirtualStart: 0x00100000

There are two ways to initialize element array

1) value should be given as a comma-separated list between < and > characters. Using this method all array items values should be given in sequence, starting from the first one. Skipping items is not allowed. However, it’s allowed to provide values only for m first items in the array where m <= array_length.

struct SOME_LIST
  Item   UINT8 [8]
end struct

my_list SOME_LIST Item:<0,1,2,3,4>

2) It is possible to initialize each element array item separately. To do this, item name should be followed by its index in square brackets ([ ]):

my_list SOME_LIST Item[0]:0, Item[4]:4, Item[7]:7

To initialize element with a value of other data type, type casting can be used. value should be given between < and > characters and preceded with desired data type. Total data length should not exceed the amount reserved for the element.

Initialize SOME_LIST.Item array with 2 UINT32 values

my_list SOME_LIST Item:<UINT32 0x11111111, UINT32 0x22222222>

Initialize SOME_LIST.Item array with WCHAR string

my_list SOME_LIST Item:<WCHAR 'UEFI'>

Data casting can be applied to arrays only. For example, it is not allowed to initialize a single UINT32 element as a sequence of UINT8 values:

struct BAR
  Number UINT32
end struct

; the following is not allowed
foo BAR Number:<UINT8 0x01, UINT8 0x02, UINT8 0x03, UINT8 0x04>

; however, the following are allowed
foo BAR Number:<UINT8 0x01>
foo BAR Number:<UINT16 0xBEEF>
foo BAR Number:<WCHAR "Hi">

Initializing EFI_GUID elements

Structure elements of type EFI_GUID cannot be initialized directly. Two conditions should be met:

  1. GUID must be declared as a constant in code
  2. Type casting must be used in structure element initialization
struct EFI_CONFIGURATION_TABLE
  VendorGuid    EFI_GUID
  VendorTable   EFI_PTR
end struct

EFI_ACPI_20_TABLE_GUID equ \
  {0x8868e871,0xe4f1,0x11d3, \
  {0xbc,0x22,0x00,0x80,0xc7,0x3c,0x88,0x81}}

cfg_tbl EFI_CONFIGURATION_TABLE \
  VendorGuid:<EFI_GUID EFI_ACPI_20_TABLE_GUID>

Initializing arrays of structures

Lets say we have the following structures declared:

struct FOO
  a   UINT16
  b   UINT16
end struct

struct BAR
  x   FOO [4]
  y   UINT32
end struct

It is possible to initialize all or only some elements of the BAR.x array. To do this, array element index in square brackets ([ ]) should be given after the array element name.

my_struc BAR x[0].a: 0x000a, x[0].b: 0x000b,\
             x[1].a: 0x001a, x[1].b: 0x001b

If variable itself is an array of structures, then its element names should be preceded with a dot (.) followed by array index in square brackets ([ ]):

my_struc BAR [2]  .[0].x[0].a: 0x000a, .[0].x[0].b: 0x000b,\
                  .[0].x[1].a: 0x001a, .[0].x[1].b: 0x001b,\
                  .[1].x[0].a: 0x010a, .[1].x[0].b: 0x010b,\
                  .[1].x[3].a: 0x013a, .[1].x[3].b: 0x013b,\
                  .[1].y: 0x12345678

1.4 Data Unions

A union is a data structure in which all members share the same memory location. This means that at any given time a union can contain no more than one object from its list of members. It also means that no matter how many members a union has, it always uses only enough memory to store the largest member.

union directive declares a union of one or more data types. The field_declarations must be valid data definitions.

Syntax

union
  field_declarations
end union

Examples

struct EFI_CAPSULE_BLOCK_DESCRIPTOR
  Length                  UINT64
  union
    DataBlock             UINT64
    ContinuationPointer   UINT64
  end union
end struct

1.5 Output File Format

UEFI uses a subset of the PE32+ image format with a modified header signature. The modification to the signature value in the PE32+ image is done to distinguish UEFI images from normal PE32 executables. The “+” addition to PE32 provides the 64-bit relocation fix-up extensions to standard PE32 format.

ebc.asm currently does not support relocation fix-ups

UEFI Image Type

image directive defines UEFI Image Type for the output file

Syntax

image [type]

The following types of image are supported:

Example

Define output file format as UEFI Application Image

image efi application

PE Sections

UEFI Image must include one or more sections for executable code and initialized or uninitialized data.

section directive defines PE section.

Syntax

section [name] [flags]

ebc.asm defines the following flags:

Flag Description
code The section contains executable code
data The section contains initialized data
udata The section contains uninitialized data
uninitialized data The section contains uninitialized data (the same as udata)
discardable The section can be discarded as needed
nocache The section cannot be cached
nopage The section is not pageable
shareable The section can be shared in memory
executable The section can be executed as code
readable The section can be read
writeable The section can be written to

Example

section '.text' code readable executable

section '.data' data readable writeable

Entry Point

While UEFI drivers do not require entry point, it is mandatory for UEFI application images.

efi_main label placed inside executable code section defines entry point for UEFI image.

Example

image efi application

section '.text' code readable executable

  {instructions}
  
efi_main:
  [instructions]

2 EBC INSTRUCTION SET

2.1 Instruction Operands

The VM supports an EBC instruction set that performs data movement, data manipulation, branching, and other miscellaneous operations typical of a simple processor. Most instructions operate on two operands, and have the general form:

INSTRUCTION operand1, operand2

Typically, instruction operands will be one of the following:

The following subsections explain these operands.

Direct Operands

When a direct operand is specified for an instruction, the data to operate upon is contained in one of the VM general-purpose registers R0-R7. Syntactically, an example of direct operand mode could be the ADD instruction:

ADD64 R1, R2

This form of the instruction utilizes two direct operands. For this particular instruction, the VM would take the contents of register R2, add it to the contents of register R1, and store the result in register R1.

Some instructions allow specifying register and immediate data as direct operands. In these cases the immediate data is considered a signed value and is added to the register contents such that Operand = Register + Immediate.

ADD32 R1, R2(+50)

For the above instruction, the VM would take immediate value 50, add it to the contents of register R2, then add the result to the contents of register R1, and store the result in register R1.

Immediate data for direct operands should directly follow the register and be enclosed in round brackets ( )

Indirect Operands

When an indirect operand is specified, a VM register contains the address of the operand data. This is sometimes referred to as register indirect, and is indicated by prefixing the register operand with “@.” Syntactically, an example of an indirect operand mode could be this form of the ADD instruction:

ADD32 R1, @R2

For this instruction, the VM would take the 32-bit value at the address specified in R2, add it to the contents of register R1, and store the result in register R1.

Indirect with Index Operands

When an indirect with index operand is specified, the address of the operand is computed by adding the contents of a register to a decoded natural index that is included in the instruction. Typically with indexed addressing, the base address will be loaded in the register and an index value will be used to indicate the offset relative to this base address.

Indexed addressing takes the form

@R1(+n,+c)

where:

The values of n and c can be either positive or negative, though they must both have the same sign. These values get encoded in the indexes associated with EBC instructions. Indexes can be 16-, 32-, or 64-bits wide depending on the instruction. An example of indirect with index syntax would be:

ADD32 R1, @R2(+1, +8)

This instruction would take the address in register R2, add (8 + 1 * sizeof.UINTN), read the 32-bit value at the address, add the contents of R1 to the value, and store the result back to R1.

ebc.asm allows using struct paths instead of raw n and c values for indexes. For example:

MOVn R1, @R2(EFI_SYSTEM_TABLE.ConOut)

Immediate Operands

Some instructions support an immediate operand, which is simply a value included in the instruction encoding. The immediate value may or may not be sign extended, depending on the particular instruction. One instruction that supports an immediate operand is MOVI. An example usage of this instruction is:

MOVIww R1, 0x1234

This instruction moves the immediate value 0x1234 directly into VM register R1. The immediate value is contained directly in the encoding for the MOVI instruction.

2.2 Arithmetic Instructions

Arithmetic instructions include the following: ADD, AND, ASHR, DIV, DIVU, EXTNDB, EXTNDD, EXTNDW, MOD, MODU, MUL, MULU, NEG, NOT, OR, SHL, SHR, SUB, XOR. Each of these instructions can perform operation on either 32-bit or 64-bit operands depending on size specifier. For example:

ADD32 operand1, operand2
ADD64 operand1, operand2

ebc.asm allows omitting size specifier for arithmetic instructions. If size specifier is omitted, the operation is assumed to be performed on 64-bit operands.

2.3 Control Transfer Instructions

Control transfer instructions include: CALL, JMP, and RET.

CALL and JMP instructions can perform operation on either 32-bit or 64-bit operands (depending on size specifier) and may be to either IP-relative or absolute address.

Table 2.1 lists all CALL instruction forms supported by ebc.asm.

Table 2.1

Mnemonic Description
CALL Call to EBC code within a given application. ebc.asm will choose the best option between 32- and 64-bit call to IP-relative address
CALLEX Call to external code. ebc.asm will choose the best option between call to 32- and 64-bit call to absolute address
CALLa Call to EBC code within a given application. ebc.asm will choose the best option between call to 32- and 64-bit call to absolute address
CALLEXa Call to external code. ebc.asm will choose the best option between call to 32- and 64-bit call to absolute address
CALL32 32-bit call to EBC code within a given application to IP-relative address
CALL32EX 32-bit call to external code to IP-relative address
CALL32a 32-bit call to EBC code within a given application to absolute address
CALL32EXa 32-bit call to external code to absolute address
CALL64 64-bit call to EBC code within a given application to absolute address
CALL64EX 64-bit call to external code to absolute address
CALL64a 64-bit call to EBC code within a given application to absolute address
CALL64EXa 64-bit call to external code to absolute address

Table 2.2 lists all JMP instruction forms supported by ebc.asm.

Table 2.2

Mnemonic Description
JMP{cc|cs} ebc.asm will choose the best option among short jump and 32- or 64-bit jump to IP-relative address
JMP{cc|cs}a ebc.asm will choose the best option between 32- and 64-bit jump to absolute address
JMP8{cc|cs} short jump within a range -128 to +127 16-bit words
JMP32{cc|cs} 32-bit jump to IP-relative address
JMP32{cc|cs}a 32-bit jump to absolute address
JMP64{cc|cs} 64-bit jump to IP-relative address
JMP64{cc|cs}a 64-bit jump to absolute address

2.4 Data Transfer Instructions

2.5 Data Comparison Instructions

2.6 Stack Manipulation Instructions

2.7 Other Instructions