x86 DOS EXE MZ File

The x86 DOS EXE, EXE MZ or MZ file is an executable file in the family of x86 DOS systems.

There are different EXE files, including MZ, NE, LE and PE. The only one used in MS-DOS era was MZ, while NE are for 16-bit Windows and PE for 32/64-bit Windows.

An EXE MZ file can only be executed natively in MS-DOS and Windows 9x/Me.

EXE MZ File Overview

EXE MZ files:

Structure: Header containing information about the program’s structure, relocation table and code/data segments.
Memory: Could occupy more memory than .com files.
Size: They can be larger that .COM files (it can be larger than 64 KB). They are more suitable for complex programs.
Mode: Supports real mode or protected mode.
Use: Suitable for larger, more complex programs.
Base address / Map offest: depends on header.

EXE MZ File Characteristcis

EXE (MZ) debuted in PC-DOS 1.0, and then MS-DOS 2.0, to overcome some limitations the COM files. They are also more complex than its predecessor.

They are called EXE (MZ) because the first two bytes contains the ASCII characters “MZ”, that which were the initials of Mark Zbikowski, a developer of MS-DOS.

EXE (MZ) programs can be written in real mode or protected mode, but were initially designed for real mode.

Unlike COM files, the program can be written in different segments. This removes the limit of programs of 64 KiB that COM files have. Nevertheless, all the segments are written contiguously or adjacent segments within the main memory in standard programs. This can only avoided with the uses of DOS overlays or extenders.

The developer must define different logic segments or zone within the assembly code, so code instructions and data are separated. Each logic segment used a different segment. There could be multiple code and data segments, though usually there was only a single stack segment.

CS points the code segment in execution. DS usually points the segment frequently used. ES points other data segment. SS points the stack segment.

EXE MZ File Structure

The EXE (MZ) file is composed these main parts:

Header
Relocation table
Code/data segments

All these structures are created by the linker, based on the source code and its code/data plus pseudo-ops.

The header contains the program metadata. The header has a variable size, and its minimum is 28 bytes.

The relocation table provides information to the DOS loader so that it can address the different logic segments or zones in which the program is divided. It is an array of segment:offset entries, so each relocation entry is 4 bytes (2 bytes segment offset + 2 bytes segment base).

The header informs the program offset to start reading the relocation table (e_lfarlc) and also the number of relocation entries (e_crlc).

Developing an EXE File

No offset required

In the case of EXE files, the offset remains the same through the whole program.

This is different with COM files, that needs to take into account that the program will be run with an offset of 0x100.

On the other hand, the segments will vary.

Linking an EXE File

The linker supposes that the program is going to be loaded at the beginning of the memory (0000h:0000h), though this never happens in reality because this address contains the Interruption Vector Table (IVT). The assumed segment is known as the load segment.

The linker converts all pseudo-ops references into absolute addresses for segments and offsets. Thus, all segment fixups are relative to this load segment.

Whenever the linker identifies that a different segment is being accessed (e.g., directive seg is used), it adds an entry within the relocation table informing the segment:offset of the instruction with an address that must be reviewed during runtime.

The linker is responsible for inserting padding between segments. Padding is typically done using 00h bytes (null bytes). This ensures that the next segment start at a valid paragraph boundary (i.e., offset divisible by 16).

Running an EXE File

DOS performs these steps when running an EXE File:

Identify load segment
Read EXE header
Allocate memory
Read relocation table
Append PSP
Initiliaze registers
Jump to entry point

Read EXE header

It reads data.

The EXE header is not dumped into memory, it is just used to load the program in memory.

Allocate memory

DOS identifies the code/data segments size. It calculates the needed size for the program as the addition of PSP + code/data segments + stack segment.

Once it knows the size, it finds a suitable place in memory.

It allocates (i.e. reserves) memory for the program and identifies the load segment from the CS and IP within the header.

Read Relocation Table

In case the file doesn’t have a relocation table, the file is interpreted as a COM file with an entry point after the header (at an address after 0x100).

The relocation table is not dumped into memory, it is just used to load the program in memory.

Append PSP

DOS adds the PSP right at the beginning of the allocated memory, within the load segment.

The program payload (without the header and relocation table) is loaded 0x10 segments after the PSP, because the PSP is 0x100 bytes long, i.e. 0x10 paragraphs.

Append code/data

DOS starts appending the content of the EXE file into memory.

The linker already considered that the first segment within the code has an offset of 0x100, so no adjustment is needed.

For each address in the EXE file, it checks whether the address appears in the relocation table. If it does (meaning that the instruction contained in address was identified by the linker as using a far address), it updates the used far address by adding the load segment value to the existing segment address.

After this, all addresses will have been uploaded.

It is important to understand that the relative position within the segments remains the same within the linked machine code (EXE file) and the program loaded in memory.

Intialize registers

The registers CS, IP, SS and SP are updated with the values from the header.

All the other registers are undefined when MS-DOS loads the program. It is the developer’s responsibility to initialize these values within the proram, if required.

Jump to entry point

The first segment doesn’t have to be necessarily the load segment. It is the linker which decides the order.

The entry point is identified from values CS and IP, that was loaded from the header.

As the CS in the header supposes that program starts in segment 0h, the actual CS value is the addition of the header CS plus the starting segment chosen by MS-DOS.

Actual CS = Header CS + Starting segment chosen by MS-DOS

Actual IP = Header IP

EXE MZ “Hello, world!” Example

MYSTACK  SEGMENT STACK 'STACK'  ; Open stack segment
         DW 100h DUP (?)        ; 100h words in stack
MYSTACK  ENDS                   ; Close stack segment

MYDATA   SEGMENT 'DATA'         ; Open data segment
Msg      DB 'Hello, world!$'    ; Message to print
MYDATA   ENDS                   ; Close data ssegment

MYCODE   SEGMENT 'CODE'         ; Open code segment
ASSUME CS:MYCODE, DS:MYDATA, SS:MYSTACK

Entry    PROC                   ; Open procedure 'Entry'
         mov  ax,DATOS          ; Segment value for MYDATA

         mov  ds,ax             ; Access Msg
         mov  dx,OFFSET Msg     ; int 21h, service 9

         mov  ah,9              ; Specify service 9
         int  21h               ; Call service 9: print screen

         mov  ax,4c00H          ; Service 4Ch, return value 0
         int  21H               ; Call service 4Ch: return to DOS

Entry    ENDP                   ; Close the procedure 'Entry'

MYCODE   ENDS                   ; Close the 'MYCODE' segment
         END Entry              ; Set entry point to 'Entry'

External References

OSdev.org community; “MZ“; OSdev.org