The x86 DOS EXE, EXE MZ or MZ file is an executable file in the family of x86 DOS systems.
There are different EXE files, including MZ, NE, LE and PE. The only one used in MS-DOS era was MZ, while NE are for 16-bit Windows and PE for 32/64-bit Windows.
An EXE MZ file can only be executed natively in MS-DOS and Windows 9x/Me.
EXE MZ File Overview
EXE
MZ files:
- Structure: Header containing information about the program’s structure, relocation table and code/data segments.
- Memory: Could occupy more memory than
.com
files. - Size: They can be larger that .COM files (it can be larger than 64 KB). They are more suitable for complex programs.
- Mode: Supports real mode or protected mode.
- Use: Suitable for larger, more complex programs.
- Base address / Map offest: depends on header.
EXE MZ File Characteristcis
EXE (MZ) debuted in PC-DOS 1.0, and then MS-DOS 2.0, to overcome some limitations the COM files. They are also more complex than its predecessor.
They are called EXE (MZ) because the first two bytes contains the ASCII characters “MZ”, that which were the initials of Mark Zbikowski, a developer of MS-DOS.
EXE (MZ) programs can be written in real mode or protected mode, but were initially designed for real mode.
Unlike COM files, the program can be written in different segments. This removes the limit of programs of 64 KiB that COM files have. Nevertheless, all the segments are written contiguously or adjacent segments within the main memory in standard programs. This can only avoided with the uses of DOS overlays or extenders.
The developer must define different logic segments or zone within the assembly code, so code instructions and data are separated. Each logic segment used a different segment. There could be multiple code and data segments, though usually there was only a single stack segment.
CS points the code segment in execution. DS usually points the segment frequently used. ES points other data segment. SS points the stack segment.
EXE MZ File Structure
The EXE (MZ) file is composed these main parts:
- Header
- Relocation table
- Code/data segments
All these structures are created by the linker, based on the source code and its code/data plus pseudo-ops.
The header contains the program metadata. The header has a variable size, and its minimum is 28 bytes.
The relocation table provides information to the DOS loader so that it can address the different logic segments or zones in which the program is divided. It is an array of segment:offset entries, so each relocation entry is 4 bytes (2 bytes segment offset + 2 bytes segment base).
The header informs the program offset to start reading the relocation table (e_lfarlc) and also the number of relocation entries (e_crlc).
Developing an EXE File
No offset required
In the case of EXE files, the offset remains the same through the whole program.
This is different with COM files, that needs to take into account that the program will be run with an offset of 0x100.
On the other hand, the segments will vary.
Linking an EXE File
The linker supposes that the program is going to be loaded at the beginning of the memory (0000h:0000h), though this never happens in reality because this address contains the Interruption Vector Table (IVT). The assumed segment is known as the load segment.
The linker converts all pseudo-ops references into absolute addresses for segments and offsets. Thus, all segment fixups are relative to this load segment.
Whenever the linker identifies that a different segment is being accessed (e.g., directive seg
is used), it adds an entry within the relocation table informing the segment:offset of the instruction with an address that must be reviewed during runtime.
The linker is responsible for inserting padding between segments. Padding is typically done using 00h bytes (null bytes). This ensures that the next segment start at a valid paragraph boundary (i.e., offset divisible by 16).
Running an EXE File
DOS performs these steps when running an EXE File:
- Identify load segment
- Read EXE header
- Allocate memory
- Read relocation table
- Append PSP
- Initiliaze registers
- Jump to entry point
Read EXE header
It reads data.
The EXE header is not dumped into memory, it is just used to load the program in memory.
Allocate memory
DOS identifies the code/data segments size. It calculates the needed size for the program as the addition of PSP + code/data segments + stack segment.
Once it knows the size, it finds a suitable place in memory.
It allocates (i.e. reserves) memory for the program and identifies the load segment from the CS and IP within the header.
Read Relocation Table
In case the file doesn’t have a relocation table, the file is interpreted as a COM file with an entry point after the header (at an address after 0x100).
The relocation table is not dumped into memory, it is just used to load the program in memory.
Append PSP
DOS adds the PSP right at the beginning of the allocated memory, within the load segment.
The program payload (without the header and relocation table) is loaded 0x10 segments after the PSP, because the PSP is 0x100 bytes long, i.e. 0x10 paragraphs.
Append code/data
DOS starts appending the content of the EXE file into memory.
The linker already considered that the first segment within the code has an offset of 0x100, so no adjustment is needed.
For each address in the EXE file, it checks whether the address appears in the relocation table. If it does (meaning that the instruction contained in address was identified by the linker as using a far address), it updates the used far address by adding the load segment value to the existing segment address.
After this, all addresses will have been uploaded.
It is important to understand that the relative position within the segments remains the same within the linked machine code (EXE file) and the program loaded in memory.
Intialize registers
The registers CS, IP, SS and SP are updated with the values from the header.
All the other registers are undefined when MS-DOS loads the program. It is the developer’s responsibility to initialize these values within the proram, if required.
Jump to entry point
The first segment doesn’t have to be necessarily the load segment. It is the linker which decides the order.
The entry point is identified from values CS and IP, that was loaded from the header.
As the CS in the header supposes that program starts in segment 0h, the actual CS value is the addition of the header CS plus the starting segment chosen by MS-DOS.
Actual CS = Header CS + Starting segment chosen by MS-DOS
Actual IP = Header IP
EXE MZ “Hello, world!” Example
MYSTACK SEGMENT STACK 'STACK' ; Open stack segment
DW 100h DUP (?) ; 100h words in stack
MYSTACK ENDS ; Close stack segment
MYDATA SEGMENT 'DATA' ; Open data segment
Msg DB 'Hello, world!$' ; Message to print
MYDATA ENDS ; Close data ssegment
MYCODE SEGMENT 'CODE' ; Open code segment
ASSUME CS:MYCODE, DS:MYDATA, SS:MYSTACK
Entry PROC ; Open procedure 'Entry'
mov ax,DATOS ; Segment value for MYDATA
mov ds,ax ; Access Msg
mov dx,OFFSET Msg ; int 21h, service 9
mov ah,9 ; Specify service 9
int 21h ; Call service 9: print screen
mov ax,4c00H ; Service 4Ch, return value 0
int 21H ; Call service 4Ch: return to DOS
Entry ENDP ; Close the procedure 'Entry'
MYCODE ENDS ; Close the 'MYCODE' segment
END Entry ; Set entry point to 'Entry'
You might also be interested in…
External References
- OSdev.org community; “MZ“; OSdev.org