_ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ 7.1 Introduction This chapter presents the object record formats that define the relocatable object language for the 8086, 80186, and 80286 microprocessors. The 8086 object language is the output of all language translators that have an 8086 processor and that will be linked by the Microsoft linker. The 8086 object language is used for input and output for object language processors such as linkers and librarians, and is used in the XENIX, PC-DOS, and MS-DOS operating systems. The 8086 object module formats let you specify relocatable memory images that may be linked together. These formats also allow efficient use of the memory mapping facilities of the 8086 family of microprocessors. The following table lists the record formats (each described in this chapter) that Microsoft supports (the abbreviations appear in parentheses): Table 7.1 Object Module Record Formats _ _________________________________________________________________________ Symbol Definition Records Public Names Definition Record (PUBDEF) Communal Names Definition Record (COMDEF) Local Symbols Record (LOCSYM) External Names Definition Record (EXTDEF) Line Numbers Record (LINNUM) Data Records Logical Enumerated Data Record (LEDATA) Logical Iterated Data Record (LIDATA) T-Module Header Record (THEADR) L-Module Header Record (LHEADR) List of Names Record (LNAMES) Segment Definition Record (SEGDEF) Group Definition Record (GRPDEF) Fixup Record (FIXUPP) Module End Record (MODEND) Comment Record (COMENT) _ _________________________________________________________________________ 3 _ _ | | _ _ _ _ | | _ _ _ ______________ _ ________________________________________________________________ Note If an object module contains any undefined values, the behavior of the Microsoft linker is undefined. All undefined values should be con- sidered reserved by Microsoft for future use. _ ________________________________________________________________ 7.1.1 Definition of Terms The following terms are fundamental to 8086 relocation and linkage: OMF - Object Module Formats MAS - Memory Address Space The 8086 MAS is one megabyte (1,048,576 bytes). Note that the MAS is distinguished from actual memory, which may occupy only a portion of the MAS. Module A module is an "inseparable" collection of object code and other informa- tion produced by a translator. T-Module A T-module is a module created by a translator, such as Pascal or FOR- TRAN. The following restrictions apply to object modules: o Every module should have a name. Translators provide default names (possibly filenames or null names) for T-modules if neither the source code nor the user specifies otherwise. o Every T-module in a collection of linked modules should have a different name so that symbolic debugging systems can distinguish the various line numbers and local symbols. The Microsoft linker does not require or enforce this restriction. 4 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ Frame A frame is a contiguous region of 64K of memory address space (MAS), beginning on a paragraph boundary (i.e., on a multiple of 16 bytes) or on a selector on the 80286 processor. This concept is useful because the con- tents of the four 8086 segment registers define four (possibly overlapping) frames; no 16-bit address in the 8086 code can access a memory location outside the current four frames. LSEG (Logical Segment) A logical segment (LSEG) is a contiguous region of memory whose contents are determined at translation time (except for address-binding). Neither the size nor the location in MAS are necessarily determined during transla- tion: the size, although partially fixed, may not be final because the linker may combine the LSEG when linking with other LSEGs, forming a single LSEG. So that it can fit in a frame, an LSEG must not be larger than 64K. Thus, a 16-bit offset, from the base of a frame that covers the LSEG, may address any byte in that LSEG. PSEG (Physical Segment) This term is equivalent to frame. Some prefer "PSEG" to "frame" because the terms PSEG and LSEG reflect the "physical" and "logical" nature of the underlying segments. Frame Number Every frame begins on a paragraph boundary. The "paragraphs" in MAS can be numbered from 0 through 65,535. These numbers, each of which defines a frame, are called frame numbers. Group A group is a collection of LSEGs defined at translation time, whose final locations in MAS have been constrained so that at least one frame exists that covers (contains) every LSEG in the collection. The notation "Gr A(X,Y,Z,)" means that LSEGs X, Y, and Z form a group named A. That X, Y, and Z are all LSEGs in the same group does not imply any ordering of X, Y, and Z in MAS, nor does it imply any contiguity between X, Y, and Z. 5 _ _ | | _ _ _ _ | | _ _ _ ______________ The Microsoft linker does not currently allow an LSEG to be a member of more than one group. Canonic On the 8086 processor, any location in MAS is contained in exactly 4096 distinct frames, but one of these frames can be distinguished because it has a higher frame number. This frame is called the canonic frame of the location. In other words, the canonic frame of a given byte is the frame chosen so that the byte's offset from that frame lies in the range 0 to 15 (decimal). For example, suppose FOO is a symbol defining a memory location. You would then refer to this frame as the "canonic frame of FOO." Similarly, if S is any set of memory locations, then a unique frame exists that has the lowest frame number in the set of canonic frames of the locations in S. This unique frame is called the canonic frame of the set S. You might refer similarly to the canonic frame of an LSEG or of a group of LSEGs. Segment Name LSEGs are assigned Segment Names at translation time. These names serve two purposes: o During linking they play a role in determining which LSEGs are combined with other LSEGs. o They are used in assembly source code to specify membership in groups. Class Name The translator may optionally assign class names to LSEGs during transla- tion. Classes define a partition on LSEGs: two LSEGs are in the same class if they have the same class name. The Microsoft linker applies the following semantics to class names: the class name "CODE", or any class name whose suffix is "CODE", implies that all segments of that class contain only code and may be considered read-only. Such segments may be overlaid if you specify the module con- taining the segment as part of an overlay. 6 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ Overlay Name The linker may optionally assign an overlay name to LSEGs. The overlay name of an LSEG is ignored by Microsoft language linkers for version 3.00 and later languages, but the standard MS-DOS linker supports it. Complete Name The complete name of an LSEG consists of the segment name, class name, and overlay name. The linker combines LSEGs from different modules if their complete names are identical. 7.2 Module Identification and Attributes A module header record, which provides a module name, is always the first record in a module. In addition to having a name, a module may represent a main program and may have a specified starting address. When linking multiple modules together, you should give only one module with the main attribute. If more than one main module appears, the first takes precedence. In summary, modules may or may not be main and may or may not have a starting address. 7.2.1 Segment Definition A module is a collection of object code defined by a sequence of records that a translator produces. The object code represents contiguous regions of memory whose contents the linker determines during translation. These regions are LSEGs. A module defines the attributes of each LSEG. The segment definition record (SEGDEF) is responsible for maintaining all LSEG information (name, length, memory alignment, etc.). The linker requires the LSEG information when you combine multiple LSEGs and when it establishes segment addressability. The SEGDEF records must follow the first header record. 7.2.2 Addressing a Segment The 8086 addressing mechanism provides segment base registers from which you may address a 64K-byte region of memory (a Frame). There is one code segment base register (CS), two data segment base registers (DS, ES), and one stack segment base register (SS). 7 _ _ | | _ _ _ _ | | _ _ _ ______________ The possible number of LSEGs that may make up a memory image far exceeds the number of available base registers. Thus, base registers may require frequent loading. This would be the case in a modular program with many small data and/or code LSEGs. Since such frequent loading of base registers is undesirable, it is a good strategy to collect many small LSEGs together into a single unit that will fit in one memory frame. Then all the LSEGs may be addressed using the same base register value. This addressable unit is a group and has been defined earlier in Section 7.1.1, "Definition of Terms." To establish addressability of objects within a group, you must explicitly define each group in the module. The group definition record (GRPDEF) lists constituent segments by their segment names. The GRPDEF records within a module must follow all SEGDEF records because GRPDEF records will reference SEGDEF records in defining a Group. The GRPDEF records must also precede all other records except header records, which the linker must process first. 7.2.3 Symbol Definition The Microsoft linker supports three different types of records belonging to the class of symbol definition records. The types are public names definition records (PUBDEFs), communal names definition records (COM- DEFs), and external names definition records (EXTDEFs). You use these record types to define globally visible procedures and data items and to resolve external references. 7.2.4 Indices "Index" fields appear throughout this chapter. An index is an integer that selects a particular item from a collection of items; for example: name index, segment index, group index, external index, type index, etc. _ ________________________________________________________________ Note An index is normally a positive number. The index value zero is reserved, and may carry a special meaning depending on the type of index (for example, a segment index of zero specifies the "Unnamed" absolute pseudo-segment; a type index of zero specifies the "Untyped type.") _ ________________________________________________________________ In general, indices must assume values that are quite large (that is, much 8 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ larger than 255). Nevertheless, a great number of object files contain no indices with values greater than 50 or 100. Therefore, indices are encoded in one or two bytes, as required. The high-order (left-most) bit of the first (and possibly the only) byte determines whether the index occupies one byte or two. If the bit is 0, the index is a number between 0 and 127, occupying one byte. If the bit is 1, the index is a number between 0 and 32K-1, occupying two bytes, and is determined as follows: the low-order eight bits are in the second byte, and the high-order seven bits are in the first byte. 7.3 Conceptual Framework for Fixups A fixup is a modification to object code that achieves address binding that a translator requested and a linker performed. _ ________________________________________________________________ Note This is the linker's definition of fixup. Nevertheless, the linker can modify object code (make a "fixup") that does not conform to this definition. For example, binding code to either hardware or software floating-point subroutines is a modification to an operation code, which is treated as an address. The previous definition of fixup is not intended to disallow or discourage modifications to the object code. _ ________________________________________________________________ 8086-family translators need four kinds of data to specify a fixup: o The place and type of a Location to be fixed up. o One of two possible fixup modes. o A target, which is the memory address that Location must refer to. o A frame that defines a context in which the reference takes place. Location \(em There are five types of Locations: a pointer, a base, an offset, a hibyte, and a lobyte. The vertical alignment of the following figure illustrates four points (remember that the high-order byte of a word in 8086 memory is the byte with the higher address): o A base is the high-order word of a pointer (the linker doesn't care whether the low-order word of the pointer is present). 9 _ _ | | _ _ _ _ | | _ _ _ ______________ o An offset is the low-order word of a pointer (the linker doesn't care whether the high-order word follows). o A hibyte is the high-order half of an offset (the linker doesn't care whether the low-order half precedes). o A lobyte is the low-order half of an offset (the linker doesn't care whether the high-order half follows). +----+----+----+----+ Pointer: | | +----+----+----+----+ +----+----+ Base: | | +----+----+ +----+----+ Offset: | | +----+----+ +----+ Hibyte: | | +----+ +----+ Lobyte: | | +----+ Figure 7.1 Location Types A Location is specified by two kinds of data: the Location type, and where the Location is (the location of the Location?). The Location type is specified by the LOC field in the FIXUPP record's LOCAT field; where the Location is is specified by the Data Record Offset field in the same LOCAT field. Mode \(em The Microsoft linker supports two kinds of fixups: self-relative and segment-relative. Self-relative fixups support the 8-bit and 16-bit offsets used in CALL, JUMP, and SHORT-JUMP instructions. Segment-relative fixups support all other addressing modes of the 8086. Target \(em The target is the location in MAS that the linker references. (More explicitly, the linker considers the target the lowest byte in the object that it is referencing.) The linker specifies a target by one of six methods. There are three "primary" methods and three "secondary" ones. Each primary method of specifying a target uses two kinds of data: an index number X, and a displacement D. 10 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ _ ________________________________________________________________ (T0) X is a segment index. The target is the Dth byte in the LSEG that the segment index identifies. (T1) X is a group index. The target is the Dth byte in the LSEG that the group index identifies. (T2) X is an external index. The external index identifies the external name that (eventually) gives the address of a byte. The Dth byte following this byte is the target. Each secondary method of specifying a target uses only one item of data \(em the index number X; this assumes an implicit displacement equal to zero. _ ________________________________________________________________ (T4) X is a segment index. The target is the 0th (first) byte in the LSEG that the segment index identifies. (T5) X is a group index. The target is the 0th (first) byte in the LSEG in the specified group located (eventually) lowest in MAS. (T6) X is an external index. The target is the byte whose address is the external name that the external index identifies. The following nomenclature describes a target: _ ________________________________________________________________ Target: SI(segment name), displacement [T0] Target: GI(group name), displacement [T1] Target: EI(symbol name), displacement [T2] Target: SI (segment name) [T4] Target: GI (group name) [T5] Target: EI (symbol name) [T6] The following examples illustrate how this notation is used: _ ________________________________________________________________ Target: SI(CODE), 1024 The 1025th byte in the segment CODE. Target: GI(DATAAREA) The location in MAS of a group called DATAAREA. Target: EI(SIN) The address of the external subrou- tine SIN. Target: EI(PAYSCHEDULE), 24 The 24th byte following the location of an external data structure called PAYSCHEDULE. 11 _ _ | | _ _ _ _ | | _ _ _ ______________ Frame \(em Every 8086 memory reference is to a location contained within a frame. This frame is designated by the content of a segment register. For the linker to form a correct, usable memory reference, it must know what the target is, and to which frame the reference is being made. Thus, every fixup specifies such a frame, in one of six methods. Some methods use data, X, which is in the index number. Other methods require no data. The five methods of specifying frames are as follows: _ ________________________________________________________________ (F0) X is a segment index. The frame is the canonic frame of the LSEG that the segment index defines. (F1) X is a group index. The frame is the canonic frame defined by the group (that is, the canonic frame defined by the LSEG in the group located (eventually) lowest in MAS). (F2) X is an external index. The frame is determined when the linker finds the external name's public definition. There are two cases: _ _________________________________________________________ (F2a) The linker defines the symbol relative to some LSEG, and there is no associated group. The linker also specifies the LSEG's canonic frame. (F2c) Regardless of how the linker defines the symbol, there is an associated group. And the linker specifies the canonic frame of the group. (The Group Index field of the PUBDEF record specifies the group.) (F4) No X. The frame is the canonic frame of the LSEG that contains Location. (F5) No X. The target determines the frame. There are three cases: _ _________________________________________________________ (F5a) The target specifies a segment index: in this case, the frame is determined as in (F0). (F5b) The target specifies a group index: in this case, the frame is determined as in (F1). (F5c) The target specifies an external index: in this case, the frame is determined as in (F2). The nomenclature that describes frames is similar to the above nomencla- ture for targets. _ ________________________________________________________________ Frame: SI (segment name) [F0] Frame: GI (group name) [F1] 12 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ Frame: EI (symbol name) [F2] Frame: Location [F4] Frame: target [F5] Frame: None [F6] For an 8086 memory reference, the frame specified by a self-relative refer- ence is usually the canonic frame of the LSEG that contains Location. Also, the frame specified by a segment-relative reference is the canonic frame of the LSEG that contains the target. 7.3.1 Self-Relative Fixup A self-relative fixup works as follows: Location implicitly defines a memory address\(emnamely, the address of the byte following Location (because at the time of a self-relative reference, the 8086 IP (Instruction Pointer) is pointing to the byte following the reference). For 8086 self-relative references, if either the Location or the target is out- side the specified frame, the linker gives a warning. Otherwise, there is a unique l6-bit displacement that, when added to the address implicitly defined by Location, yields the relative position of the target in the frame. o If Location is an offset, the linker adds the displacement to Loca- tion (modulo 65,536) and reports no errors. o If Location is a lobyte, the displacement must be within the range {-128:127}; otherwise, the linker gives a warning. The linker adds the displacement to Location (modulo 256). o If Location is a base, pointer, or hibyte, it is unclear what the translator intended, so the linker's action is undefined. 7.3.2 Segment-Relative Fixup A segment-relative fixup operates as follows: a nonnegative 16-bit number, FBVAL, is defined as the frame number of the frame or selector value that the fixup specifies. A signed 20-bit number, FOVAL, is defined as the dis- tance from the base of the frame to the target. If this signed 20-bit number is less than 0 or greater than 65,535, the linker reports an error. Otherwise, the linker uses FBVAL and FOVAL to fix up Location in the following fashion: o If Location is a pointer, the linker adds FBVAL (modulo 65,536) to the high-order word of pointer, and adds FOVAL (modulo 65,536) to the low-order word of pointer. 13 _ _ | | _ _ _ _ | | _ _ _ ______________ o If Location is a base, the linker adds FBVAL (modulo 65,536) to the base and ignores FOVAL. o If Location is an offset, the linker adds FOVAL (modulo 65,536) to the offset and ignores FBVAL. o If Location is a hibyte, the linker adds (FOVAL/256) (modulo 256) to the hibyte and ignores FBVAL. (The division indicated is integer division; that is, the linker discards the remainder.) o If Location is a lobyte, the linker adds (FOVAL modulo 256) (modulo 256) to the lobyte and ignores FBVAL. 7.4 Record Sequence A object code file must contain a sequence of (one or more) modules, or a library containing zero or more modules. The following syntax shows the valid record ordering necessary to form a module. In addition, the given semantic rules provide information about how to interpret the record sequence. _ ________________________________________________________________ Note The description language used in the following syntax is defined in WIRTH: CACM, November 1977, vol. 20, no. 11, pp. 822-823. The character strings represented by capital letters are not literals but identifiers, and are further defined in the record format section. _ ________________________________________________________________ object file = tmodule tmodule = {THEADR | LHEADR} seg-grp {component} modtail seg_grp = {LNAMES} {SEGDEF} {EXTDEF | GRPDEF} component = data | debug_record data = content_def | thread_def | PUBDEF | EXTDEF | COMDEF | LOCSYM debug_record = LINNUM content_def = data_record {FIXUPP} thread_def = FIXUPP (containing only Thread fields) data_record = LIDATA | LEDATA 14 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ modtail = MODEND The following rules apply: o A FIXUPP record always refers to the previous data record. o All LNAMES, SEGDEF, GRPDEF, and EXTDEF records must pre- cede all records that refer to them. o Comment records may appear anywhere in a file, except as the first or last record in a file or module, or within a content_def. 7.5 Introducing the Record Formats The following pages present diagrams of record formats in schematic form. Here is a sample record format that illustrates the various conventions: 7.5.1 Sample Record Format (SAMREC) -----------------------///---------||||----------- | | | | | | | REC | Record | Name | Number | CHK | | TYP | Length | | | SUM | | xxH | | | | | | | | | | | ----------------------///----------||||----------- | | +----rpt-----+ The Title and Official Abbreviation At the top of the figure is the name of the record format described, with its official abbreviation. To promote uniformity among various programs, including translators and debuggers, use the abbreviation in both code and documentation. The abbreviation of the record format is always six letters. The Boxes Each format is drawn with boxes of two sizes. The narrow boxes represent single bytes. The wide boxes each represent two bytes. The wide boxes with three slashes in the top and bottom represent a variable number of bytes, one or more, depending upon content. The wide boxes with four vertical bars in the top and bottom represent four-byte fields. 15 _ _ | | _ _ _ _ | | _ _ _ ______________ RECTYP The first byte in each record contains a value between 0 and 255, indicat- ing the record type. Record Length The second field in each record contains the number of bytes in the record, exclusive of the first two fields, where a field is a 16-bit number\(ema low byte followed by a high byte. Name Any field that indicates a name has the following internal structure: the first byte contains a number between 0 and 127, inclusive, indicating the number of remaining bytes in the field. The remaining bytes are inter- preted as a byte string. Most translators constrain the character set to a subset of the ASCII char- acter set. Number A four-byte number field represents a 32-bit unsigned integer, where the first eight bits (least-significant) are stored in the first byte (lowest address), the next eight bits are stored in the second byte, and so on. Repeated or Conditional Fields Some portions of a record format contain a field or series of fields that may be repeated one or more times. Such portions are indicated by the "repeated" or "rpt" brackets below the boxes. Similarly, some portions of a record format are present only if some given condition is true; these fields are indicated by similar "conditional" or "cond" brackets below the boxes. CHKSUM The last field in each record is a check sum, which contains the two's com- plement of the sum (modulo 256) of all other bytes in the record. There- fore, the sum (modulo 256) of all bytes in the record is zero. 16 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ Bit Fields Sometimes descriptions of contents of fields are at the bit level. Boxes with vertical lines drawn through them represent bytes or words; the verti- cal lines indicate bit boundaries. Thus, the following byte representation has three bit fields of three, one, and four bits, respectively. --------------------------------- | | | | | | | | | | | | | | | | | | | | | | --------------------------------- 3 1 4 7.5.2 T-Module Header Record (THEADR) -----------------------///----------- | | | | | | REC | Record | T- | CHK | | TYP | Length | Module | SUM | | 80H | | Name | | | | | | | ----------------------///------------ T-Module Name The T-Module Name field contains the name for the T-module. 7.5.3 L-Module Header Record (LHEADR) -----------------------///----------- | | | | | | REC | Record | L- | CHK | | TYP | Length | Module | SUM | | 82H | | Name | | | | | | | ----------------------///------------ L-Module Name The L-Module Name field contains the name for the L-module. Every module output from a translator must have a T-module or L-module header record. The linker requires a THEADR or LHEADR record to come first in the module and ignores any others. The LHEADR record is identi- cal to the THEADR record, except it has a record type of 82H. 17 _ _ | | _ _ _ _ | | _ _ _ ______________ 7.5.4 List of Names Record (LNAMES) -----------------------///----------- | | | | | | REC | Record | Name | CHK | | TYP | Length | | SUM | | 96H | | | | | | | | | ----------------------///------------ | | +----rpt-----+ The LNAMES record contains a list of names that the following SEGDEF and GRPDEF records may use as the names of segments, classes, and/or groups. The order of LNAMES records in a module and the order of names within each LNAMES record imply a mapping of these names to numbers: 1, 2, 3, etc. These numbers are used as "Name Indices" in the Segment Name Index, Class Name Index, and Group Name Index fields of the SEGDEF and GRPDEF records. Name This repeatable field provides a name, which may have zero length. 7.5.5 Segment Definition Record (SEGDEF) -----------------///-----------------///-----///---///------ | | | | | | | | | |REC| Record | Segment | Segment | Segment |Class|Over |CHK| |TYP| Length | ATTR | Length | Name |Name |Lay |SUM| |98H| | | | Index |Index|Name | | | | | | | | |Index| | -----------------///-----------------///-----///---///------ Segment index values 1 through 32,767, which are used in other record types to refer to specific LSEGs, are defined implicitly by the sequence in which SEGDEF records appear in the object file. SEG ATTR The SEG ATTR field provides information on various attributes of a seg- ment, and has the following format: ------------------------ | | | | | ACB | Frame | Off- | 18 _ _ | | _ _ _ _ | | _ _ Microsoft Relocatable Object Module Formats _ __________________________________________________ | P | Number | Set | | | | | | | | | -------------r---------- | | +---conditional--+ The ACBP byte contains four numbers\(emthe A, C, B, and P attribute specifications. This byte has the following format: --------------------------------- | | | | | | | | | | A | C | B | P | | | | | | | | | | --------------------------------- A (Alignment) is a 3-bit subfield that specifies the alignment attribute of the LSEG. The semantics are defined as follows: _ ________________________________________________________________ A=0 SEGDEF describes an absolute LSEG. A=1 SEGDEF describes a relocatable, byte-aligned LSEG. A=2 SEGDEF describes a relocatable, word-aligned LSEG. A=3 SEGDEF describes a relocatable, paragraph-aligned LSEG. A=4 SEGDEF describes a relocatable, page(256-byte)-aligned LSEG. If A=0, the Frame Number and Offset fields are present. With the Micro- soft linker, you may use absolute segments for addressing only; for exam- ple, to define the starting address of a ROM and to define symbolic names for addresses within the ROM. The linker ignores any data that belongs to an absolute LSEG, and issues a warning if absolute segments are defined for a program that runs in protected mode. C (Combination) is a 3-bit subfield that specifies the Combination attri- bute of the LSEG. Absolute segments (A=0) must have combination zero (C=0). For relocatable segments, the C field encodes a number (0,1,2,3,4,5,6, or 7) that indicates how the segment can be combined. One way to interpret this attribute is to consider how two LSEGs are com- bined. For example, suppose that X and Y are LSEGs, and that Z is the LSEG resulting from the combination of X and Y. Let LX and LY be the lengths of X and Y, and let MXY denote the maximum of LX, LY. Now, to accom- modate the alignment attribute of Y, let G be the length of any gap required between the X and Y components of Z. Then, let LZ denote the length of the (combined) LSEG, Z; let dx (0\(<=dx