MS-DOS

This is a continuation of my series of articles on the history of personal computers. It follows my fifth CP/M article.

The series starts with my first CP/M article. If you have not read the previous articles, I recommend that you read them first.


Looking at 8 bits

Our story started with the Intel 8080 CPU (and its clone, the Zilog z80 CPU), which was an 8 bit CPU with 8 bit bytes, meaning that both the smallest and largest unit it could address in memory was 8 bits wide. Its general purpose registers, meant to store data loaded from such a unit in memory, were also 8 bits wide.

But the 8080 did have 16 bit registers as well. Most prominently the program counter and registers used to address memory were 16 bits wide, because the 8080 CPU did have an address bus 16 bits wide. It could address, without using bank switching, 64 KB of memory.

The 8080's 8 bit registers are paired to form 16 bit registers. The registers are nominally general purpose registers, which apparently means that programmers are supposed to use them or at least know of them. But convention and limitations of the CPU's instruction set do define certain purposes to each register. Generally, those purposes are these:

  1. Accumulator: originally the only register of a CPU this is the register which usually contains the number one works with
  2. Flags register: this register is not used as a register but contains (for example 8) bits that are either on or off to switch on or off certain modes or mechanisms of the CPU or the computer; it is listed here only because it is paired with the 8 bit accumulator of the 8080
  3. Address register: such a register contains a memory address, usually a destination address (meaning an address the programmer sets) or a source address (meaning an address returned by a call to a calling routine)
  4. Counter: a counter
  5. System call register: used to store the system call number a system call parameter
  6. Not normally used: a register or view of a register that isn't used for anything but exists anyway for technical reasons

All of these registers except the system call registers can be 8 or 16 bit wide in the Intel 8080 and Zilog z80.

I am getting to why this is relevant for MS-DOS shortly. Stay with me.

The 8080's general purpose registers are the 8 bit accumulator and three 16 bit registers consisting of two 8 bit registers each:

  • A (8 bit accumulator)
  • HL (source address) consisting of H (not normally used) and L (not normally used)
  • BC (16 bit counter) consisting of B (8 bit counter) and C (system call number)
  • DE (destination address) consisting of D (not normally used) and E (system call parameter)

Note that registers used for system calls by the operating system have not technically been designed for that purpose, at least at first. The OS vendor just chose those because they were available.

This was the standard for CP/M computers and when Intel designed a 16 bit CPU to take over the market, Intel tried to make it easy for software to be written for the new CPU when it had already been written for the old CPU.


Enter 16 bits

The Intel 8086 CPU, which followed the 8080-compatible Intel 8085 CPU, paid lip service to 8080-compatibility. But it also added several new features.

The 8080 CPU was an 8 bit CPU and could address a minimum and maximum unit of 8 bits in memory. The new 8086 CPU could address a minimum unit of 8 bits (the CPU's "byte") but a maximum unit of 16 bits (the CPU's "word") in memory. The old 8080 CPU could address up to 2^16 bytes (the minimum unit) of memory, the new 8086 CPU can address 2^20 bytes of memory. I am sure there are other differences but those are the ones we have to worry about here. Apparently the new CPU was also faster.

I'll ignore the new 20 but memory addressing scheme for the moment.

The 8086's general purpose registers are the following 16 bit registers consisting of two 8 bit registers each:

  • AX (16 bit accumulator) consisting of AH (MS-DOS system call number) and AL (8 bit accumulator)
  • BX (source address) consisting of BH and BL
  • CX (16 bit counter) consisting of CH and CL
  • DX (destination address) consisting of DH and DL (system call parameter)

Note that the 8086's flags registers are not paired with general purpose registers and hence don't appear here in this list.

The 8086 machine language is sufficiently similar to the 8080's that assembler programs could be translated automatically (and then re-assemled for the 8086 CPU).


Enter MS-DOS

MS-DOS, originally marketed as PC DOS and originally named 86-DOS, is a CP/M clone. The term "clone" here does not mean "identical genes" as it does in biology but that the "clone" behaves like the "cloned" software, that is MS-DOS behaves like CP/M. It is important to note that a clone is not a copy. It is a piece of software written separately to conform to certain standards defined by another piece of software.

Such a clone might not just behave like the original software but might happily add new features, especially to support new hardware which the original software was never designed to support. The creation of clones in the software industry is as old as the creation of industry standards. Linux is a clone of UNIX, MS-DOS is a clone of CP/M.

MS-DOS quickly became better than CP/M. (I have heard that Linux has become better than UNIX.)

Back in 1980 Microsoft was a developer of developer tools, a compiler writer. All the example programs written for the previous articles in this series were created with Microsoft's products for CP/M. When IBM designed a new personal computer based around the Intel 8086 CPU, IBM went to Microsoft to buy developer tools for it and ultimately an operating system as well, which was then sold by Microsoft for less money per licence than the 16 bit version of CP/M that was finished later than Microsoft's 86-DOS.

MS-DOS quickly became the market leader in the new 16 bit world.

Ignoring the new memory features of (later versions of) MS-DOS and the 8086 CPU, MS-DOS looked and felt pretty much like CP/M, naturally, with the following notable differences:

  • The command interpreter was called command.com instead of cpp.com.
  • The command interpreter's prompt was "A:>" rather than "A>".
  • The command to copy a file was an internal command copy rather than CP/M's external pip.com.
  • The copy command followed the syntax of Unix' cp command, i.e. copy file1 file2 to copy from file1 to file2.
  • Files' sizes are counted in bytes rather than blocks, allowing text files not to be filled in with ^Z.
  • MS-DOS ran on IBM PCs and compatible using an 8086 CPU
  • MS-DOS eventually supported subdirectories (as did later 16 bit versions of CP/M).
  • MS-DOS eventually supported hard disks (as did later 16 bit version of CP/M)

But apart from a few differences like those MS-DOS and CP/M were very similar. They also ran the same programs, if translated and re-assembled to 8086 machine code or written in C or some other high-level language. There was no binary compatibility between the two CPUs or between MS-DOS and 16 bit CP/M for the same CPU.


Enter the IBM PC

The IBM PC and the 8086 CPU could address 2^20 bytes or 1 MB of RAM.

This is what the world looks like in an IBM PC:

ibmmem

1 MB of addressable memory are divided into two zones: 640 KB of "conventional memory" and 384 KB of ROM, video RAM, and other stuff like that. A typical IBM PC had 64 KB of memory plus ROM and, presumably, some sort of video memory. I assume this gives the programmer more memory than he had on a CP/M computer since the BIOS doesn't have to be in that same 64 KB.

If the computer had more than 64 KB of memory, the operating system itself can also be outside the 64 KB. And if the computer had a multiple of 64 KB of memory, the program itself could occupy more than 64 KB of memory. I will come to that later. For our purposes here there are 64 KB of Transient Program Area somewhere in the Conventional Memory, the operating system is also somewhere and for some reason all addresses we are using are relative to the start of our Transient Program Area. This is actually literally true.

The CP/M-like Transient Program Area of a running program can be located anywhere in the first 640 KB.

Hello, IBM world

MS-DOS supports the same system calls as CP/M. In general. For the most part. Sometimes.

I wrote about CP/M that system calls can be made by calling address 05h in the Zero Page.

The instruction at address 05h calls the BDOS and allows a transient program to make a system call without having to know where in memory the operating system actually is. System calls are made by configuring registers for the call number and parameters and then calling address 05h. (In DOS jargon this is referred to as a "call 5" system call. It is supported by both CP/M and MS-DOS.)

(See my article on CP/M at the start of this series.)

This theoretically still holds true. Allegedly and according to the 86-DOS documentation, MS-DOS does support making system calls the same way CP/M does. I never got this to work. The computer simply hangs when I try.

However, making system calls does work using the native 86-DOS way of doing it. Instead of call 5 MS-DOS defines a software interrupt 21h that does the job and can be called as int 21h. This actually executes an interrupt which then follows the interrupt table somewhere in memory where the jump address to the operating system is stored.

The system call number goes in (8 bit) register AH, a system call parameter goes into register DL, and a memory address to be used by the system call goes into (16 bit) register DX.

A "Hello, world" for MS-DOS thus looks very similar to a "Hello, world" program for CP/M.

hello1

You can copy and paste the source from here: hello1.asm

hellodos

This appears to work.

To be continued…


Sources

MS-DOS/DR DOS system calls: http://spike.scu.edu.au/~barry/interrupts.html


Useful Software

DOSBox DOS/x86 emulator: http://www.dosbox.com

NASM x86 assembler for Unix: http://www.nasm.us

 © Andrew Brehm 2016