Note: This tutorial assumes a working knowledge of binary and hex. If you don't have a working knowledge of these number systems, and you are interested in learning assembler, let me know and I'll write a brief introduction to binary. Currently, until I can do some testing, this tutorial also only works for Windows and MS-DOS computers. If anyone with a Linux box is willing to test a routine for me, please let me know.
The first thing you need to write assembly is a compiler. I use TASM which is made by Borland and can be found in their C++ Builder 5. All of the code provided here will be written for TASM. If you don't feel like buying the Borland C++ compiler you can also use NASM or FASM. I will try to point out how to change the code so it compiles under these two compilers, but I won't promise anything because I may forget.
There are also some utilities that will prove very important. The first is Ralf Brown's Interrupt List. The second is the intel software developement manuals, but these are not needed at this point in time.
Unlike higher level languages, assembly does not really have variables. It has two kinds of data storage units. The first, registers, you can actually perform operations on. The second, labels, are simply placeholders for information, like pointers in a higher level language.
The 386 and newer series of chip have four basic registers. They are the accumulator register, ax, the base register, bx, the count register, cx, and the displacement register, dx.
Each register is subdivide into two smaller registers, the high register, xh, and the low register, xl. The large register contains 16 bits, while the sub registers contain 8 bits. There is also the extended form of the register which simply adds an "e" to the beginning of the complete register's name. Thus we have eax, ebx, ecx, and edx. These registers contain 32 bits. (See Figure 1)
Figure 1: The registers using the accumulator as an example
|------------------------------EAX------------------------------|
[][][][][][][][] [][][][][][][][] [][][][][][][][] [][][][][][][][]
|--------------AX--------------|
[][][][][][][][] [][][][][][][][]
|------AH-----||------AL------|
[][][][][][][][] [][][][][][][][]
The 386 and beyond also have segment registers. They are ds, the data segment, cs, the code segment, es, the extra segment, and ss, the stack segment. There are also special registers like si, the source index, di, the destination index, bp, the base pointer, sp, the stack pointer, and ip, the instruction pointer. Finally we have the extra registers, gs, and fs.
Labels appear in the data segment. They are simply words that stand for a spot in the memory. You can use a certain command to place data at that spot in memory, and then use the label to access the data. You can't, however, manipulate the data.
It is also important to understand memory management. RAM is divided into 64kb segments. Every 16 bytes there is another segment, so the segments overlap. This means that the last segments in RAM don't have the full 64kb of memory. Individual bytes are addressed by their offset from the segment. To illustrate this, let's look at segment 1000 and start at the first offset. This is addressed like this 1000:0000. If we go to the 16th offset, we can address it in two ways: 1000:0010 or 1001:0000. What this boils down to is the offset from the beginning of RAM. If you place a 0 at the end of the segment, and a 0 at the beginning of the offset and look at it like this, 100000:00010, and then you add the segment and offset together, you get the offset from the beginning of RAM: 10010. I know this probably isn't that clear, but really, all you need to know is that memory is addressed by segment and offset. (See Figure 2)
Figure 2: Segment and offset Style Memory
Segment: 1000
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
Segment: 1001 (But it is 16 bytes past 1000 so it is also 1000:0010)
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
Segment: 1002 (But it is 32 bytes past 1000 so it is also 1000:0020)
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
Segment: 1003 (But it is 48 bytes past 1000 so it is also 1000:0030)
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
Segment: 1004 (But it is 64 bytes past 1000 so it is also 1000:0040)
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
Segment: 1005 (But it is 80 bytes past 1000 so it is also 1000:0050)
[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]
And now the moment you have all been waiting for. The sample code. Copy the code below to a blank text document, then follow the instructions to compile it.
---Begin Copy Next Line---
1: .MODEL SMALL ;define the type of program
2: .STACK 200h ;define the stack size
3: .386 ;state the use of 386 instructions
4:
5: .DATA ;define the data segment
6:
7: Message db "Hello World$" ;the data
8:
9: .CODE ;define the code segment
10:
11: Start: ;start the program
12: mov ax, seg Message ;move the segment of Message to the accumulator
13: mov dx, offset Message ;move the offset of Message to displacement register
14: mov ds, ax ;move segment of Message to data segment register
15: mov ah, 09h ;move 9 to the accumulator
16: int 21h ;perform interrupt 33
17:
18: mov ah, 00h ;move 0 to the accumulator
19: int 16h ;perform interrupt 22
20:
21: mov ax, 4c00h ;move the close program directive to the accumulator
22: int 21h ;perform interrupt 33
23: End Start ;the program ends
---End Copy on Previous Line---
Okay, now it's time to compile:
Instructions for everyone
1) First, remove the line numbers, they are for reference only
3) Create a new folder on the C: drive and call it "asm"
2) Save the file as "Hello.asm" in the newly created folder
TASM instructions
1) Open up the commad prompt
2) Change to the directory that contains TASM
3) Type in "tasm c:\asm\hello.asm" and hit enter
4) Type in "tlink c:\asm\hello.obj c:\asm\hello.exe" and hit enter
5) To run the program, type in "c:\asm\hello.exe" and hit enter
FASM instructions
1) Change line 1 to "format MZ"
2) Change line 2 to "stack 200h"
3) Chaneg line 3 to "entry cod:Start"
4) Change line 5 to "segment dat"
5) Change line 9 to "segment cod"
6) Change line 12 to "mov ax, dat"
7) Change line 13 to "mov dx, Message"
8) Remove line 23
9) Save the file
10) Open up the command prompt
11) Change to the directory that contains FASM
12) Type in "fasm c:\asm\hello.asm c:\asm\hello.exe" and hit enter
13) To run the program type in "c:\asm\hello.exe" and hit enter
NASM Instructions
Forthcoming
For the explanation of the above code:
The first three lines set up the format of the file.
Line 1 tells the compiler to use a .EXE header.
Line 2 sets up the stack, a special storage area that will be covered in a later part.
Line 3 sets up the entry point for FASM and says we want to have access to all of the instructions a 386 has for TASM.
Lines 5 through 8 set up the data:
Line 5 defines the data segment.
Line 7 actually uses our first assembly instruction, "db". Db says insert data at this point in memory that will be stored and accessed in bytes.
Lines 9 through 23 contain the code:
Line 9 defines the code segment.
Line 11 sets the start of the code.
Line 12 has the second of our assembly instructions, "mov". Mov takes data from one place and places it in another, frequently one of those places is a register. This line moves the segment number of the data segment into the accumulator
Line 13 moves the offset from the data segment of Message into the displacement register
Line 14 moves the contents of the accumulator (the segment number of the data segment) into the data segment register. The reason it had to go into the accumulator first is that the segment registers cannot be accessed directly, onlyt hrough other registers.
Line 15 moves 9 into the high byte of the accumulator.
Line 16 is another assembly instruction, "int". Int performs a hardware interrupt, a special task. If you downloaded Ralf's interrupt list, open up the program to view the interrupts and open the interrupt list. Scroll down until you see "2109" in the left column. Click on this interrupt. If you notice, when you perform interrupt 21 for this function, writing a string to the screen, ah must be 9h, which we set up. Ds:dx must also point to the segment and offset of the string, which we have. The string must end with a "$", which, if you look at Message, it does.
Line 18 moves 0 into the high byte of the accumulator.
Line 19 performs interupt 22 which, when it has function 0, waits for a keypress to continue.
Line 21 moves 4c00, which is a hex number, into the accumulator.
Line 22 calls the DOS interrupt again, this time the subfunction exits the program.
Line 23 states the end of the code.
All of the "h"'s after every number mean that those numbers are in hexadecimal. Without those h's, the numbers would be assumed to be in decimal. The ";"'s after every line indicate a comma. Everything that comes after a ; on a line is ignored by the compiler.
Review:
Registers
Memory
MOV instruction
DB instruction
INT instruction
Coming up next week:
Basic arithmetic
Explanation of the stack
Questions? Comments? Something you'd like to see? Let me know and I'll add it in.

