"Hello World" in Assembly

Everyone writes "Hello World" programs. But because sensible people write these programs for commonly used operating systems, I won't. Mostly because it's barely assembly at this point. I mean, look at the example that some assembler cames with for a Windows program:

; example of simplified Windows programming using complex macro features
include 'win32ax.inc' ; you can simply switch between win32ax, win32wx, win64ax and win64wx here
.code
  start:
    invoke  MessageBox,HWND_DESKTOP,"May I introduce myself?",invoke GetCommandLine,MB_YESNO
    .if eax = IDYES
        invoke  MessageBox,HWND_DESKTOP,"Hi! I'm the example program!","Hello!",MB_OK
    .endif
    invoke  ExitProcess,0
.end start

Seriously, this is like one step away from being Visual Basic.

On this page, we write a "hello world" for an OS that has been outdated for decades without using any macros or other simplifications. Just raw x86 assembly.

Once we've done that, we tackle the fact that an OS is overrated anyways in another document.

Operating system in use

I use Windows as my OS of choice, but be aware that you can do this on other operating systems too. The assembler I'm going to use can run on Windows, Linux, and even DOS.

Assembler used in this document

I use Flat assembler for my asm projects. It's small and portable, which makes it ideal for some quick asm hacking, and yet it's still quite powerful. It is self hosted and can assemble itself. You can use any other assembler you want, but be aware that some of them require extra work to make it assemble some output formats. Also note that some commands are different between assemblers. Your OS doesn't matters for FASM. All versions of FASM can assemble for all known targets. In other words, the DOS version of FASM can assemble 64 bit linux executables and vice versa.

Assembly uses a semicolon for comments. Anything after a semicolon up to the end of the line will be ignored by the assembler. You can use this to add notes to your program, or to quickly disable some lines without deleting them.

Numbers

Numbers in use by the computer or your assembly program are generally given in hexadecimal format. If your program is not working, check if you accidentally used decimal notation instead. Hexadecimal numbers in intel assembly usually use the h suffix instead of the 0x prefix, but it depends on your assembler. FASM supports both methods. Hexadecimal numbers in FASM have to start with a digit if you use the h suffix, which means EAh would need to be written as 0EAh.

Finally, don't forget that Intel is little endian. We're only dealing with individual bytes, so it has no effect on us, but it's important to know once you start dealing with bigger numbers.

DOS

DOS is a great target for these small text-processing-only projects. Modern DOS versions like FreeDOS make running DOS applications on a real machine easy. DOSBox is a good DOS emulator that runs even if your machine has the processing power of a toaster.

Expectations

I expect you know how to work with a command line. Be aware that the DOS prompt is absolutely braindead compared to modern terminals.

.COM vs .EXE

There's two types of executables for DOS. Raw ".COM" executables, and more modern "MZ" executables, usually having ".EXE" extension. We pick the .COM format because none of its limitations are of importance to us.

Things you have to be aware of for the .COM format:

Without using any trickery, limited to 64K of memory only.
The first 256 bytes are used by the PSP, which contains some information you can use (like the command line arguments)
No isolation between code, stack and data. Put too much onto the stack and you overwrite code in memory.
The .COM binary file is raw program code without any header.

Printing a string in DOS

All we need to do to get our string printed is to have it in memory and tell DOS to print it. Getting the string into memory happens automatically for .COM executables, because the entire thing is just loaded at offset 100h

Printing strings is a built-in function in DOS.

DOS may seem like an incredibly simple OS (and it is), but it has a ton of functions readily accessible for every application. Most of these hide behind INT 21h. The list is kinda messy and big, but searching for "STRING" will give us function 09h, which prints a $ terminated string to the screen. The page also tells us that the string location goes into the DX register.

Tasks

Tell FASM to create a .COM executable
Store the string in our binary
Get 09h into the AL register
Get the string address into the DX register
Call the DOS interrupt 21h
Properly exit our application

Step 1: Creating a .COM

Telling FASM that it's a DOS .COM executable is done via org 100h. This line essentially tells the assembler that it has to add 256 to every address we use in our assembly code. This is necessary because as mentioned, we don't start at address 0h, but at address 100h. FASM realizes by itself that this should result in a .COM and will use the appropriate file extension automatically.

The "org" line should generally appear in your code before any actual x86 commands.

If you don't specify this line, your addresses will be off by 256, and the compiled binary will have .bin instead of .com extension.

To compute addresses we can either manually calculate it, which sucks, or we can use labels. In FASM, a label is an alphanumeric string that ends in :. It may start with a dot. .str: is a label for example. The label can be on its own line, or it can be at the beginning of a line before another assembly command. The effect is the same.

When you try to put a label into a register, FASM computes the address of that label, and replaces the label text with the address in the resulting binary.

Step 2: Getting our string into memory

You can tell FASM using the db command, to put a sequence of bytes into the binary as-is. For example db 1h,33h,7h will put the byte sequence 013307 into the binary at whatever location you've put the db instruction. This instruction also supports quoted strings as argument.

db "Hello, World!$" will therefore do what we want. We should also add the label .str: before it, since we need to reference this location for the DOS command later.

Note that there is no way for us to tell DOS or the CPU that this is a string and not supposed to be run as code. To avoid this being interpreted as code, we will put all our CPU instructions before this string, and make sure our program terminates before the CPU reaches the string.

This means our assembly programm will be laid out like this:

org 100h
;Your code goes here
;...
;Terminate here
.str:
    db "Hello, World!$"

Step 3+4: Getting values into registers

To put data into registers, we use the MOV instruction. "MOV" of course stands for "COPY". You're not removing the data from the source, so moving stuff in assembly really copies it.

The Intel assembly syntax for moving stuff is mov Destination, From, in other words, data is moved from right to left. With a few limitations you can:

Move constant numbers into registers
Move data from RAM into registers and vice versa
Move data between registers

According to the documentation of the string print function, we put 9h into the ax register, and the address of our string into the dx register.

Our assembly program so far looks like this:

org 100h
mov ah, 9h
mov dx, .str
.str:
db "Hello, World!$"

This will not do anything useful so far and will crash if you run it, but FASM should assemble it regardless into a total of only 19 bytes of code. If the size differs, you're likely using a different "Hello, World!$" string. If you get an error message instead, make sure you did not miss an important symbol like a comma or the colon at the end of the label.

The binary will consist of these bytes:

B4 09 BA 05 01 48 65 6C 6C 6F 2C 20 57 6F 72 6C  ´.º..Hello, Worl
64 21 24                                         d!$

As you can clearly see, the string we want to print needs a lot more space than the 5 bytes for the instructions so far.

If you wonder why it says "2 passes", it's because when we move the ".str" label into the register, FASM hasn't yet encountered the label, so it doesn't knows what address this label is at yet. FASM uses a placeholder value instead, and then on the second pass (when it knows where ".str" is), replaces the fake value with the real address of ".str".

Step 5: Calling an interrupt

An interrupt does what the name says: It interrupts your program so the processor can do something else. Interrupts may be called by the processor itself, for example when you divide by zero, but they can also be triggered manually.

By calling int 21h we tell the CPU that it should fire interrupt 21h. There is a table where all actions for interrupts are noted. In x86, this is called the interrupt vector table (IVT). The CPU looks up entry 21h and then runs whatever program this entry points to in memory. After the interrupt program is done, it tells the CPU to return to your program. This happens instantly.

The instruction in your program that directly follows your int command will be run after the called routine has been fully executed. So when you call an interrupt, be sure to set all necessary registers to the appropriate values before calling the interrupt.

There's 256 possible entries in this table. Entries from 20h to the end of the table are user definable. DOS uses entry 21h for most functions. Entries 00h-19h are reserved for the CPU to use.

Step 6: Exiting our application

You should always properly exit your applications. With modern operating systems, this is less of an issue because it cleans up for you, but DOS doesn't do this very well.

A DOS .COM executable is really easy to terminate. By simply using the ret instruction, we can exit. This instruction is intended to return from functions, but DOS sets up our executable in a way that this will work fine. As an alternative, DOS offers int 20h as a way of terminating your program.

Our code should now look like this:

(Including comments to remind you of what these instructions do)

;Tell FASM to create a .COM executable.
;They start at offset 256.
;The first 256 bytes is where DOS will put some information,
;So it's unavailable to us
org 100h
;0x9 is the "print-string-up-to-but-excluding-$" function.
mov ah, 9h
;DX contains the address of the first character to print
mov dx, .str
;Tell DOS to execute the function
int 21h
;End our program. (Returning from .COM this way is legitimate)
ret
;Anything that follows now is not executed by the CPU because we ended our program.
;So this is essentially the section where we put predefined stuff into memory.

;This is our string. Giving it a label makes it addressable.
.str:
    db "Hello, World!$"

To get this to a real DOS machine you likely have to use a floppy disk. For DOSBox, run MOUNT C: C:\Path\that\Contains\our\hello\program to make the directory accessible from the DOS machine.

Assembling this and running it on a DOS machine/emulator will yield the requested string. As you can see though, it's kinda wedged between the two prompts. This is because we're not printing a final line break like other programs do. As you see in the image, there's always a completely blank line before the C:\DOS> prompt appears again.

We can easily fix this by inserting a line break ourselves. The line break is made up of the CR character, followed by the LF character. For us this means byte 0Dh followed by 0Ah.

We can just insert this into our string because db supports multiple parameters:

db "Hello, World!",0Dh,0Ah,"$"

Note that we have to preserve the $ because DOS uses this to know where the string ends.

Conclusion

The final result at this point is an executable that is exactly 24 bytes long. 8 bytes are for the instructions, the rest is the "Hello, World!" string with the line break.

This is remarkably small for an executable. Sure, it's not doing a lot, and it uses an operating system to print our string to the screen, but it was pretty much zero effort to do this in DOS without using any libraries. All you have to know is that addressing starts at 100h, and what the DOS interrupt function is to print a string. Finding documentation for DOS is still very easy.

You may say that this doesn't counts, so screw this idea that we need an operating system.

Writing "Hello, World!" without an OS