This book is for sale at http://leanpub.com/OS-DEV
This version was published on 2021-07-28
* * * * *
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do.
* * * * *
We all have used an operating system, whether you are using a mobile phone or a computer or any electronic devie. All of this devices have software that works close to the hardware.
Most of the operating system are split into two main components which are the kernel and the shell. Kernel of an operating system is what communicates with the hardware, manages memory etc and do all other main stuff etc… The main aim of the kernel is to provide an abstration level to the shell of that operating system which is the software that the user directly communicates with. Shell of an operating system is what you basically see on the screen and what you directly interact with. When you commands the shell to do a specific thing such as creating a file, the shell request for that service to the kernel and kernel does all of the things needed to create a file. So basically shell is only a middle stander between you and your device.
Let’s learn these basic theories later, Let me say the reason which led me to write this book. OS develoment is a topic which is seen at an angle so that most people think it can be done only by a small section of geeks. The reason they say is that “Low level stuff is hard to learn and only the brighest ones could do that”. And that argument is totally wrong as there are many people including children and teenagers who do the electronic stuff(Including programming), These people will only get the smallest level of abstraction when developing their projects and all other low level stuff should be done their own.
But when coming to the side of creating an operating system(Specifically on x86
architecture), People say that it’s too hard, but it’s not.
Your question now may be “Then why does only a small section of people do that”, and the answer of that question is that “There is only a small level of documentation for beginners to get started”, According to the current situation, People who are not too old in software development area could not do this things as all of the documentation are hard to understand for beginners.
Also currently there is no book giving a complete guide to this area apart from some websites which is meant for the “Too old people”.
I will promise you that, even if you are starting in this field or know only some part of this area, you could read this book. We will start learning from programming in c(Which is a basic need for os development), and assembly, and then forward, we will dive into real os development on the intel's x86
architecture which is where operating systems such as windows and linux mainly run on.
So for now, you need to take away all things in your mind about os development and focus mainly on learning. I will explain each and every section deeply and also explain how the code given in this book works fully.
Learning os development is fun, But before starting, we need some tools which are essential for development and testing. I will explain you fully how to set everything to get started working on the project. We will use Windows as operating system to develop our project, But the methods discussed in this book can also be implemented in Linux. There is nothing stopping you from doing that.
To run the operating system we develop, we have a total of three options.
Using an emulator or virtual machine works somewhat alike(From an outside perspective). But they are not the same(From an internal perspective) which is why they are called different as emulators and virtual machines.
Virtual machine softwares are used to run software which the processor could directly execute to a great extend. But the purpose of emulators is so that it could enable us running software which is intended to run on different architecures, but also it could emulate the architecture which it is currenly running on.
The final option to run the os we develop is to run in the real hardware, by burning the raw binary to a cd or usb and booting it on the real hardware. If you want to run the os we develop on the real hardware, please ensure to disable Secure Boot option provided in latest computers from bios. We don’t really encourage you running the os in real hardware unless you clearly know what you are doing. This is because of many reasons one of which is that you could corrupt your hard disk if you program the os incorrectly, So what you need to do is, First try running it on an emulator or virtual machine and study it’s behaviour and when you confirm it works safe, You could try running on real hardware.
Please note that i will not be responsible for any damages you do to your system by using unsafe approches in any manner. If you want to run the os on real hardware, you could try running it on a secondary system which you won’t mind if it gets harmed.
For now, We will try running the os that we are going to develop on an Emulator named QEMU. You could find in on https://www.qemu.org/
The installation of qemu is straight forward and you could carry it your own.
You could install any Hex Editor in your machine if You know how to use it.
But i will follow with a hex editor named Hex Editor Neo found at https://www.hhdsoftware.com/free-hex-editor
. If you don’t know what a hex editor is, we will discuss it in later chapters but for now, you could download it.
We will use Notepad++ as our ide to develop the project. You could find in on https://notepad-plus-plus.org/
Nasm is a widely used assembler. We will learn what an assembler is, Why and how to use it and every point you need to know to get started later. You could find it on https://www.nasm.us/
SASM is an ide which helps build assembly programs easily. There is no special need to download sasm for programming in assembly as we have already discussed installing nasm as an assembler, But for now for the sake of simplicity in learning developing applications in assembly, we could use this. If you currently know developing assembly programs, You could skip this step and also skip the chapter teaching programming in assembly.
By using sasm, You can avoid learning the concept of linking for some time and it also provides functions to print strings and numbers easily, Else you need to write your own routine or download special ones to print to screen. And the main part is that you could run and test your programs in one windows. You could download it from https://sasm.software.informer.com/download/
MinGw is a collection of tools which could be used to develop application in windows platform. We will use the c compiler provided by MinGw to develop our os. C is a mid level language best suited to speed up os development. C was initially developed to develop the unix operating system, We could use great features provided by c to get our os running. If you are new to Programming in c, we will learn necessary thing in upcoming chapters. I will only use the methods that you could easily follow to develop the os in c. If you currently know how to program in c and/or assembly, You could catch os development faster, but others don’t need to worry, i will carry you along with the journey.
To install MinGw, you need to go to https://sourceforge.net/projects/mingw/
Check the Following option after starting the installer:
And finally, Install it.
Finally, We have downloaded all tools that we need to get started in this learning process. But we need to access three of these softwares from the command line(cmd.exe
).
For that, go to command prompt by typing cmd in the search bar and type gcc
and hit enter. You will get an error message. Also type nasm
and qemu-system-i386
in it. It Also will show an error message.
We need to access these softwares by typing it’s name in the command prompt, to do that, search ‘Edit environment variables’ in the search box and select the first option
Now select ‘Environment Variables…’ button in the ‘Advanced’ tab,
In the new window, click on Path in the System variables section and click edit
You will now get a window like the one below(Please note that all system may not be same like this as different systems could have different settings, but it will look somewhat like this):
Now you need to add the location to the bin folder of three of the softwares that we downloaded by clicking New button and pasting it there. You need to set the location of Qemu , Nasm and MinGw. In My system the following is the location to those folders.
1)Qemu : C:\Program Files\qemu
2)MinGw : C:\MinGW\bin
3)Nasm : C:\Program Files (x86)\NASM
Now, After adding all these locations, the window looks somewhat like this(Please not that, the location of these application i show here might not be the case for you, It will depend on the settings you adjusted when installing those softwares):
Now close all of the existing windows only by pressing OK.
Now you could type gcc
, nasm
and qemu-system-i386
in the cmd and you will get some messages(Probably like ‘Input file missing’ , but thats ok)
We are now ready to start gettings our hands dirty by doing , LETS GOOOO!!!!!!
C
is generally called as a high level language, but some of them consider it as a mid level language. The reason being that it do not provide as lot of abstraction which is generally given by languages such as java
, python
and other new trendy languages.
There is a great chance for your application to get crashed probably because of memory corruption or wrong system calls etc when using c. If you are new to c, don’t be scared as everything can be learned by doing. Learning advanced c will make you more knowledgable about the working of computers. In this book, we will only use very basic concepts of c, We will not use any advanced concepts and will try to keep it the simplest as possible. This is so that, by using this simple methods , there is a great chance for you to not get demotivated.
A common misconception about c is that, it is a program which converts human readable source code to binary. But c is only a programming standard which defines how the compiler should accept instructions. What c does is only translate your source code to it’s assembly equivalent of a specific processor architecture. We will discuss it and more in later chapters.
If you know basic c concepts, You could skip this chapter, But if you are not being in touch with c for a long time and needs a way to refresh the concepts, you could read this chapter, But i will try to keep it as simple as possible so that even beginners could start.
If you are interested, you should also master the c programming language along with the development of your own os. This is the technique i used to learn c. I learned c by making a small 2d game. Learning by doing and asking questions is a great way to practice. Asking question is not a bad thing , everyone who knows programming or any skill used to ask questions during their learning days.
Note that this chapter won’t give you full coverage of c programming, but we will discuss everything necessary which can’t be avoided when reading this book.
So lets dive into THE C PROGRAMMING LANGUAGE!!!!.
Hello , World
is the first program used in most tutorials to introduce programming.
What this does is only print some text to the screen(Commonly “Hello , World” itself).
Lets see how we could print “Hello , World” to the terminal using c.
First, open Notepad++
and type or copy paste the following code and save it:
1
#include
<stdio.h>
2
3
int
main
(){
4
printf
(
"Hello , World"
);
5
}
Please note that you need to give .c
as file extension.
Compile the code by opening cmd
, going to the directory where you saved the code using the cd
command and finally compile it using this command:
gcc code.c
You need to replace code.c
with the file name you saved.
Now you will see an executable file named a.exe
Now type a.exe
in that command prompt itself to start it.
You will see something like this in your command prompt:
This is a simple program to display a message to the screen, You could replace content in the printf function inside quotes for example if you put that line like:
printf("Hello David");
It will print Hello David
to screen.
Here the computer tries to print whatever is present in the “ “ Quotes.printf
is a function which is predefined in the stdio.h
header file which we included at the very top. We will learn what a header file is later.
A thing to note is that after every statement in a line, we need to put a ;
at the end.
You could also see a portion:
int main(){
}
This is where the computer starts execution, So every program needs this main function to start execution.
Please note that you cannot call the printf function to display to the screen when developing the os as what this printf function does requires an operating system to display it. And as the only operating system in our virtual environment when testing the os is our os, We need to build some code so that when it is called, The text would be printed. We will see that later.
Lets now start learning the basics of c.
Data types
are a concept introduced to hold data. different data types have different storage capacities. Every data type have its own use.
Basic data types include int
, char
, float
, double
etc. These data types are generally used to hold arithmetic values. We can prefix signed
and unsigned
keywords before these keyword to alter its data holding limit.
We will learn about int
and char
data types.
int
is the data type specifically used to store larger numbers. The size of int data type is mostly the bit length of the cpu it is running on.
unsigned int
can hold a value from 0
to 65,535
or 0
to 4,294,967,295
signed int
can hold a value from -32,768
to 32,767
or -2,147,483,648
to 2,147,483,64
. By default integers are signed so, You do not need to put signed keyword before int keyword to make it signed.
Lets see how we can assign and print some value from both signed
and unsigned int
.
1
#include
<stdio.h>
2
3
int
main
(){
4
5
unsigned
int
ui
=
12345
;
6
signed
int
si
=
-
12345
;
7
8
printf
(
"Value of ui is %d and value of si is %d
\n
"
,
ui
,
si
);
9
10
}
compile and run the program, we will see something like this:
You can see that both the negative and positve numbers get printed. But the main effect of signed and unsigned keywords work during branching operations. We will see that later.
Here, We first assigned some values to the variables with the =
operator.
Then you can see the characters %d
in printf
function two times. When executing the program, computer will replace the first %d
with the value of ui
and the second %d
with the value of si
. %d
helps us printing integer values which is passed after ‘,’ token.
The characters \n
is put so that it puts a line break in the output.
Please try assigning values to both signed and unsigned int which crosses its limit and try printing it. Look what the result will be and try studying the reason.
char
data type is specifially used to store ascii
characters.
it is a one byte
data type. signed char
could store value from -128
to 127
. unsigned char
could store value from 0
to 255
1
#include
<stdio.h>
2
3
int
main
(){
4
5
signed
char
sc
=
-
64
;
6
unsigned
char
uc
=
64
;
7
8
printf
(
"Value of sc is %d and value of uc is %d
\n
"
,
sc
,
uc
);
9
printf
(
"Ascii value of sc is %c and Ascii value of uc is %c"
,
sc
,
uc
);
10
11
}
Output will be like this when compiled and executed
You could see in the source code that we used characters %c
in it. %c
is used to print the ascii character of value specified to it after the ‘,’ token
void
is a data type used to represent absense
of value. We will see the use of it later.
Derived data types
are data types which are derived from fundamental data types like int
, char
etc…
Derived data types include:
1) Pointers
2) Arrays
3) Structures
4) Unions
5) Functions
A pointer
is a data type which holds memory addresses
. This could be used to access data in a specific location. Every data in computers memory have an address, If we know the address of some specific data that we want, we could use pointers to access it.
Pointer data type do not stand on its own. We need some other keywords to work on pointers
We could first try declaring a char pointer
. Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
*
Pointer
=
"Pointers are powerfull
\n
"
;
6
printf
(
"%s"
,
Pointer
);
7
8
}
Have a look at the output:
We have declared a pointer by prefixing a *
before a variable name.
The one we declared is a character pointer. If you want to declare a char pointer with addr
as pointer name, we could declare it as:
char *addr
To assign it with the address of a string such as "Pointers are useful"
, Just declare it like this:
char *addr = "Pointers are useful";
You could see that we used the character %s
in the code. %s
accepts an address to a string. We passed the address by typing the name Pointer after ‘,’
We could now study pointers deeply. Look at the code up above. Here, what it does is, it declared a character pointer named addr and assigned it with the address of the string "Pointers are useful"
. With that code we are only assigning the address of that string, not the string itself.
I will say why we used the keyword char to declare the pointer, Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
*
Pointer
=
"COMPUTERS are powerfull
\n
"
;
6
printf
(
"%c"
,
*
Pointer
);
7
8
Pointer
++
;
9
printf
(
"%c"
,
*
Pointer
);
10
11
Pointer
++
;
12
printf
(
"%c"
,
*
Pointer
);
13
14
Pointer
++
;
15
printf
(
"%c"
,
*
Pointer
);
16
17
Pointer
++
;
18
printf
(
"%c"
,
*
Pointer
);
19
20
Pointer
++
;
21
printf
(
"%c"
,
*
Pointer
);
22
23
Pointer
++
;
24
printf
(
"%c"
,
*
Pointer
);
25
26
Pointer
++
;
27
printf
(
"%c"
,
*
Pointer
);
28
29
Pointer
++
;
30
printf
(
"%c"
,
*
Pointer
);
31
32
}
Lets see the output:
Here, we have printed the string "COMPUTERS"
on screen. At the very top, we have declared a variable and assigned it with the address of a string. Then we have printed a character like this:
printf("%c" , *Pointer);
Here, we included a *
before the variable name. What this does is to get whatever is present in the address specified in the variable Pointer.
If we didn’t put the star before it, it will print the data contained in the variable Pointer which is an address, but as we put a star before it, it will take the data in the variable named Pointer as an address and returns whatever is located at that address.
By default the variable Pointer points to the first character of the string. So after printing the first character, We issued the command:
Pointer++;
What it does is only increment
the address in Pointer by one byte
(which is the size of data type char
defined for the pointer). Now if we print the data pointed by the Pointer variable, We will print the second character of the string and this process continues with each increment and printing to the screen.
We could see how to get the address of a variable, for example the address of an integer and print whatever is in it:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
num
=
2021
;
6
int
*
pnum
=
&
num
;
7
printf
(
"address at pnum is %d and value at pnum is %d"
,
pnum
,
*
pnum
);
8
9
}
Output:
Here, we first declared an integer named num
and assigned it with 2021
. Then we issued the following command:
int *pnum = #
This will declare an integer pointer
named pnum
and assign it with the address of value in num
with the &
symbol. The &
symbol will return the address of value in variable num
. Then we issued the following command:
printf("address at pnum is %d and value at pnum is %d" , pnum , *pnum);
First we printed whatever is present in the variable pnum
(As we assigned it with the address of num
, this will print the memory address
of num
). Then the command *pnum
will take whatever is present in the variable pnum
as an address and tries to print the data pointed by that address which in this case is the value 2021
.
Let’s now see how we could assign values to a specific memory address using pointers:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
iv
=
0
;
6
7
int
*
address
=
&
iv
;
8
9
*
address
=
3377
;
10
11
printf
(
"%d
\n
"
,
iv
);
12
13
}
Output
First, We created an integer named iv
and assigned it with 0
. And in the second line, we created an integer pointer named address
and assigned it with the address of variable iv
.
The next line is as follows : *address = 3377;
. Here what it does is that, it first takes the value in variable address
as a memory address and assigns the value 3377
to that address.
Here, as we previously assigned the address of variable iv
to the variable named address
, The line : *address = 3377;
will assign 3377
to the memory location of the variable iv
.
We confirmed that the value got assigned by printing the value in iv
.
So, This way we could use pointers to also ASSIGN values.
Arrays
are a collection of specific data type. An integer array
means a collection of integers
and char array
means a collection of characters
. Arrays
are always one after other
. We can access each element of array with its index
. For example if we want to access the first
element of an array we pass 0
as the index, and to get the second
element, we pass 1
as the index and this process goes on…..
Lets see how we could declare and access an array of integers:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
num
[
5
];
6
7
num
[
0
]
=
12
;
8
num
[
1
]
=
23
;
9
num
[
2
]
=
34
;
10
num
[
3
]
=
2020
;
11
num
[
4
]
=
8281
;
12
13
printf
(
"%d %d %d %d %d"
,
num
[
0
]
,
num
[
1
]
,
num
[
2
]
,
num
[
3
]
,
num
[
4
]);
14
15
}
Output:
Here, we declared an integer array
of 5
elements with this code:
int num[5];
We can access or assign all of the elements with num[0]
to num[4]
. Please note that the index of an array starts from 0
to size_of_array - 1
. Here, it is from 0
to 4
.
Please look how we assigned values to each element and printed it.
Now let me reveal a secret. The arrays work somewhat as same as pointers. The only difference is that pointers use *
and arrays use []
.
Let me show how we could access the array with pointers, Have a look at the code below:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
num
[
5
];
6
7
num
[
0
]
=
5
;
8
num
[
1
]
=
4
;
9
num
[
2
]
=
3
;
10
num
[
3
]
=
2
;
11
num
[
4
]
=
1
;
12
13
printf
(
"%d %d %d %d %d"
,
*
(
num
+
0
),
*
(
num
+
1
),
*
(
num
+
2
),
*
(
num
+
3
),
*
(
num
+
4
));
14
15
}
Output:
Here, We declared the array and assigned the value using the technique we learned just before this.
But have a look at how we printed it, It is different from what we have previously done. We’ve used the folowing technique:
*(num + 1)
What this does is like this : It first does thing which is inside the brackets, It increments 4
to the num
variable(We have actually incremented 1
, but as the size
of datatype of the variabe num
is 4 bytes
, incrementing one will practically increments four bytes
, incrementing two
will increase eight bytes
and so on), So now the num
variable points to the second integer
which is just after *(num + 0)
.
And as we have put a *
before (num + 1)
, it will take (num + 1)
as an address
and take whatever is present in that address. This will practically access the second integer
, likewise *(num + 2)
will access the third integer
.
I will request you to play with these concepts and do some more research so that you could learn more about this.
We will use pointers extensively when developing our operating system such as when we make the display driver. In 32 bit protected mode
(We will explain that later), we need to put the data we want to print to the screen to a specifc address in memory, Which the video card will render to the screen, But for now, as i have requested , please work with pointers for some time to get the full idea of pointers
Lets now look at character array and work with pointers on it.
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
ca
[
5
];
6
7
ca
[
0
]
=
'G'
;
8
ca
[
1
]
=
'A'
;
9
ca
[
2
]
=
'M'
;
10
ca
[
3
]
=
'E'
;
11
ca
[
4
]
=
'S'
;
12
13
printf
(
"%c %c %c %c %c"
,
*
(
ca
+
0
),
*
(
ca
+
1
),
*
(
ca
+
2
),
*
(
ca
+
3
),
*
(
ca
+
4
));
14
15
}
Output:
Here, the only difference is that, Instead of printing numbers, We just printed characters. You can try studying this code and move further.
I believe you are getting clear about the concept of pointers. You should further study pointers your own, there are many more concepts relating to pointers such as pointer to pointer etc….
Functions are a block of code which could be called and does a specific task. You have already used a function which is the main function:
int main(){
}
This is an example of a function and it is named main
. We could create our own functions if we need. Look at the following example to get to know about functions:
1
#include
<stdio.h>
2
3
int
add
(
int
a
,
int
b
);
4
int
sub
(
int
a
,
int
b
);
5
6
int
main
(){
7
8
int
num
=
add
(
150
,
50
);
9
printf
(
"%d
\n
"
,
num
);
10
11
int
num2
=
sub
(
200
,
50
);
12
printf
(
"%d"
,
num2
);
13
}
14
15
int
add
(
int
a
,
int
b
){
16
int
c
=
a
+
b
;
17
return
c
;
18
}
19
20
int
sub
(
int
a
,
int
b
){
21
int
c
=
a
-
b
;
22
return
c
;
23
}
Output:
Let me explain it, Take a look at the add
function that we have just created:
int add(int a , int b){
int c = a + b;
return c;
}
We have defined three integer variables here : a
, b
and c
.a
and b
are in the round brackets ()
. This means that the value of a
and b
will be given when calling it.
we called this function from the main function like this:
int num = add(150 , 50);
Here, what the computer will do at runtime is that it passes the number 150
to the integer variable named a
in add
function and 50
to the integer variable named b
in the add
function.
After passing these arguments the computer starts executing the code in add
function.
In that function, the code adds the values in a
and b
and stores it in c
. Then it returns the value in c
to the variable num
in main function. Then the main function prints it. We have also seen another function named sub
.
int sub(int a , int b){
int c = a - b;
return c;
}
There is no special difference in the working of the code when this function is called with:
int num2 = sub(200 , 50);
The only difference is that, the two numbers which got passed will go under a substraction operation and it returns the result to the num2
variable in main function.
At the very top of the code you could see the following code:
int add(int a , int b);
int sub(int a , int b);
What this does is only telling the compiler that we will include code for those function later under the main function. If you do not do that before the main function, the compiler would stuck at what it is doing when it encounters the function call to add
and sub
functions because it haven’t seen a function like that to jump to it.
You could try removing that two lines from the top and it will print some error message like this:
Lets learn something about the return type
of a function. You can see that i typed the add
function like this:
int add(int a , int b){
int c = a + b;
return c;
}
Not like this:
add(int a , int b){
int c = a + b;
return c;
}
The int
keyword we gave before the function name add
is used so that the compiler could get an idea about the data type of variable which that function will return using the return
keyword. Here we have returned a variable named c
which is of int
type, This is the reason why we prefix
the int
keyword before the function name add
.
The same rule applies also to the sub
function.
You cannot define a certain data type as the return type and return variable of another data type. Like this:
1
#include
<stdio.h>
2
3
char
add
(
int
a
,
int
b
);
4
5
int
main
(){
6
7
int
num
=
add
(
150
,
50
);
8
printf
(
"%d
\n
"
,
num
);
9
10
}
11
12
char
add
(
int
a
,
int
b
){
13
int
c
=
a
+
b
;
14
return
c
;
15
}
Some compilers will show error when doing something like this but our compiler haven’t shown any error , but you could see when executing that program that the output will be wrong
Output:
It is possible so that you can make a function which do not return any value, for that, you could use the void
keyword.
1
#include
<stdio.h>
2
3
void
printHello
();
4
5
int
main
(){
6
printHello
();
7
}
8
9
void
printHello
(){
10
11
printf
(
"Hello!!"
);
12
13
}
Output:
Here, we have returned no value. We just jumped to a function and printed some characters to the screen from there. Then the control goes back to the main function and when at the end of the main function, the program itself terminates.
MORE INFO
You could return from any function(with or without return type) from anywhere in the code with the return keyword. You only need to type return; and if that code gets executed , the program control will go back to the function which called it.
You could make a function(with or without return type) With many , less or no arguments Like this:
void print();
int add();
void add(int a , int b);
int add (int a , int b);
You could call any type of function(with or without arguments) Like this:
PrintMessage();
Add(a , b);
Please note that the name of the function and/or variable is of your choice. I used these names just for example
I will suggest you to play with the concept of functions and also the overall concepts that we have covered this far before jumping to the next section.
Structures help us to combine a collection of data type into one packet. These data types may or may not be of the same type, Let’s see an example:
1
#include
<stdio.h>
2
3
struct
car
{
4
int
numberOfTyres
;
5
int
price
;
6
char
topSpeed
;
7
};
8
9
int
main
(){
10
struct
car
goldenCar
;
11
struct
car
yellowCar
;
12
13
goldenCar
.
numberOfTyres
=
4
;
14
goldenCar
.
price
=
2000
;
15
goldenCar
.
topSpeed
=
100
;
16
17
yellowCar
.
numberOfTyres
=
2
;
18
yellowCar
.
price
=
4000
;
19
yellowCar
.
topSpeed
=
110
;
20
21
printf
(
"GOLDEN CAR : %d , %d , %d
\n
"
,
goldenCar
.
numberOfTyres
,
goldenCar
.
price
,
gol
\
22
denCar
.
topSpeed
);
23
printf
(
"YELLOOW CAR : %d , %d , %d
\n
"
,
yellowCar
.
numberOfTyres
,
yellowCar
.
price
,
ye
\
24
llowCar
.
topSpeed
);
25
}
Output
Here, We first created a structure named car
with the struct
keyword.
Then we included three variables named numberOfTyres
, price
and topSpeed
. After closing the structure with }
, we put a ;
denoting the end of that structure.
In the main
function, we created two instances of the structure named car
with it’s name as goldenCar
and yellowCar
.
We did it with the command struct car goldenCar;
and struct car yellowCar;
This command will allocate memory space to hold values in the structure named car
and we could access those areas with its name (goldenCar
and yellowCar
).
Both goldenCar
and yellowCar
will have separate space to hold all of the three variables in the structure car
.
Then we assigned values to variables in both of the instances of structure car
by combining the instance name and variable name with a dot(.
).
In the command : goldenCar.numberOfTyres = 4;
, we assigned the value 4 to the variable numberOfTyres
of instance goldenCar
of car
structure.
The next two lines assigned values to the rest of the variables in the instance of structure car
.
Later, we assigned values to variables in the instance named yellowCar
of the car
structure.
At last we printed all of the variables in both goldenCar
and yellowCar
instance of car
structure.
Branching
is a concept which is used so that you could execute some piece of code based on some conditions. The keyword used for this is the 'if'
Keyword.
Lets look at an example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
2
;
6
7
if
(
i
==
2
){
8
printf
(
"The number in i is 2"
);
9
}
10
11
}
Output:
Here, We first assigned the value 2
to the integer variable named i
. Then we checked in the if
condition whether the value in i
is 2
or not with the ==
operator.
As the value in the variable i
is 2
, The program enters inside the condition and prints the message.
There are different operator which we could use in the if
condition which are:
1) == : Equals to
2) != : Not equal to
3) >= : Greater than or equal to
4) <= : Less than or equal to
5) > : Greater than
6) < : Less than
You could use any of these operators in the if condition. Lets see another example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
2
;
6
7
if
(
i
!=
2
){
8
printf
(
"The number in i is not 2"
);
9
}
10
11
}
Output:
You could see that, no message is printed, This is because we initialized i
with 2
, Then we checked in the condition if the value in i
is not 2
. But actually as the value in it is 2
, The printf
function will not execute and the program will exit doing nothing.
Lets now see how to make that program running by making only a small change:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
4
;
6
7
if
(
i
!=
2
){
8
printf
(
"The number in i is not 2"
);
9
}
10
11
}
Output:
Here, We initialized i
with 4
. Then we checked if value in i
is not 2
and as the value in i
is not 2
, The printf
function got executed.
We also have another keyword to work with branching which is the else
keyword. Code in this else part
will get executed if the condition checked with the if
keyword is not true
. Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
2
;
6
7
if
(
i
!=
2
){
8
printf
(
"The number in i is not 2"
);
9
}
10
else
{
11
printf
(
"The number is 2"
);
12
}
13
14
}
Output:
Here we initialized i
with 2
. Then we checked in the if condition if the value in i
is not 2
. But as the value in i
is 2
, The else part gets executed and the message got printed.
Now we will learn about 'else if'
. 'else if'
helps us check another condition if the condition we checked in the previous if
keyword is false.
Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
2
;
6
7
if
(
i
>=
3
){
8
printf
(
"Value in i is greater than or equal to 3"
);
9
}
10
else
if
(
i
==
2
){
11
printf
(
"Value in i is 2"
);
12
}
13
}
Output:
At here, We first initialized i
with 2
. Then we checked if the value in i
is greater than or equal to 3
. But as the condition is false , the program will start evaluating the condition in 'else if'
part. Now as the 'else if'
condition is true, It will execute the code inside that 'else if'
part and will print the message.
MORE INFO
You could use the && operator to check whether two or more condition in the if keyword are true or not. Like this:
if(i >= 2 && i <= 5){
}
You could use || as an operator to check if any of two or more conditions specified are true or not. Like this:
if(i == 2 || i == 3){
}
In this case, the computer will start execution of code inside the if statement if any of the condition is true(If i is equal to 2 and/or i is equal to 3)
You could use nested if cases where one or more if cases are inside other if case Like this:
if(i == 2){
if(y < 4){
printf("Both conditions are true");
}
}
We will now learn about the keyword 'switch'
:
switch
is also a keyword which allows you to check conditions. This is used instead of the 'if'
keyword when we have to check a lot of conditions
. Have a look at the following:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
3
;
6
7
switch
(
i
){
8
case
1
:
9
printf
(
"Value is 1"
);
10
break
;
11
case
2
:
12
printf
(
"Value is 2"
);
13
break
;
14
case
3
:
15
printf
(
"Value is 3"
);
16
break
;
17
case
4
:
18
printf
(
"Value is 4"
);
19
break
;
20
case
5
:
21
printf
(
"Value is 5"
);
22
break
;
23
}
24
25
}
Output:
Here, We first declared a variable named i
and initialized it with 3
.
The next line: 'switch(i)'
will say to compiler to check the value in variable named i
.
Inside the switch statement after {
, we put the 'case'
statement like this:
case 1:
This line will check if the value in i
is 1
or not. If its 1
, then it will execute every code until a break;
keyword.
In our case, it will only execute the following line:
printf("Value is 1");
But , in the program we developed , we initialized variable i
with 3
. So the compiler will jump directly to the this line:
case 3:
And will execute every code inside this until it see a break;
And the only code between 'case 3:'
and 'break'
is 'printf("Value is 3");'
, the processor will only print the string "Value is 3"
.
There is no limit that you could only call the printf
function there. You could do any operation you want for eg : calling a function , doing some other branching or anything you want.
After the cpu run any of the case inside the switch statement and after the break;
keyword, cpu will jump to the code after the switch statement( code after '}'
).
Here, You could also try changing the value in i
to get different results.
Please take the time to play with switch keyword.
You should now try playing with all of the concepts that we covered till here. This will help you attain more experience.
Looping
is a concept which makes it practically possible to run same code repeatedly until a condition turns to be false.
To do that, we will use the keyword 'while'
. Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
0
;
6
7
while
(
i
<=
5
){
8
printf
(
"*"
);
9
i
=
i
+
1
;
10
}
11
12
}
Output:
Here, We have defined a variable named i
and initialized it with 0
. We then declared the while keyword and gave the following as condition:
i <= 5
This line checks whether the value in i
is less than
or equal to 5
.
Computer will first check this condition when entering the loop and if its true, it will execute whatever is inside that loop. At the end of the loop at }
, The computer will go to the top of that while loop and checks the condition and if it’s true again, it will continue this work until the condition becomes false.
You could use every conditional operator that works in if
keyword also at the while keyword. That conditional operators include ==
, !=
, >=
, <=
, >
and <
.
You could use the &&
and ||
operators which we discussed for 'if'
keyword here.
You could also use nested loops which is a loop inside another loop : There is nothing preventing you to implement something like that. Lets see an example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
0
;
6
int
y
;
7
while
(
i
<=
150
){
8
y
=
0
;
9
while
(
y
<=
100
){
10
printf
(
"*"
);
11
y
++
;
12
}
13
printf
(
"
\n
"
);
14
i
++
;
15
}
16
17
}
Output:
Here y++;
is same as y = y + 1;
and i++;
is same as i = i + 1;
Look at how fast your computer executes this program, This is the POWER and the USE of computers. You should now try to implement programs to make patterns like these. This is a very interesting job and it will help you a lot in this journey.
MORE INFO
You could use break; as a keyword so that when it executes, the cpu will exit from the loop it is currenlty running on
You could also use continue; as a keyword so that, when it executes, cpu will jump to the top of the current loop, which practically checks if the condition specified in that loop is true and if it's true, it will continue execution of code inside that loop
Type casting
is a feature which allows us changing one data type to other. Also we could use this to change the behaviour of a special data type to the behaviour of another datatype. We do not have any special keyword to do this but we could do it. Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
number
[
5
];
6
7
int
address
=
&
number
[
0
];
8
int
address2
=
&
number
[
0
]
+
1
;
9
int
address3
=
&
number
[
0
]
+
2
;
10
11
printf
(
"%d %d %d"
,
address
,
address2
,
address3
);
12
13
}
Output:
Here you could see some warnings, You could forget about it for now. The output we got is the following "6422272 6422276 6422280"
. You could see that each of this address is incremented with 4
. This means that address of number[0]
is 6422272
,
Address of number[0] + 1
(or number[1]
) is 6422276
and address of number[0] + 2
(or number[2]
) is 6422280
. This is a valid output as the size of an int is 4
bytes, incrementing it with one
will always add four
.
The &
symbol here helps us get address of the variable.
But what if you want to increment the address by only one byte
which helps you obtain each bytes of a number instead of getting the whole number. You could use casting like this:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
number
[
5
];
6
7
int
address
=
(
char
*
)
&
number
[
0
];
8
int
address2
=
(
char
*
)
&
number
[
0
]
+
1
;
9
int
address3
=
(
char
*
)
&
number
[
0
]
+
2
;
10
11
printf
(
"%d %d %d"
,
address
,
address2
,
address3
);
12
13
}
Output:
Here, You could see in the output that each address is incremented by only one
, not 4
.
What we does special in this code is that we added (char*)
. This says to the compiler that we should only increment the value with the size of char
(which is one
byte), and this will result so that only 1
is added to the address for each increment.
We added *
to the (char)
so that it tells the compiler that what we do is an operation on memory address so it looks like this now: (char*)
.
If you did not understand this, Please try reading again, This is not a big deal.
I will show another example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
*
string
=
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
;
6
7
char
a
=
*
(
string
+
0
);
8
char
b
=
*
(
string
+
1
);
9
char
c
=
*
(
string
+
2
);
10
11
printf
(
"%c %c %c
\n\n
"
,
a
,
b
,
c
);
12
13
char
x
=
*
((
int
*
)
string
+
0
);
14
char
y
=
*
((
int
*
)
string
+
1
);
15
char
z
=
*
((
int
*
)
string
+
2
);
16
17
printf
(
"%c %c %c"
,
x
,
y
,
z
);
18
}
Output:
Lets study the code now:
The first line is as follows : char *string = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
This line will declare a char pointer named string
and assign it with the address of : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
in memory.
The next three lines are as follows :
char a = *(string + 0);
char b = *(string + 1);
char c = *(string + 2);
Here the line: char a = *(string + 0);
will add 0
to the address specified in the variable string
. The *
before the variable named string
will result in returning the data pointed by the address in string
variable to the character variable a
.
This assigns the first character pointed by the variable string
to the character a
;
The only difference in the second line is that instead of adding 0
to the string
variable , we added 1
to it. This will result in returning the second character pointed by string
variable to the character variable b
.
The same follows also in the third line , adding two will result in returning the third character to the variable named c
.
In three of the cases , You saw that incrementing the value in string variable by one
result in returning the consecutive characters.
Then we printed the values in a
, b
and c
with: printf(“%c %c %c\n\n” , a , b , c);
This prints the consecutive characters "A B C"
to the terminal.
Then we does another approach like this:
char x = *((int*) string + 0);
char y = *((int*) string + 1);
char z = *((int*) string + 2);
Here we applied an integer casting
with (int*)
.
The first line result in returning the character 'A'
To the variable x
.
Then , in the second line we added 1
to the variable string
.
Here as we casted it with (int*)
, adding 1
will actually result in adding 4
which is the size of integer(int
). And the outer most *
in *((int*) string + 1)
will obtain the character 'E'
pointed by address in string + 1
and return it to the variable y
.
The same applies to the third line and it obtains the character 'I'
. As we added 2
to the variable string
, it will return the character at 2 * size of int
which is 2 * 4
which is 8
.
There are more details we could add to the topic Type casting
, But for this tutorial, This much is enough.
NOTE
It is a good idea to learn more on this topic which will help you in advanced implementations
We have already covered one of the Arithmetic Operations which is the addition operation. We used this to implement examples programs. Here you could learn more:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
base
=
4
;
6
7
int
a
=
base
+
2
;
8
int
b
=
base
-
2
;
9
int
c
=
base
*
2
;
10
int
d
=
base
/
2
;
11
12
printf
(
"%d %d %d %d %d"
,
base
,
a
,
b
,
c
,
d
);
13
14
15
}
Output:
The code and the output says it all, But let me explain it.
Addition could be done as:a = x + y;
Substraction could be done as:a = x - y;
Multplication could be done as:a = x * y;
Division could be done as:a = x / y;
Now, Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
a
=
2
;
6
int
b
=
4
;
7
int
c
=
6
;
8
int
d
=
8
;
9
10
a
+=
2
;
11
b
-=
2
;
12
c
*=
2
;
13
d
/=
2
;
14
15
printf
(
"%d %d %d %d"
,
a
,
b
,
c
,
d
);
16
17
18
}
Output:
I assume that you have already figured out what this code does, But let me explain it:
Here, a += 2
is same as a = a + 2
b -= 2
is same as b = b - 2
c *= 2
is same as c = c * 2
And finally d /= 2
is same as d = d / 2
We finally have one more operator, Which is the modulus
operator. Modulus
operator returns a remainder after integer division
. Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
a
=
12
;
6
7
int
b
=
a
%
10
;
8
a
%=
10
;
9
10
printf
(
"%d %d"
,
b
,
a
);
11
12
}
Output:
Here, int b = a % 10;
first divides the value in a
with 10
and returns the remainder to b
. 12 / 10
returns 2
as remainder. so value in b
will be 2
.
a %= 10;
is same as a = a % 10
, This will return 2
as remainder and it will be assigned to a
.
Look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
a
=
4
;
6
int
b
=
8
;
7
8
a
++
;
9
b
--
;
10
11
printf
(
"%d %d"
,
a
,
b
);
12
13
}
Output:
Here, a++
is the increment operation and it is same as a = a + 1
and a += 1
.b--
is the decrement operation and it is same as b = b - 1
and b -= 1
.
You can try comparing this with the output.
Both the AND
and OR
operators do pure binary operations. Let’s first learn Bitwise AND
operation.
Take the binary value 11010
and 11011
as two variables. Doing an AND operation on these values looks like this:
11010
11011
———
11010
Doing AND operation on 11010
and 11011
gives us 11010
as result.
In an AND
operation, The cpu takes each bit in both of the variables and generates a corresponding binary value 1
if both of the bits are 1
and generates 0
if both or any of the bit is 0
.
The Bitwise OR
operation looks like this:
11010
11011
———
11011
Here, The cpu takes each bit in both of the variables and generates a corresponding binary value 1
if any of the bit is 1
and generates 0
if both of the bits are 0
.
In c, The symbol &
could be used to perform AND
operation and the symbol |
could be used to perform OR
operation.
Lets See an example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
aa
=
3
;
6
char
ab
=
5
;
7
8
char
AND
=
aa
&
ab
;
9
char
OR
=
aa
|
ab
;
10
11
printf
(
"%d %d"
,
AND
,
OR
);
12
13
}
Output
Both left shift
and right shift
operators shifts bits in a binary value to the specified side, the specified number of times.
Take binary value 00010
for example. Left shifting it by one
will change that value to 00100
, Right shifting the value 00010
it by one will change it to 00001
.
You could shift the bits to any number of position, But as an example here , we will shift by one
.
In c, The symbol <<
could be used as left shift operator and >>
could be used as right shift operator.
Lets look at an example:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
char
left
=
3
;
6
char
right
=
3
;
7
8
left
=
left
<<
1
;
9
right
=
right
>>
1
;
10
11
printf
(
"%d %d"
,
left
,
right
);
12
13
}
Output
Macros
or pre-processors
are a feature provide by the c compiler to do things at the compile time.
From the start of this book, we read about Data types
, branching
, looping
etc. These things will take effect only when we run the executable.
Macros
provide us the ability to do things at the compile time(when we compile the code with commands like : gcc code.c
).
We will learn about two Macros here which are the #include
and #define
directives.
#include
The #include
directive is used to include other c source files into the program we develop so that we could call function in that file and more.
All of the examples we have covered this far have #include <stdio.h>
in the very top.
This will include the file named stdio.h
into our source file. The <>
brackets tells to the compiler that to look for the stdio.h
file in the c standard library folder
.
When developing our operating system, it is advised not to include any of these standard libraries as if any of the functions we call in this files contains platform specific calls(Specifically calls to the operating system), It will result in run time errors.
But we can include our own files into the source files we create. Imagine if you want to create a file named expl.h
and call a function we made in that file. To do this , create a file with that name and put the following code into it so that we could call it:
1
void out(){
2
printf("Hello");
3
}
Now create a file named func.c
(You can name it anything you want) and put the following code:
1
#include
<stdio.h>
2
#include
"expl.h"
3
4
int
main
(){
5
out
();
6
}
Now compile and run func.c
file to get the output:
You can see that the message got printed. We called a function named out
, in expl.h
file to print something to the screen. This way you could include any file you need.
For that we used the #include
directive with filename in ""
at the top of the file.
As what we does is printing to the screen, don’t forget to include the stdio.h
file
Also note that two of this files should be in the same folder to get compiled.
Now, Look at the following case:
The main file func.c
is in a folder named src
and you want to include the file expl.h
which is contained in a sub folder named header
(which is also inside the src folder), To do that, include the following line in the func.c
file.
#include "header/expl.h"
You should now go to the src
folder using command prompt using cd
command and compile the func.c
file
Please note that you should put .h
as extension for the files included by the #include
directive.
#define
The best way to explain #define
directive is to show an example, Have a look at the following:
1
#include
<stdio.h>
2
3
#define Message "WE ALMOST COMPLETED OUR C PROGRAMMMING TUTORIAL"
4
5
int
main
(){
6
printf
(
Message
);
7
}
Output:
Here, what the line:
#define Message "WE ALMOST COMPLETED OUR C PROGRAMMMING TUTORIAL"
Does is only make the compiler to copy paste where ever it see the word 'Message'
with the string "WE ALMOST COMPLETED OUR C PROGRAMMMING TUTORIAL"
In the printf
function We passed 'Message'
inside ()
. What this does is only replace the string 'Message'
with the string "WE ALMOST COMPLETED OUR C PROGRAMMMING TUTORIAL"
at the compile time.
So it will look like this when compiling(But we can’t see this):
printf("WE ALMOST COMPLETED OUR C PROGRAMMMING TUTORIAL");
Hexadecimal
is just a number system which allows us with ease of work with computers.
We Humans generally use decimal
as the number system which have a base of 10
.
Base of 10
means we have a total of 10
number ranging from 0
to 9
.
The base 10
of decimal
number system is said to be derived from the fact that humans have a total of 10 fingers. But anyway, when we work with computers, the efficient way is to use hexadecimal number system.
This system have a base of 16
which comprises of values from 0 - 9
and a - f
Hexadecimal value of 1
to 9
is same as that of decimal
.
For eg: Hex of decimal value 1
is 1
,
Hex of decimal value 9
is 9
.
But the hex value of 10
is a
.
11 = b
12 = c
13 = d
14 = e
15 = f
Now the hex of decimal value 16
is 10
17 = 11
18 = 12
19 = 13
20 = 14
21 = 15
22 = 16
23 = 17
24 = 18
25 = 19
26 = 1a
27 = 1b
28 = 1c
29 = 1d
30 = 1e
31 = 1f
32 = 20
And this process goes on.
Hex of 255
is ff
.255
or ff
is the highest value that a byte
could have.
A byte
means 8 bits
. so in the value ff
, The first f
is took from the first four bits
of the byte
and the next f
is took from the remaining four bits
So, The main advantage of the hexadecimal
format allows us to get a single hex digit
from four bits
.
Each group of 4 bits
in binary
is a single digit
in the hexadecimal
system. This makes it really easy to convert binary
to hexadecimal
numbers.
We could use hexadecimal
number system in c
by prefixing 0x
on a hex value, for eg: if we define it as 0xa
, this will get converted to the decimal value 10
.0xff
will be converted to decimal value 255
.
Two hexadecimal characters
make one byte
, Four
of them make 2 bytes
, six
of them make 3 bytes
. eg: 0xff
is one byte
, 0xffaa
is two bytes
and 0xffaa11
is three bytes
.
Have a look at the following code:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
a
=
0x4
;
6
int
b
=
0xa
;
7
int
c
=
0xff
;
8
int
d
=
0xff12
;
9
10
printf
(
"%d %d %d %d"
,
a
,
b
,
c
,
d
);
11
}
Output:
You could now do further studies relating to the hexadecimal number system if you want.
Commenting
is a feature provided by almost every language which let us include documentation or other helpfull information along with the code.
Comments would be skipped at compile time.
We have two ways to comment inside our code in c
. We could comment on a single line with //
and comment on multiple lines with /*
and */
, Have a look at the following:
1
#include
<stdio.h>
2
3
int
main
(){
4
5
/*
6
This is an example of multi line comment.
7
This allows us include multiple lines.
8
The main aim of comments is to document working of the code
9
*/
10
11
printf
(
"Hello"
);
// This is an example of single line comment
12
13
}
When compiled and executed, We won’t have any errors, and the code would work fine.
Output:
This section will help you to put what you have learned this far and implement it.
This is the challenge for you:
You have a variable named i
, You can assign it with any value , but first assign it with 5
and try to print one star on the first line , two stars in the second line and , three stars on the third line and continue until the number of lines reaches the value of i
.
If you assign i
with 5
, This will be the output:
Now, Try to find a solution to this problem.
Solution
1
#include
<stdio.h>
2
3
int
main
(){
4
5
int
i
=
5
;
6
7
int
y
=
1
;
8
int
z
;
9
10
while
(
y
<=
i
){
11
z
=
0
;
12
while
(
z
<
y
){
13
printf
(
"*"
);
14
z
++
;
15
}
16
printf
(
"
\n
"
);
17
y
++
;
18
}
19
}
Every computers have a common way of working which follows: Input –> Processing –> Output. Processing is the powerfull and resource intensive area.
The Hardware which processes instruction in normal Desktop or Laptop computers is commonly called as a Micro Processor. Micro processors does three thing in common which are : Fetch –> Decode –> Execute.
The processor first takes an instruction from the memory, Then it will decodes the instruction which makes the internal circuitry ready for the execution of that single instruction and finally, it will do the operation specified by the fetched instruction.
I am not willing to make you mad by explaining theories, But i will explain only the necessary things which you can’t skip.
Keeping it simple, Assembly language is a low level language which helps us teach computers do a task by explaining it in a step by step manner.
An assembler is a software which converts the program’s we wrote in assembly language to pure binary which the processor could directly execute.
We have special instructions in assembly language to do the task we need. The addition operation will look something like this in assembly:
add x , 5
This instruction will simply add 5 to x
. Copying values will look something like this:
mov numb , x
This will simply copy the value in x
to numb
.
Now, Lets see what the main role of an assembler is:
The processor won’t be able to do things that we write in english or other human readable language.
Take the addition instruction
add x , 5
as an example: The assembler will convert this human readable code to binary. For this example, Please take the following hex values into consideration(Please don’t take it as exact , this is only used to convey the idea):
0x83 0x45ff 0x5
The assembler will convert the add instruction to the hex value 0x83
, Then to represent the address of x
, it converts it to the hex value 0x45ff
(Please note that this is not the exact value everytime, it will depend on the address of variable in focus). Now, to denote the addition of value 5, it will be converted to 0x5
During the execution, The processor will first take the byte 0x83
.
It is predefined in the processor circuitry so that when it see the hex value 0x83
, to prepare the processor to fetch some of the next binary values to perform the addition operation.
Now, The processor fetches these values 0x45ff 0x5
Finally, The processor adds 0x5
( 5 in decimal ) to the value located at address 0x45ff
.
Likewise, the processor have a table of binary values like these which represent each operation.
These values are called machine code.
There are different micro processor architectures in the market. Each processor architecture will have its own unique table of values(machine code). This means that programs written for a special processor architecture won’t work in another processor architecture.
The Processor architecture we are going to work with is intel’s x86 architecture.
This is the mostly used architecture currently and it is where Windows and Linux Mainly run on. Even though x86 is developed by intel, Companies like AMD also manufactures the same.
So at the conclusion, Assembly language programs are a processor architecture specific, low-level, Human readable routine which would be translated to a non-human-readable(binary) form also known as machine language, by a software commonly named as assembler.
The main purpose of a compiler is not to convert the source code to binary.
We have discussed about the add
and mov
commands in x86 assembly language. There are many instructions that we can use to make our program in assembly.
To know what a compiler is, Please save the following c code to a file, name it code.c
:
1
#include
<stdio.h>
2
3
int
main
(){
4
printf
(
"Hello , World"
);
5
}
Now, Go to where you saved the file with command prompt and type the following command
gcc -S code.c
You will now get a new file in the same directory. In my system, I got a file named code.s
Have a look at its content:
1
.file "code.c"
2
.def ___main; .scl 2; .type 32; .endef
3
.section .rdata,"dr"
4
LC0:
5
.ascii "Hello , World\0"
6
.text
7
.globl _main
8
.def _main; .scl 2; .type 32; .endef
9
_main:
10
LFB10:
11
.cfi_startproc
12
pushl %ebp
13
.cfi_def_cfa_offset 8
14
.cfi_offset 5, -8
15
movl %esp, %ebp
16
.cfi_def_cfa_register 5
17
andl $-16, %esp
18
subl $16, %esp
19
call ___main
20
movl $LC0, (%esp)
21
call _printf
22
movl $0, %eax
23
leave
24
.cfi_restore 5
25
.cfi_def_cfa 4, 4
26
ret
27
.cfi_endproc
28
LFE10:
29
.ident "GCC: (MinGW.org GCC-6.3.0-1) 6.3.0"
30
.def _printf; .scl 2; .type 32; .endef
This is an assembly code equivalent to the c code we wrote. So this is the point, a compiler is only a program which translates the code we write in a higher level language like c, to a lower level language like assembly. Then the c compiler will call its own assembler to assemble the assembly code which the compiler generated.
You don’t need to worry about the complexity of assembly code you saw here. We will not write assembly programs like this. We will only use a neat style to learn this. And mainly we are not going to use the assembler which comes with the compiler. This is why we have downloaded nasm as our assembler.
You just saw the amount of lines created just for our hello world program. This is why no one these days write code in assembly and it is also the reason behind the development of languages like c.
Assembly language is processor specific, and if we write code in assembly for a specific processor, we wont be able to make it work on other processor architectures. If you want to do that, You may need to write that program in assembly language of the processor you want to port to from scratch. This is where a compiler come. If we write our os in c or similar language, we will be able to port the program to other processor architectures if the processor vendor provide a c compiler for their processor. But we wont be able to port assembly programs like this.
This is why we write our os in c, But we surely need to write some part of the code in assembly. When porting your os to other processor architecture, You may need to change every assembly code you wrote for your os to the equivalent code for the processor architecture you want to port to.
The x86 processor defines different data sizes with its name and its size. Here is the list of it:
Word: a 2-byte data item
Doubleword: a 4-byte (32 bit) data item
Quadword: an 8-byte (64 bit) data item
Paragraph: a 16-byte (128 bit) area
Kilobyte: 1024 bytes
Megabyte: 1,048,576 bytes
We are now going to use the software named sasm that we have downloaded. First open it and create a new project and include the following code into it:
1
%include
"io.inc"
2
3
section
.
data
4
msg
db
'
Hello
,
world
!!
'
,
0
5
6
section
.
text
7
global
CMAIN
8
CMAIN
:
9
PRINT_STRING
msg
10
ret
Run it by pressing the play button and we will get the following output:
We will explain the working later. For now, We could jump to the next section.
Now, we will learn some theories.
The main function of a processor is to process data. These data’s are mainly stored in main memory(also known as RAM or Ramdom Access Memory). As the job of a processor is to process data, It may need to access memory often.
Here, We have to think about a factor, which is speed. The processor is very much faster than RAM and accessing it frequently will decrease it’s processing speed to a greater extend. It is also an inefficient way as accessing it frequently will put a lot of traffic on the system bus.
This is where the concept of register’s come. Registers are the memory inside the processor. The technology used to implement registers are so advanced so that the processor could execute at a maximum speed.
But we can’t use the technology used to implement registers to build a main memory. If this is practical, We could think of an architecture with no registers, With only a main memory. But this idea will only be in theories. The registers in the processor these days are either 32 bit or 64 bit in size. But when talking about main memory, We will atleast need 2 or 4 GB.
If we use the technology used to implement registers to build main memory, It will cost you as much as money which only a few could handle. This is the reason why we implement computers with RAM as a slow memory –> cache as a faster memory that ram –> And finally registers as an even faster memory than cache.
The following is the list of devices in computer with increasing amount of speed and decreasing amount of capacity:
1)Hard disks – Permanent storage
2)Ram – Data goes off when the computer is off
3)Cache – Data goes off when the computer is off
4)Registers – Data goes off when the computer is off
The registers are grouped into three categories:
1)General registers
2)Control registers
3)Segment registers
The general registers are further divided into the following groups:
1)Data registers
2)Pointer registers
3)Index registers
There are four data registers used for arithmetic and other operations. These are the 64 bit data registers:
1)RAX
2)RBX
3)RCX
4)RDX
Lower halves of the 64 bit data registers can be used as 32 bit data registers which are:
1)EAX
2)EBX
3)ECX
4)EDX
Lower halves of the 32 bit data registers can be used as 16 bit data registers which are:
1)AX
2)BX
3)CX
4)DX
Higher and Lower halves of 16 bit data registers can be used as 8 bit data registers which are:
1)AH , AL
2)BH , BL
3)CH , CL
4)DH , DL
Have a look at the following images which shows division of 32 bit data registers:
The AX is the primary accumulator: It is used in input/output and most arithmetic operations.
The BX is known as base register: It could be used in indexed addressing(We will see this later).
The CX is known as the count register: It is mainly used to store loop count.
The DX is known as data register: It is also used in input/output operations. It is also used along with the AX register for arithmetic operations.
There are three pointer registers. 64 bit ones are named:
1)RIP
2)RSP
3)RBP
32 bit ones are named:
1)EIP
2)ESP
3)EBP
16 bit ones are named:
1)IP
2)SP
3)BP
Instruction Pointer (IP) - This register stores the address of next instruction to be executed. x86 architecture use segmented addressing. IP
is associated with the CS
register. CS:IP
gives the complete address of the next instruction to be executed(We will see this later).
Stack Pointer (SP) - Mainly used to point to the top of stack. SS:SP
Gives the complete address in segmented addressing(We will see this later).
Base Pointer (BP) - This is mainly used to access memory relatively. BP - A special offset gives access to variables in memory. SS:BP
gives the complete address in segmented addressing(We will see this later).
There are 2 index registers. 64 bit ones are named:
1)RSI
2)RDI
32 bit ones are named:
1)ESI
2)EDI
16 bit ones are named:
1)SI
2)DI
Source Index (SI) - This is mainly used as source index for string operations.
Destination Index (DI) - This is mainly used as destination index for string operations.
Many instructions in x86 architecture involves comparisions and mathematical operations. This will change the status of some flags and some other conditional instructions test the values of these flags to influence the control flow.
The common flag bits include:
1)Overflow Flag (OF)
2)Direction Flag (DF)
3)Interrupt Flag (IF)
4)Trap Flag (TF)
5)Sign Flag (SF)
5)Zero Flag (ZF)
6)Auxiliary Carry Flag (AF)
7)Parity Flag (PF)
8)Carry Flag (CF)
I am not going to explain and overwhelm you with lot of theories about this. If you need more information about this, You could google about x86 control registers.
Not learning this deeply wont affect you for starting in this field. But when you are planning to go advanced, It’s a good idea to learn about this.
Segments are specific areas in a program for containing Data , Code and Stack. x86 initially was a 16 bit architecture. So we could only access data in memory with 16 bit address. This will limit the size of ram that could be implemented. Segmented memory is used to allow us access more memory. Segment registers are the solution used to solve this problem.
The Registers include:
1)Code Segment (CS): Used to point to executable code. This register stores the starting address of the code segment.
2)Data Segment (DS): Used to point to Data. This stores the starting address of Data Segment.
3)Stack Segment (SS): Used to point to the starting of the stack segment. Stack segment is where variables which are local to functions are stored.
Processor Endianess defines how a multi-byte
data type is stored in memory.
The early computers implemented a 4 bit
or 8 bit
processor architecture.
x86 architecture was a 16 bit architecture and data items in it’s registers qualify to be named as multi-byte
data type.
The two types of endianess which exists are named as Little Endian
and Big Endian
. x86 uses Little Endian
architecture. Little Endian
architecture stores multi byte data in a reverse order in memory.
By reversing, it doesn’t mean the first bit
is stored last and last bit
is stored first. The reversing get affected at the byte
level which means that first byte of a multi-byte data item get stored last, Second byte get stored second last and goes until the last byte gets stored first.
When copying the data from memory to any register, another byte reversing will be done which practically makes the number in the real form.
For Eg: Consider 0xaabb
as a multi-byte data. In memory, It will be stored as 0xbbaa
. After copying it to any register, It will be back in normal form as 0xaabb
.
Consider the following 32 bit
data :
0xaabbccdd
This will be in memory as 0xddccbbaa
.
It will be in normal form like this:
0xaabbccdd
When in register.
When talking about the C
Programming language in x86 architecture, The keyword char
won’t be affected by Endianess as char
is a single byte data type, But the int
keyword will always be affected by Endianess as int
keyword always uses 4 - 8 bytes.
The other architecture which exists is the Big Endian
architecture. This architecture uses the normal way of storing which stores the first byte first and continues untill the last byte get stored last.
Most of the operations we do invloves registers. So, We have different commands to work with registers. We will discuss about the important ones.
The mov command is used to copy values from one location to other. Following is the basic syntax:
mov destination , source
Consider a situation where you want to copy the value 10 to the eax register. The code will look like this:
mov eax , 10
Here’s how we could print it in sasm.
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
mov
eax
,
10
7
PRINT_DEC
4
,
eax
8
ret
Output
You only need to look at the code after CMAIN:
The command mov eax , 10
only copies the value 10 to the eax register.
In PRINT_DEC 4 , eax
, PRINT_DEC
is not a command. It is only a routine defined in the io.inc
file included at the top of the code. Here, It helps us convert binary value in eax
register to decimal form and print it as an output.
4
in PRINT_DEC 4 , eax
tells the routine that the operation is done in a four byte data item(Here it is the eax
register).
You do not need to focus on this printing function as we wont have this routine when developing our os. We will build our own routine for that.
The very last command ret
will practically terminate that program, The function of ret
command is not to terminate a process. We will discuss that later.
We could do this operation in more register, Look at the following:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
10
8
PRINT_DEC
4
,
eax
9
NEWLINE
10
11
mov
ebx
,
20
12
PRINT_DEC
4
,
ebx
13
NEWLINE
14
15
mov
ecx
,
30
16
PRINT_DEC
4
,
ecx
17
NEWLINE
18
19
mov
edx
,
40
20
PRINT_DEC
4
,
edx
21
NEWLINE
22
23
ret
Output
Here, We tested the mov
command with different registers and it worked!
The line NEWLINE
will put a line break in the output. It is not a command in x86 architecture. NEWLINE
is pre defined in the io.inc
file we included at the top.
As the name say’s, add command is a command to perform the addition operation.
The syntax is as follows:
add destination , source
Here’s the implementation:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
10
8
PRINT_DEC
4
,
eax
9
NEWLINE
10
11
add
eax
,
10
12
PRINT_DEC
4
,
eax
13
NEWLINE
14
15
ret
Output
Here, we first copied the value 10 to the eax
register and printed it. Then we issued the command add eax , 10
. This command adds 10 to the value in eax
register and stores the result in the eax
register itself.10 + 10 = 20
, so the next string printed to screen is 20.
As you have guessed, the sub
command is used to perform the substraction operation.
The syntax is as follows:
sub destination , source
Here’s the implementation:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
25
8
PRINT_DEC
4
,
eax
9
NEWLINE
10
11
sub
eax
,
5
12
PRINT_DEC
4
,
eax
13
NEWLINE
14
15
ret
Output
The sub
instruction here will substract 5 from 25 and the value 20 is printed.
Stack is an area in the main memory(RAM) where we could store variables and other data items. The two operations that we could do on stack memory is the push
and pop
operations. Push
basically means to store a value in memory and Pop
means to take a value from memory.
On the x86 architecture, pushing a value to stack means to put a value in lower memory address. When we attempt another push operation, The processor will put the value specified to it in memory just BEFORE the previous one. This means that the stack grows to lower memory addresses.
Popping values(Or taking value) from memory means taking value from the lowest address of the stack. Another pop attempt will take the value just AFTER the previous one. This means that pop instruction goes to higher memory addresses.
We will see an example image, Here’s an image of the stack with some initial values in it:
Here is the image of the stack memory when a value 0xFFFFFFFF
is pushed onto it:
At this stage, If we issue a pop
command, The stack memory will look like this:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
2001
8
9
push
eax
10
11
pop
ebx
12
13
PRINT_DEC
4
,
ebx
14
15
ret
Output
Here , We first initialized eax
with 2001
. Then we pushed the value in eax
to the stack with the command push eax
. This will place the value 2001 to the stack memory.
Then we issued the following command, pop ebx
. This will take the value that we last pushed to the stack to the ebx
register. So now, the ebx
register have the value 2001.
Then we printed the value in ebx
. This will print the value 2001 to the screen.
This is what the push and pop command basically does.
But how does the processor know what the last item pushed to the stack is?
This is where the esp
register comes. The esp
register holds address to the last item in stack memory.
For eg: If we push a value in eax
register to the stack, What the processor does is first decrement the value in esp
by four bytes(Which is the size of the eax
register), Then it pushes the value in eax
register to the address pointed by the esp
register.
Likewise, when we pop a value to a register for eg: to the ebx
register, The processor first takes the value in memory pointed by the esp
register and copy it to register ebx
and increases the value in esp
by four bytes(Which is the size of ebx
register).
So at conclusion, The push
command decreases the value in esp
register and the pop
command increses the value in esp
register.
Lets see it practically:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
PRINT_DEC
4
,
esp
8
NEWLINE
9
10
push
eax
11
12
PRINT_DEC
4
,
esp
13
NEWLINE
14
15
pop
eax
16
17
PRINT_DEC
4
,
esp
18
ret
Output
Here, We First printed the initial value of the esp
register and we got the value 6422284
. This is the address of the data which is at the top of stack.
Then we pushed the value in eax
register to the stack with the push eax
command.
This will practically decrease the value in esp register by four bytes. To confirm this, We printed the value in esp
register after that push
instruction.
And we got this value : 6422280
.
We can clearly see that the value decreased by four bytes. This value is the address of data at top of the stack.
Finally we issued the pop
command like this: pop eax
. This will copy the value at top of the stack to the eax
register and also increases the value in esp
register by four bytes and we printed it. The value of esp
register now is 6422284
. This is how the stack and commands to work with stack works.
Let’s look at another example:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
100
8
mov
ebx
,
200
9
10
push
eax
11
push
ebx
12
13
add
esp
,
4
14
15
pop
ecx
16
17
PRINT_DEC
4
,
ecx
18
ret
Output
Here, We first moved the number 100
to the eax
register and then moved the value 200
to the ebx
register.
Then we pushed the value in eax
and after that we pushed the value in ebx
register to the stack.
At this stage, The value at the top of the stack is 200
(Value of ebx
register). And if we issue a pop
command and print the value, We will get 200
as output.
But before popping and printing the value, We Increased the value in esp
register by four bytes with the command : add esp , 4
. Now, As you have guessed, The esp register points to the value 100
which we first pushed.
And if we pop the value at top of stack with the pop
command and print it, We will get 100
as output, not 200
.
You could try removing the add esp , 4
line and differentiate the results.
I showed this example so that you could understand how all of the things incorporate.
What if you want to store values in all register to stack temporarily so that you could load new values to those register and later assign the previous values from stack back to registers?
We have some special commands to do that in x86 assembly.
To temporarily store the values in 32 bit general purpose registers to the stack, You could use the command pushad
and to retrieve the value back to registers, You could use the popad
command.
When working on 16 bit register, You could use pusha
and popa
commands for these operations.
Have a look at an example:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
12
8
mov
ebx
,
24
9
10
PRINT_DEC
4
,
eax
11
NEWLINE
12
PRINT_DEC
4
,
ebx
13
NEWLINE
14
NEWLINE
15
16
pushad
17
18
mov
eax
,
100
19
mov
ebx
,
200
20
21
PRINT_DEC
4
,
eax
22
NEWLINE
23
PRINT_DEC
4
,
ebx
24
NEWLINE
25
NEWLINE
26
27
popad
28
29
PRINT_DEC
4
,
eax
30
NEWLINE
31
PRINT_DEC
4
,
ebx
32
33
ret
Output
I believe you could describe it yourself. But i will say how it works:
Here, We first moved some values to both the eax
and ebx
register and then we printed it. Now comes our command in focus: We put the pushad
command there, This will push the values in all general purpose registers to the stack. Practically the values in eax
and ebx
registers also will be pushed.
Then we moved the value 100 to the eax
register and 200 to ebx
register and printed it. Here, the previous values in eax
and ebx
register got overwritten by new values.
Then we issued the popad
command. This command will copy the previous values we pushed to the stack with pushad
command, Back to registers. So the values we copied to eax
and ebx
registers before the pushad
command also will be back to those registers.
See the following code:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
PRINT_DEC
4
,
esp
8
NEWLINE
9
10
pushad
11
PRINT_DEC
4
,
esp
12
NEWLINE
13
14
popad
15
PRINT_DEC
4
,
esp
16
NEWLINE
17
18
ret
Output
Here, We first printed the initial value of esp
register. Then we issued the pushad
command. This will push whatever data is present in the general registers to the stack. We confirmed it by printing the value of esp
after the pushad
operation, We saw that the value of esp
decreased from that of the initial value.
Then, We issued the popad
command, This will pop the values pushed to the stack back to registers, We confirmed it by printing the value in esp
and we saw that the value came back to the initial value.
Both inc
and dec
commands are simple. The inc
command increments the value in a register by 1
and the dec
command decrements the value in a register by 1
Have a look at the following code:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
1
8
mov
ebx
,
2
9
10
PRINT_DEC
4
,
eax
11
NEWLINE
12
13
PRINT_DEC
4
,
ebx
14
NEWLINE
15
NEWLINE
16
17
inc
eax
18
dec
ebx
19
20
PRINT_DEC
4
,
eax
21
NEWLINE
22
23
PRINT_DEC
4
,
ebx
24
NEWLINE
25
26
ret
Output
The output itself explains all!!
jmp
is just a command used to jump to a specified section of code, Let’s see an example:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
100
8
PRINT_DEC
4
,
eax
9
NEWLINE
10
11
jmp
sec
12
13
mov
eax
,
200
14
PRINT_DEC
4
,
eax
15
NEWLINE
16
17
sec
:
18
19
mov
eax
,
300
20
PRINT_DEC
4
,
eax
21
22
ret
Output
Here, We first moved the value 100
to the eax
register and printed it to the screen along with a newline. Then we issued the command jmp sec
which transfers execution control to the sec
section.
The sec
section is defined with the line sec:
, When cpu executes the command jmp sec
, It jumps to the first code in the sec
section. This results in the execution of code which moves the value 300
to the eax
register and prints it.
You also saw a part of code which tries to moves the value 200
to the eax
register and print it, But practically, That code won’t be executed as we jumped to another section of code before reaching that line.
The thing to note here is that, if you want to create a new section, you need to suffix the section name with :
, which means that if you want to create a section named ADD
, You need to define it as ADD:
. And to jump to that section, you need to call it like jmp ADD
.
call
is also a command which could be used to jump to other section but with an added capability. Let’s make a small modification to the code we wrote for the jmp
command :
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
100
8
PRINT_DEC
4
,
eax
9
NEWLINE
10
11
call
sec
12
13
mov
eax
,
200
14
PRINT_DEC
4
,
eax
15
NEWLINE
16
17
ret
18
19
sec
:
20
21
mov
eax
,
300
22
PRINT_DEC
4
,
eax
23
NEWLINE
24
25
ret
Output
In this code, We first moved the value 100
to eax
register and printed it.
Then, we took another approach to jump to the sec
section using the call
instruction.
Here, the difference from the jmp
instruction is that, Before jumping to the section named sec
, The cpu pushes the address of instruction after the call
instruction to the stack before jumping to the sec
section.
In our code, When the cpu executes the call instruction, It first pushes the address of the instruction mov eax , 200
to the stack and jumps to the sec
section.
In that section, We printed the number 300
to the screen.
The last code in that section is ret
, The ret
instruction always pops the value at the top of the stack and stores it in the eip
register(This is the register where the location of next instruction to be executed is present).
As the call
instruction executed before, pushed an address to the stack, That pushed address itself will be popped to the eip
register when the ret
instruction is met. This results in the execution of the instruction mov eax , 200
and then the printing of that number along with a newline. The next instruction to be executed is also ret
. When this code is executed, The cpu will jump to code section which called our code which we defined as CMAIN:
. This will result in execution of code to terminate the process.
Have a look at the following code with no ret
instruction and you can see that the program crashed:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
100
8
PRINT_DEC
4
,
eax
9
NEWLINE
Output
The actual reason for the crash is that as we haven’t included ret
instruction, The processor will start executing every data in memory after our code.
The processor will practically take everything in memory as executable code and the chance is that it might run code which it doesn’t know or do unnecessary things.
Now, Look at the following code:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
PRINT_DEC
4
,
esp
8
NEWLINE
9
10
call
special
11
12
ret
13
14
special
:
15
16
PRINT_DEC
4
,
esp
17
NEWLINE
18
19
ret
Output
In this code, We first printed the value in esp
register. This is the address of the data which is at the top of the stack.
Then, We called a section of code named special
with the call
instruction.
This will push the address of the instruction after the call
instruction to the stack and jumps to the code section named special
.
In the special
section, we printed the value in esp
and you can see that the value changed. This is because the call instruction pushed an address to the stack. This changed value is the address of value at the top of the stack, which here is the address to instruction after the call
instruction. After returning from the section special
, The value in esp will be back to normal : You could try printing it.
I believe you have learned a lot till this. We will learn some more concepts before learning OS-DEVELOPMENT
. Don’t forget to have fun learning!!
The cmp
command is mostly used to compare two values to do a conditional execution.
The if
command we used in C
makes use of the cmp
instruction.
The cmp
instruction makes changes to some Flag registers so that a logical jump can be made with other instructions.
See the following code:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
10
8
mov
ebx
,
10
9
10
cmp
eax
,
ebx
11
je
DONE
12
13
PRINT_DEC
4
,
0
14
ret
15
16
DONE
:
17
18
PRINT_DEC
4
,
10
19
ret
Output
Here, We first copied the value 10
to both eax
and ebx
registers.
Then, With the command cmp eax , ebx
, We compared the values in both the eax
and ebx
register.
The next command we issued is the je
command. It stands for Jump Equal
or Jump If Equal
.
When executing the je DONE
command, The processor checks the comparison we done using the cmp
command and if the values in both eax
and ebx
registers are equal, Then it will jump to the code section named DONE
.
If the values are not equal, It will continue execution of command from PRINT_DEC 4 , 0
. This means that the cpu will not jump to the section DONE
if the values are not equal.
Now look at the following code:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
mov
eax
,
10
8
mov
ebx
,
12
9
10
cmp
eax
,
ebx
11
je
DONE
12
13
PRINT_DEC
4
,
0
14
ret
15
16
DONE
:
17
18
PRINT_DEC
4
,
10
19
ret
Output
Here, We applied only a small change, We moved the number 12
to the ebx
register. So now, The values in eax
and ebx
registers are completely different.
So when executing the je DONE
command, The cpu realises that values in those registers are completely different, So it will not execute the instructions in the DONE
section and executes the instruction PRINT_DEC 4 , 0
and finally terminates.
Here, We have used je
as conditional operator, but we have more commands that can be used after the cmp
command. They are:
1)JE : Jump Equal
2)JNE : Jump Not Equal
3)JG : Jump Greater
4)JGE : Jump Greater Than Or Equal
5)JL : Jump Less
6)JLE : Jump Less Than Or Equal
There are more branching commands to use with the cmp
command, You could refer to other sources to learn more, But for this book, This much is enough.
I also recommend you to master x86 assembly. You could do it after completing this book or along with this book. It’s all upto you.
Nasm provides us with features to store variables in areas other that in the stack segment. define directives help us allocate these locations.
Have a look at the following:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
6
CMAIN
:
7
PRINT_DEC
4
,
av
8
ret
9
10
section
.
data
11
av
db
'A'
Output
Here you could see a section code starting with section .data
and in that section, We declared a variable with av db 'A'
.
Here av
is the name of the variable, db
says to allocate a byte and 'A'
will be assigned to the location reserved for av
.
The output you saw on the screen is the ascii value of the character assigned to that variable.
You could also create an array of characters(String) with the db
command itself.
Eg: msg db 'Hello, world!', 0
Here, some memory will be allocated to store the string Hello, world!
ending with a null character(0
). And it can be accessed by referencing with the variable name msg
.
We have more keywords to allocate variables of different length. The full options are given below:
1)db : Allocates 1 byte
2)dw : Allocates 2 bytes(Word)
3)dd : Allocates 4 bytes(Doubleword)
4)dq : Allocates 8 bytes(Quadword)
5)dt : Allocates 10 bytes
In the c programming language, We used pointers to access memory directly. Accessing memory directly is also possible in assembly and there need not be a doubt for that.
Lets see an example:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
6
CMAIN
:
7
mov
eax
,
msg
8
mov
dl
,
[
eax
]
9
10
PRINT_DEC
1
,
dl
11
NEWLINE
12
13
inc
eax
14
mov
dl
,
[
eax
]
15
16
PRINT_DEC
1
,
dl
17
NEWLINE
18
19
inc
eax
20
mov
dl
,
[
eax
]
21
22
PRINT_DEC
1
,
dl
23
NEWLINE
24
25
inc
eax
26
mov
dl
,
[
eax
]
27
28
PRINT_DEC
1
,
dl
29
NEWLINE
30
31
ret
32
33
section
.
data
34
msg
db
'
ABCDEFGH
'
Output
Here, At the very last of the program, We declared a variable named msg
as:
msg db 'ABCDEFGH'
Whenever we reference something with the name msg
, We will get the address of the string ABCDEFGH
.
The very first line in the code is:
mov eax , msg
This line moves the address of the string ABCDEFGH
to the eax
register.
The next line is as follows:
mov dl , [eax]
Here, the Square Brackets in [eax]
is used so that computer when at execution, will take the value in eax
register as an address and take the data in that address to the dl
register. So Here, Square Brackets are used as a sign to access memory.
The processor will only fetch one byte to the dl
register as the size of dl
register is only one byte.
Then we printed the value in dl
register with the command PRINT_DEC 1 , dl
.
This will print the ascii value of character A
, as the first character in the variable msg
is A
itself.
Then we incremented address and printed each character’s ascii values.
Here, We used the command mov dl , [eax]
to copy the first character’s ascii. We accessed the next value after incrementing the address with inc
command. if we want to access the second character’s ascii without using the inc
command, You could use something like mov dl , [eax + 1]
. This will give us the second character’s ascii, likewise the command mov dl , [eax + 2]
will give us third character’s ascii. Further increment will give us further values in the msg
variable.
Think of another case where we want to access the data in the address 61ff0c
specifically. Then, We could use the command mov eax , [0x61ff0c]
. This will always move the value in address 61ff0c
to the eax
register. But accessing a specific address like this wont work every time as when we are working on a pre made operating system(Windows here), the operating system may deny access to that memory which might crash the process. This is beacuse in windows and most other current operating systems, some memory parts are protected. But if the address we specify is not protected, We will get the data.
But when running the os WE made, We will get access to every locations as there is nothing preventing us from doing that. The kernel of the operating system have access to all memory locations.
We could use the commenting feature in nasm to document our code, We could do it using ';'
Example:
1
%include
"io.inc"
2
3
section
.
text
4
global
CMAIN
5
CMAIN
:
6
7
PRINT_DEC
4
,
100
;
This
line
prints
the
number
100
to
the
screen
8
ret
9
10
;
The
program
ends
Output
Finally, You have completed everything to get started into developing your own os.
Now we will start learning operating system development, which you was eagerly waiting for. I think you have enjoyed this far, but the upcoming chapter are more interesting. Let’s GOOOOOO!!!
We have travelled a long road till here. And we are now at the Main and Interesting topic. Before beginning, We have to know what an operating system is.
The early computers(At the very begining of computing), didn’t have any operating system. In those days, computers are mainly used for mathematical calculation and for research purposes. As there is no operating system those days, The developers who want to solve a mathematical or other problems need to include extra code to handle memory, input/output devices etc along with the code to solve the problem they intend to solve.
As this is a repetitive job to include extra code along with the code to solve thier problem every time, They started the idea of Operating Systems
. So, they made some code which manages memory , io devices etc and put it into the computers they ran. Later, If they need to solve a particular problem, They only need to focus on the code to solve the problem they want to solve as the Operating System
they previously made will do all other tasks.
We all are lucky these days, We have plenty of operating systems to use and we have even writen programs for one. When writing and running your own operating system, It’s you, who have the supreme power on your computer. You will manage all io devices and everything will be in your control. You could even write a GAME and run it on your computer with no operating system. You will get direct access to your video memory which lets you create your own graphics, Control it with keyboard and mouse and MAINLY with all of the processor power for your program only.
Anyway, We are going to start it, Lets Go!!
Boot Sector
is the location in Hard Disk where we place code that we want to run. Most of the times, It contains code to load the operating system installed in that machine. But the code in boot sector is not the first code to get executed.
When we power on our computer, It starts copying some piece of code named Bios
from ROM
(Read Only memory) to main memory(RAM
).
Then the cpu executes it. The main function of bios is to check if all connected devices are working properly and finally Load some code from the from first 512
bytes in the hard disk(Or #1 Bootable device) to Memory and execute it.
The location of this first 512
bytes is named as Boot Sector. We could write some programs and place it into the boot sector to execute it.
We are going to run our operating system in QEMU
for now. Type the following command in cmd and hit enter:
qemu-system-i386
You will get a window like the following:
It says no bootable device found
, But that’s OK as we didn’t give it with a file to boot, It will keep saying that.
Let’s create a 512
byte file using nasm
and give it to QEMU
so that it could execute it.
First create a file named boot-sect0.asm
and include the following code:
1
times 512 db 0
Here, what the code does is create a 512
byte long file and initialize all of the bytes with 0
. You already know this command: The db
command(Or define byte
).
The db
command defines a byte
, But as we included the command times 512
in times 512 db 0
, It will define 512 bytes
, not 1 byte
. The 0
at the end is used to initializes all of the bytes with 0
.
assemble it using the following command:
nasm boot-sect0.asm -f bin -o boot.bin
In this command, boot.bin
will be the name of the 512 byte
file which the assembler will create.
Open this file with the Hex Editor
that we previously downloaded, You will see the following output:
You could see that the file is 512
bytes long and all of the bytes are initialized with 0
.
Now, we could boot this file using QEMU
. For that, Type the following command in cmd:
qemu-system-i386 -drive format=raw,file=boot.bin
Here, boot.bin
is the name of file to be booted.
Output
It is showing that no bootable device is found. There is nothing wrong here, It won’t boot everything which is fed into it. For the computer to boot our operating system, It need to know whether the file is actually an executable. This is implemented so that, The computer will not try to take any piece of data as executable until we say that it is an executable.
We need to make small changes to the code so that the computer can confirm that it is an executable.
For that, put the following code into the source file and assemble it from cmd using nasm(Use the previous command which we used to assemble the source code using nasm):
1
loop
:
2
jmp
loop
3
4
times
510
-(
$
-
$$
)
db
0
5
dw
0xaa55
Now, Boot the file with qemu, We will see the following:
Congratulations!!!!, You have successfully created something that the computer could direcly execute!!!
Lets learn how this works, Open the binary file we just booted, using the Hex Editor and look at its content.
Look at the very last two bytes here:
These bytes are called Magic Number
, when our computer tries to boot from hard disk, It first checks if the last two bytes of the 512 byte
boot sector is the this magic number or not, When it finds that the last two bytes are this magic number, It will start to execute the code from the 0’th location. In our case it executes the jmp
instruction. This is an endless jump as it jumps every time to this jmp instruction itself so this program will not terminate.
We applied this jmp
instruction here so that the computer will not execute the remaining bytes which we initialized it with 0
. There is a chance for the computer to crash as it executes the data that is not intended to be executed.
We applied the magic number with the command dw 0xaa55
. This command will attach the bytes aa
and 55
to the end of the file generated.
But when looking at the Hex Editor, we can see that the bytes are 55
and aa
, not aa
and 55
.
This is because of the endianess
property we discussed in the assembly programming section. The x86 compilers and assemblers will put multi-byte data type in reverse order. This is what that happened here.
You can also see a section in our code : times 510-($-$$) db 0
.
The job of this command is to fill the content between the bytes representing the jmp
instruction and dw
instruction with 0
.
It fills the content after the byte representation of the jmp
instruction to the 510'th
byte with 0
.
The final dw 0xaa55
command generates those bytes and places it to the very last of the file.
So finally, The size of the file becomes 512
bytes.
When creating further programs, We could replace the jmp
instruction with whatever instruction we want, But we shouldn’t take away the dw
command which places the magic number at the last two byte locations. if we do that, The program won’t be executed by the computer.
So now, Lets try printing something to the screen.
Save and assemble the following code:
1
mov ah, 0x0e
2
3
mov al , 'H'
4
int 0x10
5
6
mov al , 'E'
7
int 0x10
8
9
mov al , 'L'
10
int 0x10
11
12
mov al , 'L'
13
int 0x10
14
15
mov al , 'O'
16
int 0x10
17
18
mov al , ' '
19
int 0x10
20
21
mov al , 'W'
22
int 0x10
23
24
mov al , 'O'
25
int 0x10
26
27
mov al , 'R'
28
int 0x10
29
30
mov al , 'L'
31
int 0x10
32
33
mov al , 'D'
34
int 0x10
35
36
jmp $
37
38
times 510-($-$$) db 0
39
dw 0xaa55
Boot the assembled file and we could see this:
We have successfully printed HELLO WORLD
to the screen. Let’s study the code:
Please note that, the computer will be initially in 16 bit mode(16 Bit Real Mode) after boot-up. So, We could only use 16 bit registers such as ax , bx or lower parts of it such as ah , al , bh , dl etc..
We will later learn how to switch to 32 bit mode so that we could use 32 bit registers such as eax , ebx and also lower parts of it such as ax , bx etc..
Here, we printed these characters by calling BIOS
with int 0x10
. The bios will do neccessary thing to print the characters to the screen.
We first copied the hex value 0e
to the ah register with the command : mov ah, 0x0e
. This value is named as The Scrolling Teletype
. Then we copied the character H
to the al
register using the command mov al, ’H’
. Finally we called Bios using int 0x10
int
is a command in assembly known as Interrupt
. This command is used to pause the current execution and execute another code which is defined in a section named as Interrupt Descriptor Table
.
int 0x10
jumps to code section which have a list of display related routines. Bios selects which routine among them to be executed by looking at the values we passed to the registers.
When the Bios gets this interrupt, It first checks the value in ah
register.
Here, as we copied the value 0e
to the ah
register before the interrupt command, The bios realises that we want to implement a scrolling teletype. A scrolling teletype advances the cursor position to the next coloumn in the screen after printing a character.
The Bios knows which character to print to the screen by refering to the value in al
register. As we previously passed the character H
to al
register, The bios prints the character H
to the screen and advances the cursor to the next coloumn in screen.
After Bios prints that character, It passes the control of execution to the code we created. So practically, In our code, It will execute code which moves the character E
to the al register and continue printing the character. With this method, we have printed the string HELLO WORLD
fully to the screen.
The last command("jmp $"
) is and endless jump which jumps to that line itself.
This is so that the computer(As explained earlier), Will not try to execute data which is not intended to be executed.
This section is just for FUN!!!, Lets fill the screen with some data.
I implemented it with the following code:
1
mov ah, 0x0e
2
3
Noend:
4
5
mov al , 'H'
6
int 0x10
7
jmp Noend
8
9
times 510-($-$$) db 0
10
dw 0xaa55
Output
Bios also allows us to fill the screen with colours, Take a look at the following code:
1
mov ah , 0x0b
2
mov bh , 0x0
3
mov bl , 0xff
4
int 0x10
5
6
mov ah, 0x0e
7
8
mov al , 'H'
9
int 0x10
10
11
mov al , 'E'
12
int 0x10
13
14
mov al , 'L'
15
int 0x10
16
17
mov al , 'L'
18
int 0x10
19
20
mov al , 'O'
21
int 0x10
22
23
mov al , ' '
24
int 0x10
25
26
mov al , 'W'
27
int 0x10
28
29
mov al , 'O'
30
int 0x10
31
32
mov al , 'R'
33
int 0x10
34
35
mov al , 'L'
36
int 0x10
37
38
mov al , 'D'
39
int 0x10
40
41
jmp $
42
43
times 510-($-$$) db 0
44
dw 0xaa55
Output
Look at the first few lines of the above code:
mov ah , 0x0b
mov bh , 0x0
mov bl , 0xff
int 0x10
This few lines is the code which changed the color of screen. Try changing the value copied to bl
register and you could see different colours.
You can goto the following link to get values for different colours.
https://en.wikipedia.org/wiki/BIOS_color_attributes
This is an optional study. If you are interested, Please follow the following link to get information about other display related bios routines.
https://en.wikipedia.org/wiki/INT_10H
Till here, We wrote programs in assembly language to interact with computer. Writing programs in assembly is both risky and time consuming. Writing the os in c is a good option(But some part of it always need assembly).
Before writing and running programs in C, We will switch our processor to 32 bit mode
(Or 32 bit protected mode
).
Switching to 32 bit mode allows us protect some memory locations from other user mode programs trying to access those location. If a user mode program could access these special locations where the Kernel of the os resides, Those programs could take control of the cpu so our OS would not have any control.
We can define which part of the memory area to be protected by defining it in a table named GDT
or Global Descriptor Table
. It is necessary to Set The GDT Before switching to 32 bit protected mode. After defining and loading the GDT, We could switch the cpu to 32 bit mode.
Before beginning this, Please know that we won’t be able to use bios to do anything such as printing to screen after we define the GDT, Load it and finally Make the switch to Protected mode. We will make our own display related functions Later in this book.
Bios also gives other necessary functions such as for accessing i/o devices , Hard disks etc…. These too also will be unavailable after the switch.
Lets see where our code and other resources will be present in memory:
From this image, You can see that the location where our Boot sector is present, is at 0x7c00
.
This means that Bios will load our Boot sector to 0x7c00
in memory.
Before switching the processor to 32 bit mode, We need to define our Global Descriptor Table(GDT).
GDT is a special data Structure which the processor directly validates.
It is little complex to understand for the first time. But this this can’t be avoided when trying to switch to protected mode. Let’s First define GDT and switch to 32 bit mode:
The Following code will define the gdt and Make the switch to 32 bit mode:
(Alternatively, You can go to https://github.com/TINU-2000/OS-DEV/blob/main/Beginning-OS-DEV/GDT.asm
to get a fine copy of this code.)
1
[org 0x7c00]
2
3
[bits 16]
4
5
;Switch To Protected Mode
6
cli ; Turns Interrupts off
7
lgdt [GDT_DESC] ; Loads Our GDT
8
9
mov eax , cr0
10
or eax , 0x1
11
mov cr0 , eax ; Switch To Protected Mode
12
13
jmp CODE_SEG:INIT_PM ; Jumps To Our 32 bit Code
14
;Forces the cpu to flush out contents in cache memory
15
16
[bits 32]
17
18
INIT_PM:
19
mov ax , DATA_SEG
20
mov ds , ax
21
mov ss , ax
22
mov es , ax
23
mov fs , ax
24
mov gs , ax
25
26
mov ebp , 0x90000
27
mov esp , ebp ; Updates Stack Segment
28
29
30
31
32
33
jmp $ ;Hang , Remove This To Start Doing Things In 32 bit Mode
34
35
36
37
38
39
GDT_BEGIN:
40
41
GDT_NULL_DESC: ;The Mandatory Null Descriptor
42
dd 0x0
43
dd 0x0
44
45
GDT_CODE_SEG:
46
dw 0xffff ;Limit
47
dw 0x0 ;Base
48
db 0x0 ;Base
49
db 10011010b ;Flags
50
db 11001111b ;Flags
51
db 0x0 ;Base
52
53
GDT_DATA_SEG:
54
dw 0xffff ;Limit
55
dw 0x0 ;Base
56
db 0x0 ;Base
57
db 10010010b ;Flags
58
db 11001111b ;Flags
59
db 0x0 ;Base
60
61
GDT_END:
62
63
GDT_DESC:
64
dw GDT_END - GDT_BEGIN - 1
65
dd GDT_BEGIN
66
67
CODE_SEG equ GDT_CODE_SEG - GDT_BEGIN
68
DATA_SEG equ GDT_DATA_SEG - GDT_BEGIN
69
70
times 510-($-$$) db 0
71
dw 0xaa55
You can try assembling and running this, But it won’t show any special output as what we have done is only a switch to 32 bit mode.
Let’s now understand what this code does:
The very first line [org 0x7c00]
Says to the assembler to make adjustments to the program so that it could calculate 0x7c00
as the address which it will be loaded to.
The second line [bits 16]
says to produce 16 bit opcodes for instructions given below it.
The line cli
turns the interrupt services off. We previously made some interrupts to bios with the int
command. from now onwards, we will not be able to use that feature. Please note, not to avoid this instruction when switching to 32 bit mode.
Next, We loaded the GDT we defined at the bottom of the code with: lgdt [GDT_DESC]
Now, focus on the following line in the code we made:
GDT_BEGIN:
GDT_NULL_DESC: ;The Mandatory Null Descriptor
dd 0x0
dd 0x0
Both GDT_BEGIN
and GDT_NULL_DESC
gives us the address to that location which we could later refer.
The next two lines starting with dd
defines two null pointer(0x0). This is a mandatory null descriptor when defining the gdt. These locations will be used by the processor when working on things with the gdt.
THE GDT IS A COMPLEX CONCEPT TO UNDERSTAND FOR THE FIRST TIME. AS THE MOTIVE OF THIS BOOK IS ONLY TO INTRODUCE OS DEVELOPMENT, WE WONT DELVE MORE INTO THIS TOPIC. THIS TOPIC MAY POSSIBLY DEMOTIVATE YOU
.
THIS TOPIC IS INCLUDED ONLY BECAUSE IT CAN'T BE AVOIDED WHEN SWITCHING TO PROTECTED MODE.
YOU DON'T NEED TO WORRY ABOUT THE GDT WHEN YOU ARE STARTING THE JOURNEY. YOU CAN JUST COPY PASTE THIS CODE, WHAT WE MAINLY FOCUS IS AT THE PRACTICAL SIDE OF OS DEVELOPMENT SUCH AS ACCESSING HARD DISK, KEYBOARD, MOUSE, CREATING GRAPHCAL USER INTERFACE ETC AND MORE. WE WILL GO INTO THAT IN LATER CHAPTERS
.
WHEN YOU THINK YOU WANT TO MASTER GDT, THE FOLLOWING ARTICLES WILL HELP:
https://wiki.osdev.org/Global_Descriptor_Table
https://wiki.osdev.org/GDT_Tutorial
Look at the following lines:
GDT_CODE_SEG:
dw 0xffff ;Limit
dw 0x0 ;Base
db 0x0 ;Base
db 10011010b ;Flags
db 11001111b ;Flags
db 0x0 ;Base
These lines tells to the cpu about the code segment. We defined the base
, limit
and some flags. This values could be referenced using the name GDT_CODE_SEG
.
Now lets see how we defined the data segment:
GDT_DATA_SEG:
dw 0xffff ;Limit
dw 0x0 ;Base
db 0x0 ;Base
db 10010010b ;Flags
db 11001111b ;Flags
db 0x0 ;Base
We could refer to this content in memory with the name GDT_DATA_SEG
.
The line GDT_END:
helps us refer to that name later to get the ending location of this GDT Entry.
The next line is the "Informer"
of the GDT Entry we defined:
GDT_DESC:
dw GDT_END - GDT_BEGIN - 1
dd GDT_BEGIN
The line dw GDT_END - GDT_BEGIN - 1
stores the size of our GDT Entry. We have reduced 1
from GDT_END - GDT_BEGIN
, This is because the size of GDT(According to the x86 cpu), is always one less than the real size.
The next line dd GDT_BEGIN
Stores the beginning address of our gdt entry.
We passed the address of GDT_DESC
to the lgdt
command at top so that it could access the size
and beginning location
of our GDT.
The next two lines:
CODE_SEG equ GDT_CODE_SEG - GDT_BEGIN
DATA_SEG equ GDT_DATA_SEG - GDT_BEGIN
defines the CODE SEGMENT
and DATA SEGMENT
locations so that we could later refer to them. In the first line, the equ
command assigns the Reslut of GDT_CODE_SEG - GDT_BEGIN
(Substraction operation) to CODE_SEG
. The same applies to the second line also.
Now, go to the top of the code at:
mov eax , cr0
or eax , 0x1
mov cr0 , eax
This code sets the cr0
bit. This will switch the processor to 32 bit protected mode
.
We finally need to implement a far jump
to fully utilize 32 bit mode. We implemented it with jmp CODE_SEG:INIT_PM
. This will clear the contents in cache memory and jump to The section named INIT_PM
. Here is the INIT_PM
section:
[bits 32]
INIT_PM:
mov ax , DATA_SEG
mov ds , ax
mov ss , ax
mov es , ax
mov fs , ax
mov gs , ax
mov ebp , 0x90000
mov esp , ebp
We started it with the line [bits 32]
. This tells to the assembler to generate 32 bit opcodes
for the instructions following it. This is necessary as we previously switched to 32 bit mode by setting the cr0
bit.
later we used the mov
command to move the address of data segment to all of the Segment Registers
.
Finally we set the esp
and ebp
registers. This also is very necessary as the C Programs we later make will surely push values to the stack.
The final line jmp $
is used to stop processor from executing further code. This is technically a jump to that jmp
instruction itself.
IT'S FINE IF YOU ARE NOT VERY CLEAR ABOUT THIS TOPIC. YOU WILL BE ABLE TO UNDERSTAND THESE LATER.
Now, as we switched to 32 bit mode, We could start writing our code in C, But first we need a way to call it from assembly. initially, Only 512 bytes are loaded to memory by bios. When we make a C program and compile it with the boot sector, It’s size will always be greater that 512 bytes.
So we need to add code to the boot sector program which loads the remaining part of the code to memory. A BOOT LOADER
is a piece of code in Boot Sector which loads the remaining part of the os to memory.
Let’s make a Boot loader:
You could download the full code for this section and the next section(Calling Our C Kernel) from here :
https://github.com/TINU-2000/OS-DEV/tree/main/Beginning-OS-DEV/Entering%20c
ALSO NOTE TO INCLUDE ALL OF THE SOURCE FILES IN SAME FOLDER
Boot Loader is only a small program. So, we will be able to include the boot loader program with the GDT defining program we just made before.
Lets add some code to it:
Alternatively , you could go to the following link to get the specific code we are going to discuss in this section :
https://github.com/TINU-2000/OS-DEV/blob/main/Beginning-OS-DEV/Entering%20c/Boot.asm
1
[org 0x7c00]
2
[bits 16]
3
4
;Boot Loader
5
mov bx , 0x1000 ; Memory offset to which kernel will be loaded
6
mov ah , 0x02 ; Bios Read Sector Function
7
mov al , 30 ; No. of sectors to read(If your kernel won't fit into 30 sectors , \
8
you may need to provide the correct no. of sectors to read)
9
mov ch , 0x00 ; Select Cylinder 0 from harddisk
10
mov dh , 0x00 ; Select head 0 from hard disk
11
mov cl , 0x02 ; Start Reading from Second sector(Sector just after boot sector)
12
13
int 0x13 ; Bios Interrupt Relating to Disk functions
14
15
16
;Switch To Protected Mode
17
cli ; Turns Interrupts off
18
lgdt [GDT_DESC] ; Loads Our GDT
19
20
mov eax , cr0
21
or eax , 0x1
22
mov cr0 , eax ; Switch To Protected Mode
23
24
jmp CODE_SEG:INIT_PM ; Jumps To Our 32 bit Code
25
;Forces the cpu to flush out contents in cache memory
26
27
[bits 32]
28
29
INIT_PM:
30
mov ax , DATA_SEG
31
mov ds , ax
32
mov ss , ax
33
mov es , ax
34
mov fs , ax
35
mov gs , ax
36
37
mov ebp , 0x90000
38
mov esp , ebp ; Updates Stack Segment
39
40
41
call 0x1000
42
jmp $
43
44
45
46
47
48
GDT_BEGIN:
49
50
GDT_NULL_DESC: ;The Mandatory Null Descriptor
51
dd 0x0
52
dd 0x0
53
54
GDT_CODE_SEG:
55
dw 0xffff ;Limit
56
dw 0x0 ;Base
57
db 0x0 ;Base
58
db 10011010b ;Flags
59
db 11001111b ;Flags
60
db 0x0 ;Base
61
62
GDT_DATA_SEG:
63
dw 0xffff ;Limit
64
dw 0x0 ;Base
65
db 0x0 ;Base
66
db 10010010b ;Flags
67
db 11001111b ;Flags
68
db 0x0 ;Base
69
70
GDT_END:
71
72
GDT_DESC:
73
dw GDT_END - GDT_BEGIN - 1
74
dd GDT_BEGIN
75
76
CODE_SEG equ GDT_CODE_SEG - GDT_BEGIN
77
DATA_SEG equ GDT_DATA_SEG - GDT_BEGIN
78
79
80
times 510-($-$$) db 0
81
dw 0xaa55
Here, We added the bootloader at the very begining of the code:
mov bx , 0x1000
; Memory offset to which kernel will be loadedmov ah , 0x02
; Bios Read Sector Functionmov al , 30
; No. of sectors to read(If your kernel won’t fit into 30 sectors , you may need to provide the correct no. of sectors to read)mov ch , 0x00
; Select Cylinder 0 from harddiskmov dh , 0x00
; Select head 0 from hard diskmov cl , 0x02
; Start Reading from Second sector(Sector just after boot sector)
int 0x13
; Bios Interrupt Relating to Disk functions
This code will load 30 sectors(512 * 30 bytes) to the memory location 0x1000
which we copied to the bx
register. This 30 sectors are excluding the boot sector(which bios loaded).
We requested for this service to the bios with the int 0x13
interrupt.
Then, after the code where we switched to 32 bit mode, We called the code at location 0x1000
with the command call 0x1000
.
This will result in the execution of code we just loaded to memory using our bootloader. We will see how to compile a C program and place it into the location 0x1000
in order to execute it.
YOU SHOULD SAVE THIS CODE TO A FILE, LETS NAME IT: Boot.asm
In order to execute the c program we make, We must know where our entry c function is in memory. This may be hard to find as in the future, if me add more code to our c program, the offset where the entry function will be present might change.
We could solve this problem by using the linker
to calculate automatically where the entry function will be.
NOTE: WE WON'T DISCUSS ANYTHING ABOUT LINKER SCRIPTS, WE WILL WORK ONLY ON THE COMMAND LINE
Open a new file and add the following code:
Alternate link :
https://github.com/TINU-2000/OS-DEV/blob/main/Beginning-OS-DEV/Entering%20c/Kernel_Entry.asm
1
START
:
2
[
bits
32
]
3
[
extern
_start
]
4
call
_start
5
jmp
$
Save this file and name it Kernel_Entry.asm
Here, we declared an extern function named _start
with the command [extern _start]
When assembling this code, the assembler will leave a hint to the linker that there will be a function named start
, in an outside source file which we will call with the code call _start
.
There is an interesting matter here. We declared the function as _start
and expect the linker to link with start
.
This is very true in this case. for eg: If you want to call a function named enter
, you should declare it as _enter
.
As the name of our entry c function is start
, we should call it as _start
.
Now, we will make our c program, Open a new file and add the following code:
Alternate link :
https://github.com/TINU-2000/OS-DEV/blob/main/Beginning-OS-DEV/Entering%20c/main.c
1
int start(){
2
char* video_memory = (char*) 0xb8000;
3
*video_memory = 'K';
4
}
Name this file as main.c
Now, We have a lot of job to assemble , compile and link all of these files.
Typing lot of commands for every build is boring and time consuming. So we will include all of the commands to assemble , compile and link together in a .bat
file.
After including all of the commands and saving it, We could double click on it and the build process will be done automatically.
For this, please create a folder named bin
in the directory where our codes exist.
Now, include the following code in a new file named compile.bat
(Don’t save it in the bin
folder, save it where our other programs are present):
Alternate link :
https://github.com/TINU-2000/OS-DEV/blob/main/Beginning-OS-DEV/Entering%20c/compile.bat
1
nasm Boot.asm -f bin -o bin\bootsect.bin
2
nasm Kernel_Entry.asm -f elf -o bin\Entry.bin
3
4
gcc -m32 -ffreestanding -c main.c -o bin\kernel.o
5
6
ld -T NUL -o bin\Kernel.img -Ttext 0x1000 bin\Entry.bin bin\kernel.o
7
8
objcopy -O binary -j .text bin\kernel.img bin\kernel.bin
9
copy /b /Y bin\bootsect.bin+bin\kernel.bin bin\os-image
10
11
qemu-system-i386 -drive format=raw,file=bin\os-image
Executing this file will Do the Build process and launch the final executable.
You could reuse this program every time. we will now be able to include more code to the c file, save it and finally click on the .bat
file to automatically build and execute it.
All of the binary files will be generated to the bin
folder
At the very last of our .bat
file, we called qemu with: os-image
as file to be booted.
Here, os-image
is the final executable that we could distibute to others for use.
Run the .bat
file and we will get the following output in qemu
The K
at the top left of the screen ensures us that, the c program we made is working.
You can try changing the c program to print other characters.
The Fun Begins
from the next chapter!!!!
NOTE: IT'S A GOOD IDEA TO CHECK IF THE .bat FILE WHEN LAUNCHED PRODUCES ANY ERROR MESSAGES TO THE TERMINAL. NO ERROR WILL BE PRESENT WHEN COMPILING THE ABOVE PROGRAM. BUT MAKE SURE TO CHECK WHEN SOME MODIFICATIONS ARE MADE.
Displaying something to screen is the crucial part when developing an operating system. As 90% of a computers output uses the Display system. Even during development, We may need to print the values in variables to debug any issue arising.
In this section, We are going to discuss about Text Mode User Interface
. But this mode also allows printing colours to the screen.
Normally after the boot up, The computers video card will be in Text Mode
. We could Print something to the screen by Poking memory location starting from 0xb8000
.
We could write directly to this address in memory to print characters and colours.
The Text Mode interface allows 80
characters wide and 25
characters lines per screen.
This means that the computer will be in 80 x 25
mode.
Let’s learn how this works!!
NOTE: IT'S A GOOD IDEA TO USE A CROSS COMPILER WHEN DEVELOPING YOUR OS, AS OUR C COMPILER TOOLSET MINGW LACKS SUPPORT FOR RAW BINARY GENERATION HELPFUL IN WRITING OS. WINDOWS LACKS A GOOD NATIVE CROSS COMPILER. SO IT WILL BE PRACTICAL TO USE LINUX WHEN DEVELOPING THIS. LINUX GIVES ACCESS TO GOOD CROSS COMPILERS. YOU COULD INSTALL ALL OF THE MAIN TOOLS DESCRIBED IN THIS BOOK ON LINUX AND CONTINUE DEVELOPMENT.
SOME PROBLEMS WILL OCCUR WHEN USING GOLOBAL VARIBLES ETC... BUT IN THIS BOOK, WE WILL ONLY USE METHODS THAT YOU COULD FOLLOW IN WINDOWS. BUT IN FUTURE, WHEN TRYING TO DO SERIOUS PROJECTS, CONSIDER USING CROSS COMPILERS ON LINUX SPECIFICALLY i386 cross compilers OR YOU COULD EVEN TRY "WINDOWS SUBSYSTEM FOR LINUX"
We saw that the computer will be initially in 80 x 25
mode. The video memory in text mode starts from 0xb8000
. Consider a situation where you want to print the character J to the Top Left of the screen, What you should do to implement this is to write the Ascii value of J
To the location 0xb8000
.
In c, It will look like this:
char* aa = (char*) 0xb8000;
*aa = 'J';
Here we declared a character pointer named aa
and assigned it with the address 0xb8000
. Then in the next line, We took the value in aa
as an address and assigned it with the Ascii value of J
. This will print the character J to the Top Left of screen. Now, if you want to add background and foreground colour to that character, You could poke the next memory location(0xb8001
) to give that character a colour.
For representing the colour, The first four bits in the byte represents background colour and next four bits represent foreground colour.
To give the first character a colour, You can use the following c program:
char* aa = (char*) 0xb8001;
*aa = 0xb5;
Here, we copied the hex value b5
. b
will be the background colour and 5
will be the foreground colour.
This means that the video card uses two bytes for representing a single character.
Suppose If you only want to Print the string AS
to the screen with no colour. You could copy the character A
to 0xb8000
and S
to 0xb8002
.
So the total size of Text Mode Video Memory is 2 * 80 * 25 bytes
.
Lets see how to fill the screen with a special colour:
1
int start(){
2
char* cell = (char*) 0xb8000;
3
int i = 1;
4
while(i < (2 * 80 * 25)){
5
*(cell + i) = 0xd5;
6
i += 2;
7
}
8
}
Save this program in the C Source file we created in the previous chapter. It will give the following output when executed:
Let’s now see an example where the first row of the screen fills with a character:
1
int start(){
2
char* cell = (char*) 0xb8000;
3
int i = 0;
4
while(i < (2 * 80)){
5
*(cell + i) = 'J';
6
*(cell + i + 1) = 0x15;
7
i += 2;
8
}
9
}
Output
You could even make a GAME
or SMALL SCALE GUI
with this mode. In later chapters, we will learns how to switch to Graphics mode to display content to screen pixel by pixel.
You could skip this section if you want, This is just for fun!!!!
Program:
1
int start(){
2
char* TM_START = (char*) 0xb8000;
3
int i = 1;
4
char co = 0;
5
while(i < 2 * 80 * 25){
6
*(TM_START + i) = co;
7
i += 2;
8
co++;
9
}
10
}
Output:
Program:
1
int start(){
2
char* TM_START = (char*) 0xb8000;
3
int i = 1;
4
char obj = 0;
5
while(i < 2 * 80 * 25){
6
*(TM_START + i) = obj;
7
i += 2;
8
obj += 15;
9
}
10
}
Output:
Program:
1
int start(){
2
char* TM_START = (char*) 0xb8000;
3
int i = 1;
4
char obj = 0;
5
while(i < 2 * 80 * 25){
6
*(TM_START + i) = obj;
7
i += 2;
8
obj += 100;
9
}
10
}
Output:
Program:
1
int start(){
2
char* TM_START = (char*) 0xb8000;
3
int i = 0;
4
char obj = 0;
5
while(i < 2 * 80 * 25){
6
*(TM_START + i) = obj;
7
i++;
8
obj++;
9
}
10
}
Output:
In future, it’s hard for us to print strings to the screen by looking at every details of Text Mode Video Graphics. So, we are going to implement our own "printf"
function that we could later call to do the print operation.
Link to project files for this section :
https://github.com/TINU-2000/OS-DEV/tree/main/Video%20Graphics0
1
void cls();
2
void setMonitorColor(char);
3
4
void printString(char*);
5
void printChar(char);
6
7
void scroll();
8
9
void printColorString(char* , char);
10
void printColorChar(char , char);
11
12
void getDecAscii(int);
13
14
char* TM_START;
15
char NumberAscii[10];
16
int CELL;
17
18
int start(){
19
TM_START = (char*) 0xb8000;
20
CELL = 0;
21
22
cls();
23
setMonitorColor(0xa5);
24
25
char Welcome[] = "Welcome To OS0 : Copyright 2021\n";
26
char Welcome2[] = "Command Line Version 1.0.0.0\n\n";
27
char OSM[] = "OS0 > ";
28
29
printString(Welcome);
30
printString(Welcome2);
31
printColorString(OSM , 0xa8);
32
}
33
34
35
void cls(){
36
int i = 0;
37
CELL = 0;
38
while(i < (2 * 80 * 25)){
39
*(TM_START + i) = ' '; // Clear screen
40
i += 2;
41
}
42
}
43
44
void setMonitorColor(char Color){
45
int i = 1;
46
while(i < (2 * 80 * 25)){
47
*(TM_START + i) = Color;
48
i += 2;
49
}
50
}
51
52
void printString(char* cA){
53
int i = 0;
54
while(*(cA + i) != '\0'){
55
printChar(*(cA + i));
56
i++;
57
}
58
}
59
60
void printChar(char c){
61
if(CELL == 2 * 80 * 25)
62
scroll();
63
if(c == '\n'){
64
CELL = ((CELL + 160) - (CELL % 160));
65
return;
66
}
67
*(TM_START + CELL) = c;
68
CELL += 2;
69
}
70
71
void scroll(){
72
int i = 160 , y = 0;
73
while(i < 2 * 80 * 25){
74
*(TM_START + y) = *(TM_START + i);
75
i += 2;
76
y += 2;
77
}
78
CELL = 2 * 80 * 24;
79
i = 0;
80
while(i < 160){
81
*(TM_START + CELL + i) = ' ';
82
i += 2;
83
}
84
}
85
86
void printColorString(char* c , char co){
87
int i = 0;
88
while(*(c + i) != '\0'){
89
printColorChar(*(c + i) , co);
90
i++;
91
}
92
}
93
94
void printColorChar(char c , char co){
95
if(CELL == 2 * 80 * 25)
96
scroll();
97
if(c == '\n'){
98
CELL = ((CELL + 160) - (CELL % 160));
99
return;
100
}
101
*(TM_START + CELL) = c;
102
*(TM_START + CELL + 1) = co;
103
CELL += 2;
104
}
105
106
void getDecAscii(int num){
107
if(num == 0){
108
NumberAscii[0] = '0';
109
return;
110
}
111
char NUM[10];
112
int i = 0 , j = 0;
113
while(num > 0){
114
NUM[i] = num % 10;
115
num /= 10;
116
i++;
117
}
118
i--;
119
while(i >= 0){
120
NumberAscii[j] = NUM[i];
121
i--;
122
j++;
123
}
124
NumberAscii[j] = 'J';
125
j = 0;
126
while(NumberAscii[j] != 'J'){
127
NumberAscii[j] = '0' + NumberAscii[j];
128
j++;
129
}
130
NumberAscii[j] = 0;
131
}
Here, you could call the printString
function by passing it with an address of null terminated character array. This will print the string to the screen.
printColorString
can be called with the address of null terminated character array as first argument and a byte representing background and foreground colour as second argument.
printChar
can be called with the character to be printed as the first argument.
printColorChar
can be called with the first argument as the character to be printed and a byte representing background and foreground color as second argument.
These functions will also take care of the scrolling activity by calling the scroll()
function.
We do not need to call the scroll()
function from our side. These things will be handled by the printing functions.
if you need to turn an integer variable to a string which could later be used to print using the printString
function, You could use the getDecAscii
function. Pass it with the integer to be turned to string and later refer to the array NumberAscii
to print it. Call the printString
function with NumberAscii
as argument to print the number generated by getDecAscii
function.
Try printing some coloured and non coloured string!!
Output of the Program we developed for this section:
In this section, We are not going to discuss or implement modern type video players, But we will cover the basic concept of video playing.
Videos
are actually moving pictures. When we see multiple pictures during a small amount of time, Our brain takes it as moving or gives us a sense of The so called "Video"
.
What the computer do to present us with a video is by poking the video memory.
Changing the data contained in the video memory continuously gives us this feel.
But the processor is very faster than our video card and the monitor. So it won’t be able to display every frame the processor generate very fastly. So the solution is to minimize the frame updating procedure depending on the refresh rate of the monitor.
But the computer must also be fast enough to play about 60
frames during a second.
Human eyes see upto 60
frames per second. So theres no point in generating video with frames more that 60
per second.
Let’s make a program which gives us with a VIDEO
feel.
1
int start(){
2
char* TM_START = (char*) 0xb8000;
3
int i;
4
char obj = 0;
5
while(1){
6
i = 0;
7
while(i < (2 * 80 * 25)){
8
*(TM_START + i) = obj;
9
i++;
10
obj++;
11
}
12
}
13
}
One Frame Of Output During Video Playing:
The program is in an infinite loop with while(1)
.
Keyboard is the primary input device for a computer. Working with keyboard is a non avoidable factor in os development.
How does the computer know when a key is being pressed?
This is an interesting question. We could try to implement it by asking keyboard everytime if there is any input or not. But this is not a good idea.
Keyboard is very slower than the processor. Asking it everytime will affect the processing speed badly. And it also will put lot of traffic in the system bus.
We need to avoid that. So the solution implemented by the x86 chip manufacturer uses interrupts.
When ever a key is being pressed, The cpu will call a fuction that we pre define to it. The processor won’t do the keyboard logic, but it will let us execute some code whenever a key is being pressed.
We need to make some code to handle the keyboard input, and pass its address to the interrupt descriptor table and say to the processor to load it.
After loading all the parameters and the location of our keyboard handling code, It will call that code whenever a key is being pressed.
When we get this interrupt, We could try reading from the keyboard which gives us the key being pressed.
What keyboard gives as the value for pressed key is not ascii. It is named as scan codes
. We need to write our own program to convert it to ascii or other text encoding schemes.
As said earlier, we need to know the values given by keyboard(scan codes) in order to convert it to ascii.
The following link lists the scan codes:
https://wiki.osdev.org/PS/2_Keyboard
You can also take this link as an alternate way to know more about keyboard working
Modern keyboards use usb
interface to communicate to the computer. This section covers working of PS/2
keyboards. But usb keyboards will also work here as usb keyboard emulates the older PS/2
keyboards.
PIC
or Programmable Interrupt Controller
is a chip in the computer whose main job is to generate interrupts. When a key in keyboard is pressed, The Chip inside keyboard tells to the pic chip inside our computer to generate a #1 Interrupt. The pic chip will then decide the time to notify the cpu about the interrupt. When the cpu gets the message which says a key is being pressed, It executes a set of code which we told earlier to the cpu to execute when a key is pressed.
We can make use of the in
assembly instruction to read the key being pressed.
It is very necessary to do this read as a failure in reading from keyboard will prevent the keyboard from giving further keyboard press events.
The full source code for this section could be downloaded from:
https://github.com/TINU-2000/OS-DEV/tree/main/Keyboard
Have a quick look at the code here:
main.c
1
#define PIC1_C 0x20
2
#define PIC1_D 0x21
3
#define PIC2_C 0xa0
4
#define PIC2_D 0xa1
5
6
#define ICW1_DEF 0x10
7
#define ICW1_ICW4 0x01
8
#define ICW4_x86 0x01
9
10
void cls();
11
void setMonitorColor(char);
12
13
void printString(char*);
14
void printChar(char);
15
16
void scroll();
17
18
void printColorString(char* , char);
19
void printColorChar(char , char);
20
21
void getDecAscii(int);
22
23
void initIDT();
24
extern void loadIdt();
25
extern void isr1_Handler();
26
void handleKeypress(int);
27
void pressed(char);
28
void picRemap();
29
30
unsigned char inportb(unsigned short);
31
void outportb(unsigned short , unsigned char);
32
33
char* TM_START;
34
char NumberAscii[10];
35
int CELL;
36
37
struct IDT_ENTRY{
38
unsigned short base_Lower;
39
unsigned short selector;
40
unsigned char zero;
41
unsigned char flags;
42
unsigned short base_Higher;
43
};
44
45
struct IDT_ENTRY idt[256];
46
extern unsigned int isr1;
47
unsigned int base;
48
49
int start(){
50
TM_START = (char*) 0xb8000;
51
CELL = 0;
52
base = (unsigned int)&isr1;
53
54
cls();
55
setMonitorColor(0xa5);
56
57
char Welcome[] = "Welcome To OS0 : Copyright 2021\n";
58
char Welcome2[] = "Command Line Version 1.0.0.0\n\n";
59
char OSM[] = "OS0 > ";
60
61
printString(Welcome);
62
printString(Welcome2);
63
printColorString(OSM , 0xa8);
64
65
initIDT();
66
}
67
68
69
void cls(){
70
int i = 0;
71
CELL = 0;
72
while(i < (2 * 80 * 25)){
73
*(TM_START + i) = ' '; // Clear screen
74
i += 2;
75
}
76
}
77
78
void setMonitorColor(char Color){
79
int i = 1;
80
while(i < (2 * 80 * 25)){
81
*(TM_START + i) = Color;
82
i += 2;
83
}
84
}
85
86
void printString(char* cA){
87
int i = 0;
88
while(*(cA + i) != '\0'){
89
printChar(*(cA + i));
90
i++;
91
}
92
}
93
94
void printChar(char c){
95
if(CELL == 2 * 80 * 25)
96
scroll();
97
if(c == '\n'){
98
CELL = ((CELL + 160) - (CELL % 160));
99
return;
100
}
101
*(TM_START + CELL) = c;
102
CELL += 2;
103
}
104
105
void scroll(){
106
int i = 160 , y = 0;
107
while(i < 2 * 80 * 25){
108
*(TM_START + y) = *(TM_START + i);
109
i += 2;
110
y += 2;
111
}
112
CELL = 2 * 80 * 24;
113
i = 0;
114
while(i < 160){
115
*(TM_START + CELL + i) = ' ';
116
i += 2;
117
}
118
}
119
120
void printColorString(char* c , char co){
121
int i = 0;
122
while(*(c + i) != '\0'){
123
printColorChar(*(c + i) , co);
124
i++;
125
}
126
}
127
128
void printColorChar(char c , char co){
129
if(CELL == 2 * 80 * 25)
130
scroll();
131
if(c == '\n'){
132
CELL = ((CELL + 160) - (CELL % 160));
133
return;
134
}
135
*(TM_START + CELL) = c;
136
*(TM_START + CELL + 1) = co;
137
CELL += 2;
138
}
139
140
void getDecAscii(int num){
141
if(num == 0){
142
NumberAscii[0] = '0';
143
return;
144
}
145
char NUM[10];
146
int i = 0 , j = 0;
147
while(num > 0){
148
NUM[i] = num % 10;
149
num /= 10;
150
i++;
151
}
152
i--;
153
while(i >= 0){
154
NumberAscii[j] = NUM[i];
155
i--;
156
j++;
157
}
158
NumberAscii[j] = 'J';
159
j = 0;
160
while(NumberAscii[j] != 'J'){
161
NumberAscii[j] = '0' + NumberAscii[j];
162
j++;
163
}
164
NumberAscii[j] = 0;
165
}
166
167
void initIDT(){
168
idt[1].base_Lower = (base & 0xFFFF);
169
idt[1].base_Higher = (base >> 16) & 0xFFFF;
170
idt[1].selector = 0x08;
171
idt[1].zero = 0;
172
idt[1].flags = 0x8e;
173
174
picRemap();
175
176
outportb(0x21 , 0xfd);
177
outportb(0xa1 , 0xff);
178
179
loadIdt();
180
}
181
182
unsigned char inportb(unsigned short _port){
183
unsigned char rv;
184
__asm__ __volatile__ ("inb %1, %0" : "=a" (rv) : "dN" (_port));
185
return rv;
186
}
187
188
void outportb(unsigned short _port, unsigned char _data){
189
__asm__ __volatile__ ("outb %1, %0" : : "dN" (_port), "a" (_data));
190
}
191
192
extern void isr1_Handler(){
193
handleKeypress(inportb(0x60));
194
outportb(0x20 , 0x20);
195
outportb(0xa0 , 0x20);
196
}
197
198
void handleKeypress(int code){
199
char Scancode[] = {
200
0 , 0 , '1' , '2' ,
201
'3' , '4' , '5' , '6' ,
202
'7' , '8' , '9' , '0' ,
203
'-' , '=' , 0 , 0 , 'Q' ,
204
'W' , 'E' , 'R' , 'T' , 'Y' ,
205
'U' , 'I' , 'O' , 'P' , '[' , ']' ,
206
0 , 0 , 'A' , 'S' , 'D' , 'F' , 'G' ,
207
'H' , 'J' , 'K' , 'L' , ';' , '\'' , '`' ,
208
0 , '\\' , 'Z' , 'X' , 'C' , 'V' , 'B' , 'N' , 'M' ,
209
',' , '.' , '/' , 0 , '*' , 0 , ' '
210
};
211
212
if(code == 0x1c)
213
printChar('\n');
214
else if(code < 0x3a)
215
pressed(Scancode[code]);
216
}
217
218
void pressed(char key){
219
printChar(key);
220
}
221
222
void picRemap(){
223
unsigned char a , b;
224
a = inportb(PIC1_D);
225
b = inportb(PIC2_D);
226
227
outportb(PIC1_C , ICW1_DEF | ICW1_ICW4);
228
outportb(PIC2_C , ICW1_DEF | ICW1_ICW4);
229
230
outportb(PIC1_D , 0);
231
outportb(PIC2_D , 8);
232
233
outportb(PIC1_D , 4);
234
outportb(PIC2_D , 2);
235
236
outportb(PIC1_D , ICW4_x86);
237
outportb(PIC2_D , ICW4_x86);
238
239
outportb(PIC1_D , a);
240
outportb(PIC2_D , b);
241
}
Kernel_Entry.asm
1
START
:
2
[
bits
32
]
3
[
extern
_start
]
4
call
_start
5
jmp
$
IDT.asm
1
extern _idt
2
extern _isr1_Handler
3
global _isr1
4
global _loadIdt
5
6
idtDesc:
7
dw 2048
8
dd _idt
9
10
_isr1:
11
pushad
12
call _isr1_Handler
13
popad
14
iretd
15
16
_loadIdt:
17
lidt[idtDesc]
18
sti
19
ret
compile.bat
1
nasm Boot.asm -f bin -o bin\bootsect.bin
2
nasm Kernel_Entry.asm -f elf -o bin\Entry.bin
3
nasm IDT.asm -f elf -o bin\IDT.bin
4
5
gcc -m32 -ffreestanding -c main.c -o bin\kernel.o
6
7
ld -T NUL -o bin\Kernel.img -Ttext 0x1000 bin\Entry.bin bin\kernel.o bin\IDT.bin
8
9
objcopy -O binary -j .text bin\kernel.img bin\kernel.bin
10
copy /b /Y bin\bootsect.bin+bin\kernel.bin bin\os-image
11
12
qemu-system-i386 -drive format=raw,file=bin\os-image
Boot.asm
1
[org 0x7c00]
2
[bits 16]
3
4
;Boot Loader
5
mov bx , 0x1000 ; Memory offset to which kernel will be loaded
6
mov ah , 0x02 ; Bios Read Sector Function
7
mov al , 30 ; No. of sectors to read(If your kernel won't fit into 30 sectors , \
8
you may need to provide the correct no. of sectors to read)
9
mov ch , 0x00 ; Select Cylinder 0 from harddisk
10
mov dh , 0x00 ; Select head 0 from hard disk
11
mov cl , 0x02 ; Start Reading from Second sector(Sector just after boot sector)
12
13
int 0x13 ; Bios Interrupt Relating to Disk functions
14
15
16
;Switch To Protected Mode
17
cli ; Turns Interrupts off
18
lgdt [GDT_DESC] ; Loads Our GDT
19
20
mov eax , cr0
21
or eax , 0x1
22
mov cr0 , eax ; Switch To Protected Mode
23
24
jmp CODE_SEG:INIT_PM ; Jumps To Our 32 bit Code
25
;Forces the cpu to flush out contents in cache memory
26
27
[bits 32]
28
29
INIT_PM:
30
mov ax , DATA_SEG
31
mov ds , ax
32
mov ss , ax
33
mov es , ax
34
mov fs , ax
35
mov gs , ax
36
37
mov ebp , 0x90000
38
mov esp , ebp ; Updates Stack Segment
39
40
41
call 0x1000
42
jmp $
43
44
45
46
47
48
GDT_BEGIN:
49
50
GDT_NULL_DESC: ;The Mandatory Null Descriptor
51
dd 0x0
52
dd 0x0
53
54
GDT_CODE_SEG:
55
dw 0xffff ;Limit
56
dw 0x0 ;Base
57
db 0x0 ;Base
58
db 10011010b ;Flags
59
db 11001111b ;Flags
60
db 0x0 ;Base
61
62
GDT_DATA_SEG:
63
dw 0xffff ;Limit
64
dw 0x0 ;Base
65
db 0x0 ;Base
66
db 10010010b ;Flags
67
db 11001111b ;Flags
68
db 0x0 ;Base
69
70
GDT_END:
71
72
GDT_DESC:
73
dw GDT_END - GDT_BEGIN - 1
74
dd GDT_BEGIN
75
76
CODE_SEG equ GDT_CODE_SEG - GDT_BEGIN
77
DATA_SEG equ GDT_DATA_SEG - GDT_BEGIN
78
79
80
times 510-($-$$) db 0
81
dw 0xaa55
Also note to create a folder named bin
in the same directory.
Lets look at how this works:
In the main.c
file, in the start()
function, Look at the last line: We called the initIDT()
function.
The job of this function is to set the Interrupt Descriptor Table so that when a key is pressed, It will call the isr1_Handler()
function.
Look at the very top of this file;
struct IDT_ENTRY{
unsigned short base_Lower;
unsigned short selector;
unsigned char zero;
unsigned char flags;
unsigned short base_Higher;
};
struct IDT_ENTRY idt[256];
extern unsigned int isr1;
unsigned int base;
The struct IDT_ENTRY
could be used to define the idt
. There are a total of 256
possible idt entries. So we created 256 of them with struct IDT_ENTRY idt[256];
The next command extern unsigned int isr1;
says to the compiler to implement compilation so that it knows that there will be a section named isr1
in an outside source file. We could find this section in the IDT.asm
file.
In the start()
function in main.c
, we copied the address of the isr1
section to the variable base
with base = (unsigned int)&isr1;
(We used the & Symbol to get the address).
In the initIDT()
function, we mapped the #1 interrupt(Keyboard interrupt(IRQ 1)) with:
idt[1].base_Lower = (base & 0xFFFF);
idt[1].base_Higher = (base >> 16) & 0xFFFF;
idt[1].selector = 0x08;
idt[1].zero = 0;
idt[1].flags = 0x8e;
Here we set the location of isr1
section in the first two lines using the value in base
variable.
The next three variables sets the arguments necessary for the idt entry.
Then we prepared the pic chip to handle interrupts and finally, we called theloadIdt()
function. At the very top of the code, we said to the compiler that the loadIdt()
function wil be outside the c source file with extern void loadIdt();
When the loadIdt()
code is executed, it jumps to that section we defined in the IDT.asm
file.
_loadIdt:
lidt[idtDesc]
sti
ret
lidt
is a x86 command used to make the processor load the Interrupt Descriptor Table. Here, it loads the idt entry Which the idtDesc
section points to.
The command sti
is used to enable the interrupts. From now onwards , we will get every interrupts.
From now onwards, the _isr1
section in IDT.asm file will be called when ever a keyboard key is being pressed.
The _isr1
section calls the _isr1_Handler
function we defined in the main.c
file.
When ever a key is pressed, the fuction isr1_Handler()
will be called. We could implement what ever to do in that function.
According to our implementation we first obtained the scan code of key being pressed with inportb(0x60)
and passed it to the handleKeypress()
function with handleKeypress(inportb(0x60));
In the handleKeypress()
function, we have an array of values named Scancode[]
.
We could take the scan code as an index to point to the Scancode[]
array.
The arrangement of values in Scancode[]
gives us the ascii representation of the scan code.
Finally we called a function to print the character being pressed.
THE inportb AND outportb FUNCTIONS ARE FUNCTIONS THAT HELP US COMMUNICATE WITH EXTERNAL DEVICES. in COMMAND ACCEPTS INPUT FROM EXTERNAL DEVICES AND out COMMAND OUTPUTS COMMANDS AND DATA TO EXTERNAL DEVICES.
Compile and run the code by executing the compile.bat
file and try pressing any character on keyboard, The program simply prints it to the screen.
https://wiki.osdev.org/8259_PIC
https://wiki.osdev.org/Interrupts
https://www.osdev.org/howtos/2/
https://wiki.osdev.org/Interrupt_Descriptor_Table
In this section, We will try to implement everything we have learned this far.
We will make a program which changes the colour of screen depending on the command it receives , play videos , print string etc…. This chapter just gives you an idea about practical implementation.
Use the following link to get the full source code for this section:
https://github.com/TINU-2000/OS-DEV/tree/main/First%20Prototype
main.c
1
#include
"extra.h"
2
3
#define PIC1_C 0x20
4
#define PIC1_D 0x21
5
#define PIC2_C 0xa0
6
#define PIC2_D 0xa1
7
8
#define ICW1_DEF 0x10
9
#define ICW1_ICW4 0x01
10
#define ICW4_x86 0x01
11
12
void
cls
();
13
void
setMonitorColor
(
char
);
14
15
void
printString
(
char
*
);
16
void
printChar
(
char
);
17
18
void
scroll
();
19
20
void
printColorString
(
char
*
,
char
);
21
void
printColorChar
(
char
,
char
);
22
23
void
getDecAscii
(
int
);
24
25
void
initIDT
();
26
extern
void
loadIdt
();
27
extern
void
isr1_Handler
();
28
void
handleKeypress
(
int
);
29
void
pressed
(
char
);
30
void
picRemap
();
31
32
unsigned
char
inportb
(
unsigned
short
);
33
void
outportb
(
unsigned
short
,
unsigned
char
);
34
35
36
char
NumberAscii
[
10
];
37
int
CELL
;
38
39
40
char
COMMAND
[
21
];
41
int
i
=
0
;
42
43
struct
IDT_ENTRY
{
44
unsigned
short
base_Lower
;
45
unsigned
short
selector
;
46
unsigned
char
zero
;
47
unsigned
char
flags
;
48
unsigned
short
base_Higher
;
49
};
50
51
struct
IDT_ENTRY
idt
[
256
];
52
extern
unsigned
int
isr1
;
53
unsigned
int
base
;
54
55
int
start
(){
56
TM_START
=
(
char
*
)
0xb8000
;
57
CELL
=
0
;
58
base
=
(
unsigned
int
)
&
isr1
;
59
60
cls
();
61
setMonitorColor
(
0xa5
);
62
63
char
Welcome
[]
=
"Welcome To OS0 : Copyright 2021
\n
"
;
64
char
Welcome2
[]
=
"Command Line Version 1.0.0.0
\n\n
"
;
65
char
OSM
[]
=
"OS0 > "
;
66
67
printString
(
Welcome
);
68
printString
(
Welcome2
);
69
printColorString
(
OSM
,
0xa8
);
70
71
initIDT
();
72
}
73
74
75
void
cls
(){
76
int
i
=
0
;
77
CELL
=
0
;
78
while
(
i
<
(
2
*
80
*
25
)){
79
*
(
TM_START
+
i
)
=
' '
;
// Clear screen
80
i
+=
2
;
81
}
82
}
83
84
void
setMonitorColor
(
char
Color
){
85
int
i
=
1
;
86
while
(
i
<
(
2
*
80
*
25
)){
87
*
(
TM_START
+
i
)
=
Color
;
88
i
+=
2
;
89
}
90
}
91
92
void
printString
(
char
*
cA
){
93
int
i
=
0
;
94
while
(
*
(
cA
+
i
)
!=
'\0'
){
95
printChar
(
*
(
cA
+
i
));
96
i
++
;
97
}
98
}
99
100
void
printChar
(
char
c
){
101
if
(
CELL
==
2
*
80
*
25
)
102
scroll
();
103
if
(
c
==
'\n'
){
104
CELL
=
((
CELL
+
160
)
-
(
CELL
%
160
));
105
return
;
106
}
107
*
(
TM_START
+
CELL
)
=
c
;
108
CELL
+=
2
;
109
}
110
111
void
scroll
(){
112
int
i
=
160
,
y
=
0
;
113
while
(
i
<
2
*
80
*
25
){
114
*
(
TM_START
+
y
)
=
*
(
TM_START
+
i
);
115
i
+=
2
;
116
y
+=
2
;
117
}
118
CELL
=
2
*
80
*
24
;
119
i
=
0
;
120
while
(
i
<
160
){
121
*
(
TM_START
+
CELL
+
i
)
=
' '
;
122
i
+=
2
;
123
}
124
}
125
126
void
printColorString
(
char
*
c
,
char
co
){
127
int
i
=
0
;
128
while
(
*
(
c
+
i
)
!=
'\0'
){
129
printColorChar
(
*
(
c
+
i
)
,
co
);
130
i
++
;
131
}
132
}
133
134
void
printColorChar
(
char
c
,
char
co
){
135
if
(
CELL
==
2
*
80
*
25
)
136
scroll
();
137
if
(
c
==
'\n'
){
138
CELL
=
((
CELL
+
160
)
-
(
CELL
%
160
));
139
return
;
140
}
141
*
(
TM_START
+
CELL
)
=
c
;
142
*
(
TM_START
+
CELL
+
1
)
=
co
;
143
CELL
+=
2
;
144
}
145
146
void
getDecAscii
(
int
num
){
147
if
(
num
==
0
){
148
NumberAscii
[
0
]
=
'0'
;
149
return
;
150
}
151
char
NUM
[
10
];
152
int
i
=
0
,
j
=
0
;
153
while
(
num
>
0
){
154
NUM
[
i
]
=
num
%
10
;
155
num
/=
10
;
156
i
++
;
157
}
158
i
--
;
159
while
(
i
>=
0
){
160
NumberAscii
[
j
]
=
NUM
[
i
];
161
i
--
;
162
j
++
;
163
}
164
NumberAscii
[
j
]
=
'J'
;
165
j
=
0
;
166
while
(
NumberAscii
[
j
]
!=
'J'
){
167
NumberAscii
[
j
]
=
'0'
+
NumberAscii
[
j
];
168
j
++
;
169
}
170
NumberAscii
[
j
]
=
0
;
171
}
172
173
void
initIDT
(){
174
idt
[
1
].
base_Lower
=
(
base
&
0xFFFF
);
175
idt
[
1
].
base_Higher
=
(
base
>>
16
)
&
0xFFFF
;
176
idt
[
1
].
selector
=
0x08
;
177
idt
[
1
].
zero
=
0
;
178
idt
[
1
].
flags
=
0x8e
;
179
180
picRemap
();
181
182
outportb
(
0x21
,
0xfd
);
183
outportb
(
0xa1
,
0xff
);
184
185
loadIdt
();
186
}
187
188
unsigned
char
inportb
(
unsigned
short
_port
){
189
unsigned
char
rv
;
190
__asm__
__volatile__
(
"inb %1, %0"
:
"=a"
(
rv
)
:
"dN"
(
_port
));
191
return
rv
;
192
}
193
194
void
outportb
(
unsigned
short
_port
,
unsigned
char
_data
){
195
__asm__
__volatile__
(
"outb %1, %0"
:
:
"dN"
(
_port
),
"a"
(
_data
));
196
}
197
198
extern
void
isr1_Handler
(){
199
handleKeypress
(
inportb
(
0x60
));
200
outportb
(
0x20
,
0x20
);
201
outportb
(
0xa0
,
0x20
);
202
}
203
204
void
handleKeypress
(
int
code
){
205
char
OSM
[]
=
"
\n
OS0 > "
;
206
char
Scancode
[]
=
{
207
0
,
0
,
'1'
,
'2'
,
208
'3'
,
'4'
,
'5'
,
'6'
,
209
'7'
,
'8'
,
'9'
,
'0'
,
210
'-'
,
'='
,
0
,
0
,
'Q'
,
211
'W'
,
'E'
,
'R'
,
'T'
,
'Y'
,
212
'U'
,
'I'
,
'O'
,
'P'
,
'['
,
']'
,
213
0
,
0
,
'A'
,
'S'
,
'D'
,
'F'
,
'G'
,
214
'H'
,
'J'
,
'K'
,
'L'
,
';'
,
'\''
,
'`'
,
215
0
,
'\\'
,
'Z'
,
'X'
,
'C'
,
'V'
,
'B'
,
'N'
,
'M'
,
216
','
,
'.'
,
'/'
,
0
,
'*'
,
0
,
' '
217
};
218
219
if
(
code
==
0x1c
){
220
COMMAND
[
i
]
=
'\0'
;
221
i
=
0
;
222
strEval
(
COMMAND
);
223
printString
(
OSM
);
224
}
225
else
if
(
code
<
0x3a
)
226
pressed
(
Scancode
[
code
]);
227
}
228
229
void
pressed
(
char
key
){
230
if
(
i
!=
20
){
231
COMMAND
[
i
]
=
key
;
232
i
++
;
233
printChar
(
key
);
234
}
235
else
{
236
blink
();
237
}
238
}
239
240
void
picRemap
(){
241
unsigned
char
a
,
b
;
242
a
=
inportb
(
PIC1_D
);
243
b
=
inportb
(
PIC2_D
);
244
245
outportb
(
PIC1_C
,
ICW1_DEF
|
ICW1_ICW4
);
246
outportb
(
PIC2_C
,
ICW1_DEF
|
ICW1_ICW4
);
247
248
outportb
(
PIC1_D
,
0
);
249
outportb
(
PIC2_D
,
8
);
250
251
outportb
(
PIC1_D
,
4
);
252
outportb
(
PIC2_D
,
2
);
253
254
outportb
(
PIC1_D
,
ICW4_x86
);
255
outportb
(
PIC2_D
,
ICW4_x86
);
256
257
outportb
(
PIC1_D
,
a
);
258
outportb
(
PIC2_D
,
b
);
259
}
extra.h
1
void setMonitorColor(char);
2
void cls();
3
void printString(char*);
4
void vid();
5
6
char* TM_START;
7
8
void blink(){
9
setMonitorColor(0x59);
10
int TIME_OUT = 0x10fffff;
11
while(--TIME_OUT);
12
setMonitorColor(0xa5);
13
}
14
15
char strcmp(char* sou , char* dest){
16
int i = 0;
17
while(*(sou + i) == *(dest + i)){
18
if(*(sou + i) == 0 && *(dest + i) == 0)
19
return 1;
20
i++;
21
}
22
return 0;
23
}
24
25
void strEval(char* CMD){
26
char cmd1[] = "CLS";
27
char cmd2[] = "COLORA";
28
char cmd3[] = "COLORB";
29
char cmd4[] = "COLORC";
30
char cmd5[] = "COLORDEF";
31
char cmd6[] = "VID";
32
char cmd7[] = "HI";
33
34
char msg1[] = "\nHELLO , HAVE A GOOD JOURNEY LEARNING\n";
35
36
if(strcmp(CMD , cmd1))
37
cls();
38
39
else if(strcmp(CMD , cmd2))
40
setMonitorColor(0x3c);
41
42
else if(strcmp(CMD , cmd3))
43
setMonitorColor(0x5a);
44
45
else if(strcmp(CMD , cmd4))
46
setMonitorColor(0x2a);
47
48
else if(strcmp(CMD , cmd5))
49
setMonitorColor(0xa5);
50
51
else if(strcmp(CMD , cmd6))
52
vid();
53
else if(strcmp(CMD , cmd7))
54
printString(msg1);
55
}
56
57
void vid(){
58
char clr = 'A';
59
while(1){
60
int i = 0;
61
while(i < 2 * 80 * 25){
62
*(TM_START + i) = clr;
63
clr++;
64
i++;
65
}
66
}
67
}
We have implemented a small amount of application in this code. Here is the list:
After compiling and executing the code, We will be able to type to the screen. If the number of characters in a line becomes 20, Further key presses will generate a blinking in the screen. We did this by setting the colour of the screen for a small amount of time and changing it back to the orginal color.
hitting enter key will allow us to write to the next line.
Typing HI
and hitting enter will display a pre defined text to the screen.
Typing COLORA
, COLORB
, COLORC
and COLORDEF
will change the color of the screen.
Typing CLS
will clear the screen.
Typing VID
will generate a video feeling look. Here you need to relaunch the executable to do further experiments. This is because as the computer is fully working on video playing, it will not have time to exit from it when we say to it for eg: when pressing a key.
We could easily solve this problem by using Threading
, But currently, we are not at that topic.
The function strEval()
in extra.h
is the function which evaluates the commands and performs necessary operation.
We made the call to this function from the function named handleKeypress()
in main.c
inside an if
condition :
if(code == 0x1c){
}
This is so that strEval()
will only be called when the enter key is pressed.
TRY MAKING YOUR OWN IMPLEMENTATION OF THIS PROGRAM.
OR YOU CAN TRY MAKING SOME 2D GAME WHICH YOU CAN CONTROL WITH THE KEYBOARD.
YOU WILL BE VERY CONFIDENT IN THIS TOPIC AFTER YOU INITIATE YOUR OWN PROJECTS.
In later chapters, we will discuss about Creating pixel by pixel drawing. But This Text Mode
graphics can also be used to implement a gui as we have the option to draw coloumn based colors. Earlier computers even used text mode graphics capabilities to make a GUI.
Hard Disk
or the Primary "Secondary Storage Device"
is also one of the important part of a computer. Along with storage uses, it’s abilities are also used to maximize the use of Primary storage device or RAM. Concepts such as paging make use of this.
When implementing a hard disk driver , we usually use the in
and out
assembly commands. Before developing a hard disk driver, we need to know the types of hard disks and how the it is organized.
The two main types of hard disk available today are HDD
and SSD
. HDD Stands for Hard Disk Drive and SSD stands for Solid State Drive.
Hard Disk Drive or HDD is a mechanical hard disk with a rotating platter and a moving head. Data is stored into it in magnetic form. The Platter(Where data is stored) , rotates during read or write operations and the Head(The part of hard disk which do the read/write operation) performs the desired operation on the platter.
Even though the RPM of modern hard disks are pretty huge, It is little slower when comparing with the speed of Main Memory or RAM, as every read or write operation needs the head to move to desired locations.
SSD or Solid State Drive is a non mechanical type of hard disk where the data is stored typically using flash memory technology. As it is non mechanical , SSD’s are much faster(But costly too). SSD’s will give us a faster boot time , load time and even increase the process execution speed. This is because the paging operation will be much faster. Paging is a concept used to utilize main memory to its fullest by putting “less often used” data temporarily to hard Hard disk and later copy it back to memory when needed.
Let’s now learn how the HDD is organized to store data.
We said that the data is stored in magnetic form in platters. Typically hard disks have one to four platters stacked together. All of this platters will have separate Heads to read/write data.
All of this platters are further divided into tracks
and sectors
. There are number of tracks in a hard disk. All of this tracks will have a set of sectors in it. Usually the size of one sector is 512
bytes.
Another term associated with the hard disk is cylinder
.
We already said that hard disks have number of platters and each platters are further divided into tracks. A cylinder is formed by joining the same tracks in all of the platters. For eg: Track 1
of Platter 1
+ Track 1
of Platter 2
+ Track 1
of Platter 3
……… Forms one cylinder.
The full source code for this section is at:
https://github.com/TINU-2000/OS-DEV/tree/main/Hard%20Disk%20-%20ATA
ata.asm
1
[bits 32]
2
3
extern _blockAddr
4
extern _At
5
global _read
6
global _write
7
8
_read:
9
10
pushfd
11
and eax , 0x0FFFFFFF
12
push eax
13
push ebx
14
push ecx
15
push edx
16
push edi
17
18
mov eax , [_blockAddr]
19
mov cl , 1
20
mov edi , _At
21
22
mov ebx , eax
23
24
mov edx , 0x01F6
25
shr eax , 24
26
or al , 11100000b ; Sets bit 6 in al for LBA mode
27
out dx , al
28
29
mov edx , 0x01F2 ; Port to send number of sectors
30
mov al , cl ; Get number of sectors from CL
31
out dx , al
32
33
mov edx , 0x1F3 ; Port to send bit 0 - 7 of LBA
34
mov eax , ebx ; Get LBA from EBX
35
out dx , al
36
37
mov edx , 0x1F4 ; Port to send bit 8 - 15 of LBA
38
mov eax , ebx ; Get LBA from EBX
39
shr eax , 8 ; Get bit 8 - 15 in AL
40
out dx , al
41
42
43
mov edx , 0x1F5 ; Port to send bit 16 - 23 of LBA
44
mov eax , ebx ; Get LBA from EBX
45
shr eax , 16 ; Get bit 16 - 23 in AL
46
out dx , al
47
48
mov edx , 0x1F7 ; Command port
49
mov al , 0x20 ; Read with retry.
50
out dx , al
51
52
.loop1:
53
in al , dx
54
test al , 8
55
jz .loop1
56
57
mov eax , 256 ; Read 256 words , 1 sector
58
xor bx , bx
59
mov bl , cl ; read CL sectors
60
mul bx
61
mov ecx , eax
62
mov edx , 0x1F0
63
rep insw ; copy to [RDI]
64
65
pop edi
66
pop edx
67
pop ecx
68
pop ebx
69
pop eax
70
popfd
71
ret
72
73
74
_write:
75
pushfd
76
and eax , 0x0FFFFFFF
77
push eax
78
push ebx
79
push ecx
80
push edx
81
push edi
82
83
mov eax , [_blockAddr]
84
mov cl , 1
85
mov edi , _At
86
87
mov ebx , eax
88
89
mov edx , 0x01F6
90
shr eax , 24
91
or al , 11100000b ; Set bit 6 in al for LBA mode
92
out dx , al
93
94
mov edx , 0x01F2 ; Port to send number of sectors
95
mov al , cl ; Get number of sectors from CL
96
out dx , al
97
98
mov edx , 0x1F3 ; Port to send bit 0 - 7 of LBA
99
mov eax , ebx ; Get LBA from EBX
100
out dx , al
101
102
mov edx , 0x1F4 ; Port to send bit 8 - 15 of LBA
103
mov eax , ebx ; Get LBA from EBX
104
shr eax , 8 ; Get bit 8 - 15 in AL
105
out dx , al
106
107
108
mov edx , 0x1F5 ; Port to send bit 16 - 23 of LBA
109
mov eax , ebx ; Get LBA from EBX
110
shr eax , 16 ; Get bit 16 - 23 in AL
111
out dx , al
112
113
mov edx , 0x1F7 ; Command port
114
mov al , 0x30 ; Write with retry.
115
out dx , al
116
117
.loop2:
118
in al , dx
119
test al , 8
120
jz .loop2
121
122
mov eax , 256 ; Read 256 words , 1 sector
123
xor bx , bx
124
mov bl , cl ; write CL sectors
125
mul bx
126
mov ecx , eax
127
mov edx , 0x1F0
128
mov esi , edi
129
rep outsw ; go out
130
131
pop edi
132
pop edx
133
pop ecx
134
pop ebx
135
pop eax
136
popfd
137
ret
extra.h
1
void setMonitorColor(char);
2
void cls();
3
void printString(char*);
4
void vid();
5
void put();
6
void get();
7
8
extern void read();
9
extern void write();
10
11
12
int blockAddr;
13
char At[1024];
14
15
char* TM_START;
16
17
void blink(){
18
setMonitorColor(0x59);
19
int TIME_OUT = 0x10fffff;
20
while(--TIME_OUT);
21
setMonitorColor(0xa5);
22
}
23
24
char strcmp(char* sou , char* dest){
25
int i = 0;
26
while(*(sou + i) == *(dest + i)){
27
if(*(sou + i) == 0 && *(dest + i) == 0)
28
return 1;
29
i++;
30
}
31
return 0;
32
}
33
34
void strEval(char* CMD){
35
char cmd1[] = "CLS";
36
char cmd2[] = "COLORA";
37
char cmd3[] = "COLORB";
38
char cmd4[] = "COLORC";
39
char cmd5[] = "COLORDEF";
40
char cmd6[] = "VID";
41
char cmd7[] = "HI";
42
char cmd8[] = "PUT";
43
char cmd9[] = "GET";
44
45
char msg1[] = "\nHELLO , HAVE A GOOD JOURNEY LEARNING\n";
46
47
if(strcmp(CMD , cmd1))
48
cls();
49
50
else if(strcmp(CMD , cmd2))
51
setMonitorColor(0x3c);
52
53
else if(strcmp(CMD , cmd3))
54
setMonitorColor(0x5a);
55
56
else if(strcmp(CMD , cmd4))
57
setMonitorColor(0x2a);
58
59
else if(strcmp(CMD , cmd5))
60
setMonitorColor(0xa5);
61
62
else if(strcmp(CMD , cmd6))
63
vid();
64
else if(strcmp(CMD , cmd7))
65
printString(msg1);
66
else if(strcmp(CMD , cmd8)){
67
blockAddr = 0;
68
int i = 0;
69
70
while(i < 511){
71
At[i] = 'J'; // Fill with J
72
i++;
73
}
74
At[i] = 0; // Null character
75
76
put(); // Writes to Hard disk
77
78
i = 0;
79
while(i < 511){
80
At[i] = 0; // Clears the content
81
i++;
82
}
83
}
84
else if(strcmp(CMD , cmd9)){
85
blockAddr = 0;
86
get();
87
printString(At);
88
}
89
90
}
91
92
void vid(){
93
char clr = 'A';
94
while(1){
95
int i = 0;
96
while(i < 2 * 80 * 25){
97
*(TM_START + i) = clr;
98
clr++;
99
i++;
100
}
101
}
102
}
103
104
void put(){
105
write();
106
}
107
108
void get(){
109
read();
110
}
compile.bat
1
nasm Boot.asm -f bin -o bin\bootsect.bin
2
nasm Kernel_Entry.asm -f elf -o bin\Entry.bin
3
nasm IDT.asm -f elf -o bin\IDT.bin
4
nasm ata.asm -f elf -o bin\ata.bin
5
6
gcc -m32 -ffreestanding -c main.c -o bin\kernel.o
7
8
ld -T NUL -o bin\Kernel.img -Ttext 0x1000 bin\Entry.bin bin\kernel.o bin\IDT.bin bin\
9
\ata.bin
10
11
objcopy -O binary -j .text bin\kernel.img bin\kernel.bin
12
copy /b /Y bin\bootsect.bin+bin\kernel.bin bin\os-image
13
14
qemu-system-i386 -drive format=raw,file=bin\os-image
Here, we implemented a hard disk driver for the ATA
technology. ATA or Advanced Technology Attachment
allows hard disks and CD-ROMs to be internally connected to the motherboard and perform input/output functions.
There are different ways to read/write to hard disk. What we have used is LBA
mode.
This is the easiest way to read/write to hard disk , all we need to do is pass the Block address of sector. Passing 0 will give us access to the first sector(Boot sector).
Please note not to write to the 0’th sector as this can make your computer non bootable, but you could always copy a boot loader to that sector.
Let’s now understand the code:
The two new commands we added to the strEval
function are GET
and PUT
.
When we type and enter the PUT
command, it first copies the number 0
to blockAddr
and then proceeds to initialize every cell in At
character array with 'J'
and finally adds a null character. Then it calls the put
function which calls another function named write
.
The write
function is defined in ata.asm
file. In that function, look at the following section:
mov eax , [_blockAddr]
mov cl , 1
mov edi , _At
Here , the program copies the value we copied to the blockAddr
variable in strEval
function to the eax register. This value represents which sector to write to hard disk. Then we copied the value 1
to cl
register , this represents the number of sectors to write. As we give 1
to that register, it writes 1 sector(512 bytes). After that, we copied the address of the array At
we defined in extra.h
file. This is so that the processor will write the data in that array to hard disk. The rest of the code communicates with the hard disk to write to it, and after the operation ends, it returns to our code in C
.
The same thing happens when we give the GET
command. But instead of the writing process , it does the reading process. In the code , we first copied the value 0
to blockAddr
variable, Then we called the get
function which calls the read
function in ata.asm
file. That function reads the 0
‘th sector to the At
array and returns to the code in C
. and finally prints the content in that array.
Now , when running it, try giving the GET
command first. We will get some random characters as output.
Now , try giving the PUT
command and later call the GET
command. This will print a lot a J
to the screen.
This is because , when we give the PUT
command , it copies the string of J
’s to At
array and writes it to the hard disk. After that , when we call the GET
command , it reads from the hard disk and outputs the string of J
’s we previously copied to it.
We done all of these operations to the 0
‘th sector. We could change the value in blockAddr
variable to read/write to other sectors.
Now, as we developed the Hard disk driver, lets create a File System so that we could do some file operations.
https://wiki.osdev.org/Category:ATA
https://wiki.osdev.org/ATA_PIO_Mode
A File System
is a system which lets us use the secondary storage device to store files , edit , rename , delete and do all other file operations. An ideal file system should be able to use the capacity of hard disk to the most. Fat , ntfs , ext etc… are examples of file systems.
We need to work on a good implementation so that it enables all of the file operations. The file system should be ideal enough so that even a single byte shouldn’t be unused.
This is technically impossible to make use of every byte, But what it mean is that We should be able to use our hard disk to the most
Here , we will discuss how to implement a simple file system.
File system implementation is an advanced concept as it involves use of Data Structures.
We will discuss about implementing a simple file system which will be able to only save and retrieve some data. It won’t be able to delete , rename or do other complicated stuff.
This is because as this is a beginner level book , we don’t want the readers to be overwhelmed with big details.
We are going to build a file system which allows us to give names to a file and store upto 512 byte data in that file.
Heres how we are going to implement it:
After our os is launched , before trying to create a file , The user needs to format the hard disk. This is usually done when the user enters the command FORMAT
and presses enter key.
The formatting operation first stores four random bytes(Which the developer can choose) in the beginning of sector 0.
Then , the developer needs to code so that the remaining bytes in sector 0 and the rest of sectors initialize with 0. This is because , that we don’t want the default data in hard disk to combine with the data we will store in it in the future. So it’s a good idea to clear it. This is not so necessary but you may end up with problems depending upon your implementation.
Please note that the sector 0 is always used to store the boot loader. Writing other data to sector 0 will make the system un-bootable. if you are storing and booting the os from real hardware , please choose other sector.
The Sector 0 Where we stored the random bytes will be called as the file allocation table here. This sector could be used to store names of file that we will create in the future.
Now we will enable the users to type a command to create a file(You could choose any name for the command) and after obtaining the name for the file we do the following operations:
- We check the first four bytes stored in sector 0 and if it is same as what we stored in the formatting section , we do the next operation.
We do this check to see if we have previously formatted the disk. If the check fails , say to the user to enter the command to format it.
- Now we store the name of file in sector 0 just after the name of last file we created. Name of the very first file can be stored just after the first four bytes.
Please note that you need to manage the implementation so that we could differentiate between different files
To save some data to a file , the user first needs to create a file using the command you chose at the previous section. After creating the file , the user could attempt the SAVE
operation we will discuss here.
Lets use the command SAVE
to do the save operation. What the save operation does here is copying a specified string of text to hard disk.
After the user enters the save command , we will also obtain a file name and a string of text to save to hard disk. Now we will do a check to see whether the user previously created that file by looking to the 0’th sector and if it succeeds , we do the following operation
- We obtains the index of the file specified. We will get 1 as index if the file specfied is the first file defined at sector 0. Like that , we will get 2 as index if the file specified is the second file defined at sector 0.
- Now we take this index as the sector count where the string should be saved to.
if the index is 1 , we save the string of text to sector 1 , and if the index is 2 , we save the string to sector 2. and this process goes on.
Please note that we will only be able to store string of upto 512 bytes as the size of a sector is always 512 bytes. Storing a large file needs different implementation.
I assume you already know how to access the sectors in hard disk. Heres how to do it (Assuming you already implemented the hard disk driver said in the previous chapter):
- Store the index of sector to read/write to the variable
blockAddr
defined atextra.h
. Copying 0 will give access to first sector(Boot sector) copying 1 will give access to 2’nd sector , copying 2 will give access to 3’rd sector………- Now to do the save operation , call the
put()
function and to retrieve the content from hard disk , call theget()
function inextra.h
header file.
To read the data in saved file , We will do things similar to what we done at Save
Section. We first obtains the name of file , then searches the sector 0 to get index of that file and use that index to point to the sector and finally do the read operation.
The file system we discussed here is actually worth nothing. But i think it gives some ideas about how it should be done. An ideal file system should always support creating file of variable size , rename , delete , edit and other advanced things such as using algorithms to find best space to store a file. Finding best place to store a file is so crucial so that we could use the hard disk to the most.
Every file allocation methods have it’s own pros and cons. But it’s our responsibility to implement a system which best suit for our needs.
https://wiki.osdev.org/File_Systems
https://wiki.osdev.org/FAT
We this far have used Text Mode graphics to output to the screen. Even if it allows to print colours , we are not able to do it in a pixel by pixel manner. After switching to graphics mode , we will be able to draw pixel. We will now choose the screen resolution we want and do the video operations.
The Drawback of switching to Graphics mode is that , we need to design special programs to render text to the screen. We should first create a font and use it to render to screen whenever we want. Most Hobbyist operating systems use Text Mode graphics as it is the best solution to learn it. But in order to get the most of the systems abilities such as displaying icons , pictures etc… needs this switch.
Not all modes allow rendering full colours , we will choose from a list of modes which have different abilities.
When talking about switching to Graphics mode , we have different video modes to choose from. These different modes have its own abilities with its own supported resolution and colours.
The following table shows the supported modes and its representation hex value:
This is taken from:
https://wiki.osdev.org/Drawing_In_Protected_Mode
00
text 40*25 16 color (mono)01
text 40*25 16 color02
text 80*25 16 color (mono)03
text 80*25 16 color04
CGA 320*200 4 color05
CGA 320*200 4 color (m)06
CGA 640*200 2 color07
MDA monochrome text 80*2508
PCjr09
PCjr0A
PCjr0B
reserved0C
reserved0D
EGA 320*200 16 color0E
EGA 640*200 16 color0F
EGA 640*350 mono10
EGA 640*350 16 color11
VGA 640*480 mono12
VGA 640*480 16 color13
VGA 320*200 256 color
We need to choose one from this list and note its associated hex value.
For now we will take VGA 320*200 256 color so its representation number hex value is 13
.
We will now switch the video mode from our current Text Mode to Graphics Mode.
Switching to graphics mode is done using bios. But as we are in 32 bit Protected Mode , we won’t be able to use bios. Bios is only available in 16 bit mode. So we will leave our current source codes for now and start a new project from scratch to show the development in graphics mode. After we switch to Graphics mode , we could switch to 32 bit protected mode.
Please note that we will use our current source codes in later chapters. But for this chapter , we will create a sample from scratch.
The following assembly statements switches to VGA 320*200 256 color Mode:
mov ah , 0x00
mov al , 0x13
int 0x10
Here , we first copied the value 0x00
to the ah
register. This value represents the option in bios to switch video modes.
We then copied the value 0x13
to the al
register, And finally called the bios with int 0x10
.
Bios will now switch to VGA 320*200 256 color mode which we represented with 0x13
We Could place this assembly statements in Boot.asm
file just after the the Boot loader code.
We wont be able to switch to graphics mode in 32 bit protected mode. So the best place to put code to switch to Graphics mode in our case is just after the Boot Loader in Boot.asm
file.
Please obtain the source code from:https://github.com/TINU-2000/OS-DEV/tree/main/Graphics%20Mode0
Boot.asm
1
[org 0x7c00]
2
[bits 16]
3
4
;Boot Loader
5
mov bx , 0x1000 ; Memory offset to which kernel will be loaded
6
mov ah , 0x02 ; Bios Read Sector Function
7
mov al , 30 ; No. of sectors to read(If your kernel won't fit into 30 sectors , \
8
you may need to provide the correct no. of sectors to read)
9
mov ch , 0x00 ; Select Cylinder 0 from harddisk
10
mov dh , 0x00 ; Select head 0 from hard disk
11
mov cl , 0x02 ; Start Reading from Second sector(Sector just after boot sector)
12
13
int 0x13 ; Bios Interrupt Relating to Disk functions
14
15
;Switch to Graphics Mode
16
mov ah , 0x00
17
mov al , 0x13
18
int 0x10
19
20
;Switch To Protected Mode
21
cli ; Turns Interrupts off
22
lgdt [GDT_DESC] ; Loads Our GDT
23
24
mov eax , cr0
25
or eax , 0x1
26
mov cr0 , eax ; Switch To Protected Mode
27
28
jmp CODE_SEG:INIT_PM ; Jumps To Our 32 bit Code
29
;Forces the cpu to flush out contents in cache memory
30
31
[bits 32]
32
33
INIT_PM:
34
mov ax , DATA_SEG
35
mov ds , ax
36
mov ss , ax
37
mov es , ax
38
mov fs , ax
39
mov gs , ax
40
41
mov ebp , 0x90000
42
mov esp , ebp ; Updates Stack Segment
43
44
45
call 0x1000
46
jmp $
47
48
49
50
51
52
GDT_BEGIN:
53
54
GDT_NULL_DESC: ;The Mandatory Null Descriptor
55
dd 0x0
56
dd 0x0
57
58
GDT_CODE_SEG:
59
dw 0xffff ;Limit
60
dw 0x0 ;Base
61
db 0x0 ;Base
62
db 10011010b ;Flags
63
db 11001111b ;Flags
64
db 0x0 ;Base
65
66
GDT_DATA_SEG:
67
dw 0xffff ;Limit
68
dw 0x0 ;Base
69
db 0x0 ;Base
70
db 10010010b ;Flags
71
db 11001111b ;Flags
72
db 0x0 ;Base
73
74
GDT_END:
75
76
GDT_DESC:
77
dw GDT_END - GDT_BEGIN - 1
78
dd GDT_BEGIN
79
80
CODE_SEG equ GDT_CODE_SEG - GDT_BEGIN
81
DATA_SEG equ GDT_DATA_SEG - GDT_BEGIN
82
83
84
times 510-($-$$) db 0
85
dw 0xaa55
Kernel_Entry.asm
1
START
:
2
[
bits
32
]
3
[
extern
_start
]
4
call
_start
5
jmp
$
main.c
1
int start(){
2
3
}
compile.bat
1
nasm Boot.asm -f bin -o bin\bootsect.bin
2
nasm Kernel_Entry.asm -f elf -o bin\Entry.bin
3
4
gcc -m32 -ffreestanding -c main.c -o bin\kernel.o
5
6
ld -T NUL -o bin\Kernel.img -Ttext 0x1000 bin\Entry.bin bin\kernel.o
7
8
objcopy -O binary -j .text bin\kernel.img bin\kernel.bin
9
copy /b /Y bin\bootsect.bin+bin\kernel.bin bin\os-image
10
11
qemu-system-i386 -drive format=raw,file=bin\os-image
Please note to create a folder named bin
in the same folder as the compile.bat
file saves the compiled files to it.
Look at the Boot.asm
file to see changes.
Compile and run the program and you will get a black screen with no content. This is the desired output at this stage. We will now see how to draw to that screen.
Please also note that the start()
function in main.c
will be blank , we will now create code to render to the screen in that function.
In Text Mode graphics , we wrote to a special location in ram to make things display to the screen. Graphics Mode Graphical operations also make use of memory to draw to the screen. But in this case , we will use 0xA0000
as address.
The following program renders one pixel to the Top Left of the screen by poking the very first video memory buffer byte:
main.c
1
int start(){
2
char* vbuff = (char*) 0xA0000;
3
*(vbuff) = 0x55;
4
}
Output
The pixel displayed here represents the hex value 0x55
which we copied to the video memory.
The supported values that can be copied to one cell is from 0x00
to 0xFF
or 0
to 255
The rest of the pixels can be rendered by poking the rest of the video memory.
For eg: poking 0xA0000 + 1
renders the second pixel , poking 0xA0000 + 2
renders third pixel and this process goes on.
Look at the following program to poke every cell in first row:
main.c
1
int start(){
2
char* vbuff = (char*) 0xA0000;
3
char colr = 0x00;
4
int i = 0;
5
while(i < 320){
6
*(vbuff + i) = colr;
7
colr++;
8
i++;
9
}
10
}
Output
Now lets fill the screen:
main.c
1
int start(){
2
char* vbuff = (char*) 0xA0000;
3
char colr = 0x00;
4
int i = 0;
5
while(i < 320 * 200){
6
*(vbuff + i) = colr;
7
colr++;
8
i++;
9
}
10
}
Output
We have discussed about the base of creating a pixel by pixel drawing. Creating a good user interface needs deep knowledge in different colours and its values. Let me give a sample user interface , for that i will create a user interface looking like a text editor:
main.c
1
int start(){
2
char* vbuff = (char*) 0xA0000;
3
int y = 0;
4
while(y < 200){
5
int x = 0;
6
while(x < 75){
7
*(vbuff + (y * 320) + x) = 0x07;
8
x++;
9
}
10
while(x < 245){
11
*(vbuff + (y * 320) + x) = 0x0B;
12
x++;
13
}
14
while(x < 320){
15
*(vbuff + (y * 320) + x) = 0x07;
16
x++;
17
}
18
y++;
19
}
20
}
Output
Lets see another one:
main.c
1
int start(){
2
char* vbuff = (char*) 0xA0000;
3
int y = 0;
4
while(y < 200){
5
int x = 0;
6
while(x < 75){
7
*(vbuff + (y * 320) + x) = 0x04;
8
x++;
9
}
10
while(x < 245){
11
*(vbuff + (y * 320) + x) = 0x0F;
12
x++;
13
}
14
while(x < 320){
15
*(vbuff + (y * 320) + x) = 0x04;
16
x++;
17
}
18
y++;
19
}
20
}
Output
Here , we could take the center part as the Text Editing field. This is just a sample , we wont be able to render text to the screen with any trick. We need to create our own font and make some program to render it to the screen. This may take some time , but it’s a good idea to keep working on this mode and develop an amazing UI.
Also note to work with other video modes.
https://wiki.osdev.org/Drawing_In_Protected_Mode
https://wiki.osdev.org/VGA_Resources
Mouse
is very essential when working with a Graphical Operating system. Although most hobbyist operating systems do not use a mouse as they work on text modes. But we need to implement it for those who are trying for a graphical OS.
For that , We will implement a driver for PS2 mouse.
A PS2
mouse generates IRQ12
. Once we initialize a mouse , it sends 3 or 4 byte packets to communicate mouse movement, and mouse button press/release events.
We used IRQ1
to get data from the keyboard. likewise we will be able to get mouse input by defining IRQ12
We could give an address of a function we create so that it will be called during every mouse input.
The first 3 bytes which the mouse give always have the same format. The second byte which it generate signifies movement in X
axis and the third byte signifies mouse movement in Y
axis.
When talking about the first byte , it is a combination of 8 bits
where every bit signifies different things.
The 6'th
bit of first byte is set if the middle button is clicked. 7'th
bit represents the click of Right button and 8'th
bit represents the click of Left button.
You can use the bit shift
and logical and
, or
operators in c to evaluate each of this bits.
The PS2 mouse won’t give you special data to represent double clicks. The best way to check if a double click occur is by looking at the time difference between first and second click. If the time difference is less(According to your need) , you could treat it as a double click.
Please obtain the full source code from:
https://github.com/TINU-2000/OS-DEV/tree/main/Mouse
1
unsigned char inportb(unsigned short _port){
2
unsigned char rv;
3
__asm__ __volatile__ ("inb %1, %0" : "=a" (rv) : "dN" (_port));
4
return rv;
5
}
6
7
void outportb(unsigned short _port, unsigned char _data){
8
__asm__ __volatile__ ("outb %1, %0" : : "dN" (_port), "a" (_data));
9
}
10
11
void irq_install_handler(){
12
13
// Initialize IRQ12 in IDT and load it
14
15
}
16
17
void mouse_wait(unsigned char a_type){
18
int _time_out=100000;
19
if(a_type==0){
20
while(_time_out--){
21
if((inportb(0x64) & 1) == 1){
22
return;
23
}
24
}
25
return;
26
}
27
else{
28
while(_time_out--){
29
if((inportb(0x64) & 2) == 0){
30
return;
31
}
32
}
33
return;
34
}
35
}
36
37
void ProcessMouse(){
38
39
}
40
41
void mouse_write(unsigned char a_write){
42
mouse_wait(1);
43
outportb(0x64, 0xD4);
44
45
mouse_wait(1);
46
outportb(0x60, a_write);
47
}
48
49
unsigned char mouse_read(){
50
mouse_wait(0);
51
return inportb(0x60);
52
}
53
54
void initMouse(){
55
unsigned char status;
56
mouse_wait(1);
57
outportb(0x64, 0xA8);
58
59
60
mouse_wait(1);
61
outportb(0x64, 0x20);
62
63
mouse_wait(0);
64
status = (inportb(0x60) | 2);
65
mouse_wait(1);
66
outportb(0x64, 0x60);
67
mouse_wait(1);
68
outportb(0x60, status);
69
70
// Use default settings
71
mouse_write(0xF6);
72
mouse_read();
73
74
mouse_write(0xF4);
75
mouse_read();
76
77
irq_install_handler(12 , &ProcessMouse);
78
}
79
80
int start(){
81
initMouse();
82
}
Here , the start()
function calls a function to initialize the mouse , and in that initialize function , We communicated with the mouse to make it start working considering the PS2 specifications. At the end of that function , we called the irq_install_handler
function which initializes IRQ12 and load it. This function will be same as the one we used to initialize the keyboard and the only difference is the IRQ number.
After all of this is complete , whenever the mouse generates an interrupt , the cpu will start executing the ProcessMouse()
function.
We will be able to use the inportb
function to read from mouse and do the operation we want. So it is left to you for your own implementation!!!
https://wiki.osdev.org/Mouse_Input
https://forum.osdev.org/viewtopic.php?t=10247
Generating audio is an interesting part when doing your project. We actually give some integer value(Representing the frequency) which will turn to an audible effect.
We directly communicate with the sound card using out
command and the sound card work on further stuff.
Let’s directly start working on it!!
Let’s Generate a small beep
:
main.c
1
unsigned char inportb(unsigned short _port){
2
unsigned char rv;
3
__asm__ __volatile__ ("inb %1, %0" : "=a" (rv) : "dN" (_port));
4
return rv;
5
}
6
7
void outportb(unsigned short _port, unsigned char _data){
8
__asm__ __volatile__ ("outb %1, %0" : : "dN" (_port), "a" (_data));
9
}
10
11
void play(unsigned int fr) {
12
unsigned int Div;
13
unsigned char tmp;
14
15
Div = 1193180 / fr;
16
outportb(0x43, 0xb6);
17
outportb(0x42, (unsigned char) (Div) );
18
outportb(0x42, (unsigned char) (Div >> 8));
19
20
tmp = inportb(0x61);
21
if (tmp != (tmp | 3)) {
22
outportb(0x61, tmp | 3);
23
}
24
}
25
26
void stop() {
27
unsigned char fr = inportb(0x61) & 0xFC;
28
outportb(0x61, fr);
29
}
30
31
void scream(int fr) {
32
play(fr);
33
int Delay = 10000;
34
while(Delay--);
35
stop();
36
}
37
38
int start(){
39
scream(2020);
40
}
Before compiling , add the following argument when calling qemu in compile.bat
:
-soundhw pcspk
compile.bat
1
nasm Boot.asm -f bin -o bin\bootsect.bin
2
nasm Kernel_Entry.asm -f elf -o bin\Entry.bin
3
4
gcc -m32 -ffreestanding -c main.c -o bin\kernel.o
5
6
ld -T NUL -o bin\Kernel.img -Ttext 0x1000 bin\Entry.bin bin\kernel.o
7
8
objcopy -O binary -j .text bin\kernel.img bin\kernel.bin
9
copy /b /Y bin\bootsect.bin+bin\kernel.bin bin\os-image
10
11
qemu-system-i386 -soundhw pcspk -drive format=raw,file=bin\os-image
Here , we called the scream
function with scream(2020)
. We could try giving different value as an argument to the scream function to generate different sound.
Please obtain the full source code for this section from:
https://github.com/TINU-2000/OS-DEV/tree/main/Audio
Let’s include the sound generating program in our previous code. This lets us generate sound when entering a command.
This time we will feel it like a music , lets see:
extra.h
1
void setMonitorColor(char);
2
void cls();
3
void printString(char*);
4
void vid();
5
void put();
6
void get();
7
8
void scream(int);
9
10
extern void read();
11
extern void write();
12
13
unsigned char inportb(unsigned short);
14
void outportb(unsigned short , unsigned char);
15
16
17
18
int blockAddr;
19
char At[1024];
20
21
char* TM_START;
22
23
void blink(){
24
setMonitorColor(0x59);
25
int TIME_OUT = 0x10fffff;
26
while(--TIME_OUT);
27
setMonitorColor(0xa5);
28
}
29
30
char strcmp(char* sou , char* dest){
31
int i = 0;
32
while(*(sou + i) == *(dest + i)){
33
if(*(sou + i) == 0 && *(dest + i) == 0)
34
return 1;
35
i++;
36
}
37
return 0;
38
}
39
40
void strEval(char* CMD){
41
char cmd1[] = "CLS";
42
char cmd2[] = "COLORA";
43
char cmd3[] = "COLORB";
44
char cmd4[] = "COLORC";
45
char cmd5[] = "COLORDEF";
46
char cmd6[] = "VID";
47
char cmd7[] = "HI";
48
char cmd8[] = "PUT";
49
char cmd9[] = "GET";
50
char cmd10[] = "PLAY";
51
52
char msg1[] = "\nHELLO , HAVE A GOOD JOURNEY LEARNING\n";
53
54
if(strcmp(CMD , cmd1))
55
cls();
56
57
else if(strcmp(CMD , cmd2))
58
setMonitorColor(0x3c);
59
60
else if(strcmp(CMD , cmd3))
61
setMonitorColor(0x5a);
62
63
else if(strcmp(CMD , cmd4))
64
setMonitorColor(0x2a);
65
66
else if(strcmp(CMD , cmd5))
67
setMonitorColor(0xa5);
68
69
else if(strcmp(CMD , cmd6))
70
vid();
71
else if(strcmp(CMD , cmd7))
72
printString(msg1);
73
else if(strcmp(CMD , cmd8)){
74
blockAddr = 0;
75
int i = 0;
76
77
while(i < 511){
78
At[i] = 'J'; // Fill with J
79
i++;
80
}
81
At[i] = 0; // Null character
82
83
put(); // Writes to Hard disk
84
85
i = 0;
86
while(i < 511){
87
At[i] = 0; // Clears the content
88
i++;
89
}
90
}
91
else if(strcmp(CMD , cmd9)){
92
blockAddr = 0;
93
get();
94
printString(At);
95
}
96
else if(strcmp(CMD , cmd10)){
97
int stone = 15000;
98
while(stone--){
99
scream(stone);
100
}
101
}
102
103
}
104
105
void vid(){
106
char clr = 'A';
107
while(1){
108
int i = 0;
109
while(i < 2 * 80 * 25){
110
*(TM_START + i) = clr;
111
clr++;
112
i++;
113
}
114
}
115
}
116
117
void put(){
118
write();
119
}
120
121
void get(){
122
read();
123
}
124
125
void play(unsigned int fr) {
126
unsigned int Div;
127
unsigned char tmp;
128
129
Div = 1193180 / fr;
130
outportb(0x43, 0xb6);
131
outportb(0x42, (unsigned char) (Div) );
132
outportb(0x42, (unsigned char) (Div >> 8));
133
134
tmp = inportb(0x61);
135
if (tmp != (tmp | 3)) {
136
outportb(0x61, tmp | 3);
137
}
138
}
139
140
void stop() {
141
unsigned char fr = inportb(0x61) & 0xFC;
142
outportb(0x61, fr);
143
}
144
145
void scream(int fr) {
146
play(fr);
147
int stone = 0xffff;
148
while(stone--);
149
stop();
150
}
compile and run the program. Then type PLAY
to play the sound.
Please note that you need to include the previous compile.bat
file for the audio to generate. Audio will not be played if qemu is not called with -soundhw pcspk
.
https://wiki.osdev.org/PC_Speaker
https://wiki.osdev.org/Sound
https://wiki.osdev.org/Sound_Blaster_16
This section is dedicated for advanced development. We will give some small descriptions for each topic and relevant reference links. Please note that this won’t give you a complete idea as each topic usually requires an entire Book.
The technology used to communicate with CD-ROM
is ATAPI
. It uses the Packet Interface of the ATA6
or higher standard command set. ATAPI
give commands needed for controlling CD-ROM
drive and Tape
drive.
The command set for ATAPI
includes one for Reading and Writing Sectors , Additional features like setting CD SPEED
, Read Track Information etc and more….
https://wiki.osdev.org/ATAPI
USB
was introduced with the intention of replacing custom Hardware interfaces, and to simplify the configuration of these devices. The official USB
specification might be hard to read and may demotivate you from implementing a driver for that. Anyway it is a Must-To-Implement
feature if you have the plan to support USB Drives or the “Pen Drives”.
USB
uses serial data transfer methods (And so the name Universal Serial Bus
) and it’s different version support different transfer speeds.
https://en.wikipedia.org/wiki/USB
https://wiki.osdev.org/USB
Networking
also is a large topic as it involves communicating between Network interfaces(We need to implement a lot of drivers for different Interfaces including Wired and Wireless).
When a driver for these Network cards are developed , we need to implement some Program for the communication protocol like TCP
, UDP
etc….
https://wiki.osdev.org/Intel_Ethernet_i217
https://en.wikipedia.org/wiki/Transmission_Control_Protocol
https://en.wikipedia.org/wiki/User_Datagram_Protocol
Paging
is a feature provided by x86 cpu which allows use of hard disk to temporarily store Data intended to be stored in ram and when needed take it back. This practically allows a program to utilize memory more than what a system have.
But this operation is little slower as a hard disk actually works at low speed. A solution for this is to use an SSD
.
https://en.wikipedia.org/wiki/Memory_paging
https://wiki.osdev.org/Paging
Global Descriptor Table
or GDT
is a table which contains information about memory areas and who can access those. This is very essential as an absense of this feature could allow user mode programs to do things like editing the kernel routines which results in complete system takedown.
We have already implemented gdt with the lgdt
assembly instruction.
https://en.wikipedia.org/wiki/Global_Descriptor_Table
https://wiki.osdev.org/Global_Descriptor_Table
https://wiki.osdev.org/GDT_Tutorial
Interrupt Descriptor Table
or IDT
is a mechanism implemented in x86
cpu which allows an operating system to capture events and do necessary operations. The events include Keyboard Input , Mouse Input etc… and also includes software defined interrupts which in x86 cpu occurs during things like when executing the int
command.
An absense of feature like Interrupt Descriptor Table
will make the system inefficient at working. This is because in this case , our only solution will be to ask for events every time to these devices.
https://wiki.osdev.org/Interrupts
https://wiki.osdev.org/Interrupt_Descriptor_Table
https://wiki.osdev.org/Interrupt_Service_Routines
Programmable Interval Timer
or PIT
is a counter that generates an interrupt when it reaches a programmed count. Things such as a Delay Operations
could be implemented using this.
https://en.wikipedia.org/wiki/Programmable_interval_timer
https://wiki.osdev.org/Programmable_Interval_Timer
https://en.wikipedia.org/wiki/GNU_GRUB
https://wiki.osdev.org/GRUB
https://www.howtogeek.com/56958/htg-explains-how-uefi-will-replace-the-bios/
https://wiki.osdev.org/UEFI
https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface
There are many resources online to move further. You now have an idea how things works , You can utilize it to move further. Here’s some resources
https://littleosbook.github.io/book.pdf
http://www.brokenthorn.com/Resources/OSDevIndex.html
http://wyoos.org/Sources/index.php
We have travelled a small road till here. This book aims for beginners to get started in os development and to give an idea about how things work basically. Getting good fluency in this topic need doing some projects.
Everyone have to put effort when learning a new topic. Not understanding for the first time is not a bad thing. Consistency and Patience will surely lead you to success.
At the end of time , i want every one to be successful in the Journey.
Wishing You Good Luck!!!
Thank You!!