Time to poke around the kernel and do a simple bit of reversing.
Whether you are a sysadmin, penetration tester, or reverse engineer, if you don't know about Solaris DTrace you will want to. It allows for low-latency instrumentation of the system. This includes function boundary tracing (FBT) of pretty much any function in the kernel and any application running on the server.
People have been able to solve complex problems in applications on other platforms simply because the application also ran on Solaris and they could use DTrace on that port of the application.
You can inspect memory in the kernel and application based on events, and much more. You can even get the kernel to lie to userland processes (with filtering on what it will lie to).
But we are getting ahead of ourselves.
As an example, which I'll cover a very small part here, I had an application that had multiple serious issues that even the vendor was struggling to solve.
To identify the problem areas the most powerful script I wrote was a single DTrace script. This single script was able to trace all the individual requests from the arrival of the initial connection and the three-way handshake using kernel FBT (including when Q = Qmax), through to the system call (syscall tracing), on to the listening thread within the Java app, and then the handover to the worker thread (user FBT) and keep an eye on all Java safepoints and then tell me what was blocking the threads (filesystem issues – sometimes closing a file descriptor took several seconds). All these stages had issues, and sometimes the Java safepoints aggravated the situation; but the picture it painted was invaluable – we could prove what was the cause before fixing it, not just say something was the probable cause, and do this directly on the live system. Tracing all of this without impacting the live application.
To start we will look at monitoring the incoming connection (the first syn packet), and whilst we are in the listener perimeter (mutual exclusion of the listener in interrupt context) we will report on the value of q0/q/qmax. i.e. we know if a connection is ignored because we can prove q==qmax. Using ndd only gives you a point in time, here we are provably showing what the settings are at the time that specific connection is evaluated within the kernel.
It helps to have the Solaris source code (or a version of it) – see https://github.com/illumos/illumos-gate.
Our first DTrace scripts
In source file usr/src/stand/lib/tcp/tcp.c we have tcp_conn_request(). It's first parameter is a tcp_t, which is defined in /usr/include/inet/tcp.h. As a reverse engineer we could figure this out without the source code, but I will leave that as an exercise.
The tcp_t has tcp_conn_req_cnt_q0, tcp_conn_req_cnt_q, and tcp_conn_req_max for Q0, Q and Qmax respectively.
Within this structure we also have a struct conn_s (tcp_connp) pointer, which is defined in /usr/include/inet/ipclassifier.h. Then in tcp_connp we have a union of structs, such that the local port is at u_port.tcpu_ports.tcp_lport.
So, the script would be:
If we run the script and then create a new connection to 22/tcp (e.g. using nc or telnet with a dst port) we see this:
So, what is wrong. First, lets comment out all the printf's except the one that prints the address of arg0 and re-try:
If we then look for the address using ndd we see that this doesn't look right:
After a bit of diagnostics it appears that we may have made a false assumption. After all, we are basing the functionality on an open source version of Solaris and a major version of Solaris has many major changes between updates, and I'm running update 9 (on Intel).
If we look at the open source code we see that all versions have the following as the first test. It is therefore reasonable to expect that the closed source version will have that first (or near the first) test.
So, lets run up a kernel debugger and have a look at the assembly of tcp_conn_request(). Here is the truncated and annotated output.
It appears that arg0 may not be tcp_t after all, but if we look at the offset 0x28 from arg0 we will find a pointer to it. In this case, there may be a public structure this maps to, there may not. I will leave that as an exercise for the reader.
Lets re-write the setting of self->tcpq to be the following and try again.
This time it looks better:
So, Qmax etc look good but conn_lport doesn't.
We know that the tcp_t address is right from ndd, and that from tcp_lookup_listener_ipv4() we are looking in the right place for the local port, so in this case it may be that the metadata DTrace is using is not the actual structure (something is different). In this case lets go back to the kernel debugger to see if we can find something that could be a “port 22” in the conn_s structure.
This is at offset 266 (0x10a); so lets change this value to the following and re-run.
This time we get a value, but it doesn't look right:
As we are on Intel and we are printing a network port, there is a good change this is in network byte order. Simple to test; we pass it through htons().
This time all the values look right:
As this is event driven and we haven't added any predicates we can just try another port to confirm things look ok. e.g. 111/tcp:
And a quick check with ndd confirms this:
A quick note, the reason q0 is not at least one. We are evaluating the parameters on entry to the function that will update them.
If we take a step back and think what we have done. We are now dynamically tracing incoming SYN packets caused by a network interrupt whilst within the mutual exclusion of the listener perimeter in interrupt context on a running (live) system. We are therefore certain as to the values of q0/q/qmax at that point in time. How awesome is that.
In the next part we will update the script so we also report on the origin of the packet. We will also add some predicates to only look at a particular listener.