Following on from Part 1,
we will further enhance our script to trace incoming connections through the
kernel to the application.
Before doing that we will tweak the script to
provide a CSV output. We do this using a BEGIN
probe to print the header and then update the probes. We will also split the
probes. Where there is a match, multiple probes are triggered in the order
listed in the script. We can therefore say we are only interested in a specific
port and then do the heavy lifting later on.
It is also important that you clear variables
once you finish with them so as not to fill up the kernel buffers (you will
loose traces if you don't; given enough probes firing).
#!/usr/sbin/dtrace -Cs
#pragma D option quiet
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/ip.h>
BEGIN
{
printf("TS,Date,Action,Source IP,Source Port,Dest
Port,Q0,Q,Qmax\n");
}
fbt:ip:tcp_conn_request:entry
{
self->tcpq =
*(tcp_t**)((char*)arg0+0x28);
self->lport =
ntohs(*(uint16_t*)((char*)self->tcpq->tcp_connp+0x10a));
}
fbt:ip:tcp_conn_request:entry
/self->lport == $1/
{
this->srcAddr = 0;
this->srcPort = 0;
printf("%d,%Y,%s,0x%08x,%d,%d,%d,%d,%d\n",
timestamp,
walltimestamp, "syn",
this->srcAddr, this->srcPort,
self->lport,
self->tcpq->tcp_conn_req_cnt_q0,
self->tcpq->tcp_conn_req_cnt_q,
self->tcpq->tcp_conn_req_max
);
}
fbt:ip:tcp_conn_request:return
/self->lport/
{
self->tcpq = 0;
self->lport = 0;
}
Whilst in tcp_conn_request()
we don't actually have a connection from the client; the kernel is about to
build one.
It is the message in the call that is actually
the incoming packet to process. IP has dealt with it; now it is TCP's turn.
Remember that the tcpq we have used so far is the listener's connection (the
one that is in the listen state), not the eager's (new connection); so looking
at that is not going to be productive.
We could probe another function or we could
decode the message; tcp_accept_comm()
is called by tcp_conn_request()
to accept the connection providing a
few things are satisfied such as Qmax not been reached. If we are only
interested in connections that aren't dropped due to, e.g. Q==Qmax, then this
would be a good place. However, if we also want to capture failed connections,
we need to do the processing ourselves.
So, let's decode the message :D
Looking at the source for tcp_conn_request()
we can see that arg1 (mp,
the mblk_t)
is the message. Not surprising as this is a STREAMS module. Further down we can
see where the data is and the format of the data. i.e. we have a pointer to the
IP header here.
ipha = (struct ip *)mp->b_rptr;
As mblk_t
is a standard structure we can easily map this within our DTrace script. We can
therefore change the setting of srcAddr
to the following:
this->mblk = (mblk_t
*)arg1;
this->iphdr =
(struct ip*)this->mblk->b_rptr;
this->srcAddr =
this->iphdr->ip_src.s_addr;
If we then run the script and connect to the
server running on 172.16.170.133 from client 172.16.170.1 we get the following:
root@sol10-u9-t4# ./tcpq_ex7.d 22
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
180548302609318,2016 Aug 11 14:37:02,syn,0x85aa10ac,0,22,0,0,8
As you can see, the hex address is that of the
destination and not the source. Given that an IP header is a standard format
(defined in rfc791)
this is a bit surprising. So, what is going on?
A
Slight Detour
The IP header is defined in /usr/include/netinet/ip.h.
It is as expected. Whilst I won't go into all the steps to find the cause, here
is the quick version.
The DTrace compiler uses the C pre-processor,
but compiles it's own code for the kernel to use. DTrace should be able to
import C headers to allow you to use names, etc, but clearly something has gone
wrong.
One test I did is to create my own header using
the constituent parts and a key tweak; one struct doesn't use fields (bitmasks)
but both are the same size.
#ifndef MYIP_H
#define MYIP_H
#include <sys/types.h>
typedef struct {
uchar_t c1;
uchar_t c2,c3,c4;
ushort_t s1,s2,s3,s4;
uint32_t src,dst;
} myip_t;
typedef struct {
uchar_t c1_1:4,c1_2:4;
uchar_t c2,c3,c4;
ushort_t s1,s2,s3,s4;
uint32_t src,dst;
} myip2_t;
#endif
Then, if we run this through a simple C program
that just prints the sizes all looks good:
#include <stdio.h>
#include "myip.h"
int main(int argc, char *argv[]) {
printf("myip_t :
%u\n", sizeof(myip_t));
printf("myip2_t:
%u\n", sizeof(myip2_t));
return 0;
}
Which gives the following output:
root@sol10-u9-t4# ./a.out
myip_t : 20
myip2_t: 20
Then within the BEGIN probe of the DTrace we do
the same (not forgetting to include the header):
printf("myip_t :
%d\n", sizeof(myip_t));
printf("myip2_t:
%d\n", sizeof(myip2_t));
Which gives the following output:
root@sol10-u9-t4# ./tcpq_ex8.d -I `pwd` 22
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
myip_t : 20
myip2_t: 24
This is the problem. A few more tests show that
the DTrace compiler cannot handle fields (bitmasks) in the C structs correctly,
at least in the situations I've tested (me'thinks a bug). I will leave it as an
exercise for the reader to analyse this further.
Back
to the Task at Hand
As the IP header is well-defined we can just
define the offset without further analysis.
this->srcAddr = *(uint32_t*)((char*)(this->iphdr)+0xc);
This yields the correct IP address:
root@sol10-u9-t4# ./tcpq_ex9.d 22
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
181523558976884,2016 Aug 11 14:53:17,syn,0x01aa10ac,0,22,0,0,8
As the header length of the IP header and the
TCP header has the bitmask after the port, we should be able to extract the
port quite easily (setting srcPort
to the result):
this->ihl = 4 *
((int)*((char*)(this->iphdr)) & 0xf);
this->tcphdr =
(struct tcphdr*)((char*)(this->iphdr)+(this->ihl));
this->srcPort =
ntohs(this->tcphdr->th_sport);
This yields the following output:
root@sol10-u9-t4# ./tcpq_ex10.d 22
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
181903328660708,2016 Aug 11 14:59:37,syn,0x01aa10ac,40950,22,0,0,8
Which is confirmed via netstat:
root@sol10-u9-t4# netstat -an | fgrep 40950
172.16.170.133.22
172.16.170.1.40950 29312
0 49232 0 ESTABLISHED
Completion
of the handshake
As you can guess, there are various locations
that we can probe the kernel in order to mark events. In the case of the
completion of the three-way handshake, I'm going to go for tcp_send_conn_ind()
in usr/src/uts/common/inet/tcp/tcp_tpi.c.
The key function is actual tcp_rput_data()
in usr/src/stand/lib/tcp/tcp.c
where the state is changed to EST.
Anyway, in tcp_send_conn_ind()
arg0 is the listener and arg1 is a mblk_t
containing a pointer to a struct
T_conn_ind. However, as we have already processed the
original syn packet, we also have an eager tcp_t.
We can extract that, and use the offsets we know to get the address and port of
the peer.
We can combine probes where there is common
processing. In this particular kernel arg0 is the same.
fbt:ip:tcp_conn_request:entry,
fbt:ip:tcp_send_conn_ind:entry
{
self->tcpq =
*(tcp_t**)((char*)arg0+0x28);
self->lport =
htons(*(uint16_t*)((char*)self->tcpq->tcp_connp+0x10a));
}
We can now predicate all the other probes
within the kernel thread to where the port maps what we are after. This
includes a simple way to label the action. We can also change the variable
context from 'this' (lexical scope) to 'self' (thread scope) to pass the
parameters that are in different probes but the same thread.
First, tcp_conn_request(),
which should be familiar:
fbt:ip:tcp_conn_request:entry
/self->lport == $1/
{
self->action =
"syn";
this->mblk =
(mblk_t *)arg1;
this->iphdr =
(struct ip*)this->mblk->b_rptr;
this->ihl = 4 *
((int)*((char*)(this->iphdr)) & 0xf);
this->tcphdr =
(struct tcphdr*)((char*)(this->iphdr)+(this->ihl));
self->srcAddr =
*(uint32_t*)((char*)(this->iphdr)+0xc);
self->srcPort =
ntohs(this->tcphdr->th_sport);
}
Then tcp_send_conn_ind():
fbt:ip:tcp_send_conn_ind:entry
/self->lport == $1/
{
self->action =
"syn-aa";
this->mblk =
(mblk_t *)arg1;
this->tconnind =
(struct T_conn_ind*)(this->mblk->b_rptr);
this->tcpp =
*(tcp_t**)(((char*)this->tconnind)+(this->tconnind->OPT_offset));
self->srcPort =
ntohs(*(uint16_t*)((char *)this->tcpp->tcp_connp+0x108));
self->srcAddr =
*(uint32_t*)((char*)this->tcpp->tcp_connp+0x104);
}
The printout then is common to both, since all
of it is data driven from the previous stages:
fbt:ip:tcp_conn_request:entry,
fbt:ip:tcp_send_conn_ind:entry
/self->lport == $1/
{
printf("%d,%Y,%s,0x%08x,%d,%d,%d,%d,%d\n",
timestamp,
walltimestamp,
self->action,
self->srcAddr,
self->srcPort,
self->lport,
self->tcpq->tcp_conn_req_cnt_q0,
self->tcpq->tcp_conn_req_cnt_q,
self->tcpq->tcp_conn_req_max
);
}
Finally we clear the buffers to tidy up:
fbt:ip:tcp_conn_request:return,
fbt:ip:tcp_send_conn_ind:return
/self->lport/
{
self->tcpq = 0;
self->lport = 0;
self->action = 0;
self->srcAddr = 0;
self->secPort = 0;
}
If we run this on a connection we can see the
two parts. Notice the value of Q0. On entry it is one as we have an embryonic
connection which is about to transition to Q.
root@sol10-u9-t4# ./tcpq_ex11.d 22
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
185255847646301,2016 Aug 11 15:55:29,syn,0x01aa10ac,41010,22,0,0,8
185255847845639,2016 Aug 11 15:55:29,syn-aa,0x01aa10ac,41010,22,1,0,8
If we fill up Q so it equals Qmax (using my
test script from a previous blog) and retry, accepting a connection on the
queue part way through to free the kernel backlog we get this, clearly showing
the remote servers tcp retries.
root@sol10-u9-t4# ./tcpq_ex11.d 2000
TS,Date,Action,Source IP,Source Port,Dest Port,Q0,Q,Qmax
185458485779874,2016
Aug 11 15:58:52,syn,0x01aa10ac,50066,2000,0,2,2
185459488194513,2016 Aug 11
15:58:53,syn,0x01aa10ac,50066,2000,0,2,2
185461492294103,2016 Aug 11 15:58:55,syn,0x01aa10ac,50066,2000,0,2,2
185465496397317,2016 Aug 11
15:58:59,syn,0x01aa10ac,50066,2000,0,1,2
185465496517076,2016
Aug 11 15:58:59,syn-aa,0x01aa10ac,50066,2000,1,1,2
As you can see, the max'ed out queue resulted
in the connection taking 185465496517076 minus 185458485779874 = 7010737202 ns
(or 7.01 seconds) to establish the connection.
In the next part we will look at extending this
into the application, using both syscall tracing and userspace tracing.
As always, please can you provide feedback so I
can improve the blog.
No comments:
Post a Comment