Monday, March 12, 2012

Identifying computers behind NAT with plotpcap

Following on from my last post Identifying computers behind NAT with pyflag I've made a stand alone script plotpcap that can produce similar graphs without needing to install pyflag.

The results aren't as pretty and you miss out on some of pyflag's analytical tools (such as filtering streams by user agents). On the other hand you do gain the ability to filter your output by tcpdump style filter strings and with a little bit of pcap preprocessing from tshark you can perform almost all the same comparisons.

plotpcap requires the python modules dpkt, pcap (from pypcap) and matplotlib. I used the versions available from the Ubuntu 10.04 repository but other versions are probably good too.

Here's some output generated from the same example data as the last post:
IPID versus Packet Number (note that without stream highlighting it gets a bit hard to read)
IPID versus Packet Number after excluding packets with TCP timestamp options (ipid2)
TCP Timestamps versus Packet Number
If you wanted to do some of the tricks from the last post you can apply wireshark display filters to the pcap and then run it through plotpcap. For example:

tshark -r test.pcap -w test_chrome.pcap -R "http.user_agent contains Chrome"
python plotpcap.py test_chrome.pcap number ipid


Produces something like:
IPID versus Packet Number after matching the wireshark display filter "http.user_agent contains Chrome"

Monday, February 20, 2012

Identifying computers behind NAT with pyflag

I've been a bit busy recently as I'm preparing to move across the world to the US to work at a small Internet company in the SF Bay Area. In the mean time though my current employer has been kind enough to let me contribute back some of the code we have written for the pyflag project (the link goes to my github page which has a fork of the project as the upstream site pyflag.net is down right now). Update: An alternate version (without the feature described below) is available on google code

The new features centre around identifying computers that are all lumped together behind a network address translation gateway (NAT). The idea is if you can identify the computers behind the NAT gateway you can attribute traffic to a specific system rather than only down to the network itself. The implementation is some visualisation tools in pyflag that allow you to plot certain packet headers fields against packet numbers or time.

Here's an example:
IPID field plotted against PCAP packet number
The plot takes the IP Identification field from the IP header and plots it sequentially against the PCAP packet number (pyflag also supports plotting against time). It looks like a big mess but you can see some lines and maybe some patterns in there. The IPID field is used to associate fragmented packets together for reassembly and it is generally left untouched by NAT gateways. Usefully different networking stacks have different strategies for picking IPID values.

In my anecdotal (non-scientifically determined) experience:

  •  Windows machines start at 0 when the computer is booted and increment for each packet sent up until 2^16 and then start again. In some cases it seems to wrap at 2^15 which to me suggests a signed integer problem but I haven't conclusively figured out on what versions it happens on. Additionally, I've read (but not seen) that some versions of Windows send the field in host order rather than network byte order.
  • Linux machines pick a random number for the start of the connection and then increment the value for each subsequent packet of the connection. I've heard (but again not seen) that packets with the Don't Fragment bit set get their IPID set to 0 on Linux.
  • BSD machines (including Mac OS X) pick a random number for every packet.
So looking back at our example we can see a haze of small lines and also a couple of longer lines which suggests that we might be looking at one or more Linux boxes along with one or more Windows boxes. To test this theory I looked for any user-agent strings in web traffic and found the following:

User-Agent strings present in the sample PCAP file
Based on those user agent strings it looks like there is at least one Ubuntu system and one Windows system. Also of note is the presence of Java user agent strings as well as Transmission (the Ubuntu Bittorrent client).

If we revisit our previous IPID plot and tell pyflag to colour all the Chrome/Windows user agent string related streams blue we get the following:

IP ID versus PCAP number with Chrome on Windows streams highlighted
From this it becomes clear that there are two distinct lines of IPID growth which implies that behind this NAT gateway are two Windows systems, one which was active for longer and even sent enough packets that the IPID value wrapped. Knowing the shape of these lines means that you can associate other traffic (perhaps traffic with no distinguishing application layer features such as encrypted streams) to a specific computer and any metadata gleamed from other application protocols (like HTTP).  

To make this even clearer there's another header field to consider, this time in the TCP header. There is an optional header in TCP called the timestamp value (defined by RFC1323) which is used to measure packet round trip times. By default Windows systems omit this value while most other systems include it (I've read that Windows can be configured to send timestamps and that in some cases will use timestamps if the client connecting to it uses timestamps). This means that if we exclude packets that have a TCP timestamp we should be left with all Windows traffic (assuming we exclude non-TCP traffic as well).

IPID versus PCAP number for Chrome user-agents, minus packets that have a TCP timestamp
After excluding packets with the TCP timestamp option set most of the background packets have been excluded. The remaining packets that don't fall on the lines are likely parser failures or packets generated by a Linux box that do not have a timestamp value for one reason or another (more investigation is required).

So we're convinced that there are two Windows system on the network and some yet to be determined number of Linux systems, if we change our filter to highlight Firefox on Linux and then plot IPID we get something that looks like this:

IPID versus PCAP number for Firefox sessions on Linux
 The things to note here is that the IPID values change dramatically between connections, also that in general HTTP traffic seems to be in the minority of the non-Windows traffic and finally that we're no closer to determining how many Linux systems are present. However, if we consider the TCP timestamp field for a moment we learn that it's generally determined as:

From: Identifying hosts with natfilterd
The interesting part in this case is that wallclock - boottime should be unique among the hosts that use the TCP timestamp option and it should increment in a predictable fashion. So if we graph the TCP timestamp value of packets versus their PCAP number we get:

TCP Timestamp value versus PCAP packet number (Firefox/Linux traffic highlighted)
Again we can see that the Firefox traffic accounts for only a minority of packets and we also see that there're two distinct lines for the first half on the plot. These two lines suggests that there are two Linux systems and the line fragment at the end probably represents a reboot (and not wrapping because the timestamp values are 32 bit numbers and the values we see are around 2^18 at their highest) of one of the systems or the appearance of a new one.

So at this point I'm convinced that there are two Linux systems and two Windows system and that most of the Windows packets are HTTP traffic (using Chrome) and that while there is HTTP traffic it accounts for only a small amount of the Linux related packets. For the remainder of the Linux traffic I'd guess that at least one of the systems is transferring files using BitTorrent based on the Transmission user-agent that was present before. Maybe if we plot the traffic with the Transmission user-agent we'll be able to tell which computers were running BitTorrent:


TCP Timestamp versus PCAP Packet Number for the user-agent "Transmission"
 At first this looks good, the line with the lower timestamp values is associated with Transmission and the higher one is not. Unfortunately this plot is ambiguous because the third line section is also associated with Transmission traffic and that line could easily belong to the top line section (after a reboot). If instead we ask pyflag to generate a table with only traffic that is not to or from ports 80 or 53 (to eliminate HTTP and DNS) we're left with a lot of connections between high ports transferring lots of  encrypted (looking) data to our NAT gateway address which fits the hypothesis of BitTorrent traffic. When we plot the timestamp values again and highlight any packet from our Not-HTTP/Not-DNS table we get the following:

TCP Timestamp versus PCAP number with non-HTTP/non-DNS traffic highlighted
At this point I'm reasonably confident that both the observed Linux hosts are downloading files over BitTorrent once I combine this plot with some analysis of the ports / stream sizes seen while I'm equally convinced that the Windows systems are not using BitTorrent or at least that there isn't a significant level of BitTorrent traffic observed during this packet capture.
The above little demo is contrived but I have found that this kind of analysis can be really useful in characterising the use of a network. This example was constructed from 5 virtual machines, 2 running Windows XP, 2 running Ubuntu 10.04 and a NAT gateway running Ubuntu 10.04 and using iptables/netfilter to do the NATing. Also, just in case you were wondering the Windows machines were watching youtube (in particular nyan cat and techno viking) while the Ubuntu systems were each using BitTorrent to download ubuntu images (12.04 alpha for different architectures). 


Future Work
  • Spring cleaning of the pyflag source (it's a little annoying to build and use right now)
  • More options on what to graph (maybe a system for generically plotting table information)
  • Ability to choose what to highlight based off the reverse side of a stream
  • Implementing a minimal version of this visualisation outside of pyflag Done! Identifying computers behind NAT with plotpcap

Related Work and Further Reading
Now that I've got the links handy I thought I'd also point at Michael Cohen's work. Michael is one of the authors of pyflag (project lead is probably a better description), and it's his ideas and that lead to the implementation of IP ID processing in pyflag.

Monday, January 2, 2012

Yet Another First Ascension Post

I was going through the pages of an old defunct blog of mine and I saw this image and thought that I would repost it for old times sake. This is one of my proudest computer gaming moments of all time (from October 2009).


Tuesday, September 6, 2011

Something you should know about talloc

Talloc is an excellent memory management system for C that provides hierarchical memory pools with other cool tricks like destructors. It's written by Tridge for Samba and I really like it. If you are writing a complex system in C you could do worse than to replace your calls to malloc with calls to talloc.

So that's talloc, but the thing you really should know about talloc is right there at the bottom of the project page. In particular:


when using talloc_enable_leak_report(), giving directly NULL as a parent context implicitly refers to a hidden "null context" global variable, so this should not be used in a multi-threaded environment without proper synchronization.

I've spent many days recently hunting down a bug, the bug would have been much easier to find if I had read the above line. Suddenly I was sharing contexts all over the place and very very rarely there'd be a synchronization problem that would lead to a null pointer deref. 

By the way talloc_enable_leak_report() is an excellent feature of talloc. Excellent. 

Monday, August 8, 2011

Authenticode and Antivirus Detection part 2

After Shane's comments on the Authenticode and Antivirus Detection post I thought I'd run some more tests. I wanted to try and figure out how much of the observed detection difference were because some extra bytes had been added and how much was due to special handling of signed binaries.

I found an archive of malware online and created four sets of samples. Set one was the malware without any changes, set two was after the binaries had been signed with the TEST1 certificate, set three was signed with a TEST2 certificate that was similar to TEST1 but was only valid from 1975 - 2009 and set four had a random blob of 32 bytes appened to the end. Using the VirusTotal API and Bryce Boe's python script I ran each of the sets against the VirusTotal antivirus suite.

The resulting statistics are here, showing the number of AV positives, the format is:
 "HASH [SET1, SET2, SET3, SET4] [SET1 - SET2, SET1 - SET4]"

And here are the first 10 entries (ordered by decreasing "SET1 - SET2" value):

DB1D5E...34573 ['28', '10', '16', '22'] ['18', '6']
FDFB86...1FE0C ['34', '18', '18', '27'] ['16', '7']
6D48A7...F4880 ['36', '20', '22', '33'] ['16', '3']
CA9C3E...ED072 ['31', '16', '16', '23'] ['15', '8']
8798FA...8755B ['35', '20', '20', '32'] ['15', '3']
1011ED...0DB18 ['35', '20', '20', '33'] ['15', '2']
DA01D0...C899D ['31', '17', '16', '28'] ['14', '3']
CC3B7D...228D1 ['37', '23', '23', '34'] ['14', '3']
CADD90...CE9C4 ['35', '21', '21', '31'] ['14', '4']
B6BBE8...8CD10 ['32', '18', '18', '29'] ['14', '3']

General observations:
  • Adding either an authenticode signature or random data would defeat several engines
  • Very rarely would the signing certificate's validity influence the score
  • For some reason adding the random data occasionally resulted in more signatures being hit and considering that the same data was added to each sample I'm not sure what happened there.
  • This test primarily tests the AV signature engines and not their runtime or heuristic scanners
  • The VirusTotal API limit of 20 requests each 5 minutes sounds like a lot until you run tests like this.
Really what I've learnt from this is that AV signatures are even more fragile than I realised. To get a proper look at how AV treats authenticode signed binaries I think I'd need to evaluate all of each AV's modules and not just the signature engine.

Saturday, August 6, 2011

Tavis Ormandy's Sophail paper

On the topic of antivirus, Tavis Ormandy has recently released a paper looking into the internals of Sophos. It's quite scathing and very interesting, check it out: Sophail: A Critical Analysis of Sophos Antivirus

Authenticode and Antivirus Detection

It turns out that many antivirus engines white list authenticode signed binaries regardless of the trustworthiness of the signature. Here's an experiment that I performed, feel free to play along at home (remember to be careful when working with malware).

Step 1: Find some malware
This was actually the most time consuming step, a lot of places talk about malware and offer large archives of malware samples to download. Even so, it took me a good 15 minutes to find a malicious windows executable that I could download from a site without a password, registration or other nonsense. In the end I found a site that lists live drive by download sites and I grabbed an EXE before the particular malware host went down. Sadly I can't find the link to the index site I was using, I'm sure a little bit of Googling will allow you to retrace my steps.

I ended up with freedom.exe md5sum: ba87b562c829b7095bfb9e0db7a39890

Step 2: Confirm that it is detected by Antivirus
For this to work you need to know that your malware sample is detected by antivirus engines so I recommend submitting it to VirusTotal or similar service. Alternative if you have the resources run it against your local battery of antivirus installs.

Freedom.exe was detected under a variety of names, Microsoft Security Essentials calls it Trojan:Win32/Danginex. The results were 36/43 (83.7%) considered Freedom.exe malware.


Step 3: Generate a code signing certificate
I don't have a proper code signing certificate handy so I thought I'd generate a self-signed certificate for the test. I used makecert.exe and pvk2pfx.exe from the Windows SDK 7.1 and the following commands:


makecert -r -pe -$ individual -n CN=TEST1 -sv test1.pvk test1.cer
pvk2pfx -pvk test1.pvk -spc test1.cer -pfx test1.pfx


Step 4: Sign the malware sample
Copy the sample to a new filename and then use signtool.exe to add the authenticode signature saying that TEST1 is responsible for this file.


signtool sign /f test1.pfx freedo-signed-test1.exe






Step 5: See what AV thinks of this new file
Submit your new file to VirusTotal and see what happens. In the case of Freedom.exe the detection rate fell from 83.7% to 27.9% (12/43). Most of the big names in the AV community (with a couple of notable exceptions) were quite happy to ignore Freedom.exe once it had been signed.


The Antivirus engines that changed their minds about freedom.exe are:
AhnLab-V3, AVG, BitDefender, CAT-QuickHeal, Comodo, Emsisoft,
F-Secure, Fortinet, Ikarus, K7AntiVirus, McAfee,
McAfree-GW-Edition, Microsoft, Norman, nProtect, PCTools, Rising,
Sophos, Symantec, TheHacker, TrendMicro, TrendMicro-HouseCall,
VIPRE, ViRobot

Notably Kaspersky flagged both the original and modified samples as Trojan-Clicker.Win32.Agent.shx and ClamAV among 7 others did not flag either sample.

Conclusion: What have we learnt?
Signed executables are more likely to be considered benign by antivirus engines. Signed executables are probably excluded by policy for performance reasons but it is possible (but unlikely) that instead that the addition of the Authenticode block at the end of the file is disrupting the signatures used by the engines. I hope that in the future that if vendors are going to exclude signed binaries that they at least check to see if the certificate used to sign the binary is trusted.

Thursday, May 26, 2011

GitHub additions!

I've ported some of my old projects over to git and uploaded them to github.
A much better solution than hosting raw source files on my web server!


The projects that have been ported:
talklikewarren A twitter bot that posts things that sound like Warren Ellis.
fakemiddleman - A twitter bot that posts things that sound like The Middleman.
hottest100 - A python script that created a live music video channel out of The Triple J Hottest 100.
top1m A squid redirector that prevents clients from visiting sites outside of the Alexa top 1 million.
twitbot - An example of how one might use twitter as a channel for command and control of malware.



Saturday, May 14, 2011

py360 - A speed increase (mostly)

I've updated py360 (from the patch note):

Changed the Partition class from preprocessing the entire partition during its constructor, instead it now will resolve files and directories on demand and store the results for later. Basically trading precomputation for memoization. gamertags.py runs about 90x faster, report360.py runs about the same (since it touches every file) and mounting is about 100x faster. These improvements are at the cost of all first time reads being slightly slower but no wasted preprocessing is done.


What isn't mentioned is that this will also make it much more likely that corrupted partitions will mount (which was why I started looking at this change at all). Also, gamertags.py and report360.py have been changed to be compatible with the new changes. The main difference is that using partition.allfiles does not necessarily return all files but rather all the processed files, use partition.walk() to get all files. The old behaviour is still available by passing in precache=True to the Partition constructor. 

This is a fairly experimental patch so let me know if it doesn't work for you. Next on the dev list remains better output from report360.py (and STFS / XDBF).

I also hope to write some posts unrelated to py360 soon too!

Tuesday, May 10, 2011

py360 - Update!

Thanks to the feedback and test data provided by some excellent people (Thanks Juri, Matt and DC) I've managed to fix several bugs in py360 and it should now be a smoother experience.

The biggest fix is in the STFS parser which would naively assume that all filelisting blocks were contiguous, this fix means that STFS files with large numbers of files inside them will now work (e.g. Profiles that see a lot of use). Now report360.py should run cleanly on (hopefully) all images and the only errors you will see will be from trying to parse deleted files (they are recognisable by the ~ that precedes their name).

Next up on the dev list is to investigate the unicode parsing problems that occasionally appear and cause extraneous bytes (usually nulls) to appear in the output. Worst case I plan on changing report360.py to remove the null bytes, best case I find the underlying cause and sort that out.

If you would like to help with py360 I'd like to hear from you. I'm really interested in hearing about your experiences with py360 and especially tell me of any errors of inconsistencies that you encounter. If you're a programmer and really keen feel free to contribute code —especially example programs and bugfixes— I won't turn you away!