Saturday, November 17, 2012

TextHole Source Code

A quick update.
 The source code to TextHole is now available from my github account

Monday, November 5, 2012

Tavis Ormandy's (second) Sophail paper

Tavis has done it again with another paper about the failings of Sophos. This time with several interesting bugs and a working exploit.

Sophail: Applied attacks against Sophos Antivirus

Full Disclosure Post (including link to exploit):
[Full-disclosure] multiple critical vulnerabilities in sophos products

Saturday, September 15, 2012


To experiment with Google Appengine I've created a simple text repository application called TextHole.

TextHole is a basic text repository with the following features:

  • Anonymous uploads and read access
  • Optional Google OAuth2 authentication to allow you to delete or edit your uploads
  • A simple JSON interface makes it easy to post and download text from other sites
To download text via JSON make a GET request to[mesage_id]
The reply will be a JSON dictionary with the following keys:

body: the text body of the message
editable: whether the requestor can modify the text
creation: Creation time of the text
expiry: unix timestamp of the expiry of the text
message_id: the message id of this text
status: True if the request succeeded
error: If status is false more details here

Note: Only status and message_id fields are guaranteed

To upload text via JSON make a POST request to providing a JSON dictionary via the data form field. 

Possible actions are:
New message: The body key is required
Delete: The delete key is required
Edit message: The body and overwrite keys are required

Request dictionary keys:
body: the text body of the new/modified message
delete: the message id of the message to delete
overwrite: the message id of the message to edit
authenticated: if set attribute the new message to the user
expiry: number of seconds (max 1yr) the text is to be valid for

Notes: One of body and delete is required. Overwrite and delete require a valid cookie to be sent with the request.

Reply dictionary keys: 
message_id: the message id of the new/edited/deleted text
status: True if the request succeeded
error: If status is false more details here
expiry: unix timestamp of the expiry of the text
user: username of the owner of the text ("None" for anonymous)

TextHole is missing the following features (maybe coming soon):
  • An index of available texts
  • Text search
  • A javascript client library to make it even easier to integrate with TextHole
  • A way to authenticate via the JSON library
Please play around with TextHole and send me any bugs or ideas that you find. Please remember that everything in TextHole is public, I can see it, and so can everyone else. Finally, please don't use TextHole for evil.

Friday, September 7, 2012

Google acquires VirusTotal

VirusTotal, the online service that will scan uploaded files against dozens of AV engines has been acquired by Google. Here's the announcement. I think this is great, I'm a big fan of VirusTotal and I am looking forward to what Google and VT can come up with together.

Monday, July 30, 2012

Owning Ubisoft

Tavis Ormandy is at it again, this time offhandedly revealing a drive-by code execution vulnerability in Ubisoft's Uplay platform. A malicious website could cause the Uplay browser plugin to execute arbitrary commands on the victim's computer. The attack takes advantage of a feature that allows a visited website to launch a Ubisoft game but does not check that the command that the website issues corresponds to a legitimate game. The issue has been patched in an emergency update from Ubisoft.

Full details:

Sunday, June 17, 2012

Mapping the relationship between YouTube videos

I've been playing John Robertson's YouTube choose your own adventure game The Dark Room and I've been having a great time. However, I need a little help navigating the room (you see, it's dark in there) and so I wrote a program to do a little cartography and create a map of the game.

An abbreviated map of The Dark Room (you can make a complete one with
The map shows the videos that comprise The Dark Room (abbreviated here for space and to limit the spoilers) with the size of each node proportional to the number of views the video has and the colour signifying the number of outbound links from the video. The map was generated by from my ytmap repository and is created by processing the YouTube annotations. Sadly, the annotations aren't available from the YouTube GData API so I process the annotations with regular expressions. The map provides a huge boon in navigating The Dark Room but does not make escaping trivial (it's like John anticipated this kind of analysis).

After creating I realised that this approach could also be used to help me discover YouTube content by seeing who my favourite film makers and musicians linked to and in turn who they linked to. So I created and started by plotting the people in Lindsey Stirling's YouTube video social network and ended up with a giant mess of relationships that quickly got out of control. After adjusting my scripts to build in some limits I ended up with this diagram of her closest neighbors.
Lindsey Stirling's YouTube collaboration social network 
While not the most useful analysis tool I've ever built I've been having fun with it and you should too!

Check out ytmap on github! 

If you like spoilers here is a complete (as far as I know) version of The Dark Room map (use your browser's zoom function to navigate it better). 

Finally, here is a large version of Lindsey Stirling's network and a large version of GeekandSundry's network (Felica Day and Will Wheaton's YouTube channel).

Edit: Viewing the images directly makes them clickable, so that they can take you directly to the YouTube user or video directly.

Monday, March 12, 2012

Identifying computers behind NAT with plotpcap

Following on from my last post Identifying computers behind NAT with pyflag I've made a stand alone script plotpcap that can produce similar graphs without needing to install pyflag.

The results aren't as pretty and you miss out on some of pyflag's analytical tools (such as filtering streams by user agents). On the other hand you do gain the ability to filter your output by tcpdump style filter strings and with a little bit of pcap preprocessing from tshark you can perform almost all the same comparisons.

plotpcap requires the python modules dpkt, pcap (from pypcap) and matplotlib. I used the versions available from the Ubuntu 10.04 repository but other versions are probably good too.

Here's some output generated from the same example data as the last post:
IPID versus Packet Number (note that without stream highlighting it gets a bit hard to read)
IPID versus Packet Number after excluding packets with TCP timestamp options (ipid2)
TCP Timestamps versus Packet Number
If you wanted to do some of the tricks from the last post you can apply wireshark display filters to the pcap and then run it through plotpcap. For example:

tshark -r test.pcap -w test_chrome.pcap -R "http.user_agent contains Chrome"
python test_chrome.pcap number ipid

Produces something like:
IPID versus Packet Number after matching the wireshark display filter "http.user_agent contains Chrome"

Monday, February 20, 2012

Identifying computers behind NAT with pyflag

I've been a bit busy recently as I'm preparing to move across the world to the US to work at a small Internet company in the SF Bay Area. In the mean time though my current employer has been kind enough to let me contribute back some of the code we have written for the pyflag project (the link goes to my github page which has a fork of the project as the upstream site is down right now). Update: An alternate version (without the feature described below) is available on google code

The new features centre around identifying computers that are all lumped together behind a network address translation gateway (NAT). The idea is if you can identify the computers behind the NAT gateway you can attribute traffic to a specific system rather than only down to the network itself. The implementation is some visualisation tools in pyflag that allow you to plot certain packet headers fields against packet numbers or time.

Here's an example:
IPID field plotted against PCAP packet number
The plot takes the IP Identification field from the IP header and plots it sequentially against the PCAP packet number (pyflag also supports plotting against time). It looks like a big mess but you can see some lines and maybe some patterns in there. The IPID field is used to associate fragmented packets together for reassembly and it is generally left untouched by NAT gateways. Usefully different networking stacks have different strategies for picking IPID values.

In my anecdotal (non-scientifically determined) experience:

  •  Windows machines start at 0 when the computer is booted and increment for each packet sent up until 2^16 and then start again. In some cases it seems to wrap at 2^15 which to me suggests a signed integer problem but I haven't conclusively figured out on what versions it happens on. Additionally, I've read (but not seen) that some versions of Windows send the field in host order rather than network byte order.
  • Linux machines pick a random number for the start of the connection and then increment the value for each subsequent packet of the connection. I've heard (but again not seen) that packets with the Don't Fragment bit set get their IPID set to 0 on Linux.
  • BSD machines (including Mac OS X) pick a random number for every packet.
So looking back at our example we can see a haze of small lines and also a couple of longer lines which suggests that we might be looking at one or more Linux boxes along with one or more Windows boxes. To test this theory I looked for any user-agent strings in web traffic and found the following:

User-Agent strings present in the sample PCAP file
Based on those user agent strings it looks like there is at least one Ubuntu system and one Windows system. Also of note is the presence of Java user agent strings as well as Transmission (the Ubuntu Bittorrent client).

If we revisit our previous IPID plot and tell pyflag to colour all the Chrome/Windows user agent string related streams blue we get the following:

IP ID versus PCAP number with Chrome on Windows streams highlighted
From this it becomes clear that there are two distinct lines of IPID growth which implies that behind this NAT gateway are two Windows systems, one which was active for longer and even sent enough packets that the IPID value wrapped. Knowing the shape of these lines means that you can associate other traffic (perhaps traffic with no distinguishing application layer features such as encrypted streams) to a specific computer and any metadata gleamed from other application protocols (like HTTP).  

To make this even clearer there's another header field to consider, this time in the TCP header. There is an optional header in TCP called the timestamp value (defined by RFC1323) which is used to measure packet round trip times. By default Windows systems omit this value while most other systems include it (I've read that Windows can be configured to send timestamps and that in some cases will use timestamps if the client connecting to it uses timestamps). This means that if we exclude packets that have a TCP timestamp we should be left with all Windows traffic (assuming we exclude non-TCP traffic as well).

IPID versus PCAP number for Chrome user-agents, minus packets that have a TCP timestamp
After excluding packets with the TCP timestamp option set most of the background packets have been excluded. The remaining packets that don't fall on the lines are likely parser failures or packets generated by a Linux box that do not have a timestamp value for one reason or another (more investigation is required).

So we're convinced that there are two Windows system on the network and some yet to be determined number of Linux systems, if we change our filter to highlight Firefox on Linux and then plot IPID we get something that looks like this:

IPID versus PCAP number for Firefox sessions on Linux
 The things to note here is that the IPID values change dramatically between connections, also that in general HTTP traffic seems to be in the minority of the non-Windows traffic and finally that we're no closer to determining how many Linux systems are present. However, if we consider the TCP timestamp field for a moment we learn that it's generally determined as:

From: Identifying hosts with natfilterd
The interesting part in this case is that wallclock - boottime should be unique among the hosts that use the TCP timestamp option and it should increment in a predictable fashion. So if we graph the TCP timestamp value of packets versus their PCAP number we get:

TCP Timestamp value versus PCAP packet number (Firefox/Linux traffic highlighted)
Again we can see that the Firefox traffic accounts for only a minority of packets and we also see that there're two distinct lines for the first half on the plot. These two lines suggests that there are two Linux systems and the line fragment at the end probably represents a reboot (and not wrapping because the timestamp values are 32 bit numbers and the values we see are around 2^18 at their highest) of one of the systems or the appearance of a new one.

So at this point I'm convinced that there are two Linux systems and two Windows system and that most of the Windows packets are HTTP traffic (using Chrome) and that while there is HTTP traffic it accounts for only a small amount of the Linux related packets. For the remainder of the Linux traffic I'd guess that at least one of the systems is transferring files using BitTorrent based on the Transmission user-agent that was present before. Maybe if we plot the traffic with the Transmission user-agent we'll be able to tell which computers were running BitTorrent:

TCP Timestamp versus PCAP Packet Number for the user-agent "Transmission"
 At first this looks good, the line with the lower timestamp values is associated with Transmission and the higher one is not. Unfortunately this plot is ambiguous because the third line section is also associated with Transmission traffic and that line could easily belong to the top line section (after a reboot). If instead we ask pyflag to generate a table with only traffic that is not to or from ports 80 or 53 (to eliminate HTTP and DNS) we're left with a lot of connections between high ports transferring lots of  encrypted (looking) data to our NAT gateway address which fits the hypothesis of BitTorrent traffic. When we plot the timestamp values again and highlight any packet from our Not-HTTP/Not-DNS table we get the following:

TCP Timestamp versus PCAP number with non-HTTP/non-DNS traffic highlighted
At this point I'm reasonably confident that both the observed Linux hosts are downloading files over BitTorrent once I combine this plot with some analysis of the ports / stream sizes seen while I'm equally convinced that the Windows systems are not using BitTorrent or at least that there isn't a significant level of BitTorrent traffic observed during this packet capture.
The above little demo is contrived but I have found that this kind of analysis can be really useful in characterising the use of a network. This example was constructed from 5 virtual machines, 2 running Windows XP, 2 running Ubuntu 10.04 and a NAT gateway running Ubuntu 10.04 and using iptables/netfilter to do the NATing. Also, just in case you were wondering the Windows machines were watching youtube (in particular nyan cat and techno viking) while the Ubuntu systems were each using BitTorrent to download ubuntu images (12.04 alpha for different architectures). 

Future Work
  • Spring cleaning of the pyflag source (it's a little annoying to build and use right now)
  • More options on what to graph (maybe a system for generically plotting table information)
  • Ability to choose what to highlight based off the reverse side of a stream
  • Implementing a minimal version of this visualisation outside of pyflag Done! Identifying computers behind NAT with plotpcap

Related Work and Further Reading
Now that I've got the links handy I thought I'd also point at Michael Cohen's work. Michael is one of the authors of pyflag (project lead is probably a better description), and it's his ideas and that lead to the implementation of IP ID processing in pyflag.

Monday, January 2, 2012

Yet Another First Ascension Post

I was going through the pages of an old defunct blog of mine and I saw this image and thought that I would repost it for old times sake. This is one of my proudest computer gaming moments of all time (from October 2009).