PhD Thesis Finally Done

If you’ve seen my presentations anytime over the past few years, you’ll know the introduction about “PhD Student at Dalhousie University finishing ‘soon'” has been the claim for the past several years. Finally ‘soon’ actually happened!

You can see my complete thesis entitled “A Framework for Embedded Hardware Security Analysis” on the DalSpace website. It’s been a ton of fun doing the PhD, and I’ve had a lot of help over the years which I’ve very grateful for. For the foreseeable future I’ll be continuing to spin up NewAE Technology Inc., and keeping my ChipWhisperer project alive.

Black Hat Slides – PIN-Protected HD Enclosure / MB86C311A Research

This is a quick post to link to slides from my Black Hat USA 2016 work.

This work stands directly on the work done by Joffrey Czarny & Raphaël Rigo presented at HardWear.io last year (2015). They discovered the issues w.r.t. the stream-mode cipher being used by all manufactures on the MB86C311A, and the fact that secrets are stored on the HD itself. Their work is available at:

They have some newer work coming out which looks to be very interesting, so please keep your eyes out for that. Anyway onto my stuff. The following is a link to my slides:

Brute-Forcing Lockdown Harddrive PIN Codes [Slides]

 

A Low-Cost X-Y Scanner using 3D Printer

This summer, our summer intern Greg d’Eon made a quick project to build a X-Y Scanner from a 3D printer (by ‘quick’, I mean it took him less than 2 days!). You can see the source code up on GitHub. Anyway, 3D printers are very nice as they have fairly high resolution and fairly low cost. Here’s a quick video:

We’re using it to measure EM emissions frequencies over a PCB, but you could also use this for side-channel emissions, or fault injection. While the resolution might not be high enough for getting at specific features on a chip surface, it can still be used for general positioning.

With your EM emissions, you can graph X-Y vs. amplitude – here I’ve constrained the range to get an idea where the 96 MHz emissions are concentrated. Probably more interesting would have been to use a 2D plot with colour overlaid over the PCB design:em_plotYou can also do things like plot frequency vs. position with strength of the signal given by color. In the following graph the X position is fixed, and only the Y position is varied. You can see here the 96MHz oscillator of the SAM3U microcontroller on the ChipWhisperer-Lite for example:

650MHz_05

 

Low-Cost SMD Soldering Setup

The following blog post shows some details of my SMD soldering process. This was based on a larger video I did (linked below) showing the entire soldering process.

Video of Soldering Setup

The following shows me soldering a complete board with BGA device.

Equipment Used

In the above video, there are several pieces of equipment used. The following shows you some of the important ones.

Reflow Oven

I’m using a T962A reflow oven. I recommend this over the T962, which is a smaller version. The T962A has 3 heat lamps so has a more even heat distribution. Be aware you can’t use the full surface area – about the middle half I find is successful, but depends a little on complexity of the PCB.

I specifically purchased mine from this seller on AliExpress, check other sellers as prices change over time. You might turn it on quickly to confirm it works, but before doing much there are some important fixes:

  • Removing masking tape, replace with Kapton (Polymide) tape. See instructables post for details.
  • Fixing missing ground connections. Some versions have poor grounding between the outer (metal!) case and the wall plug. See the wiki page for a photo of this fix.
  • Updating the firmware and adding a cold-junction sensor. This is the most complex task, and requires soldering a DS18B20 to the mainboard, then using a USB-Serial adapter to reflash the firmware. See the front page of the T962-Improvements Github Repo, which has links to the required soldering. There is also an optional fix to reduce the very noisy small fan.

Fume Hood

I built a fume hood out of the following:

  • 2×4’s for frame.
  • Thick plastic drop-sheet.
  • Powerful vent booster fan with variable speed control.
  • Active charcoal oven range hood filter (mounted in top of fume hood).
  • Active charcoal filter for car cabin (mounted in cardboard box used as exhaust).

You can also improve one out of a range hood from an oven. See video for general fume hood construction.

Manual Pick-n-Place

FIG4

This requires three things:

  1. Vacuum pump, which you can make from a Tetra Whisper pump (see instructables link). Get some of the nice silicon tubing at the same time (like $3 from Amazon).
  2. Syringe with hole drilled into body. You can get syringes (don’t need the pointy bit!) from pharmacy, or order from Newark/Digikey. When you cover the hole, you force the vacuum through the tip, picking up the part. Release your finger from the hole to drop the part. See the above video for details.
  3. The tips for pick and place, which are “Luer Lock” needles bent slightly (for small parts) or commercially available tips (for larger parts).

The tips are the only somewhat tricky thing. I had a good selection from a previous SMD picker tool, something like this kit for example (which is Chip Quick Inc. part #V8910). These tips are actually the same “Luer Lock” that fits into syringes, check E-Bay for cheaper kits:

V8910 Chip Quik Inc. | V8910-ND DigiKey Electronics

You can also buy Chip Quik Inc part #VCS-9-B which has a bunch of these tips. It’s not the cheapest way, but if you are in a hurry will do! But all of these tips are for larger parts (i.e. maybe SOT23-3 at smallest). If you get into chip resistors, you need to go smaller.

For the small parts, you can bend “needle tips” slightly. You can buy packs of 50 from Digikey (search “Luer Lock”), but might find it cheaper to get individual ones from either medical supply places, or buying products which use them. For example some static-safe squeeze bottles come with a few tips. Again the expensive but easy route is Chip Quik part # SMDTA200 which has a bunch of different sized tips.

http://media.digikey.com/Photos/Apex%20Tool%20Photos/KDS2312P.JPG

Stencils

There is three main options for stencils:

  1. Laser cut stainless steel.
  2. Third-party cut Kapton film.
  3. Self-cut Kapton/Mylar film.

For laser-cut stainless steel, this can typically be ordered with your PCB fab. For example 3PCB.com and Dirty PCBs offer them very cheaply (~$25) when ordering PCBs. This is almost always the best choice, as the stainless steel stencils are very reliable and I’ve had great success with BGA devices.

You can also use third-party services to cut Mylar or Kapton film for you. OSHStencils is one example of a supplier.

Finally, you can make your own. You’ll need some practice to cut BGA parts, but it’s quite easy to cut stencils for less demanding applications. I have a previous blog post on my method.

I’ve been making my own stencils with this process for some time with great success.

FIG3

Solder Paste/Squeegee

I purchased the squeegee from Dirty PCBs. There are some other blog posts on squeegee options you might look at.

I generally just buy solder past from Digikey. Digikey does a great job of cheaply shipping to Canada, and the paste comes in an awesome cold pack thingy that keeps it cool during the trip. Chip Quik (again with the Chip Quik sorry, I don’t have a connection with them but just end up using their stuff!) sells some nice small syringes. Be aware it does have a shelf life… I’ve used past about 6-12 months paste that date, but you will eventually see issues (balling, flux separates). I recommend keeping to the suggested date to avoid giving yourself the headache of discovering your paste is bad after you’ve tried soldering your PCB. The cost of all your parts is probably a lot more expensive than the cost of replacing your paste.

 

SECT-2015 Talk Slides

On Friday at 14:15 I’m giving a talk about my open-source power analysis and glitching projected called ChipWhisperer at SEC-T. Here is some useful links if you watched the presentation:

See information about the entire project at www.ChipWhisperer.com too! Video will be posted online at some point too.

ESC SV 2015 – USSSSSB: Talking USB From Python

At ESC 2015 SV I gave a talk on using USB From Python, see the talk description here. This blog post is serving as a placeholder to allow me to update links to software used during the live demo.

For SuperCon 2015, there is a Project Page with these details too. You can also ask questions on the project page.

Download Slides

There is two versions of the slides. Use the SuperCon slides, but I left a copy of the ESC ones here in case you wanted the original for some reason.

Download Slides from Hackaday SuperCon 2015 (Newer Version for SAMD21) [PDF, 10MB]

Download Slides from ESC2015 (Older version for SAMD11) [PDF, 10MB]

Tools to Install

  1. Atmel Studio 6.2
  2. WinPython-2.7
  3. libusb-win32-devel-filter (NB: No need to open the filter install wizard when done)
  4. USBView

SAMD11 Errata

For ESC I used a SAMD11 device, which needs a bit of a hack.

There is a bit of an “oopsie” in the SAMD11 devices. This bug isn’t in the official errata yet, and I’ve been told it’s limited to engineering sample devices (which were used in some of the early dev boards).

Basically the 48MHz oscillator calibration byte is wrong, and you need to manually tune this. You’ll know this problem exists as the device won’t be detected by Windows:

usbbad

The work-around isn’t super-fun. First, use the programming interface to see the starting value of the DFLL48M_COARSE_CAL fuse:

pllbad

Next, search in the source code for reference to the dfll_conf.coarse_value variable. You will find where it is being setup, and you can override the value. Basically you have to experiment a bit to find a working value:

pllhack

 Reference Code

USB Test – Slide 87

import usb.core
dev = usb.core.find(idVendor=0x03eb, idProduct=0x2402)
print dev

If you get “None”, make sure you installed the “Filter Driver” using the LibUSB tools!

Control Endpoint Read – Slide 94

import usb.core
dev = usb.core.find(idVendor=0x03eb, idProduct=0x2402)
dev.set_configuration()

data = dev.ctrl_transfer(0b10100001, 0x01, 3<<8, 0, 4)

print data

If you get a “device is not functioning” error just skip this one…

Sending Output Report

import usb.core
dev = usb.core.find(idVendor=0x03eb, idProduct=0x2402)
print dev

dev.set_configuration()

data = [ord('1'), ord('1'), 0, 0, 0, 0, 0, 0]
dev.write(0x02, data)

Receiving Input Data (Press button to see change)

import usb.core
dev = usb.core.find(idVendor=0x03eb, idProduct=0x2402)
print dev

dev.set_configuration()

for i in range(0, 10):
    while True:
        try:
            test = dev.read(0x81, 8, timeout=50)
            break
        except usb.core.USBError, e:            
            if str(e).find("timeout") >= 0:
                pass
            else:
                raise IOError("USB Error: %s"%str(e))
     
    print test

 Full GUI Example

#Public domain - simple USB GUI Example by Colin O'Flynn

from PySide.QtCore import *
from PySide.QtGui import *
import usb.core
import sys

class USBForm(QDialog):
    def __init__(self, parent=None):
        super(USBForm, self).__init__(parent)
        self.setWindowTitle("ESC 2015 Demo")

        layout = QVBoxLayout()
        self.setLayout(layout)

        self.pbConnect = QPushButton("Connect")
        self.pbConnect.clicked.connect(self.con)
        self.isConnected = False

        self.pbLED = QPushButton("LED Blinking")
        self.pbLED.setCheckable(True)
        self.pbLED.clicked.connect(self.changeLED)
        self.pbLED.setEnabled(False)

        layout.addWidget(self.pbConnect)
        layout.addWidget(self.pbLED)

        self.swStatus = QLineEdit()
        self.swStatus.setReadOnly(True)
        layout.addWidget(self.swStatus)

        self.butTimer = QTimer(self)
        self.butTimer.timeout.connect(self.pollButton)


    def con(self):
        if self.isConnected == False:
            #Do USB Connect Here
            self.dev = usb.core.find(idVendor=0x03eb, idProduct=0x2402)
            self.dev.set_configuration()

            #Sync changeLED
            self.changeLED()
            
            self.isConnected = True
            self.pbConnect.setText("Disconnect")
            self.pbLED.setEnabled(True)
            self.butTimer.start(100)
        else:
            self.isConnected = False
            self.pbConnect.setText("Connect")
            self.pbLED.setEnabled(False)
            self.butTimer.stop()

    def changeLED(self):
        if self.pbLED.isChecked():
            #Send command to make LED on
            self.dev.write(0x02, [ord('1'), ord('1'), 0, 0, 0, 0, 0, 0])

            self.pbLED.setText("LED On")            
        else:
            #Send command to make LED blink
            self.dev.write(0x02, [ord('0'), ord('1'), 0, 0, 0, 0, 0, 0])
            self.pbLED.setText("LED Blinking")

    def pollButton(self):
        try:
            data = self.dev.read(0x81, 8, timeout=50)
            if data[0]:
                self.swStatus.setText("Button Pressed")
            else:
                self.swStatus.setText("Button Released")
                
        except usb.core.USBError, e:
            if str(e).find("timeout") >= 0:
                pass
            else:
                raise IOError("USB Error: %s"%str(e))


if __name__ == "__main__":
    app = QApplication(sys.argv)
    form = USBForm()
    form.show()
    sys.exit(app.exec_())

Side-Channel Power Analysis of AES Core in Project Vault

What is Project Vault

You can read a quick overview on various news sites, but basically project vault gives you a cryptographic module that you have complete control over. This means *you* decide to trust the module – even to the point of being able to access to implementation details of the crypto cores.

Basically Project Vault is a solution to how you can avoid having unknown backdoors in your hardware. Rather than having to trust some vendor of security modules, you can make sure things are done correctly.

About the AES Core

The crypto modules have a nice description, which you can read here. Of interest to us is this statement:

The AES core can encrypt and decrypt blocks of data in three modes of operation: AES-128, AES-192, and AES-256.  The design is based upon a purely gate logic implementation of the forward and reverse sboxes due to the work of Boyar and Peralta.  This avoids differential power attacks present in purely lookup table or SRAM based sbox implementation.

The paper in question isn’t written by Andy Samberg’s character on Brooklyn Nine-Nine, but instead is referencing A small depth-16 circuit for the AES S-Box (that link is the unpaywalled version, the paper was published in the SEC 2012 proceedings).

This is a problematic statement, as side-channel power leakage isn’t just one simple fix. In this case there is effectively no difference from an unprotected implementation for side-channel power analysis. More on that in a moment.

Side-Channel Power Analysis

It’s worth pointing out I’m looking at a single small part of the entire device. There may be additional protocol-layer protection that would significantly complicated the analysis I perform, I just have no idea as haven’t looked into that.

Realistically, side-channel power analysis might be a threat. Having a leaking core on it’s own might be impossible/very difficult to exploit due to use-cases. But it might form part of a larger attack (i.e. someone is able to take control of the core using a different attack method).

Side-channel power analysis (or Differential Power Analysis, called DPA) also requires the device is operating with the key we are using. You cannot use DPA on an encrypted hard drive sitting on the table for example – you could only use it to recover the encryption key as the drive is decrypting/encrypting something. If the encryption key comes from the user, this means DPA is useless against an encrypted drive you recovered from someone.

Because of these caveats I like to stress this isn’t some master attack. In fact the only thing that makes it noteworthy is the documentation claimed some level of DPA resistance. Anyway on with the attack…

Attack Theory

DPA attacks are based on small power measurements being correlated with either data values or changes in data. In the referenced paper from earlier, the DPA attack being prevented is that the input and output of the S-Box are never put onto the same register.

This means we could never see the difference in number of bits flipped from input to output of the S-Box. Thus the power analysis attack on the S-Box itself would fail, which is normally where an easy leak to stop is. But it’s not the only way.

Looking at the source code, we see the following Verilog lines during the encryption (similar lines for decryption):

            begin
                state <= state_new;
                if(round == round_max)
                beginS
                    data_o <= state_new;
                    busy_o <= 0;
                end
            end

This is problematic, as the 128-bit AES state is held in a register. That register is overwritten on each round. In particular, looking at the last round (this figure based on one shamelessly stolen from Frank Gürkaynak’s Thesis), note the “old value” to “new value”:

aes_state_desc

The ShiftRows is easily reversed (it’s just swapping around the location of bytes). This in fact means the input and output of the S-Box is effectively written into the same register, giving as an easy way to count bit flips (Hamming Distance). We can correlate expected number of bit transitions with measured power as in a standard DPA attack.

Attack Test

While it’s not really needed to test this in theory, nobody believes hand-waving. So I used a SAKURA-G FPGA board (Spartan 6 LX75) with my OpenADC and ChipWhisperer software, as I happen to have it around:

You could easily use my ChipWhisperer-Lite with any other FPGA board instead of the SAKURA-G. The SAKURA-G makes power measurement easier, otherwise you can use some H-Field probes etc.

There’s not a lot to this – I ripped out just the AES core (i.e. everything in this directory in the GIT). It’s easy to interface to the existing FPGA code given with the SAKURA-G, as the interface is almost exactly the same (key in, block in, block out, clk, go command).

There was a few cycles of synchronization error for some reason, but I used a “resync by sum of absolute difference” in my ChipWhisperer software. Here is what the raw power traces look like after resync:

aes-power

Running an attack targeting the last round-state difference of AES gives us a nice figure where the known encryption key bytes (in red) are filtering to the top of the “most likely encryption keys”, here with 2000 traces:

aes-pge

You can check where the leakage is occurring too. In the following figure the “correct” byte value is highlighted in red. You can see around sample ~342 there is the largest absolute peak of that correlation value, and it rises about all the wrong guesses. This corresponds to around the last round (based on power dips in earlier waveform):

aes-locationThat’s it! It’s really a standard Hamming-Distance attack against AES. The special S-Box design didn’t make our life any harder for my attack. Again this was done in a controlled environment, so it’s quite possible there are higher-level protections that make this attack much much harder.

Considering the device will (presumably) only have the encryption keys loaded when the user is doing stuff, it’s a pretty small risk. An attacker would have to monitor the power while you are using the device to deduce your keys… and if they are that close, they might just try seducing you instead.

 

Experiments with Seek Thermal Camera

A while back I got a Seek thermal camera, as I wanted to use it for measuring electronics component temperatures. As part of a course I’m teaching at Dal, I did a few experiments I wanted to post here. These photos were taken with a macro lense, shown here:

seeker

To get that lens, I purchased a 20mm diameter ZnSe Lens with 50.8mm/2″ focus off E-Bay for about $20. I ended up getting both a 100mm and 50mm focal length to try both. Then you need a holder, which I used one I found on Thingiverse. If printing again I’d try to enlarge the size of the space for the lens – I had to use a knife and considerably carve the inside step down. In fact I’d remove the middle ‘ridge’ which holds the lens in, and instead epoxy it.

The Tests

I’m using a TO-220 5 ohm resistor, which lets me reliably control the power being dissipated by the device. The part number is PF2205-5R which you can get at Digikey.

The first test compares mounting the resistor horizontally and vertically. To do this I’ve put two into a breadboard:

restest_overview

Which we power with constant power using my supply:

restest_powerinAnd can see the difference in temperature between the two:

restest_results

So what gives? There is (expected) to be two reasons for the temperature difference:

  1.  The vertical mount naturally causes airflow over the large back tab – heat will rise, and as heat comes off the tab, it cause a small amount of natural airflow.
  2. The horizontal mounted package is closed to the table surface which will further restrict airflow.

The majority of this comes from #1, but people will complain if I don’t mentioned #2. If your heatsinks has lots of fins, it’s worthwhile to ensure the natural airflow due to heat easily flows up the heatsink.

Also note the temperature rise is about what is expected of a TO220 package, which typically has about a 62ºC/W Junction to Case thermal resistance. Ambient is around 20C, so with 1W of power in we expect 20ºC + 62ºC = 82ºC case temperature. Cool!

The second test tries several mounting of TO220 packages on a PCB. The PCB setup is shown here:

restest_more

First, let’s look at the vertically mounted device. Here is the thermal image once it reaches steady-state:

restest_vert

What the hell happened? We still had 1W of input per transistor, but it’s 14C cooler than the other test!

In this case the PCB is dissipating some of the heat – the entire top and bottom are solid copper pours, each side connected to one of the pins. This is almost idea for heat transfer.

Next, let’s look at the other two resistors. The following shows both details of the mounting and the steady-state temperatures:

restest_doubleI should mention the reflection areas have low thermal emissivity, and the Seek camera doesn’t pick them up correctly. Thus the tab & PCB area without solder-mask aren’t actually cooler.

Anyway you can see that mounting the package *close* to a good heatsink but without actually touching it is worse than free-space mounting. Having a good connection (in this case soldering) as expected further reduces the case temperature by allowing the PCB to dissipate more of the heat.

The “close but not touching” comes up a lot – for example if you make a simple metal shield for your device, you might think it a good idea to have the shield come close to heat-producing devices. But unless it actually makes good contact, you are probably hampering the natural convection air currents!

When I get some more time I plan on buying a few different heatsinks from Digikey and compare my measured temperature rise with the theoretical temperature rise.

USB Inrush Testing

The USB spec has limits on the ‘inrush current’, which is designed to prevent you from having 2000uF of capacitance that must be suddenly charged when your board is plugged into the USB port.

The limit works out to around 10uF of capacitance . Your board might have much much more – so you’ll have to switch portions of your board on later with FETs as a soft-start.

For the ChipWhisperer-Lite, I naturally switch the FPGA + analog circuitry as to meet the 2.5 mA suspend current. Thus I only have to ensure the 3.3V supply for the SAM3U2C meets the inrush limits, which is a fairly easy task. This blog post describes how I did this testing.

The official USB Test Specs for inrush current testing describe the use of the Tektronix TCP202 which is $2000, and I don’t think I’d use again a lot. Thus I’m describing my cheaper/easier method.

First, I used a differential probe (part of the ChipWhisperer project, so you can see schematics) to measure the current across a 0.22 ohm shunt resistor. The value was selected as I happened to have one around… you might want a smaller value (0.1 ohm say) even, as the voltage drop across this will reduce the voltage to your device. The differential probe has enough gain to give your scope a fairly clean signal. This shows my test board, where the differential probe is plugged into a simple 2-pin header:

P1080537

From the bottom, you can see where I cut the USB shield to bring the +5V line through the shunt:

P1080538

To calibrate the shunt + gain from the diff-probe, I just used some test loads, where I measure the current flowing through them with a DMM. You can then figure out the equation for converting the scope measurement to a current in amps.

P1080539

Finally, we plug in our actual board. Here I’ve plugged in the ChipWhisperer-Lite prototype. The following figure shows the measurement after I’ve used a math channel in PicoScope to convert the voltage to a current measurement, and I’ve annotated where some of these spikes come from:usb_power

Saving the data, we can run through the USB Electrical Analysis Tool 2.0 to get a test result. The USB-IF tool assumes your scope saves the files with time in seconds and current in amps. The PicoScope .csv files have time in miliseconds, so you need to import the file into Excel, divide the column by 1000, and save the file again. Finally you should get something like this:

compliance_results

Note the inrush charge is > 50mC, but there is an automatic waiver for anything < 150 mC. While the system would be OK due to the waiver, I would prefer to avoid exceeding the 50 mC limit. In this case there’s an easy solution – I can delay the USB enumeration slightly from processor power-on, which limits the inrush to only the charging of the capacitors (which is done by ~15mS). This results in about 47 mC. This means I’ve got about 100 mC of headroom before I exceed the official limits!

This extra headroom is needed in case of differences due to my use of the shunt for example.

In addition, I should be adjusting the soft-start FET gate resistor to reduce the size of that huge soft-start spike. Ideally the capacitor charging shouldn’t draw more than the 500mA I claim when I enumerate, so that’s a little out of spec as-is! If I don’t want to change hardware I could consider using PWM on the FET gate even…