Avatar

This blog post was authored by Troy Fridley and Omar Santos of Cisco PSIRT.

On Mar 9 2015, the Project Zero team at Google revealed findings from new research related to the known issue in the DDR3 Memory specification referred to as “Row Hammer”. Row Hammer is an industry-wide issue that has been discussed publicly since (at least) 2012.

The new research by Google shows that these types of errors can be introduced in a predictable manner. A proof-of-concept (POC) exploit that runs on the Linux operating system has been released. Successful exploitation leverages the predictability of these Row Hammer errors to modify memory of an affected device. An authenticated, local attacker with the ability to execute code on the affected system could elevate their privileges to that of a super user or “root” account. This is also known as Ring 0. Programs that run in Ring 0 can modify anything on the affected system.

Prior Research

Dynamic random-access memory (DRAM) contains a two-dimensional array of cells. This is illustrated in the figure below.

rowhammer-1024x311

Within each cell there is a capacitor and an access-transistor. The two states of binary data value are represented when the capacitor is fully charged or fully discharged, respectively.

Memory disturbance errors can occur in cases where there is an abnormal interaction between two circuit components that should be isolated from each other. Historically, these memory disturbance errors have been demonstrated by repeatedly accessing (opening, reading, and closing) the same row of memory. This is discussed in detail in the research paper titled “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors“.

The paper also discusses that this is not the only access pattern to induce errors, but is the most typical.

When row hammer conditions are triggered, unrecoverable memory errors could occur that may cause an affected device to “crash”. Memory manufacturers and chipset vendors began building mitigations for “Row Hammer” into their parts.

A number of patents have been filed for technologies and methods to provide better ways to test and track the “Row Hammer” issue. The following are some examples:

Privilege Escalation Potential

Google’s demonstration of the kernel privilege escalation leverages row hammering to induce a bit flip in a page table entry (PTE) which forces the PTE to point to a physical page containing a page table of the attacking process.

The research uses the concept of memory spraying with the POSIX-compliant Unix system call that maps files or devices into memory — mmap() . The attacker could spray most of physical memory with page tables by using the mmap() system call to a single file repeatedly.

There are two caveats that Google highlights in their research:

Our exploit runs in a normal Linux process. More work may be required for this to work inside a sandboxed Linux process (such as a Chromium renderer process).

We tested on a machine with low memory pressure. Making this work on a heavily-loaded machine may involve further work.

The tests were done with non-ECC memory using the CLFLUSH instruction with a “random address selection” methodology also described in their post.

Mitigations and Fix

This vulnerability exists within hardware and cannot be mitigated by just upgrading software. The following are the two widely known mitigations for the Row Hammer issue:

  • Two times (2x) refresh –  is a mitigation that has been commonly implemented on server based chipsets from Intel since the introduction of Sandy Bridge and is the suggested default.  This reduces the row refresh time by the memory controller from 64ms to 32ms and shrinks the potential window for a row hammer, or other gate pass type memory error to be introduced.
  • Pseudo Target Row Refresh (pTRR) – available in modern memory and chipsets. pTRR does not introduce any performance and power impact.
  • Increased Patrol Scub timers – systems that are equipped with ECC memory will often have a BIOS option that allows the administrator to set an interval at which the CPU will utilize the checksum data stored on each ECC DIMM module to ensure that the contents of memory are valid, and correcting any bit errors that may have been introduced.  The number of correctable errors will vary based on architecture and ECC variant.  Administrator’s may consider reducing the patrol scrub timers from the standard 20 minute interval to a lower value.

Server-based chipsets starting with the Intel Ivy Bridge (IVB) chipset provide support for pTRR. Subsequently, Haswell (HSW) and Broadwell (BSW) server chipsets from Intel also included support for the Joint Electron Design Engineering Council (JEDEC) Targeted Row Refresh (TRR) algorithm. The TRR is an improved version of the previously implemented pTRR algorithm. Protections for the issue have also been engineered into high density DDR3 memory parts by many vendors which can mitigate ‘Row Hammer’ directly. The three main DRAM vendors that Cisco contracts with to supply memory for Cisco UCS devices have developed and integrated such protections which fully resolve or mitigate Row Hammer issues.

In addition to rigid memory testing and sourcing parts from vendors known to be hardened against Row Hammer events, Cisco designed memory controllers and custom ASICs uftilized for line cards and other devices which implement a number of features to help prevent memory errors of all types, including those that may be triggered by a gate bypass-type issue.  The following are a few examples:

  • System refresh toggling
  • Buffering
  • Bit/address scrambling

Additional information about affected Cisco products and mitigations can be obtained from Cisco’s Security Advisory at:

http://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20150309-rowhammer