# d3d10warp.dll Multiple integer overflows leading to an OOB write triggerable remotely. | id | Address | | -: | :------ | | top | UMContext::CopyImmediateData+0x2DC | | bottom | UMContext::CopyImmediateData+0x2EE | | left | UMContext::CopyImmediateData+0x2D2 | | right | UMContext::CopyImmediateData+0x2E6 | | offset | code jitted by JITCopyContext::CompileJITCopy | Offsets are given according to d3d10warp.dll version 10.0.19041.84 on which we performed our analyses. The bugs have also been confirmed on version 10.0.22533.1001 and others. # Platform Confirmed on Windows 10 and Windows 11: - 22533.1001.amd64fre.rs_prerelease.220107-2122 - 22000.1.amd64fre.co_release.210604-1628 - 19041.1.amd64fre.vb_release.191206-1406 # Class - CWE-190 Integer Overflow or Wraparound; - CWE-787 Out-of-bounds Write. # Bounty Program Microsoft Windows Insider Preview. # Summary WARP is a software rasterizer introduced by the Direct3D 11 runtime available on Windows Vista and up. On a machine **without 3D acceleration** the corresponding dll, d3d10warp.dll, is loaded by most, if not all, processes interacting with a GUI: - explorer.exe - msedge.exe and msedgewebview2.exe - winword.exe and other MS Office executables - mstsc.exe, MS RDP client - ApplicationFrameHost.exe and ShellExperienceHost.exe - SearchHost.exe - StartMenuExperienceHost.exe - SystemSettings.exe - ... When adding an image to a framebuffer, WARP incorrectly checks that the image coordinates are within the framebuffer bounds, leading to an out-of-bounds write. This OOB write can be remotely triggered in mstsc.exe from a malicious RDP server. Considering the ubiquity of d3d10warp.dll on machines without 3D acceleration, it may also be accessible in other scenarii. Note that d3d10warp.dll is also loaded by some processes (most notably explorer.exe) on machines _with_ 3D acceleration. # Description ## Integer Overflows The integer overflows occur in `d3d10warp!UMContext::CopyImmediateData`: ``` void __fastcall UMContext::CopyImmediateData(UMContext *this, const struct D3D10_DDIARG_SUBRESOURCE_UP *, struct UMResource *, unsigned int, bool, const struct D3D10_DDI_BOX *) ``` This function copies data from D3D10_DDIARG_SUBRESOURCE_UP to UMResource at the coordinates defined in D3D10_DDI_BOX as illustrated in the following table: | | 0 | - | `l` | `r` | - | `w` |:---:|:-:|:-:|:---:|:---:|:-:|:---: | 0 | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | `t` | . | . | X | . | . | | \| | . | . | X | . | . | | `b` | . | . | . | . | . | | `h` | | | | | | Where `w` and `h` are the width and height of UMResource, and `l`, `r`, `t`, and `b` are respectively the left, right, top, and bottom coordinates of D3D10_DDI_BOX as defined in d3d10umddi.h: ``` c typedef struct D3D10_DDI_BOX { long left; // left side of the box on the x-axis long top; // top of the box on the y-axis long front; // front of the box on the z-axis long right; // right side of the box on the x-axis (width = right - left) long bottom; // bottom of the box on the y-axis (height = bottom - top) long back; // back of the box on the z-axis (depth = back - front) } D3D10_DDI_BOX; ``` Before copying data, WARP checks that D3D10_DDI_BOX is inside the resource: ```c if ( line_size < item_size * D3D10_DDI_BOX.left || full_size < line_size * D3D10_DDI_BOX.top || line_size < item_size * D3D10_DDI_BOX.right || full_size < line_size * D3D10_DDI_BOX.bottom || D3D10_DDI_BOX.front >= depth) || D3D10_DDI_BOX.back > depth) ) { goto FAIL; } else { goto COPY_DATA; } ``` where: - `item_size` is the size in bytes of an item, derived from `CD3D10FormatHelper::FORMAT_DETAIL` and `UMResource`; - `line_size` is the size in bytes of a line of the 2D array, obtained from `UMResource`, and equal to `w * item_size`; - `full_size` is the size of the 2D array, obtained from `UMResource`, and equal to `w * h * item_size`; `front` and `back` are correctly checked directly against the corresponding bounds. Other values are checked after a 32bits multiplication that may overflow. As a remainder, all fields of `D3D10_DDI_BOX` are 32 bits. Overflowing `line_size * D3D10_DDI_BOX.bottom` allows one to write OOB at regular intervals controllable with `l`, `r`, and `w`: | | 0 | - | l | r | - | w |:--:|:-:|:-:|:-:|:-:|:-:|:-: | 0 | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | t | . | . | X | . | . | | \| | . | . | X | . | . | | h | | | X | | | | - | | | X | | | | - | | | X | | | | - | | | X | | | | - | | | X | | | | b | | | | | | The constraints to pass the bound check with an integer overflow are: - `w * b * item_size >= 2^32` - `(w * b * item_size) % 32 <= w * h * item_size` assuming that `w * h * item_size` is smaller than `2^32`. One can limit the drawback of having to write at regular intervals by carefully choosing `w` and `h`. Another option would have been to overflow both `line_size * D3D10_DDI_BOX.bottom` and `line_size * D3D10_DDI_BOX.top` (but this particular scenario is mitigated by another integer overflow in the jitted code as we will see in the next subsection): | | 0 | - | l | r | - | w |:--:|:-:|:-:|:-:|:-:|:-:|:-: | 0 | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | \| | . | . | . | . | . | | h | | | | | | | - | | | | | | | - | | | | | | | - | | | | | | | t | | | X | | | | b | | | | | | The constraints are now: - `w * t * item_size >= 2^32` - `w * b * item_size >= 2^32` - `(w * t * item_size) % (2^32) <= w * h * item_size` - `(w * b * item_size) % (2^32) <= w * h * item_size` Overflowing `l` or `r` is harder, as they are multiplied by smaller values. Still, they can also be used to write outside of the 2D array or offer flexibility to the `t` and `b` overflows: | | 0 | - | - | - | - | `w` | | `l` | `r` | |:-----:|:-:|:-:|:-:|:-:|:-:|:---:|:-:|:---:|:---:| | 0 | . | . | . | . | . | | | | | | \| | . | . | . | . | . | | | | | | \| | . | . | . | . | . | | | | | | \| | . | . | . | . | . | | | | | | \| | . | . | . | . | . | | | | | | \| | . | . | . | . | . | | | | | | `t` | . | . | . | . | . | | | | | | `h/b` | | | X | | | | | | | ## OOB Writes Once the checks bypassed, data is copied out of bounds: `UMContext::CopyImmediateData` --> `Task_Copy` --> `JITCopyContext::ExecuteResourceCopy` --> code jitted by `JITCopyContext::CompileJITCopy` There is another integer overflow in the jitted code. The offset to the first line, `w * t * item_size`, is also computed on 32 bits resulting in a value inside the bounds of the buffer (since the previous checks guaranteed that `(w * t * item_size) % (2^32) <= w * h * item_size`). Addresses of subsequent lines are computed by adding `w * item_size` to the 64 bits address of the current line, leading to an oob write if `b - t` is large enough. ## Remote trigger As already discussed, d3d10warp can be loaded by numerous processes. In order to demonstrate that the bug can be triggered remotely we describe here how mstsc.exe, MS RDP client, can suffer from an OOB write when connecting to a malicious RDP server. The graphics pipeline extension of the Remote Desktop Protocol ([RDPGFX](https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpegfx/)) is used to encode graphics display data, in particular: - `RDPGFX_CREATE_SURFACE_PDU` for the creation of surfaces with controlled 16 bits width and height; - `RDPGFX_SOLIDFILL_PDU` for plotting a rectangle of controlled 32 bits color (with transparency) and 16 bits coordinates on a chosen surface. An item is an ARGB pixel, thus `item_size` is 4, and the coordinates are 16 bits, thus the multiplications involving `left` and `right` cannot overflow. Still, with full control of the 16 least significant bits of `w`, `h`, `t`, and `b` we can overflow one or both multiplications with some flexibility. The previous constraints become: - `w > 2^14` - `b >= 2^30 / w` - `(w * b) % (2^30) <= w * h` and similarly for `t`. `l` and `r` offer some flexibility for a precise OOB write. As already mentioned, the first line cannot start out of bounds thanks to an integer overflow in the jitted code. In order to ensure that the last line is out of bounds we need: - `w * (b - t) > w * h - (w * t) % (2^30)` As an example, one can choose: - `w = 2**16 - 1` - `h = 2**14` - `t = 2**14` - `b` any value between `2**14 + 1` and `2**16 - 1` (including) - `l` any value between `0` and `2**16-2` (including) - `r = l + 1` in order to write a chosen dword every `w * 4 = 2**18-4` bytes up to almost 16Gb after the buffer defined by the `w * h` surface. By choosing `b = 2**14 + 1` it is possible to write a single dword at a chosen position in the `2**18-4` bytes line just after the buffer defined by the `w`x`h` surface. This example leads to a 4Gb allocation on the client side during the processing of `RDPGFX_CREATE_SURFACE_PDU`. To be perfectly precise, the transparency byte is replaced with `0xFF`, thus only three out of four bytes can be chosen arbitrarily. An arbitrary buffer can be written by combining several `RDPGFX_SOLIDFILL_PDU` containing one or more `RDPGFX_RECT16` in the same message. ## Suggested Fix Check the bounds directly as for `front` and `back` if they are available; or perform the multiplications with 64 bits, as in `ResourceShape::PreDistributeOriginal` for `4*w*h`. This last solution might not be sufficient if the full 32 bits of the fields of `D3D10_DDI_BOX` can be controlled by an attacker. # Proof of Concept The PoC is a malicious server built from a modified version of FreeRDP. Run `make` under the `PoC` directory then connect to port 33890 using `mstsc.exe` from a machine without 3D acceleration. Tested on an Ubuntu Desktop 20.04 host. ## Detailed explanation The `PoC` directory contains three files: - `rdpgfx.patch`is a patch for FreeRDP 2.6.0 that appends a RDPGFX_CREATE_SURFACE_PDU and a RDPGFX_SOLID_FILL_PDU to every messages sent from the RDP server on the GFX channel; - `Dockerfile` is used to build a malicious version of `freerdp-shadow-cli`; - `Makefile` builds and run the docker container, redirecting port 33890 of the host to port 3389 of the container, and exposing the host X server socket to the container (since it does not run an X server). Running `make` exposes an RDP server on the host 33890 port. Connecting to this RDP server using `mstsc.exe` from a machine without 3D acceleration leads to an OOB write. In order to set up a virtual machine without 3D acceleration: - VirtualBox: Machine -> Settings... -> Display -> Screen -> ensure that `Enable 3D Acceleration` is unchecked - or VMware Workstation: VM -> Settings... -> Display -> ensure that `Accelerate 3D graphics` is unchecked Note that 3D acceleration is deactivated by default on VirtualBox and that guest additions must be installed for the checkbox to have any effect. Any machine using a generic VGA driver like `Microsoft Basic Display Adapter`, effectively deactivating 3D acceleration, will also be vulnerable. ## Expected Result The connection should be closed by the client with the following error message: ``` Because of a protocol error, this session will be disconnected. Please try connecting to the remote computer again. ``` ## Observed Result The client crashes: ``` (14fc.19e8): Access violation - code c0000005 (first chance) First chance exceptions are reported before any exception handling. This exception may be expected and handled. SYMSRV: BYINDEX: 0x6 C:\ProgramData\Dbg\sym d3d10warp.pdb 4DA5E2EC579EEF00AFA4EA26BC80C5C71 SYMSRV: PATH: C:\ProgramData\Dbg\sym\d3d10warp.pdb\4DA5E2EC579EEF00AFA4EA26BC80C5C71\d3d10warp.pdb SYMSRV: RESULT: 0x00000000 DBGHELP: d3d10warp - public symbols C:\ProgramData\Dbg\sym\d3d10warp.pdb\4DA5E2EC579EEF00AFA4EA26BC80C5C71\d3d10warp.pdb 00007df4`8a451122 890a mov dword ptr [rdx],ecx ds:0000017e`17a3cccc=???????? 0:014> r ecx ecx=ff424242 ``` ## Traces The attached partial [tenet](https://github.com/gaasedelen/tenet) traces correspond to d3d10warp.dll version 10.0.19041.84 (the bug has also been confirmed on version 10.0.22533.1001 and others): - `poc-overflow-oob.tenet`: `w=0x8080`, `h=1`, `t=0x7F81`, `b=0xFF01`, crashes; - `poc-overflow-noerror.tenet`: `w=0x8000`, `h=1`, `t=0x8000`, `b=0x8001`, writes inside the bounds and continue; - `poc-overflow-error.tenet`: `w=0x8000`, `h=1`, `t=0x8000`, `b=0x8002`, reports an error and closes connection.