adding Irix (and, to a lesser extent, Solaris) userland emulation to QEMU
Go to file
Alexey G 7fb394ad8a xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
Under certain circumstances normal xen-mapcache functioning may be broken
by guest's actions. This may lead to either QEMU performing exit() due to
a caught bad pointer (and with QEMU process gone the guest domain simply
appears hung afterwards) or actual use of the incorrect pointer inside
QEMU address space -- a write to unmapped memory is possible. The bug is
hard to reproduce on a i440 machine as multiple DMA sources are required
(though it's possible in theory, using multiple emulated devices), but can
be reproduced somewhat easily on a Q35 machine using an emulated AHCI
controller -- each NCQ queue command slot may be used as an independent
DMA source ex. using READ FPDMA QUEUED command, so a single storage
device on the AHCI controller port will be enough to produce multiple DMAs
(up to 32). The detailed description of the issue follows.

Xen-mapcache provides an ability to map parts of a guest memory into
QEMU's own address space to work with.

There are two types of cache lookups:
 - translating a guest physical address into a pointer in QEMU's address
   space, mapping a part of guest domain memory if necessary (while trying
   to reduce a number of such (re)mappings to a minimum)
 - translating a QEMU's pointer back to its physical address in guest RAM

These lookups are managed via two linked-lists of structures.
MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
reverse lookups.

Every guest physical address is broken down into 2 parts:
    address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
    address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);

MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
a 64-bit system (which assumed for the further description). Basically,
this means that we deal with 1 MB chunks and offsets within those 1 MB
chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
etc. Most DMA transfers typically are less than 1MB, however, if the
transfer crosses any 1MB border(s) - than a nearest larger mapping size
will be used, so ex. a 512-byte DMA transfer with the start address
700FFF80h will actually require a 2MB range.

Current implementation assumes that MapCacheEntries are unique for a given
address_index and size pair and that a single MapCacheEntry may be reused
by multiple requests -- in this case the 'lock' field will be larger than
1. On other hand, each requested guest physical address (with 'lock' flag)
is described by each own MapCacheRev. So there may be multiple MapCacheRev
entries corresponding to a single MapCacheEntry. The xen-mapcache code
uses MapCacheRev entries to retrieve the address_index & size pair which
in turn used to find a related MapCacheEntry. The 'lock' field within
a MapCacheEntry structure is actually a reference counter which shows
a number of corresponding MapCacheRev entries.

The bug lies in ability for the guest to indirectly manipulate with the
xen-mapcache MapCacheEntries list via a special sequence of DMA
operations, typically for storage devices. In order to trigger the bug,
guest needs to issue DMA operations in specific order and timing.
Although xen-mapcache is protected by the mutex lock -- this doesn't help
in this case, as the bug is not due to a race condition.

Suppose we have 3 DMA transfers, namely A, B and C, where
- transfer A crosses 1MB border and thus uses a 2MB mapping
- transfers B and C are normal transfers within 1MB range
- and all 3 transfers belong to the same address_index

In this case, if all these transfers are to be executed one-by-one
(without overlaps), no special treatment necessary -- each transfer's
mapping lock will be set and then cleared on unmap before starting
the next transfer.
The situation changes when DMA transfers overlap in time, ex. like this:

  |===== transfer A (2MB) =====|

              |===== transfer B (1MB) =====|

                          |===== transfer C (1MB) =====|
 time --->

In this situation the following sequence of actions happens:

1. transfer A creates a mapping to 2MB area (lock=1)
2. transfer B (1MB) tries to find available mapping but cannot find one
   because transfer A is still in progress, and it has 2MB size + non-zero
   lock. So transfer B creates another mapping -- same address_index,
   but 1MB size.
3. transfer A completes, making 1st mapping entry available by setting its
   lock to 0
4. transfer C starts and tries to find available mapping entry and sees
   that 1st entry has lock=0, so it uses this entry but remaps the mapping
   to a 1MB size
5. transfer B completes and by this time
  - there are two locked entries in the MapCacheEntry list with the SAME
    values for both address_index and size
  - the entry for transfer B actually resides farther in list while
    transfer C's entry is first
6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
   and size pair from corresponding MapCacheRev entry, but then it starts
   looking for MapCacheEntry with these values and finds the first entry
   -- which belongs to transfer C.

At this point there may be following possible (bad) consequences:

1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
   in this statement:

   raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
       ((unsigned long) ptr - (unsigned long) entry->vaddr_base);

resulting in an incorrent raddr value returned from the function. The
(ptr - entry->vaddr_base) expression may produce both positive and negative
numbers and its actual value may differ greatly as there are many
map/unmap operations take place. If the value will be beyond guest RAM
limits then a "Bad RAM offset" error will be triggered and logged,
followed by exit() in QEMU.

2. If raddr value won't exceed guest RAM boundaries, the same sequence
of actions will be performed for xen_invalidate_map_cache_entry() on DMA
unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
operation which uses it is still active. The above example must
be extended by one more DMA transfer in order to allow unmapping as the
first mapping in the list is sort of resident.

The patch modifies the behavior in which MapCacheEntry's are added to the
list, avoiding duplicates.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-07-21 17:37:06 -07:00
accel tcg: Pass generic CPUState to gen_intermediate_code() 2017-07-19 14:45:16 -07:00
audio audio: st_rate_flow exist a infinite loop 2017-07-17 11:08:59 +02:00
backends memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() 2017-07-14 17:59:42 +01:00
block block/vpc: fix uninitialised variable compiler warning 2017-07-21 15:00:07 +01:00
bsd-user bsd-user/main.c: Fix unused variable warning 2017-07-21 15:01:09 +01:00
chardev * gdbstub fixes (Alex) 2017-07-14 12:16:09 +01:00
contrib
crypto crypto: hmac: add af_alg-backend hmac support 2017-07-19 10:11:05 +01:00
default-configs configure: Use an explicit CONFIG_IVSHMEM rather than CONFIG_EVENTFD 2017-07-20 14:58:19 +01:00
disas disas/microblaze: Add missing 'const' attributes 2017-07-04 09:22:20 +02:00
docs -----BEGIN PGP SIGNATURE----- 2017-07-18 20:29:36 +01:00
dtc@558cd81bdd
fpu softfloat: define floatx80_round() 2017-06-29 20:27:39 +02:00
fsdev block: remove timer canceling in throttle_config() 2017-07-18 15:14:35 +02:00
gdb-xml s390x/gdb: add gs registers 2017-07-14 12:29:49 +02:00
hw xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings 2017-07-21 17:37:06 -07:00
include configure: Drop ancient Solaris 9 and earlier support 2017-07-21 15:04:05 +01:00
io Merge I/O 2017/07/18 v1 2017-07-19 09:11:38 +01:00
libdecnumber
linux-headers linux header sync against v4.13-rc1 2017-07-18 10:55:16 +02:00
linux-user Replace 'struct ucontext' with 'ucontext_t' type 2017-07-20 10:10:28 +01:00
migration migration: check global caps for validity 2017-07-18 17:36:26 +02:00
nbd nbd: Fix server reply to NBD_OPT_EXPORT_NAME of older clients 2017-07-17 17:06:46 -05:00
net net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len 2017-07-17 20:13:53 +08:00
pc-bios keymaps: fr-ca: more fixups 2017-07-20 09:25:44 +02:00
pixman@87eea99e44
po
qapi block/qapi: Add qdev device name to query-block 2017-07-18 15:14:35 +02:00
qga test-qga: add test for guest-get-osinfo 2017-07-18 05:49:01 -05:00
qobject json: learn to parse uint64 numbers 2017-06-20 14:31:31 +02:00
qom x86 and machine queue, 2017-07-17 2017-07-18 15:24:11 +01:00
replay
roms Pull request 2017-07-17 15:05:29 +01:00
scripts git orderfile and editorconfig for 2.10 2017-07-20 12:04:05 +01:00
slirp slirp: Handle error returns from sosendoob() 2017-07-15 14:28:25 +02:00
stubs
target MIPS patches 2017-07-21 2017-07-21 13:28:51 +01:00
tcg tcg/tci: enable bswap16_i64 2017-07-19 14:45:16 -07:00
tests Final CI updates for soft-freeze 2017-07-21 11:44:53 +01:00
trace exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state 2017-07-17 13:11:05 +01:00
ui vnc: Set default kbd delay to 10ms 2017-07-17 11:35:27 +02:00
util util/oslib-posix.c: Avoid warning on NetBSD 2017-07-21 10:32:19 +01:00
.dir-locals.el
.editorconfig add editorconfig 2017-07-20 09:56:56 +02:00
.exrc
.gdbinit
.gitignore coccinelle: ignore ASTs pre-parsed cached C files 2017-07-19 14:45:15 -07:00
.gitmodules
.mailmap
.shippable.yml shippable: add win32/64 targets 2017-07-18 10:58:36 +01:00
.travis.yml travis: move make -j flag out of script 2017-07-18 09:39:19 +01:00
CODING_STYLE
COPYING
COPYING.LIB
Changelog
HACKING
LICENSE
MAINTAINERS MAINTAINERS: Add entries for MPS2 board 2017-07-17 13:36:09 +01:00
Makefile configure: Don't build ivshmem tools unless CONFIG_IVSHMEM is set 2017-07-20 14:58:19 +01:00
Makefile.objs configure: Don't build ivshmem tools unless CONFIG_IVSHMEM is set 2017-07-20 14:58:19 +01:00
Makefile.target tcg: add the CONFIG_TCG into Makefiles 2017-07-05 09:12:44 +02:00
README
VERSION
arch_init.c
atomic_template.h
balloon.c
block.c qemu-img: Check for backing image if specified during create 2017-07-18 15:27:37 +02:00
blockdev-nbd.c
blockdev.c blockdev: move BDRV_O_NO_BACKING option forward 2017-07-18 15:27:20 +02:00
blockjob.c fix: avoid an infinite loop or a dangling pointer problem in img_commit 2017-06-26 14:54:46 +02:00
bootdevice.c Makefile: Move bootdevice.o to common-obj-y 2017-07-04 14:39:27 +02:00
bt-host.c
bt-vhci.c
configure configure: Drop ancient Solaris 9 and earlier support 2017-07-21 15:04:05 +01:00
cpus-common.c
cpus.c Convert error_report() to warn_report() 2017-07-13 13:49:58 +02:00
device-hotplug.c
device_tree.c
disas.c
dma-helpers.c
dump.c
exec.c cpu: Convert to DEFINE_PROP_LINK 2017-07-14 12:04:43 +02:00
gdbstub.c Use qemu_tolower() and qemu_toupper(), not tolower() and toupper() 2017-07-21 10:32:41 +01:00
hax-stub.c
hmp-commands-info.hx s390x/migration: Monitor commands for storage attributes 2017-07-14 12:29:47 +02:00
hmp-commands.hx s390x/kvm/migration/cpumodel: fixes, enhancements and cleanups 2017-07-14 14:19:35 +01:00
hmp.c block: List anonymous device BBs in query-block 2017-07-18 15:14:36 +02:00
hmp.h hmp: add hmp analogue for qmp-chardev-change 2017-07-14 11:04:34 +02:00
ioport.c
iothread.c
memory.c memory.h: Add memory_region_init_{ram, rom, rom_device}() handling migration 2017-07-14 17:59:42 +01:00
memory_ldst.inc.c
memory_mapping.c
module-common.c
monitor.c s390x/kvm/migration/cpumodel: fixes, enhancements and cleanups 2017-07-14 14:19:35 +01:00
numa.c memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate() 2017-07-14 17:59:42 +01:00
os-posix.c
os-win32.c
qapi-schema.json vnc: Clarify documentation of QMP command change 2017-07-20 09:25:06 +02:00
qdev-monitor.c
qdict-test-data.txt
qemu-bridge-helper.c
qemu-doc.texi docs: document encryption options for qcow, qcow2 and luks 2017-07-11 17:44:57 +02:00
qemu-ga.texi
qemu-img-cmds.hx qemu-img: Check for backing image if specified during create 2017-07-18 15:27:37 +02:00
qemu-img.c qemu-img: Check for backing image if specified during create 2017-07-18 15:27:37 +02:00
qemu-img.texi qemu-img: Check for backing image if specified during create 2017-07-18 15:27:37 +02:00
qemu-io-cmds.c block: Add PreallocMode to blk_truncate() 2017-07-11 17:45:01 +02:00
qemu-io.c block: rip out all traces of password prompting 2017-07-11 17:44:56 +02:00
qemu-nbd.c nbd: Implement NBD_INFO_BLOCK_SIZE on client 2017-07-14 12:04:42 +02:00
qemu-nbd.texi
qemu-option-trace.texi
qemu-options-wrapper.h
qemu-options.h
qemu-options.hx -----BEGIN PGP SIGNATURE----- 2017-07-18 09:16:43 +01:00
qemu-seccomp.c
qemu-tech.texi
qemu.nsi
qemu.sasl
qmp.c qmp: Include parent type on 'qom-list-types' output 2017-07-17 15:41:30 -03:00
qtest.c char: add backend hotswap handler 2017-07-14 11:04:33 +02:00
replication.c
replication.h
rules.mak
softmmu_template.h
thunk.c
tpm.c
trace-events trace: [trivial] Statically enable all guest events 2017-07-17 13:11:13 +01:00
user-exec-stub.c
user-exec.c Replace 'struct ucontext' with 'ucontext_t' type 2017-07-20 10:10:28 +01:00
version.rc
vl.c * gdbstub fixes (Alex) 2017-07-14 12:16:09 +01:00

README

         QEMU README
         ===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

  http://qemu-project.org/Hosts/Linux
  http://qemu-project.org/Hosts/Mac
  http://qemu-project.org/Hosts/W32


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

   git clone git://git.qemu-project.org/qemu.git

When submitting patches, the preferred approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the HACKING and CODING_STYLE files.

Additional information on submitting patches can be found online via
the QEMU website

  http://qemu-project.org/Contribute/SubmitAPatch
  http://qemu-project.org/Contribute/TrivialPatches


Bug reporting
=============

The QEMU project uses Launchpad as its primary upstream bug tracker. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

  https://bugs.launchpad.net/qemu/

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via launchpad.

For additional information on bug reporting consult:

  http://qemu-project.org/Contribute/ReportABug


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

 - qemu-devel@nongnu.org
   http://lists.nongnu.org/mailman/listinfo/qemu-devel
 - #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

  http://qemu-project.org/Contribute/StartHere

-- End