You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(33) |
Nov
(325) |
Dec
(320) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(484) |
Feb
(438) |
Mar
(407) |
Apr
(713) |
May
(831) |
Jun
(806) |
Jul
(1023) |
Aug
(1184) |
Sep
(1118) |
Oct
(1461) |
Nov
(1224) |
Dec
(1042) |
2008 |
Jan
(1449) |
Feb
(1110) |
Mar
(1428) |
Apr
(1643) |
May
(682) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Avi K. <av...@qu...> - 2008-04-29 22:35:31
|
Glauber Costa wrote: > Hi. This is a proposal for reducing the impact of kvm functions in core qemu > code. This is by all means not ready, but I felt like posting it, so a discussion > on it could follow. > > The idea in this patch is to replace the specific kvm details from core qemu files > like vl.c, with driver_yyy() functions. When kvm is not running, those functions would > just return (most of time), absolutely reducing the impact of kvm code. > > As I wanted to test it, in this patch I changed the kvm functions to be called driver_yyy(), > but that's not my final goal. I intend to use a function pointer schema, similar to what the linux > kernel already do for a lot of its subsystem, to isolate the changes. > > Comments deeply welcome. > While I would be very annoyed if someone referred to kvm as a qemu accelerator, I think accelerator_yyy() is more descriptive than driver_yyy(). I did not see any references to kqemu, but I imagine you mean this to abstract kqemu support as well. Other than that, looks really good. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Marcelo T. <mto...@re...> - 2008-04-29 22:34:05
|
Hi Anthony, How is -no-kvm-irqchip working with the patch? On Tue, Apr 29, 2008 at 09:28:14AM -0500, Anthony Liguori wrote: > This patch eliminates the use of sigtimedwait() in the IO thread. To avoid the > signal/select race condition, we use a pipe that we write to in the signal > handlers. This was suggested by Rusty and seems to work well. > > +static int kvm_eat_signal(CPUState *env, int timeout) > { > struct timespec ts; > int r, e, ret = 0; > siginfo_t siginfo; > + sigset_t waitset; > > + sigemptyset(&waitset); > + sigaddset(&waitset, SIG_IPI); > ts.tv_sec = timeout / 1000; > ts.tv_nsec = (timeout % 1000) * 1000000; > - r = sigtimedwait(&waitset->sigset, &siginfo, &ts); > + qemu_kvm_unlock(); > + r = sigtimedwait(&waitset, &siginfo, &ts); > + qemu_kvm_lock(env); > + cpu_single_env = env; This assignment seems redundant now. > if (r == -1 && (errno == EAGAIN || errno == EINTR) && !timeout) > return 0; > e = errno; > - pthread_mutex_lock(&qemu_mutex); > if (env && vcpu) > cpu_single_env = vcpu->env; And this one too. > > @@ -263,12 +238,8 @@ static void pause_all_threads(void) > vcpu_info[i].stop = 1; > pthread_kill(vcpu_info[i].thread, SIG_IPI); Make sure the IO thread has SIG_IPI blocked (those are for APIC vcpu initialization only). > +static void sig_aio_fd_read(void *opaque) > +{ > + int signum; > + ssize_t len; > + > + do { > + len = read(kvm_sigfd[0], &signum, sizeof(signum)); > + } while (len == -1 && errno == EINTR); What is the reason for this loop instead of a straight read? Its alright to be interrupted by a signal. > + signal(SIGUSR1, sig_aio_handler); > + signal(SIGUSR2, sig_aio_handler); > + signal(SIGALRM, sig_aio_handler); > + signal(SIGIO, sig_aio_handler); > + > + if (pipe(kvm_sigfd) == -1) > + abort(); perror() would be nice. > - kvm_eat_signal(&io_signal_table, NULL, 1000); > pthread_mutex_lock(&qemu_mutex); > - cpu_single_env = NULL; > - main_loop_wait(0); > + main_loop_wait(10); Increase that 1000 or something. Will make it easier to spot bugs. Similarly in qemu_kvm_aio_wait(). |
From: Anthony L. <an...@co...> - 2008-04-29 22:25:49
|
Avi Kivity wrote: > Anthony Liguori wrote: > >> This patch allows VMA's that contain no backing page to be used for guest >> memory. This is a drop-in replacement for Ben-Ami's first page in his direct >> mmio series. Here, we continue to allow mmio pages to be represented in the >> rmap. >> >> >> > > I like this very much, as it only affects accessors and not the mmu core > itself. > > Hollis/Xiantao/Carsten, can you confirm that this approach works for > you? Carsten, I believe you don't have mmio, but at least this > shouldn't interfere. > > >> >> struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) >> { >> - return pfn_to_page(gfn_to_pfn(kvm, gfn)); >> + pfn_t pfn; >> + >> + pfn = gfn_to_pfn(kvm, gfn); >> + if (pfn_valid(pfn)) >> + return pfn_to_page(pfn); >> + >> + return NULL; >> } >> >> > > You're returning NULL here, not bad_page. > My thinking was that bad_page indicates that the gfn is invalid. This is a different type of error though. The problem is that the guest is we are trying to kmap() a page that has no struct page associated with it. I'm not sure what the right thing to do here is. Perhaps we should be replacing consumers of gfn_to_page() with copy_to_user() instead? Regards, Anthony Liguori |
From: Avi K. <av...@qu...> - 2008-04-29 22:19:21
|
Anthony Liguori wrote: > This patch allows VMA's that contain no backing page to be used for guest > memory. This is a drop-in replacement for Ben-Ami's first page in his direct > mmio series. Here, we continue to allow mmio pages to be represented in the > rmap. > > I like this very much, as it only affects accessors and not the mmu core itself. Hollis/Xiantao/Carsten, can you confirm that this approach works for you? Carsten, I believe you don't have mmio, but at least this shouldn't interfere. > > struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) > { > - return pfn_to_page(gfn_to_pfn(kvm, gfn)); > + pfn_t pfn; > + > + pfn = gfn_to_pfn(kvm, gfn); > + if (pfn_valid(pfn)) > + return pfn_to_page(pfn); > + > + return NULL; > } > You're returning NULL here, not bad_page. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Avi K. <av...@qu...> - 2008-04-29 21:47:43
|
David Miller wrote: > Should I create the list(s) now? If so, please let me know the > names they should have. > I sent an email a couple of days ago to pos...@vg...: > Hi, please create the following lists for kvm: > > kvm (x86 and general discussion) > kvm-ppc (powerpc, managed by Hollis Blanchard) > kvm-ia64 (ia64) > kvm-commits (read-only, tracks commits to kvm git HEAD) > > Thanks. Thanks. -- Any sufficiently difficult bug is indistinguishable from a feature. |
From: Hollis B. <ho...@us...> - 2008-04-29 20:51:00
|
Acked-by: Hollis Blanchard <ho...@us...> Avi, please apply for 2.6.26. -- Hollis Blanchard IBM Linux Technology Center |
From: andrzej z. <ba...@gm...> - 2008-04-29 19:51:27
|
On 29/04/2008, Glauber Costa <gc...@re...> wrote: > This patch goes towards the direction of increasing general modularity of the > code. Code in vl.c that used to live inside target ifdefs, are moved to inside the > target directories, in a new file called machine.c. They are the cpu save/load and machine > registration Good idea, I had a similar patch to move cpu save/load to target-*/helper.c but I postponed it because it would make the libqemu.a depend on vl.c. > --- > Makefile.target | 5 +- > hw/boards.h | 1 + > target-arm/machine.c | 211 +++++++++++++++++ > target-cris/machine.c | 7 + > target-i386/machine.c | 264 +++++++++++++++++++++ > target-m68k/machine.c | 9 + > target-mips/machine.c | 21 ++ > target-ppc/machine.c | 20 ++ > target-sh4/machine.c | 8 + > target-sparc/machine.c | 102 ++++++++ > vl.c | 616 ------------------------------------------------ > 11 files changed, 647 insertions(+), 617 deletions(-) > create mode 100644 target-arm/machine.c > create mode 100644 target-cris/machine.c > create mode 100644 target-i386/machine.c > create mode 100644 target-m68k/machine.c > create mode 100644 target-mips/machine.c > create mode 100644 target-ppc/machine.c > create mode 100644 target-sh4/machine.c > create mode 100644 target-sparc/machine.c > > diff --git a/Makefile.target b/Makefile.target > index 5ac29a7..a530ee5 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -303,6 +303,9 @@ gen-op.h: op.o $(DYNGEN) > op.o: op.c > $(CC) $(OP_CFLAGS) $(CPPFLAGS) -c -o $@ $< > > +machine.o: machine.c > + $(CC) $(OP_CFLAGS) $(CPPFLAGS) -c -o $@ $< > + > # HELPER_CFLAGS is used for all the code compiled with static register > # variables > ifeq ($(TARGET_BASE_ARCH), i386) > @@ -481,7 +484,7 @@ endif #CONFIG_DARWIN_USER > # System emulator target > ifndef CONFIG_USER_ONLY > > -OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o > +OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o > ifdef CONFIG_WIN32 > OBJS+=block-raw-win32.o > else > diff --git a/hw/boards.h b/hw/boards.h > index affcaa6..ada4664 100644 > --- a/hw/boards.h > +++ b/hw/boards.h > @@ -18,6 +18,7 @@ typedef struct QEMUMachine { > } QEMUMachine; > > int qemu_register_machine(QEMUMachine *m); > +void register_machines(void); > > /* Axis ETRAX. */ > extern QEMUMachine bareetraxfs_machine; > diff --git a/target-arm/machine.c b/target-arm/machine.c > new file mode 100644 > index 0000000..d8de189 > --- /dev/null > +++ b/target-arm/machine.c > @@ -0,0 +1,211 @@ > +#include "hw/hw.h" > +#include "hw/boards.h" > + > +void register_machines(void) > +{ > + qemu_register_machine(&integratorcp_machine); > + qemu_register_machine(&versatilepb_machine); > + qemu_register_machine(&versatileab_machine); > + qemu_register_machine(&realview_machine); > + qemu_register_machine(&akitapda_machine); > + qemu_register_machine(&spitzpda_machine); > + qemu_register_machine(&borzoipda_machine); > + qemu_register_machine(&terrierpda_machine); > + qemu_register_machine(&palmte_machine); > + qemu_register_machine(&lm3s811evb_machine); > + qemu_register_machine(&lm3s6965evb_machine); > + qemu_register_machine(&connex_machine); > + qemu_register_machine(&verdex_machine); > + qemu_register_machine(&mainstone2_machine); > +} This list is a bit outdated and the new files lack licenses. Regards |
From: Glauber C. <gc...@re...> - 2008-04-29 19:48:43
|
Hi. This is a proposal for reducing the impact of kvm functions in core qemu code. This is by all means not ready, but I felt like posting it, so a discussion on it could follow. The idea in this patch is to replace the specific kvm details from core qemu files like vl.c, with driver_yyy() functions. When kvm is not running, those functions would just return (most of time), absolutely reducing the impact of kvm code. As I wanted to test it, in this patch I changed the kvm functions to be called driver_yyy(), but that's not my final goal. I intend to use a function pointer schema, similar to what the linux kernel already do for a lot of its subsystem, to isolate the changes. Comments deeply welcome. --- qemu/exec.c | 11 +-- qemu/gdbstub.c | 8 +- qemu/hw/vmport.c | 6 +- qemu/monitor.c | 3 +- qemu/qemu-kvm.c | 210 +++++++++++++++++++++++++++++++++++++++++++++++++++++- qemu/qemu-kvm.h | 1 + qemu/vl.c | 187 +++++------------------------------------------- 7 files changed, 239 insertions(+), 187 deletions(-) diff --git a/qemu/exec.c b/qemu/exec.c index b82d26d..7a16c78 100644 --- a/qemu/exec.c +++ b/qemu/exec.c @@ -1150,8 +1150,7 @@ int cpu_breakpoint_insert(CPUState *env, target_ulong pc) return -1; env->breakpoints[env->nb_breakpoints++] = pc; - if (kvm_enabled()) - kvm_update_debugger(env); + driver_update_debugger(env); breakpoint_invalidate(env, pc); return 0; @@ -1175,8 +1174,7 @@ int cpu_breakpoint_remove(CPUState *env, target_ulong pc) if (i < env->nb_breakpoints) env->breakpoints[i] = env->breakpoints[env->nb_breakpoints]; - if (kvm_enabled()) - kvm_update_debugger(env); + driver_update_debugger(env); breakpoint_invalidate(env, pc); return 0; @@ -1196,8 +1194,7 @@ void cpu_single_step(CPUState *env, int enabled) /* XXX: only flush what is necessary */ tb_flush(env); } - if (kvm_enabled()) - kvm_update_debugger(env); + driver_update_debugger(env); #endif } @@ -1246,7 +1243,7 @@ void cpu_interrupt(CPUState *env, int mask) env->interrupt_request |= mask; if (kvm_enabled() && !qemu_kvm_irqchip_in_kernel()) - kvm_update_interrupt_request(env); + kvm_update_interrupt_request(env); /* if the cpu is currently executing code, we must unlink it and all the potentially executing TB */ diff --git a/qemu/gdbstub.c b/qemu/gdbstub.c index 2252084..c574686 100644 --- a/qemu/gdbstub.c +++ b/qemu/gdbstub.c @@ -895,7 +895,7 @@ static int gdb_handle_packet(GDBState *s, CPUState *env, const char *line_buf) #if defined(TARGET_I386) env->eip = addr; if (kvm_enabled()) - kvm_load_registers(env); + driver_load_registers(env); #elif defined (TARGET_PPC) env->nip = addr; #elif defined (TARGET_SPARC) @@ -923,7 +923,7 @@ static int gdb_handle_packet(GDBState *s, CPUState *env, const char *line_buf) #if defined(TARGET_I386) env->eip = addr; if (kvm_enabled()) - kvm_load_registers(env); + driver_load_registers(env); #elif defined (TARGET_PPC) env->nip = addr; #elif defined (TARGET_SPARC) @@ -976,7 +976,7 @@ static int gdb_handle_packet(GDBState *s, CPUState *env, const char *line_buf) break; case 'g': if (kvm_enabled()) - kvm_save_registers(env); + driver_save_registers(env); reg_size = cpu_gdb_read_registers(env, mem_buf); memtohex(buf, mem_buf, reg_size); put_packet(s, buf); @@ -987,7 +987,7 @@ static int gdb_handle_packet(GDBState *s, CPUState *env, const char *line_buf) hextomem((uint8_t *)registers, p, len); cpu_gdb_write_registers(env, mem_buf, len); if (kvm_enabled()) - kvm_load_registers(env); + driver_load_registers(env); put_packet(s, "OK"); break; case 'm': diff --git a/qemu/hw/vmport.c b/qemu/hw/vmport.c index c09227d..a519152 100644 --- a/qemu/hw/vmport.c +++ b/qemu/hw/vmport.c @@ -59,8 +59,7 @@ static uint32_t vmport_ioport_read(void *opaque, uint32_t addr) uint32_t eax; uint32_t ret; - if (kvm_enabled()) - kvm_save_registers(s->env); + driver_save_registers(s->env); eax = s->env->regs[R_EAX]; if (eax != VMPORT_MAGIC) @@ -77,8 +76,7 @@ static uint32_t vmport_ioport_read(void *opaque, uint32_t addr) ret = s->func[command](s->opaque[command], addr); - if (kvm_enabled()) - kvm_load_registers(s->env); + driver_load_registers(s->env); return ret; } diff --git a/qemu/monitor.c b/qemu/monitor.c index 4ee0b19..bd538d9 100644 --- a/qemu/monitor.c +++ b/qemu/monitor.c @@ -286,8 +286,7 @@ static CPUState *mon_get_cpu(void) mon_set_cpu(0); } - if (kvm_enabled()) - kvm_save_registers(mon_cpu); + driver_save_registers(mon_cpu); return mon_cpu; } diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 45fddd3..f3a7758 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -20,6 +20,11 @@ int kvm_irqchip = 1; #include <pthread.h> #include <sys/utsname.h> #include <sys/syscall.h> +#include <sys/mman.h> + +int hpagesize = 0; +unsigned int kvm_shadow_memory = 0; +extern char *mem_path; extern void perror(const char *s); @@ -114,16 +119,16 @@ static int pre_kvm_run(void *opaque, int vcpu) return 0; } -void kvm_load_registers(CPUState *env) +void driver_load_registers(CPUState *env) { if (kvm_enabled()) kvm_arch_load_regs(env); } -void kvm_save_registers(CPUState *env) +void driver_save_registers(CPUState *env) { if (kvm_enabled()) - kvm_arch_save_regs(env); + kvm_arch_save_regs(env); } int kvm_cpu_exec(CPUState *env) @@ -628,6 +633,11 @@ int kvm_update_debugger(CPUState *env) return kvm_guest_debug(kvm_context, env->cpu_index, &dbg); } +int driver_update_debugger(CPUState *env) +{ + if (kvm_enabled()) + kvm_update_debugger(env); +} /* * dirty pages logging @@ -774,3 +784,197 @@ void kvm_cpu_destroy_phys_mem(target_phys_addr_t start_addr, { kvm_destroy_phys_mem(kvm_context, start_addr, size); } + +/* FIXME: make it all beautiful when kvm is off, make room for other hypervisors, etc */ + +void decorate_application_name(char *appname, int max_len) +{ + if (kvm_enabled()) + { + int remain = max_len - strlen(appname) - 1; + + if (remain > 0) + strncat(appname, "/KVM", remain); + } +} + +static int gethugepagesize(void) +{ + int ret, fd; + char buf[4096]; + char *needle = "Hugepagesize:"; + char *size; + unsigned long hugepagesize; + + fd = open("/proc/meminfo", O_RDONLY); + if (fd < 0) { + perror("open"); + exit(0); + } + + ret = read(fd, buf, sizeof(buf)); + if (ret < 0) { + perror("read"); + exit(0); + } + + size = strstr(buf, needle); + if (!size) + return 0; + size += strlen(needle); + hugepagesize = strtol(size, NULL, 0); + return hugepagesize; +} + + +void *alloc_mem_area(unsigned long memory, const char *path) +{ + char *filename; + void *area; + int fd; + + if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1) + return NULL; + + hpagesize = gethugepagesize() * 1024; + if (!hpagesize) + return NULL; + + fd = mkstemp(filename); + if (fd < 0) { + perror("mkstemp"); + free(filename); + return NULL; + } + unlink(filename); + free(filename); + + memory = (memory+hpagesize-1) & ~(hpagesize-1); + + if (ftruncate(fd, memory) == -1) { + perror("ftruncate"); + close(fd); + return NULL; + } + + area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); + if (area == MAP_FAILED) { + perror("mmap"); + close(fd); + return NULL; + } + + return area; +} + +void *qemu_alloc_physram(unsigned long memory) +{ + void *area = NULL; + + if (mem_path) + area = alloc_mem_area(memory, mem_path); + if (!area) + area = qemu_vmalloc(memory); + + return area; +} + + +void driver_cpu_save_end(QEMUFile *f, CPUState *env) +{ + int i; + if (kvm_enabled()) { + for (i = 0; i < NR_IRQ_WORDS ; i++) { + qemu_put_be32s(f, &env->kvm_interrupt_bitmap[i]); + } + qemu_put_be64s(f, &env->tsc); + } +} + +int driver_cpu_load(QEMUFile *f, CPUState *env, int version_id) +{ + int i; + if (kvm_enabled()) { + /* when in-kernel irqchip is used, HF_HALTED_MASK causes deadlock + because no userspace IRQs will ever clear this flag */ + env->hflags &= ~HF_HALTED_MASK; + for (i = 0; i < NR_IRQ_WORDS ; i++) { + qemu_get_be32s(f, &env->kvm_interrupt_bitmap[i]); + } + qemu_get_be64s(f, &env->tsc); + driver_load_registers(env); + } + return 0; +} + +int driver_allowed_page(target_ulong addr) +{ + if (kvm_enabled() && (addr>=0xa0000) && (addr<0xc0000)) /* do not access video-addresses */ + return 0; + return 1; +} + +int driver_main_loop(void) +{ + if (kvm_enabled()) { + kvm_main_loop(); + cpu_disable_ticks(); + return 0; + } + return -1; +} + +void driver_init_context(void) +{ +#if USE_KVM + if (kvm_enabled()) { + if (kvm_qemu_init() < 0) { + extern int kvm_allowed; + fprintf(stderr, "Could not initialize KVM, will disable KVM support\n"); +#ifdef NO_CPU_EMULATION + fprintf(stderr, "Compiled with --disable-cpu-emulation, exiting.\n"); + exit(1); +#endif + kvm_allowed = 0; + } + } +#endif +} + +int driver_init() +{ +#if defined(TARGET_I386) || defined(TARGET_X86_64) +#define KVM_EXTRA_PAGES 3 +#else +#define KVM_EXTRA_PAGES 0 +#endif + if (kvm_enabled()) { + phys_ram_size += KVM_EXTRA_PAGES * TARGET_PAGE_SIZE; + if (kvm_qemu_create_context() < 0) { + fprintf(stderr, "Could not create KVM context\n"); + exit(1); + } +#ifdef KVM_CAP_USER_MEMORY + { + int ret; + + ret = kvm_qemu_check_extension(KVM_CAP_USER_MEMORY); + if (ret) { + phys_ram_base = qemu_alloc_physram(phys_ram_size); + if (!phys_ram_base) { + fprintf(stderr, "Could not allocate physical memory\n"); + exit(1); + } + } + } +#endif + return 1; + } + return 0; +} + +void driver_smp_init(void) +{ + if (kvm_enabled()) + kvm_init_ap(); +} diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 8e45f30..7953f4a 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -81,6 +81,7 @@ int handle_powerpc_dcr_write(int vcpu,uint32_t dcrn, uint32_t data); extern int kvm_allowed; extern kvm_context_t kvm_context; +extern unsigned int kvm_shadow_memory; #define kvm_enabled() (kvm_allowed) #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context) diff --git a/qemu/vl.c b/qemu/vl.c index a59f71c..4df410f 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -234,9 +234,7 @@ int nb_option_roms; int semihosting_enabled = 0; int autostart = 1; int time_drift_fix = 0; -unsigned int kvm_shadow_memory = 0; const char *mem_path = NULL; -int hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM int old_param = 0; @@ -259,17 +257,6 @@ static int event_pending = 1; #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR) -void decorate_application_name(char *appname, int max_len) -{ - if (kvm_enabled()) - { - int remain = max_len - strlen(appname) - 1; - - if (remain > 0) - strncat(appname, "/KVM", remain); - } -} - /***********************************************************/ /* x86 ISA bus support */ @@ -6544,8 +6531,7 @@ void cpu_save(QEMUFile *f, void *opaque) uint32_t hflags; int i; - if (kvm_enabled()) - kvm_save_registers(env); + driver_save_registers(env); for(i = 0; i < CPU_NB_REGS; i++) qemu_put_betls(f, &env->regs[i]); @@ -6632,12 +6618,7 @@ void cpu_save(QEMUFile *f, void *opaque) #endif qemu_put_be32s(f, &env->smbase); - if (kvm_enabled()) { - for (i = 0; i < NR_IRQ_WORDS ; i++) { - qemu_put_be32s(f, &env->kvm_interrupt_bitmap[i]); - } - qemu_put_be64s(f, &env->tsc); - } + driver_cpu_save_end(f, env); } #ifdef USE_X86LDOUBLE @@ -6780,17 +6761,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id) /* XXX: compute hflags from scratch, except for CPL and IIF */ env->hflags = hflags; tlb_flush(env, 1); - if (kvm_enabled()) { - /* when in-kernel irqchip is used, HF_HALTED_MASK causes deadlock - because no userspace IRQs will ever clear this flag */ - env->hflags &= ~HF_HALTED_MASK; - for (i = 0; i < NR_IRQ_WORDS ; i++) { - qemu_get_be32s(f, &env->kvm_interrupt_bitmap[i]); - } - qemu_get_be64s(f, &env->tsc); - kvm_load_registers(env); - } - return 0; + return driver_cpu_load(f, opaque, version_id); } #elif defined(TARGET_PPC) @@ -7126,7 +7097,7 @@ static int ram_load_v1(QEMUFile *f, void *opaque) if (qemu_get_be32(f) != phys_ram_size) return -EINVAL; for(i = 0; i < phys_ram_size; i+= TARGET_PAGE_SIZE) { - if (kvm_enabled() && (i>=0xa0000) && (i<0xc0000)) /* do not access video-addresses */ + if (!driver_allowed_page(i)) continue; ret = ram_get_page(f, phys_ram_base + i, TARGET_PAGE_SIZE); if (ret) @@ -7262,7 +7233,7 @@ static void ram_save_live(QEMUFile *f, void *opaque) target_ulong addr; for (addr = 0; addr < phys_ram_size; addr += TARGET_PAGE_SIZE) { - if (kvm_enabled() && (addr>=0xa0000) && (addr<0xc0000)) /* do not access video-addresses */ + if (!driver_allowed_page(addr)) continue; if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) { qemu_put_be32(f, addr); @@ -7282,7 +7253,7 @@ static void ram_save_static(QEMUFile *f, void *opaque) if (ram_compress_open(s, f) < 0) return; for(i = 0; i < phys_ram_size; i+= BDRV_HASH_BLOCK_SIZE) { - if (kvm_enabled() && (i>=0xa0000) && (i<0xc0000)) /* do not access video-addresses */ + if (!driver_allowed_page(i)) continue; #if 0 if (tight_savevm_enabled) { @@ -7355,7 +7326,7 @@ static int ram_load_static(QEMUFile *f, void *opaque) if (ram_decompress_open(s, f) < 0) return -EINVAL; for(i = 0; i < phys_ram_size; i+= BDRV_HASH_BLOCK_SIZE) { - if (kvm_enabled() && (i>=0xa0000) && (i<0xc0000)) /* do not access video-addresses */ + if (!driver_allowed_page(i)) continue; if (ram_decompress_buf(s, buf, 1) < 0) { fprintf(stderr, "Error while reading ram block header\n"); @@ -7846,6 +7817,11 @@ void main_loop_wait(int timeout) } +int driver_enabled(void) +{ + return 1; +} + static int main_loop(void) { int ret, timeout; @@ -7854,12 +7830,8 @@ static int main_loop(void) #endif CPUState *env; - - if (kvm_enabled()) { - kvm_main_loop(); - cpu_disable_ticks(); - return 0; - } + if (driver_enabled() && (ret = driver_main_loop() < 0)) + return ret; cur_cpu = first_cpu; next_cpu = cur_cpu->next_cpu ?: first_cpu; @@ -7902,15 +7874,16 @@ static int main_loop(void) if (reset_requested) { reset_requested = 0; qemu_system_reset(); - if (kvm_enabled()) - kvm_load_registers(env); + driver_load_registers(); ret = EXCP_INTERRUPT; } + if (powerdown_requested) { powerdown_requested = 0; qemu_system_powerdown(); ret = EXCP_INTERRUPT; } + if (ret == EXCP_DEBUG) { vm_stop(EXCP_DEBUG); } @@ -8564,87 +8537,6 @@ void qemu_get_launch_info(int *argc, char ***argv, int *opt_daemonize, const cha *opt_incoming = incoming; } - -static int gethugepagesize(void) -{ - int ret, fd; - char buf[4096]; - char *needle = "Hugepagesize:"; - char *size; - unsigned long hugepagesize; - - fd = open("/proc/meminfo", O_RDONLY); - if (fd < 0) { - perror("open"); - exit(0); - } - - ret = read(fd, buf, sizeof(buf)); - if (ret < 0) { - perror("read"); - exit(0); - } - - size = strstr(buf, needle); - if (!size) - return 0; - size += strlen(needle); - hugepagesize = strtol(size, NULL, 0); - return hugepagesize; -} - -void *alloc_mem_area(unsigned long memory, const char *path) -{ - char *filename; - void *area; - int fd; - - if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1) - return NULL; - - hpagesize = gethugepagesize() * 1024; - if (!hpagesize) - return NULL; - - fd = mkstemp(filename); - if (fd < 0) { - perror("mkstemp"); - free(filename); - return NULL; - } - unlink(filename); - free(filename); - - memory = (memory+hpagesize-1) & ~(hpagesize-1); - - if (ftruncate(fd, memory) == -1) { - perror("ftruncate"); - close(fd); - return NULL; - } - - area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); - if (area == MAP_FAILED) { - perror("mmap"); - close(fd); - return NULL; - } - - return area; -} - -void *qemu_alloc_physram(unsigned long memory) -{ - void *area = NULL; - - if (mem_path) - area = alloc_mem_area(memory, mem_path); - if (!area) - area = qemu_vmalloc(memory); - - return area; -} - int main(int argc, char **argv) { #ifdef CONFIG_GDBSTUB @@ -9355,19 +9247,7 @@ int main(int argc, char **argv) } #endif -#if USE_KVM - if (kvm_enabled()) { - if (kvm_qemu_init() < 0) { - extern int kvm_allowed; - fprintf(stderr, "Could not initialize KVM, will disable KVM support\n"); -#ifdef NO_CPU_EMULATION - fprintf(stderr, "Compiled with --disable-cpu-emulation, exiting.\n"); - exit(1); -#endif - kvm_allowed = 0; - } - } -#endif + driver_init_context(); if (pid_file && qemu_create_pidfile(pid_file) != 0) { if (daemonize) { @@ -9463,33 +9343,7 @@ int main(int argc, char **argv) /* init the memory */ phys_ram_size = ram_size + vga_ram_size + MAX_BIOS_SIZE; - /* Initialize kvm */ -#if defined(TARGET_I386) || defined(TARGET_X86_64) -#define KVM_EXTRA_PAGES 3 -#else -#define KVM_EXTRA_PAGES 0 -#endif - if (kvm_enabled()) { - phys_ram_size += KVM_EXTRA_PAGES * TARGET_PAGE_SIZE; - if (kvm_qemu_create_context() < 0) { - fprintf(stderr, "Could not create KVM context\n"); - exit(1); - } -#ifdef KVM_CAP_USER_MEMORY -{ - int ret; - - ret = kvm_qemu_check_extension(KVM_CAP_USER_MEMORY); - if (ret) { - phys_ram_base = qemu_alloc_physram(phys_ram_size); - if (!phys_ram_base) { - fprintf(stderr, "Could not allocate physical memory\n"); - exit(1); - } - } -} -#endif - } else { + if (!driver_init()) { phys_ram_base = qemu_vmalloc(phys_ram_size); if (!phys_ram_base) { fprintf(stderr, "Could not allocate physical memory\n"); @@ -9637,8 +9491,7 @@ int main(int argc, char **argv) qemu_mod_timer(display_state.gui_timer, qemu_get_clock(rt_clock)); } - if (kvm_enabled()) - kvm_init_ap(); + driver_smp_init(); #ifdef CONFIG_GDBSTUB if (use_gdbstub) { -- 1.5.0.6 |
From: Glauber C. <gc...@re...> - 2008-04-29 19:19:36
|
This patch goes towards the direction of increasing general modularity of the code. Code in vl.c that used to live inside target ifdefs, are moved to inside the target directories, in a new file called machine.c. They are the cpu save/load and machine registration --- Makefile.target | 5 +- hw/boards.h | 1 + target-arm/machine.c | 211 +++++++++++++++++ target-cris/machine.c | 7 + target-i386/machine.c | 264 +++++++++++++++++++++ target-m68k/machine.c | 9 + target-mips/machine.c | 21 ++ target-ppc/machine.c | 20 ++ target-sh4/machine.c | 8 + target-sparc/machine.c | 102 ++++++++ vl.c | 616 ------------------------------------------------ 11 files changed, 647 insertions(+), 617 deletions(-) create mode 100644 target-arm/machine.c create mode 100644 target-cris/machine.c create mode 100644 target-i386/machine.c create mode 100644 target-m68k/machine.c create mode 100644 target-mips/machine.c create mode 100644 target-ppc/machine.c create mode 100644 target-sh4/machine.c create mode 100644 target-sparc/machine.c diff --git a/Makefile.target b/Makefile.target index 5ac29a7..a530ee5 100644 --- a/Makefile.target +++ b/Makefile.target @@ -303,6 +303,9 @@ gen-op.h: op.o $(DYNGEN) op.o: op.c $(CC) $(OP_CFLAGS) $(CPPFLAGS) -c -o $@ $< +machine.o: machine.c + $(CC) $(OP_CFLAGS) $(CPPFLAGS) -c -o $@ $< + # HELPER_CFLAGS is used for all the code compiled with static register # variables ifeq ($(TARGET_BASE_ARCH), i386) @@ -481,7 +484,7 @@ endif #CONFIG_DARWIN_USER # System emulator target ifndef CONFIG_USER_ONLY -OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o +OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o ifdef CONFIG_WIN32 OBJS+=block-raw-win32.o else diff --git a/hw/boards.h b/hw/boards.h index affcaa6..ada4664 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -18,6 +18,7 @@ typedef struct QEMUMachine { } QEMUMachine; int qemu_register_machine(QEMUMachine *m); +void register_machines(void); /* Axis ETRAX. */ extern QEMUMachine bareetraxfs_machine; diff --git a/target-arm/machine.c b/target-arm/machine.c new file mode 100644 index 0000000..d8de189 --- /dev/null +++ b/target-arm/machine.c @@ -0,0 +1,211 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&integratorcp_machine); + qemu_register_machine(&versatilepb_machine); + qemu_register_machine(&versatileab_machine); + qemu_register_machine(&realview_machine); + qemu_register_machine(&akitapda_machine); + qemu_register_machine(&spitzpda_machine); + qemu_register_machine(&borzoipda_machine); + qemu_register_machine(&terrierpda_machine); + qemu_register_machine(&palmte_machine); + qemu_register_machine(&lm3s811evb_machine); + qemu_register_machine(&lm3s6965evb_machine); + qemu_register_machine(&connex_machine); + qemu_register_machine(&verdex_machine); + qemu_register_machine(&mainstone2_machine); +} + +void cpu_save(QEMUFile *f, void *opaque) +{ + int i; + CPUARMState *env = (CPUARMState *)opaque; + + for (i = 0; i < 16; i++) { + qemu_put_be32(f, env->regs[i]); + } + qemu_put_be32(f, cpsr_read(env)); + qemu_put_be32(f, env->spsr); + for (i = 0; i < 6; i++) { + qemu_put_be32(f, env->banked_spsr[i]); + qemu_put_be32(f, env->banked_r13[i]); + qemu_put_be32(f, env->banked_r14[i]); + } + for (i = 0; i < 5; i++) { + qemu_put_be32(f, env->usr_regs[i]); + qemu_put_be32(f, env->fiq_regs[i]); + } + qemu_put_be32(f, env->cp15.c0_cpuid); + qemu_put_be32(f, env->cp15.c0_cachetype); + qemu_put_be32(f, env->cp15.c1_sys); + qemu_put_be32(f, env->cp15.c1_coproc); + qemu_put_be32(f, env->cp15.c1_xscaleauxcr); + qemu_put_be32(f, env->cp15.c2_base0); + qemu_put_be32(f, env->cp15.c2_base1); + qemu_put_be32(f, env->cp15.c2_mask); + qemu_put_be32(f, env->cp15.c2_data); + qemu_put_be32(f, env->cp15.c2_insn); + qemu_put_be32(f, env->cp15.c3); + qemu_put_be32(f, env->cp15.c5_insn); + qemu_put_be32(f, env->cp15.c5_data); + for (i = 0; i < 8; i++) { + qemu_put_be32(f, env->cp15.c6_region[i]); + } + qemu_put_be32(f, env->cp15.c6_insn); + qemu_put_be32(f, env->cp15.c6_data); + qemu_put_be32(f, env->cp15.c9_insn); + qemu_put_be32(f, env->cp15.c9_data); + qemu_put_be32(f, env->cp15.c13_fcse); + qemu_put_be32(f, env->cp15.c13_context); + qemu_put_be32(f, env->cp15.c13_tls1); + qemu_put_be32(f, env->cp15.c13_tls2); + qemu_put_be32(f, env->cp15.c13_tls3); + qemu_put_be32(f, env->cp15.c15_cpar); + + qemu_put_be32(f, env->features); + + if (arm_feature(env, ARM_FEATURE_VFP)) { + for (i = 0; i < 16; i++) { + CPU_DoubleU u; + u.d = env->vfp.regs[i]; + qemu_put_be32(f, u.l.upper); + qemu_put_be32(f, u.l.lower); + } + for (i = 0; i < 16; i++) { + qemu_put_be32(f, env->vfp.xregs[i]); + } + + /* TODO: Should use proper FPSCR access functions. */ + qemu_put_be32(f, env->vfp.vec_len); + qemu_put_be32(f, env->vfp.vec_stride); + + if (arm_feature(env, ARM_FEATURE_VFP3)) { + for (i = 16; i < 32; i++) { + CPU_DoubleU u; + u.d = env->vfp.regs[i]; + qemu_put_be32(f, u.l.upper); + qemu_put_be32(f, u.l.lower); + } + } + } + + if (arm_feature(env, ARM_FEATURE_IWMMXT)) { + for (i = 0; i < 16; i++) { + qemu_put_be64(f, env->iwmmxt.regs[i]); + } + for (i = 0; i < 16; i++) { + qemu_put_be32(f, env->iwmmxt.cregs[i]); + } + } + + if (arm_feature(env, ARM_FEATURE_M)) { + qemu_put_be32(f, env->v7m.other_sp); + qemu_put_be32(f, env->v7m.vecbase); + qemu_put_be32(f, env->v7m.basepri); + qemu_put_be32(f, env->v7m.control); + qemu_put_be32(f, env->v7m.current_sp); + qemu_put_be32(f, env->v7m.exception); + } +} + +int cpu_load(QEMUFile *f, void *opaque, int version_id) +{ + CPUARMState *env = (CPUARMState *)opaque; + int i; + + if (version_id != ARM_CPU_SAVE_VERSION) + return -EINVAL; + + for (i = 0; i < 16; i++) { + env->regs[i] = qemu_get_be32(f); + } + cpsr_write(env, qemu_get_be32(f), 0xffffffff); + env->spsr = qemu_get_be32(f); + for (i = 0; i < 6; i++) { + env->banked_spsr[i] = qemu_get_be32(f); + env->banked_r13[i] = qemu_get_be32(f); + env->banked_r14[i] = qemu_get_be32(f); + } + for (i = 0; i < 5; i++) { + env->usr_regs[i] = qemu_get_be32(f); + env->fiq_regs[i] = qemu_get_be32(f); + } + env->cp15.c0_cpuid = qemu_get_be32(f); + env->cp15.c0_cachetype = qemu_get_be32(f); + env->cp15.c1_sys = qemu_get_be32(f); + env->cp15.c1_coproc = qemu_get_be32(f); + env->cp15.c1_xscaleauxcr = qemu_get_be32(f); + env->cp15.c2_base0 = qemu_get_be32(f); + env->cp15.c2_base1 = qemu_get_be32(f); + env->cp15.c2_mask = qemu_get_be32(f); + env->cp15.c2_data = qemu_get_be32(f); + env->cp15.c2_insn = qemu_get_be32(f); + env->cp15.c3 = qemu_get_be32(f); + env->cp15.c5_insn = qemu_get_be32(f); + env->cp15.c5_data = qemu_get_be32(f); + for (i = 0; i < 8; i++) { + env->cp15.c6_region[i] = qemu_get_be32(f); + } + env->cp15.c6_insn = qemu_get_be32(f); + env->cp15.c6_data = qemu_get_be32(f); + env->cp15.c9_insn = qemu_get_be32(f); + env->cp15.c9_data = qemu_get_be32(f); + env->cp15.c13_fcse = qemu_get_be32(f); + env->cp15.c13_context = qemu_get_be32(f); + env->cp15.c13_tls1 = qemu_get_be32(f); + env->cp15.c13_tls2 = qemu_get_be32(f); + env->cp15.c13_tls3 = qemu_get_be32(f); + env->cp15.c15_cpar = qemu_get_be32(f); + + env->features = qemu_get_be32(f); + + if (arm_feature(env, ARM_FEATURE_VFP)) { + for (i = 0; i < 16; i++) { + CPU_DoubleU u; + u.l.upper = qemu_get_be32(f); + u.l.lower = qemu_get_be32(f); + env->vfp.regs[i] = u.d; + } + for (i = 0; i < 16; i++) { + env->vfp.xregs[i] = qemu_get_be32(f); + } + + /* TODO: Should use proper FPSCR access functions. */ + env->vfp.vec_len = qemu_get_be32(f); + env->vfp.vec_stride = qemu_get_be32(f); + + if (arm_feature(env, ARM_FEATURE_VFP3)) { + for (i = 0; i < 16; i++) { + CPU_DoubleU u; + u.l.upper = qemu_get_be32(f); + u.l.lower = qemu_get_be32(f); + env->vfp.regs[i] = u.d; + } + } + } + + if (arm_feature(env, ARM_FEATURE_IWMMXT)) { + for (i = 0; i < 16; i++) { + env->iwmmxt.regs[i] = qemu_get_be64(f); + } + for (i = 0; i < 16; i++) { + env->iwmmxt.cregs[i] = qemu_get_be32(f); + } + } + + if (arm_feature(env, ARM_FEATURE_M)) { + env->v7m.other_sp = qemu_get_be32(f); + env->v7m.vecbase = qemu_get_be32(f); + env->v7m.basepri = qemu_get_be32(f); + env->v7m.control = qemu_get_be32(f); + env->v7m.current_sp = qemu_get_be32(f); + env->v7m.exception = qemu_get_be32(f); + } + + return 0; +} + + diff --git a/target-cris/machine.c b/target-cris/machine.c new file mode 100644 index 0000000..cbfa645 --- /dev/null +++ b/target-cris/machine.c @@ -0,0 +1,7 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&bareetraxfs_machine); +} diff --git a/target-i386/machine.c b/target-i386/machine.c new file mode 100644 index 0000000..703c820 --- /dev/null +++ b/target-i386/machine.c @@ -0,0 +1,264 @@ +#include "hw/hw.h" +#include "hw/boards.h" +#include "hw/pc.h" +#include "hw/isa.h" + +#include "exec-all.h" + +void register_machines(void) +{ + qemu_register_machine(&pc_machine); + qemu_register_machine(&isapc_machine); +} + +static void cpu_put_seg(QEMUFile *f, SegmentCache *dt) +{ + qemu_put_be32(f, dt->selector); + qemu_put_betl(f, dt->base); + qemu_put_be32(f, dt->limit); + qemu_put_be32(f, dt->flags); +} + +static void cpu_get_seg(QEMUFile *f, SegmentCache *dt) +{ + dt->selector = qemu_get_be32(f); + dt->base = qemu_get_betl(f); + dt->limit = qemu_get_be32(f); + dt->flags = qemu_get_be32(f); +} + +void cpu_save(QEMUFile *f, void *opaque) +{ + CPUState *env = opaque; + uint16_t fptag, fpus, fpuc, fpregs_format; + uint32_t hflags; + int i; + + for(i = 0; i < CPU_NB_REGS; i++) + qemu_put_betls(f, &env->regs[i]); + qemu_put_betls(f, &env->eip); + qemu_put_betls(f, &env->eflags); + hflags = env->hflags; /* XXX: suppress most of the redundant hflags */ + qemu_put_be32s(f, &hflags); + + /* FPU */ + fpuc = env->fpuc; + fpus = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11; + fptag = 0; + for(i = 0; i < 8; i++) { + fptag |= ((!env->fptags[i]) << i); + } + + qemu_put_be16s(f, &fpuc); + qemu_put_be16s(f, &fpus); + qemu_put_be16s(f, &fptag); + +#ifdef USE_X86LDOUBLE + fpregs_format = 0; +#else + fpregs_format = 1; +#endif + qemu_put_be16s(f, &fpregs_format); + + for(i = 0; i < 8; i++) { +#ifdef USE_X86LDOUBLE + { + uint64_t mant; + uint16_t exp; + /* we save the real CPU data (in case of MMX usage only 'mant' + contains the MMX register */ + cpu_get_fp80(&mant, &exp, env->fpregs[i].d); + qemu_put_be64(f, mant); + qemu_put_be16(f, exp); + } +#else + /* if we use doubles for float emulation, we save the doubles to + avoid losing information in case of MMX usage. It can give + problems if the image is restored on a CPU where long + doubles are used instead. */ + qemu_put_be64(f, env->fpregs[i].mmx.MMX_Q(0)); +#endif + } + + for(i = 0; i < 6; i++) + cpu_put_seg(f, &env->segs[i]); + cpu_put_seg(f, &env->ldt); + cpu_put_seg(f, &env->tr); + cpu_put_seg(f, &env->gdt); + cpu_put_seg(f, &env->idt); + + qemu_put_be32s(f, &env->sysenter_cs); + qemu_put_be32s(f, &env->sysenter_esp); + qemu_put_be32s(f, &env->sysenter_eip); + + qemu_put_betls(f, &env->cr[0]); + qemu_put_betls(f, &env->cr[2]); + qemu_put_betls(f, &env->cr[3]); + qemu_put_betls(f, &env->cr[4]); + + for(i = 0; i < 8; i++) + qemu_put_betls(f, &env->dr[i]); + + /* MMU */ + qemu_put_be32s(f, &env->a20_mask); + + /* XMM */ + qemu_put_be32s(f, &env->mxcsr); + for(i = 0; i < CPU_NB_REGS; i++) { + qemu_put_be64s(f, &env->xmm_regs[i].XMM_Q(0)); + qemu_put_be64s(f, &env->xmm_regs[i].XMM_Q(1)); + } + +#ifdef TARGET_X86_64 + qemu_put_be64s(f, &env->efer); + qemu_put_be64s(f, &env->star); + qemu_put_be64s(f, &env->lstar); + qemu_put_be64s(f, &env->cstar); + qemu_put_be64s(f, &env->fmask); + qemu_put_be64s(f, &env->kernelgsbase); +#endif + qemu_put_be32s(f, &env->smbase); +} + +#ifdef USE_X86LDOUBLE +/* XXX: add that in a FPU generic layer */ +union x86_longdouble { + uint64_t mant; + uint16_t exp; +}; + +#define MANTD1(fp) (fp & ((1LL << 52) - 1)) +#define EXPBIAS1 1023 +#define EXPD1(fp) ((fp >> 52) & 0x7FF) +#define SIGND1(fp) ((fp >> 32) & 0x80000000) + +static void fp64_to_fp80(union x86_longdouble *p, uint64_t temp) +{ + int e; + /* mantissa */ + p->mant = (MANTD1(temp) << 11) | (1LL << 63); + /* exponent + sign */ + e = EXPD1(temp) - EXPBIAS1 + 16383; + e |= SIGND1(temp) >> 16; + p->exp = e; +} +#endif + +int cpu_load(QEMUFile *f, void *opaque, int version_id) +{ + CPUState *env = opaque; + int i, guess_mmx; + uint32_t hflags; + uint16_t fpus, fpuc, fptag, fpregs_format; + + if (version_id != 3 && version_id != 4) + return -EINVAL; + for(i = 0; i < CPU_NB_REGS; i++) + qemu_get_betls(f, &env->regs[i]); + qemu_get_betls(f, &env->eip); + qemu_get_betls(f, &env->eflags); + qemu_get_be32s(f, &hflags); + + qemu_get_be16s(f, &fpuc); + qemu_get_be16s(f, &fpus); + qemu_get_be16s(f, &fptag); + qemu_get_be16s(f, &fpregs_format); + + /* NOTE: we cannot always restore the FPU state if the image come + from a host with a different 'USE_X86LDOUBLE' define. We guess + if we are in an MMX state to restore correctly in that case. */ + guess_mmx = ((fptag == 0xff) && (fpus & 0x3800) == 0); + for(i = 0; i < 8; i++) { + uint64_t mant; + uint16_t exp; + + switch(fpregs_format) { + case 0: + mant = qemu_get_be64(f); + exp = qemu_get_be16(f); +#ifdef USE_X86LDOUBLE + env->fpregs[i].d = cpu_set_fp80(mant, exp); +#else + /* difficult case */ + if (guess_mmx) + env->fpregs[i].mmx.MMX_Q(0) = mant; + else + env->fpregs[i].d = cpu_set_fp80(mant, exp); +#endif + break; + case 1: + mant = qemu_get_be64(f); +#ifdef USE_X86LDOUBLE + { + union x86_longdouble *p; + /* difficult case */ + p = (void *)&env->fpregs[i]; + if (guess_mmx) { + p->mant = mant; + p->exp = 0xffff; + } else { + fp64_to_fp80(p, mant); + } + } +#else + env->fpregs[i].mmx.MMX_Q(0) = mant; +#endif + break; + default: + return -EINVAL; + } + } + + env->fpuc = fpuc; + /* XXX: restore FPU round state */ + env->fpstt = (fpus >> 11) & 7; + env->fpus = fpus & ~0x3800; + fptag ^= 0xff; + for(i = 0; i < 8; i++) { + env->fptags[i] = (fptag >> i) & 1; + } + + for(i = 0; i < 6; i++) + cpu_get_seg(f, &env->segs[i]); + cpu_get_seg(f, &env->ldt); + cpu_get_seg(f, &env->tr); + cpu_get_seg(f, &env->gdt); + cpu_get_seg(f, &env->idt); + + qemu_get_be32s(f, &env->sysenter_cs); + qemu_get_be32s(f, &env->sysenter_esp); + qemu_get_be32s(f, &env->sysenter_eip); + + qemu_get_betls(f, &env->cr[0]); + qemu_get_betls(f, &env->cr[2]); + qemu_get_betls(f, &env->cr[3]); + qemu_get_betls(f, &env->cr[4]); + + for(i = 0; i < 8; i++) + qemu_get_betls(f, &env->dr[i]); + + /* MMU */ + qemu_get_be32s(f, &env->a20_mask); + + qemu_get_be32s(f, &env->mxcsr); + for(i = 0; i < CPU_NB_REGS; i++) { + qemu_get_be64s(f, &env->xmm_regs[i].XMM_Q(0)); + qemu_get_be64s(f, &env->xmm_regs[i].XMM_Q(1)); + } + +#ifdef TARGET_X86_64 + qemu_get_be64s(f, &env->efer); + qemu_get_be64s(f, &env->star); + qemu_get_be64s(f, &env->lstar); + qemu_get_be64s(f, &env->cstar); + qemu_get_be64s(f, &env->fmask); + qemu_get_be64s(f, &env->kernelgsbase); +#endif + if (version_id >= 4) + qemu_get_be32s(f, &env->smbase); + + /* XXX: compute hflags from scratch, except for CPL and IIF */ + env->hflags = hflags; + tlb_flush(env, 1); + return 0; +} diff --git a/target-m68k/machine.c b/target-m68k/machine.c new file mode 100644 index 0000000..fbdcac9 --- /dev/null +++ b/target-m68k/machine.c @@ -0,0 +1,9 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&mcf5208evb_machine); + qemu_register_machine(&an5206_machine); + qemu_register_machine(&dummy_m68k_machine); +} diff --git a/target-mips/machine.c b/target-mips/machine.c new file mode 100644 index 0000000..ba01070 --- /dev/null +++ b/target-mips/machine.c @@ -0,0 +1,21 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&mips_machine); + qemu_register_machine(&mips_malta_machine); + qemu_register_machine(&mips_pica61_machine); + qemu_register_machine(&mips_mipssim_machine); +} + +void cpu_save(QEMUFile *f, void *opaque) +{ +} + +int cpu_load(QEMUFile *f, void *opaque, int version_id) +{ + return 0; +} + + diff --git a/target-ppc/machine.c b/target-ppc/machine.c new file mode 100644 index 0000000..be0cbe1 --- /dev/null +++ b/target-ppc/machine.c @@ -0,0 +1,20 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&heathrow_machine); + qemu_register_machine(&core99_machine); + qemu_register_machine(&prep_machine); + qemu_register_machine(&ref405ep_machine); + qemu_register_machine(&taihu_machine); +} + +void cpu_save(QEMUFile *f, void *opaque) +{ +} + +int cpu_load(QEMUFile *f, void *opaque, int version_id) +{ + return 0; +} diff --git a/target-sh4/machine.c b/target-sh4/machine.c new file mode 100644 index 0000000..2d78aae --- /dev/null +++ b/target-sh4/machine.c @@ -0,0 +1,8 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +void register_machines(void) +{ + qemu_register_machine(&shix_machine); + qemu_register_machine(&r2d_machine); +} diff --git a/target-sparc/machine.c b/target-sparc/machine.c new file mode 100644 index 0000000..0e7a23e --- /dev/null +++ b/target-sparc/machine.c @@ -0,0 +1,102 @@ +#include "hw/hw.h" +#include "hw/boards.h" + +#include "exec-all.h" + +void register_machines(void) +{ +#ifdef TARGET_SPARC64 + qemu_register_machine(&sun4u_machine); +#else + qemu_register_machine(&ss5_machine); + qemu_register_machine(&ss10_machine); + qemu_register_machine(&ss600mp_machine); + qemu_register_machine(&ss20_machine); + qemu_register_machine(&ss2_machine); + qemu_register_machine(&voyager_machine); + qemu_register_machine(&ss_lx_machine); + qemu_register_machine(&ss4_machine); + qemu_register_machine(&scls_machine); + qemu_register_machine(&sbook_machine); + qemu_register_machine(&ss1000_machine); + qemu_register_machine(&ss2000_machine); +#endif +} + +void cpu_save(QEMUFile *f, void *opaque) +{ + CPUState *env = opaque; + int i; + uint32_t tmp; + + for(i = 0; i < 8; i++) + qemu_put_betls(f, &env->gregs[i]); + for(i = 0; i < NWINDOWS * 16; i++) + qemu_put_betls(f, &env->regbase[i]); + + /* FPU */ + for(i = 0; i < TARGET_FPREGS; i++) { + union { + float32 f; + uint32_t i; + } u; + u.f = env->fpr[i]; + qemu_put_be32(f, u.i); + } + + qemu_put_betls(f, &env->pc); + qemu_put_betls(f, &env->npc); + qemu_put_betls(f, &env->y); + tmp = GET_PSR(env); + qemu_put_be32(f, tmp); + qemu_put_betls(f, &env->fsr); + qemu_put_betls(f, &env->tbr); +#ifndef TARGET_SPARC64 + qemu_put_be32s(f, &env->wim); + /* MMU */ + for(i = 0; i < 16; i++) + qemu_put_be32s(f, &env->mmuregs[i]); +#endif +} + +int cpu_load(QEMUFile *f, void *opaque, int version_id) +{ + CPUState *env = opaque; + int i; + uint32_t tmp; + + for(i = 0; i < 8; i++) + qemu_get_betls(f, &env->gregs[i]); + for(i = 0; i < NWINDOWS * 16; i++) + qemu_get_betls(f, &env->regbase[i]); + + /* FPU */ + for(i = 0; i < TARGET_FPREGS; i++) { + union { + float32 f; + uint32_t i; + } u; + u.i = qemu_get_be32(f); + env->fpr[i] = u.f; + } + + qemu_get_betls(f, &env->pc); + qemu_get_betls(f, &env->npc); + qemu_get_betls(f, &env->y); + tmp = qemu_get_be32(f); + env->cwp = 0; /* needed to ensure that the wrapping registers are + correctly updated */ + PUT_PSR(env, tmp); + qemu_get_betls(f, &env->fsr); + qemu_get_betls(f, &env->tbr); +#ifndef TARGET_SPARC64 + qemu_get_be32s(f, &env->wim); + /* MMU */ + for(i = 0; i < 16; i++) + qemu_get_be32s(f, &env->mmuregs[i]); +#endif + tlb_flush(env, 1); + return 0; +} + + diff --git a/vl.c b/vl.c index 61eb191..9289982 100644 --- a/vl.c +++ b/vl.c @@ -6292,557 +6292,6 @@ void do_info_snapshots(void) } /***********************************************************/ -/* cpu save/restore */ - -#if defined(TARGET_I386) - -static void cpu_put_seg(QEMUFile *f, SegmentCache *dt) -{ - qemu_put_be32(f, dt->selector); - qemu_put_betl(f, dt->base); - qemu_put_be32(f, dt->limit); - qemu_put_be32(f, dt->flags); -} - -static void cpu_get_seg(QEMUFile *f, SegmentCache *dt) -{ - dt->selector = qemu_get_be32(f); - dt->base = qemu_get_betl(f); - dt->limit = qemu_get_be32(f); - dt->flags = qemu_get_be32(f); -} - -void cpu_save(QEMUFile *f, void *opaque) -{ - CPUState *env = opaque; - uint16_t fptag, fpus, fpuc, fpregs_format; - uint32_t hflags; - int i; - - for(i = 0; i < CPU_NB_REGS; i++) - qemu_put_betls(f, &env->regs[i]); - qemu_put_betls(f, &env->eip); - qemu_put_betls(f, &env->eflags); - hflags = env->hflags; /* XXX: suppress most of the redundant hflags */ - qemu_put_be32s(f, &hflags); - - /* FPU */ - fpuc = env->fpuc; - fpus = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11; - fptag = 0; - for(i = 0; i < 8; i++) { - fptag |= ((!env->fptags[i]) << i); - } - - qemu_put_be16s(f, &fpuc); - qemu_put_be16s(f, &fpus); - qemu_put_be16s(f, &fptag); - -#ifdef USE_X86LDOUBLE - fpregs_format = 0; -#else - fpregs_format = 1; -#endif - qemu_put_be16s(f, &fpregs_format); - - for(i = 0; i < 8; i++) { -#ifdef USE_X86LDOUBLE - { - uint64_t mant; - uint16_t exp; - /* we save the real CPU data (in case of MMX usage only 'mant' - contains the MMX register */ - cpu_get_fp80(&mant, &exp, env->fpregs[i].d); - qemu_put_be64(f, mant); - qemu_put_be16(f, exp); - } -#else - /* if we use doubles for float emulation, we save the doubles to - avoid losing information in case of MMX usage. It can give - problems if the image is restored on a CPU where long - doubles are used instead. */ - qemu_put_be64(f, env->fpregs[i].mmx.MMX_Q(0)); -#endif - } - - for(i = 0; i < 6; i++) - cpu_put_seg(f, &env->segs[i]); - cpu_put_seg(f, &env->ldt); - cpu_put_seg(f, &env->tr); - cpu_put_seg(f, &env->gdt); - cpu_put_seg(f, &env->idt); - - qemu_put_be32s(f, &env->sysenter_cs); - qemu_put_be32s(f, &env->sysenter_esp); - qemu_put_be32s(f, &env->sysenter_eip); - - qemu_put_betls(f, &env->cr[0]); - qemu_put_betls(f, &env->cr[2]); - qemu_put_betls(f, &env->cr[3]); - qemu_put_betls(f, &env->cr[4]); - - for(i = 0; i < 8; i++) - qemu_put_betls(f, &env->dr[i]); - - /* MMU */ - qemu_put_be32s(f, &env->a20_mask); - - /* XMM */ - qemu_put_be32s(f, &env->mxcsr); - for(i = 0; i < CPU_NB_REGS; i++) { - qemu_put_be64s(f, &env->xmm_regs[i].XMM_Q(0)); - qemu_put_be64s(f, &env->xmm_regs[i].XMM_Q(1)); - } - -#ifdef TARGET_X86_64 - qemu_put_be64s(f, &env->efer); - qemu_put_be64s(f, &env->star); - qemu_put_be64s(f, &env->lstar); - qemu_put_be64s(f, &env->cstar); - qemu_put_be64s(f, &env->fmask); - qemu_put_be64s(f, &env->kernelgsbase); -#endif - qemu_put_be32s(f, &env->smbase); -} - -#ifdef USE_X86LDOUBLE -/* XXX: add that in a FPU generic layer */ -union x86_longdouble { - uint64_t mant; - uint16_t exp; -}; - -#define MANTD1(fp) (fp & ((1LL << 52) - 1)) -#define EXPBIAS1 1023 -#define EXPD1(fp) ((fp >> 52) & 0x7FF) -#define SIGND1(fp) ((fp >> 32) & 0x80000000) - -static void fp64_to_fp80(union x86_longdouble *p, uint64_t temp) -{ - int e; - /* mantissa */ - p->mant = (MANTD1(temp) << 11) | (1LL << 63); - /* exponent + sign */ - e = EXPD1(temp) - EXPBIAS1 + 16383; - e |= SIGND1(temp) >> 16; - p->exp = e; -} -#endif - -int cpu_load(QEMUFile *f, void *opaque, int version_id) -{ - CPUState *env = opaque; - int i, guess_mmx; - uint32_t hflags; - uint16_t fpus, fpuc, fptag, fpregs_format; - - if (version_id != 3 && version_id != 4) - return -EINVAL; - for(i = 0; i < CPU_NB_REGS; i++) - qemu_get_betls(f, &env->regs[i]); - qemu_get_betls(f, &env->eip); - qemu_get_betls(f, &env->eflags); - qemu_get_be32s(f, &hflags); - - qemu_get_be16s(f, &fpuc); - qemu_get_be16s(f, &fpus); - qemu_get_be16s(f, &fptag); - qemu_get_be16s(f, &fpregs_format); - - /* NOTE: we cannot always restore the FPU state if the image come - from a host with a different 'USE_X86LDOUBLE' define. We guess - if we are in an MMX state to restore correctly in that case. */ - guess_mmx = ((fptag == 0xff) && (fpus & 0x3800) == 0); - for(i = 0; i < 8; i++) { - uint64_t mant; - uint16_t exp; - - switch(fpregs_format) { - case 0: - mant = qemu_get_be64(f); - exp = qemu_get_be16(f); -#ifdef USE_X86LDOUBLE - env->fpregs[i].d = cpu_set_fp80(mant, exp); -#else - /* difficult case */ - if (guess_mmx) - env->fpregs[i].mmx.MMX_Q(0) = mant; - else - env->fpregs[i].d = cpu_set_fp80(mant, exp); -#endif - break; - case 1: - mant = qemu_get_be64(f); -#ifdef USE_X86LDOUBLE - { - union x86_longdouble *p; - /* difficult case */ - p = (void *)&env->fpregs[i]; - if (guess_mmx) { - p->mant = mant; - p->exp = 0xffff; - } else { - fp64_to_fp80(p, mant); - } - } -#else - env->fpregs[i].mmx.MMX_Q(0) = mant; -#endif - break; - default: - return -EINVAL; - } - } - - env->fpuc = fpuc; - /* XXX: restore FPU round state */ - env->fpstt = (fpus >> 11) & 7; - env->fpus = fpus & ~0x3800; - fptag ^= 0xff; - for(i = 0; i < 8; i++) { - env->fptags[i] = (fptag >> i) & 1; - } - - for(i = 0; i < 6; i++) - cpu_get_seg(f, &env->segs[i]); - cpu_get_seg(f, &env->ldt); - cpu_get_seg(f, &env->tr); - cpu_get_seg(f, &env->gdt); - cpu_get_seg(f, &env->idt); - - qemu_get_be32s(f, &env->sysenter_cs); - qemu_get_be32s(f, &env->sysenter_esp); - qemu_get_be32s(f, &env->sysenter_eip); - - qemu_get_betls(f, &env->cr[0]); - qemu_get_betls(f, &env->cr[2]); - qemu_get_betls(f, &env->cr[3]); - qemu_get_betls(f, &env->cr[4]); - - for(i = 0; i < 8; i++) - qemu_get_betls(f, &env->dr[i]); - - /* MMU */ - qemu_get_be32s(f, &env->a20_mask); - - qemu_get_be32s(f, &env->mxcsr); - for(i = 0; i < CPU_NB_REGS; i++) { - qemu_get_be64s(f, &env->xmm_regs[i].XMM_Q(0)); - qemu_get_be64s(f, &env->xmm_regs[i].XMM_Q(1)); - } - -#ifdef TARGET_X86_64 - qemu_get_be64s(f, &env->efer); - qemu_get_be64s(f, &env->star); - qemu_get_be64s(f, &env->lstar); - qemu_get_be64s(f, &env->cstar); - qemu_get_be64s(f, &env->fmask); - qemu_get_be64s(f, &env->kernelgsbase); -#endif - if (version_id >= 4) - qemu_get_be32s(f, &env->smbase); - - /* XXX: compute hflags from scratch, except for CPL and IIF */ - env->hflags = hflags; - tlb_flush(env, 1); - return 0; -} - -#elif defined(TARGET_PPC) -void cpu_save(QEMUFile *f, void *opaque) -{ -} - -int cpu_load(QEMUFile *f, void *opaque, int version_id) -{ - return 0; -} - -#elif defined(TARGET_MIPS) -void cpu_save(QEMUFile *f, void *opaque) -{ -} - -int cpu_load(QEMUFile *f, void *opaque, int version_id) -{ - return 0; -} - -#elif defined(TARGET_SPARC) -void cpu_save(QEMUFile *f, void *opaque) -{ - CPUState *env = opaque; - int i; - uint32_t tmp; - - for(i = 0; i < 8; i++) - qemu_put_betls(f, &env->gregs[i]); - for(i = 0; i < NWINDOWS * 16; i++) - qemu_put_betls(f, &env->regbase[i]); - - /* FPU */ - for(i = 0; i < TARGET_FPREGS; i++) { - union { - float32 f; - uint32_t i; - } u; - u.f = env->fpr[i]; - qemu_put_be32(f, u.i); - } - - qemu_put_betls(f, &env->pc); - qemu_put_betls(f, &env->npc); - qemu_put_betls(f, &env->y); - tmp = GET_PSR(env); - qemu_put_be32(f, tmp); - qemu_put_betls(f, &env->fsr); - qemu_put_betls(f, &env->tbr); -#ifndef TARGET_SPARC64 - qemu_put_be32s(f, &env->wim); - /* MMU */ - for(i = 0; i < 16; i++) - qemu_put_be32s(f, &env->mmuregs[i]); -#endif -} - -int cpu_load(QEMUFile *f, void *opaque, int version_id) -{ - CPUState *env = opaque; - int i; - uint32_t tmp; - - for(i = 0; i < 8; i++) - qemu_get_betls(f, &env->gregs[i]); - for(i = 0; i < NWINDOWS * 16; i++) - qemu_get_betls(f, &env->regbase[i]); - - /* FPU */ - for(i = 0; i < TARGET_FPREGS; i++) { - union { - float32 f; - uint32_t i; - } u; - u.i = qemu_get_be32(f); - env->fpr[i] = u.f; - } - - qemu_get_betls(f, &env->pc); - qemu_get_betls(f, &env->npc); - qemu_get_betls(f, &env->y); - tmp = qemu_get_be32(f); - env->cwp = 0; /* needed to ensure that the wrapping registers are - correctly updated */ - PUT_PSR(env, tmp); - qemu_get_betls(f, &env->fsr); - qemu_get_betls(f, &env->tbr); -#ifndef TARGET_SPARC64 - qemu_get_be32s(f, &env->wim); - /* MMU */ - for(i = 0; i < 16; i++) - qemu_get_be32s(f, &env->mmuregs[i]); -#endif - tlb_flush(env, 1); - return 0; -} - -#elif defined(TARGET_ARM) - -void cpu_save(QEMUFile *f, void *opaque) -{ - int i; - CPUARMState *env = (CPUARMState *)opaque; - - for (i = 0; i < 16; i++) { - qemu_put_be32(f, env->regs[i]); - } - qemu_put_be32(f, cpsr_read(env)); - qemu_put_be32(f, env->spsr); - for (i = 0; i < 6; i++) { - qemu_put_be32(f, env->banked_spsr[i]); - qemu_put_be32(f, env->banked_r13[i]); - qemu_put_be32(f, env->banked_r14[i]); - } - for (i = 0; i < 5; i++) { - qemu_put_be32(f, env->usr_regs[i]); - qemu_put_be32(f, env->fiq_regs[i]); - } - qemu_put_be32(f, env->cp15.c0_cpuid); - qemu_put_be32(f, env->cp15.c0_cachetype); - qemu_put_be32(f, env->cp15.c1_sys); - qemu_put_be32(f, env->cp15.c1_coproc); - qemu_put_be32(f, env->cp15.c1_xscaleauxcr); - qemu_put_be32(f, env->cp15.c2_base0); - qemu_put_be32(f, env->cp15.c2_base1); - qemu_put_be32(f, env->cp15.c2_mask); - qemu_put_be32(f, env->cp15.c2_data); - qemu_put_be32(f, env->cp15.c2_insn); - qemu_put_be32(f, env->cp15.c3); - qemu_put_be32(f, env->cp15.c5_insn); - qemu_put_be32(f, env->cp15.c5_data); - for (i = 0; i < 8; i++) { - qemu_put_be32(f, env->cp15.c6_region[i]); - } - qemu_put_be32(f, env->cp15.c6_insn); - qemu_put_be32(f, env->cp15.c6_data); - qemu_put_be32(f, env->cp15.c9_insn); - qemu_put_be32(f, env->cp15.c9_data); - qemu_put_be32(f, env->cp15.c13_fcse); - qemu_put_be32(f, env->cp15.c13_context); - qemu_put_be32(f, env->cp15.c13_tls1); - qemu_put_be32(f, env->cp15.c13_tls2); - qemu_put_be32(f, env->cp15.c13_tls3); - qemu_put_be32(f, env->cp15.c15_cpar); - - qemu_put_be32(f, env->features); - - if (arm_feature(env, ARM_FEATURE_VFP)) { - for (i = 0; i < 16; i++) { - CPU_DoubleU u; - u.d = env->vfp.regs[i]; - qemu_put_be32(f, u.l.upper); - qemu_put_be32(f, u.l.lower); - } - for (i = 0; i < 16; i++) { - qemu_put_be32(f, env->vfp.xregs[i]); - } - - /* TODO: Should use proper FPSCR access functions. */ - qemu_put_be32(f, env->vfp.vec_len); - qemu_put_be32(f, env->vfp.vec_stride); - - if (arm_feature(env, ARM_FEATURE_VFP3)) { - for (i = 16; i < 32; i++) { - CPU_DoubleU u; - u.d = env->vfp.regs[i]; - qemu_put_be32(f, u.l.upper); - qemu_put_be32(f, u.l.lower); - } - } - } - - if (arm_feature(env, ARM_FEATURE_IWMMXT)) { - for (i = 0; i < 16; i++) { - qemu_put_be64(f, env->iwmmxt.regs[i]); - } - for (i = 0; i < 16; i++) { - qemu_put_be32(f, env->iwmmxt.cregs[i]); - } - } - - if (arm_feature(env, ARM_FEATURE_M)) { - qemu_put_be32(f, env->v7m.other_sp); - qemu_put_be32(f, env->v7m.vecbase); - qemu_put_be32(f, env->v7m.basepri); - qemu_put_be32(f, env->v7m.control); - qemu_put_be32(f, env->v7m.current_sp); - qemu_put_be32(f, env->v7m.exception); - } -} - -int cpu_load(QEMUFile *f, void *opaque, int version_id) -{ - CPUARMState *env = (CPUARMState *)opaque; - int i; - - if (version_id != ARM_CPU_SAVE_VERSION) - return -EINVAL; - - for (i = 0; i < 16; i++) { - env->regs[i] = qemu_get_be32(f); - } - cpsr_write(env, qemu_get_be32(f), 0xffffffff); - env->spsr = qemu_get_be32(f); - for (i = 0; i < 6; i++) { - env->banked_spsr[i] = qemu_get_be32(f); - env->banked_r13[i] = qemu_get_be32(f); - env->banked_r14[i] = qemu_get_be32(f); - } - for (i = 0; i < 5; i++) { - env->usr_regs[i] = qemu_get_be32(f); - env->fiq_regs[i] = qemu_get_be32(f); - } - env->cp15.c0_cpuid = qemu_get_be32(f); - env->cp15.c0_cachetype = qemu_get_be32(f); - env->cp15.c1_sys = qemu_get_be32(f); - env->cp15.c1_coproc = qemu_get_be32(f); - env->cp15.c1_xscaleauxcr = qemu_get_be32(f); - env->cp15.c2_base0 = qemu_get_be32(f); - env->cp15.c2_base1 = qemu_get_be32(f); - env->cp15.c2_mask = qemu_get_be32(f); - env->cp15.c2_data = qemu_get_be32(f); - env->cp15.c2_insn = qemu_get_be32(f); - env->cp15.c3 = qemu_get_be32(f); - env->cp15.c5_insn = qemu_get_be32(f); - env->cp15.c5_data = qemu_get_be32(f); - for (i = 0; i < 8; i++) { - env->cp15.c6_region[i] = qemu_get_be32(f); - } - env->cp15.c6_insn = qemu_get_be32(f); - env->cp15.c6_data = qemu_get_be32(f); - env->cp15.c9_insn = qemu_get_be32(f); - env->cp15.c9_data = qemu_get_be32(f); - env->cp15.c13_fcse = qemu_get_be32(f); - env->cp15.c13_context = qemu_get_be32(f); - env->cp15.c13_tls1 = qemu_get_be32(f); - env->cp15.c13_tls2 = qemu_get_be32(f); - env->cp15.c13_tls3 = qemu_get_be32(f); - env->cp15.c15_cpar = qemu_get_be32(f); - - env->features = qemu_get_be32(f); - - if (arm_feature(env, ARM_FEATURE_VFP)) { - for (i = 0; i < 16; i++) { - CPU_DoubleU u; - u.l.upper = qemu_get_be32(f); - u.l.lower = qemu_get_be32(f); - env->vfp.regs[i] = u.d; - } - for (i = 0; i < 16; i++) { - env->vfp.xregs[i] = qemu_get_be32(f); - } - - /* TODO: Should use proper FPSCR access functions. */ - env->vfp.vec_len = qemu_get_be32(f); - env->vfp.vec_stride = qemu_get_be32(f); - - if (arm_feature(env, ARM_FEATURE_VFP3)) { - for (i = 0; i < 16; i++) { - CPU_DoubleU u; - u.l.upper = qemu_get_be32(f); - u.l.lower = qemu_get_be32(f); - env->vfp.regs[i] = u.d; - } - } - } - - if (arm_feature(env, ARM_FEATURE_IWMMXT)) { - for (i = 0; i < 16; i++) { - env->iwmmxt.regs[i] = qemu_get_be64(f); - } - for (i = 0; i < 16; i++) { - env->iwmmxt.cregs[i] = qemu_get_be32(f); - } - } - - if (arm_feature(env, ARM_FEATURE_M)) { - env->v7m.other_sp = qemu_get_be32(f); - env->v7m.vecbase = qemu_get_be32(f); - env->v7m.basepri = qemu_get_be32(f); - env->v7m.control = qemu_get_be32(f); - env->v7m.current_sp = qemu_get_be32(f); - env->v7m.exception = qemu_get_be32(f); - } - - return 0; -} - -#else - -//#warning No CPU save/restore functions - -#endif - -/***********************************************************/ /* ram save/restore */ static int ram_get_page(QEMUFile *f, uint8_t *buf, int len) @@ -7988,71 +7437,6 @@ static void read_passwords(void) } } -/* XXX: currently we cannot use simultaneously different CPUs */ -static void register_machines(void) -{ -#if defined(TARGET_I386) - qemu_register_machine(&pc_machine); - qemu_register_machine(&isapc_machine); -#elif defined(TARGET_PPC) - qemu_register_machine(&heathrow_machine); - qemu_register_machine(&core99_machine); - qemu_register_machine(&prep_machine); - qemu_register_machine(&ref405ep_machine); - qemu_register_machine(&taihu_machine); -#elif defined(TARGET_MIPS) - qemu_register_machine(&mips_machine); - qemu_register_machine(&mips_malta_machine); - qemu_register_machine(&mips_pica61_machine); - qemu_register_machine(&mips_mipssim_machine); -#elif defined(TARGET_SPARC) -#ifdef TARGET_SPARC64 - qemu_register_machine(&sun4u_machine); -#else - qemu_register_machine(&ss5_machine); - qemu_register_machine(&ss10_machine); - qemu_register_machine(&ss600mp_machine); - qemu_register_machine(&ss20_machine); - qemu_register_machine(&ss2_machine); - qemu_register_machine(&voyager_machine); - qemu_register_machine(&ss_lx_machine); - qemu_register_machine(&ss4_machine); - qemu_register_machine(&scls_machine); - qemu_register_machine(&sbook_machine); - qemu_register_machine(&ss1000_machine); - qemu_register_machine(&ss2000_machine); -#endif -#elif defined(TARGET_ARM) - qemu_register_machine(&integratorcp_machine); - qemu_register_machine(&versatilepb_machine); - qemu_register_machine(&versatileab_machine); - qemu_register_machine(&realview_machine); - qemu_register_machine(&akitapda_machine); - qemu_register_machine(&spitzpda_machine); - qemu_register_machine(&borzoipda_machine); - qemu_register_machine(&terrierpda_machine); - qemu_register_machine(&palmte_machine); - qemu_register_machine(&lm3s811evb_machine); - qemu_register_machine(&lm3s6965evb_machine); - qemu_register_machine(&connex_machine); - qemu_register_machine(&verdex_machine); - qemu_register_machine(&mainstone2_machine); -#elif defined(TARGET_SH4) - qemu_register_machine(&shix_machine); - qemu_register_machine(&r2d_machine); -#elif defined(TARGET_ALPHA) - /* XXX: TODO */ -#elif defined(TARGET_M68K) - qemu_register_machine(&mcf5208evb_machine); - qemu_register_machine(&an5206_machine); - qemu_register_machine(&dummy_m68k_machine); -#elif defined(TARGET_CRIS) - qemu_register_machine(&bareetraxfs_machine); -#else -#error unsupported CPU -#endif -} - #ifdef HAS_AUDIO struct soundhw soundhw[] = { #ifdef HAS_AUDIO_CHOICE -- 1.5.0.6 |
From: Anthony L. <ali...@us...> - 2008-04-29 19:10:22
|
This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. Since v1, I've taken into account Andrea's suggestions at using VM_PFNMAP instead of VM_IO and changed the BUG_ON to a return of bad_page. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1d7991a..64e5efe 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -532,6 +532,7 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) struct page *page[1]; unsigned long addr; int npages; + pfn_t pfn; might_sleep(); @@ -544,19 +545,35 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) npages = get_user_pages(current, current->mm, addr, 1, 1, 1, page, NULL); - if (npages != 1) { - get_page(bad_page); - return page_to_pfn(bad_page); - } + if (unlikely(npages != 1)) { + struct vm_area_struct *vma; - return page_to_pfn(page[0]); + vma = find_vma(current->mm, addr); + if (vma == NULL || addr >= vma->vm_start || + !(vma->vm_flags & VM_PFNMAP)) { + get_page(bad_page); + return page_to_pfn(bad_page); + } + + pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; + BUG_ON(pfn_valid(pfn)); + } else + pfn = page_to_pfn(page[0]); + + return pfn; } EXPORT_SYMBOL_GPL(gfn_to_pfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } EXPORT_SYMBOL_GPL(gfn_to_page); @@ -569,7 +586,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(pfn_t pfn) { - put_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -594,21 +612,25 @@ EXPORT_SYMBOL_GPL(kvm_set_page_dirty); void kvm_set_pfn_dirty(pfn_t pfn) { - struct page *page = pfn_to_page(pfn); - if (!PageReserved(page)) - SetPageDirty(page); + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + if (!PageReserved(page)) + SetPageDirty(page); + } } EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(pfn_t pfn) { - mark_page_accessed(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); void kvm_get_pfn(pfn_t pfn) { - get_page(pfn_to_page(pfn)); + if (pfn_valid(pfn)) + get_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_get_pfn); |
From: Anthony L. <an...@co...> - 2008-04-29 18:17:15
|
Laurent Vivier wrote: > Le mardi 29 avril 2008 à 11:41 -0500, Anthony Liguori a écrit : > >> Guillaume Thouvenin wrote: >> >>> Hello, >>> >>> This patch should solve the problem observed during protected mode >>> transitions that appears for example during the installation of >>> openSuse-10.3. Unfortunately there is an issue that crashes >>> kvm-userspace. I'm not sure if it's a problem introduced by the >>> patch or if the patch is good and raises a new issue. >>> >>> >> You still aren't emulating the instructions correctly I think. Running >> your patch, I see: >> >> [ 979.755349] Failed vm entry (exit reason 0x21) invalid guest state >> [ 979.755354] emulation at (46e4b) rip 6e0b: ea 10 6e 18 >> [ 979.755358] successfully emulated instruction >> [ 979.756105] Failed vm entry (exit reason 0x21) invalid guest state >> [ 979.756109] emulation at (46e50) rip 6e10: 66 b8 20 00 >> [ 979.756111] successfully emulated instruction >> [ 979.756749] Failed vm entry (exit reason 0x21) invalid guest state >> [ 979.756752] emulation at (46e54) rip 6e14: 8e d8 8c d0 >> [ 979.756755] successfully emulated instruction >> [ 979.757427] Failed vm entry (exit reason 0x21) invalid guest state >> [ 979.757430] emulation at (46e56) rip 6e16: 8c d0 81 e4 >> [ 979.757433] successfully emulated instruction >> [ 979.758074] Failed vm entry (exit reason 0x21) invalid guest state >> [ 979.758077] emulation at (46e58) rip 6e18: 81 e4 ff ff >> >> >> The corresponding gfxboot code is: >> >> 16301 00006E0B EA[106E]1800 jmp >> pm_seg.prog_c32:switch_to_pm_20 >> 16302 switch_to_pm_20: >> 16303 >> 16304 bits 32 >> 16305 >> 16306 00006E10 66B82000 mov ax,pm_seg.prog_d16 >> 16307 00006E14 8ED8 mov ds,ax >> 16308 >> 16309 00006E16 8CD0 mov eax,ss >> 16310 00006E18 81E4FFFF0000 and esp,0ffffh >> >> >> The VT state should be correct after executing instruction an RIP 6E16 >> (mov eax, ss). The next instruction should not cause a vmentry >> > > Are you sure ? It is intel notation (opcode dst,src) , so it updates > eax, not ss. Guillaumes gives us (with gdb notation, opcode src,dst): > You're right, it's a fair bit down the code before the ss move happens. Regards, Anthony Liguori > 0x0000000000046e53: ljmp $0x18,$0x6e18 > > 0x0000000000046e58: mov $0x20,%ax > > %EAX = 0x20 > > 0x0000000000046e5c: mov %eax,%ds > > %DS = 0x20 > > 0x0000000000046e5e: mov %ss,%eax > > %EAX = %SS = 0x53E1 (in this particular case) > > For me the issue is with instructions with "dst.byte = 0". > for instance: > > 0x0000000000046e66: shl $0x4,%eax > > [82768.003174] emulation at (46e66) rip 6e26: c1 e0 04 01 > [82768.035153] writeback: dst.byte 0 > [82768.055174] writeback: dst.ptr 0x0000000000000000 > [82768.087177] writeback: dst.val 0x53e1 > [82768.111178] writeback: src.ptr 0x0000000000006e28 > [82768.143157] writeback: src.val 0x4 > > So my questions are: > > Why dst.val is not 0x53e10 ? > Why dst.byte is 0 ? > > >> failure. The fact that it is for you indicates that you're not updating >> guest state correctly. >> >> My guess would be that load_segment_descriptor is not updating the >> values within the VMCS. >> >> Regards, >> >> Anthony Liguori >> > > Regards > Laurent > |
From: Anthony L. <an...@co...> - 2008-04-29 18:16:59
|
Guillaume Thouvenin wrote: > Hello, > > It's strange because handle_vmentry_failure() is not called. I'm trying > to see where is the problem, any comments are welcome > [ 979.761321] handle_exception: unexpected, vectoring info 0x80000306 intr info 0x80000b0d Is the error I'm seeing. Regards, Anthony Liguori > Regards, > Guillaume > > > > arch/x86/kvm/vmx.c | 68 +++++++++++++++++++++++++++ > arch/x86/kvm/vmx.h | 3 + > arch/x86/kvm/x86.c | 12 ++-- > arch/x86/kvm/x86_emulate.c | 112 +++++++++++++++++++++++++++++++++++++++++++-- > include/asm-x86/kvm_host.h | 4 + > 5 files changed, 190 insertions(+), 9 deletions(-) > > --- > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 79cdbe8..a0a13b8 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -1272,7 +1272,9 @@ static void enter_pmode(struct kvm_vcpu *vcpu) > fix_pmode_dataseg(VCPU_SREG_GS, &vcpu->arch.rmode.gs); > fix_pmode_dataseg(VCPU_SREG_FS, &vcpu->arch.rmode.fs); > > +#if 0 > vmcs_write16(GUEST_SS_SELECTOR, 0); > +#endif > vmcs_write32(GUEST_SS_AR_BYTES, 0x93); > > vmcs_write16(GUEST_CS_SELECTOR, > @@ -2635,6 +2637,66 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > return 1; > } > > +static int invalid_guest_state(struct kvm_vcpu *vcpu, > + struct kvm_run *kvm_run, u32 failure_reason) > +{ > + u16 ss, cs; > + u8 opcodes[4]; > + unsigned long rip = vcpu->arch.rip; > + unsigned long rip_linear; > + > + ss = vmcs_read16(GUEST_SS_SELECTOR); > + cs = vmcs_read16(GUEST_CS_SELECTOR); > + > + if ((ss & 0x03) != (cs & 0x03)) { > + int err; > + rip_linear = rip + vmx_get_segment_base(vcpu, VCPU_SREG_CS); > + emulator_read_std(rip_linear, (void *)opcodes, 4, vcpu); > + printk(KERN_INFO "emulation at (%lx) rip %lx: %02x %02x %02x %02x\n", > + rip_linear, > + rip, opcodes[0], opcodes[1], opcodes[2], opcodes[3]); > + err = emulate_instruction(vcpu, kvm_run, 0, 0, 0); > + switch (err) { > + case EMULATE_DONE: > + printk(KERN_INFO "successfully emulated instruction\n"); > + return 1; > + case EMULATE_DO_MMIO: > + printk(KERN_INFO "mmio?\n"); > + return 0; > + default: > + kvm_report_emulation_failure(vcpu, "vmentry failure"); > + break; > + } > + } > + > + kvm_run->exit_reason = KVM_EXIT_UNKNOWN; > + kvm_run->hw.hardware_exit_reason = failure_reason; > + return 0; > +} > + > +static int handle_vmentry_failure(struct kvm_vcpu *vcpu, > + struct kvm_run *kvm_run, > + u32 failure_reason) > +{ > + unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); > + > + printk(KERN_INFO "Failed vm entry (exit reason 0x%x) ", failure_reason); > + switch (failure_reason) { > + case EXIT_REASON_INVALID_GUEST_STATE: > + printk("invalid guest state \n"); > + return invalid_guest_state(vcpu, kvm_run, failure_reason); > + case EXIT_REASON_MSR_LOADING: > + printk("caused by MSR entry %ld loading.\n", exit_qualification); > + break; > + case EXIT_REASON_MACHINE_CHECK: > + printk("caused by machine check.\n"); > + break; > + default: > + printk("reason not known yet!\n"); > + break; > + } > + return 0; > +} > /* > * The exit handlers return 1 if the exit was handled fully and guest execution > * may resume. Otherwise they set the kvm_run parameter to indicate what needs > @@ -2696,6 +2758,12 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) > exit_reason != EXIT_REASON_EPT_VIOLATION)) > printk(KERN_WARNING "%s: unexpected, valid vectoring info and " > "exit reason is 0x%x\n", __func__, exit_reason); > + > + if ((exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY)) { > + exit_reason &= ~VMX_EXIT_REASONS_FAILED_VMENTRY; > + return handle_vmentry_failure(vcpu, kvm_run, exit_reason); > + } > + > if (exit_reason < kvm_vmx_max_exit_handlers > && kvm_vmx_exit_handlers[exit_reason]) > return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run); > diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h > index 79d94c6..2cebf48 100644 > --- a/arch/x86/kvm/vmx.h > +++ b/arch/x86/kvm/vmx.h > @@ -238,7 +238,10 @@ enum vmcs_field { > #define EXIT_REASON_IO_INSTRUCTION 30 > #define EXIT_REASON_MSR_READ 31 > #define EXIT_REASON_MSR_WRITE 32 > +#define EXIT_REASON_INVALID_GUEST_STATE 33 > +#define EXIT_REASON_MSR_LOADING 34 > #define EXIT_REASON_MWAIT_INSTRUCTION 36 > +#define EXIT_REASON_MACHINE_CHECK 41 > #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 > #define EXIT_REASON_APIC_ACCESS 44 > #define EXIT_REASON_EPT_VIOLATION 48 > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 578a0c1..9e5d687 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3027,8 +3027,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) > return 0; > } > > -static void get_segment(struct kvm_vcpu *vcpu, > - struct kvm_segment *var, int seg) > +void get_segment(struct kvm_vcpu *vcpu, > + struct kvm_segment *var, int seg) > { > kvm_x86_ops->get_segment(vcpu, var, seg); > } > @@ -3111,8 +3111,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, > return 0; > } > > -static void set_segment(struct kvm_vcpu *vcpu, > - struct kvm_segment *var, int seg) > +void set_segment(struct kvm_vcpu *vcpu, > + struct kvm_segment *var, int seg) > { > kvm_x86_ops->set_segment(vcpu, var, seg); > } > @@ -3270,8 +3270,8 @@ static int load_segment_descriptor_to_kvm_desct(struct kvm_vcpu *vcpu, > return 0; > } > > -static int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, > - int type_bits, int seg) > +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, > + int type_bits, int seg) > { > struct kvm_segment kvm_seg; > > diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c > index 2ca0838..f6b9dad 100644 > --- a/arch/x86/kvm/x86_emulate.c > +++ b/arch/x86/kvm/x86_emulate.c > @@ -138,7 +138,8 @@ static u16 opcode_table[256] = { > /* 0x88 - 0x8F */ > ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov, > ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, > - 0, ModRM | DstReg, 0, Group | Group1A, > + DstMem | SrcReg | ModRM | Mov, ModRM | DstReg, > + DstReg | SrcMem | ModRM | Mov, Group | Group1A, > /* 0x90 - 0x9F */ > 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0, > @@ -152,7 +153,8 @@ static u16 opcode_table[256] = { > ByteOp | ImplicitOps | Mov | String, ImplicitOps | Mov | String, > ByteOp | ImplicitOps | String, ImplicitOps | String, > /* 0xB0 - 0xBF */ > - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > + 0, 0, 0, 0, 0, 0, 0, 0, > + DstReg | SrcImm | Mov, 0, 0, 0, 0, 0, 0, 0, > /* 0xC0 - 0xC7 */ > ByteOp | DstMem | SrcImm | ModRM, DstMem | SrcImmByte | ModRM, > 0, ImplicitOps | Stack, 0, 0, > @@ -168,7 +170,7 @@ static u16 opcode_table[256] = { > /* 0xE0 - 0xE7 */ > 0, 0, 0, 0, 0, 0, 0, 0, > /* 0xE8 - 0xEF */ > - ImplicitOps | Stack, SrcImm|ImplicitOps, 0, SrcImmByte|ImplicitOps, > + ImplicitOps | Stack, SrcImm | ImplicitOps, ImplicitOps, SrcImmByte | ImplicitOps, > 0, 0, 0, 0, > /* 0xF0 - 0xF7 */ > 0, 0, 0, 0, > @@ -1511,14 +1513,90 @@ special_insn: > break; > case 0x88 ... 0x8b: /* mov */ > goto mov; > + case 0x8c: { /* mov r/m, sreg */ > + struct kvm_segment segreg; > + > + if (c->modrm_mod == 0x3) > + c->src.val = c->modrm_val; > + > + switch ( c->modrm_reg ) { > + case 0: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_ES); > + break; > + case 1: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_CS); > + break; > + case 2: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_SS); > + break; > + case 3: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_DS); > + break; > + case 4: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_FS); > + break; > + case 5: > + get_segment(ctxt->vcpu, &segreg, VCPU_SREG_GS); > + break; > + default: > + printk(KERN_INFO "0x8c: Invalid segreg in modrm byte 0x%02x\n", > + c->modrm); > + goto cannot_emulate; > + } > + c->dst.val = segreg.selector; > + c->dst.bytes = 2; > + c->dst.ptr = (unsigned long *)decode_register(c->modrm_rm, c->regs, > + c->d & ByteOp); > + break; > + } > case 0x8d: /* lea r16/r32, m */ > c->dst.val = c->modrm_ea; > break; > + case 0x8e: { /* mov seg, r/m16 */ > + uint16_t sel; > + > + sel = c->src.val; > + switch ( c->modrm_reg ) { > + case 0: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_ES) < 0) > + goto cannot_emulate; > + break; > + case 1: > + if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) > + goto cannot_emulate; > + break; > + case 2: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_SS) < 0) > + goto cannot_emulate; > + break; > + case 3: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_DS) < 0) > + goto cannot_emulate; > + break; > + case 4: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_FS) < 0) > + goto cannot_emulate; > + break; > + case 5: > + if (load_segment_descriptor(ctxt->vcpu, sel, 1, VCPU_SREG_GS) < 0) > + goto cannot_emulate; > + break; > + default: > + printk(KERN_INFO "Invalid segreg in modrm byte 0x%02x\n", > + c->modrm); > + goto cannot_emulate; > + } > + > + c->dst.type = OP_NONE; /* Disable writeback. */ > + break; > + } > case 0x8f: /* pop (sole member of Grp1a) */ > rc = emulate_grp1a(ctxt, ops); > if (rc != 0) > goto done; > break; > + case 0xb8: /* mov r, imm */ > + goto mov; > case 0x9c: /* pushf */ > c->src.val = (unsigned long) ctxt->eflags; > emulate_push(ctxt); > @@ -1657,6 +1735,34 @@ special_insn: > break; > } > case 0xe9: /* jmp rel */ > + jmp_rel(c, c->src.val); > + c->dst.type = OP_NONE; /* Disable writeback. */ > + break; > + case 0xea: /* jmp far */ { > + uint32_t eip; > + uint16_t sel; > + > + switch (c->op_bytes) { > + case 2: > + eip = insn_fetch(u16, 2, c->eip); > + eip = eip & 0x0000FFFF; /* clear upper 16 bits */ > + break; > + case 4: > + eip = insn_fetch(u32, 4, c->eip); > + break; > + default: > + DPRINTF("jmp far: Invalid op_bytes\n"); > + goto cannot_emulate; > + } > + sel = insn_fetch(u16, 2, c->eip); > + if (load_segment_descriptor(ctxt->vcpu, sel, 9, VCPU_SREG_CS) < 0) { > + DPRINTF("jmp far: Failed to load CS descriptor\n"); > + goto cannot_emulate; > + } > + > + c->eip = eip; > + break; > + } > case 0xeb: /* jmp rel short */ > jmp_rel(c, c->src.val); > c->dst.type = OP_NONE; /* Disable writeback. */ > diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h > index 4baa9c9..7a0846a 100644 > --- a/include/asm-x86/kvm_host.h > +++ b/include/asm-x86/kvm_host.h > @@ -495,6 +495,10 @@ int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, > int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, > unsigned long value); > > +void set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); > +void get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); > +int load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, > + int type_bits, int seg); > int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason); > > void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); > |
From: Laurent V. <Lau...@bu...> - 2008-04-29 17:22:20
|
Le mardi 29 avril 2008 à 19:09 +0200, Laurent Vivier a écrit : > Le mardi 29 avril 2008 à 11:41 -0500, Anthony Liguori a écrit : > > Guillaume Thouvenin wrote: > > > Hello, > > > > > > This patch should solve the problem observed during protected mode > > > transitions that appears for example during the installation of > > > openSuse-10.3. Unfortunately there is an issue that crashes > > > kvm-userspace. I'm not sure if it's a problem introduced by the > > > patch or if the patch is good and raises a new issue. > > > > > > > You still aren't emulating the instructions correctly I think. Running > > your patch, I see: > > > > [ 979.755349] Failed vm entry (exit reason 0x21) invalid guest state > > [ 979.755354] emulation at (46e4b) rip 6e0b: ea 10 6e 18 > > [ 979.755358] successfully emulated instruction > > [ 979.756105] Failed vm entry (exit reason 0x21) invalid guest state > > [ 979.756109] emulation at (46e50) rip 6e10: 66 b8 20 00 > > [ 979.756111] successfully emulated instruction > > [ 979.756749] Failed vm entry (exit reason 0x21) invalid guest state > > [ 979.756752] emulation at (46e54) rip 6e14: 8e d8 8c d0 > > [ 979.756755] successfully emulated instruction > > [ 979.757427] Failed vm entry (exit reason 0x21) invalid guest state > > [ 979.757430] emulation at (46e56) rip 6e16: 8c d0 81 e4 > > [ 979.757433] successfully emulated instruction > > [ 979.758074] Failed vm entry (exit reason 0x21) invalid guest state > > [ 979.758077] emulation at (46e58) rip 6e18: 81 e4 ff ff > > > > > > The corresponding gfxboot code is: > > > > 16301 00006E0B EA[106E]1800 jmp > > pm_seg.prog_c32:switch_to_pm_20 > > 16302 switch_to_pm_20: > > 16303 > > 16304 bits 32 > > 16305 > > 16306 00006E10 66B82000 mov ax,pm_seg.prog_d16 > > 16307 00006E14 8ED8 mov ds,ax > > 16308 > > 16309 00006E16 8CD0 mov eax,ss > > 16310 00006E18 81E4FFFF0000 and esp,0ffffh > > > > > > The VT state should be correct after executing instruction an RIP 6E16 > > (mov eax, ss). The next instruction should not cause a vmentry > > Are you sure ? It is intel notation (opcode dst,src) , so it updates > eax, not ss. Guillaumes gives us (with gdb notation, opcode src,dst): > > 0x0000000000046e53: ljmp $0x18,$0x6e18 > > 0x0000000000046e58: mov $0x20,%ax > > %EAX = 0x20 > > 0x0000000000046e5c: mov %eax,%ds > > %DS = 0x20 > > 0x0000000000046e5e: mov %ss,%eax > > %EAX = %SS = 0x53E1 (in this particular case) > > For me the issue is with instructions with "dst.byte = 0". > for instance: > > 0x0000000000046e66: shl $0x4,%eax > > [82768.003174] emulation at (46e66) rip 6e26: c1 e0 04 01 > [82768.035153] writeback: dst.byte 0 > [82768.055174] writeback: dst.ptr 0x0000000000000000 > [82768.087177] writeback: dst.val 0x53e1 > [82768.111178] writeback: src.ptr 0x0000000000006e28 > [82768.143157] writeback: src.val 0x4 > > So my questions are: > > Why dst.val is not 0x53e10 ? I can answer myself to this one: emulate_2op_SrcB("sal", c->src, c->dst, ctxt->eflags); does nothing if dst.byte == 0 So next question is the good question... > Why dst.byte is 0 ? > > > failure. The fact that it is for you indicates that you're not updating > > guest state correctly. > > > > My guess would be that load_segment_descriptor is not updating the > > values within the VMCS. > > > > Regards, > > > > Anthony Liguori > > Regards > Laurent -- ------------- Lau...@bu... --------------- "The best way to predict the future is to invent it." - Alan Kay |
From: Laurent V. <Lau...@bu...> - 2008-04-29 17:09:39
|
Le mardi 29 avril 2008 à 11:41 -0500, Anthony Liguori a écrit : > Guillaume Thouvenin wrote: > > Hello, > > > > This patch should solve the problem observed during protected mode > > transitions that appears for example during the installation of > > openSuse-10.3. Unfortunately there is an issue that crashes > > kvm-userspace. I'm not sure if it's a problem introduced by the > > patch or if the patch is good and raises a new issue. > > > > You still aren't emulating the instructions correctly I think. Running > your patch, I see: > > [ 979.755349] Failed vm entry (exit reason 0x21) invalid guest state > [ 979.755354] emulation at (46e4b) rip 6e0b: ea 10 6e 18 > [ 979.755358] successfully emulated instruction > [ 979.756105] Failed vm entry (exit reason 0x21) invalid guest state > [ 979.756109] emulation at (46e50) rip 6e10: 66 b8 20 00 > [ 979.756111] successfully emulated instruction > [ 979.756749] Failed vm entry (exit reason 0x21) invalid guest state > [ 979.756752] emulation at (46e54) rip 6e14: 8e d8 8c d0 > [ 979.756755] successfully emulated instruction > [ 979.757427] Failed vm entry (exit reason 0x21) invalid guest state > [ 979.757430] emulation at (46e56) rip 6e16: 8c d0 81 e4 > [ 979.757433] successfully emulated instruction > [ 979.758074] Failed vm entry (exit reason 0x21) invalid guest state > [ 979.758077] emulation at (46e58) rip 6e18: 81 e4 ff ff > > > The corresponding gfxboot code is: > > 16301 00006E0B EA[106E]1800 jmp > pm_seg.prog_c32:switch_to_pm_20 > 16302 switch_to_pm_20: > 16303 > 16304 bits 32 > 16305 > 16306 00006E10 66B82000 mov ax,pm_seg.prog_d16 > 16307 00006E14 8ED8 mov ds,ax > 16308 > 16309 00006E16 8CD0 mov eax,ss > 16310 00006E18 81E4FFFF0000 and esp,0ffffh > > > The VT state should be correct after executing instruction an RIP 6E16 > (mov eax, ss). The next instruction should not cause a vmentry Are you sure ? It is intel notation (opcode dst,src) , so it updates eax, not ss. Guillaumes gives us (with gdb notation, opcode src,dst): 0x0000000000046e53: ljmp $0x18,$0x6e18 0x0000000000046e58: mov $0x20,%ax %EAX = 0x20 0x0000000000046e5c: mov %eax,%ds %DS = 0x20 0x0000000000046e5e: mov %ss,%eax %EAX = %SS = 0x53E1 (in this particular case) For me the issue is with instructions with "dst.byte = 0". for instance: 0x0000000000046e66: shl $0x4,%eax [82768.003174] emulation at (46e66) rip 6e26: c1 e0 04 01 [82768.035153] writeback: dst.byte 0 [82768.055174] writeback: dst.ptr 0x0000000000000000 [82768.087177] writeback: dst.val 0x53e1 [82768.111178] writeback: src.ptr 0x0000000000006e28 [82768.143157] writeback: src.val 0x4 So my questions are: Why dst.val is not 0x53e10 ? Why dst.byte is 0 ? > failure. The fact that it is for you indicates that you're not updating > guest state correctly. > > My guess would be that load_segment_descriptor is not updating the > values within the VMCS. > > Regards, > > Anthony Liguori Regards Laurent -- ------------- Lau...@bu... --------------- "The best way to predict the future is to invent it." - Alan Kay |
From: David M. <dm...@ma...> - 2008-04-29 16:55:59
|
Guillaume Thouvenin wrote: > Hello, > > This patch should solve the problem observed during protected mode > transitions that appears for example during the installation of > openSuse-10.3. Unfortunately there is an issue that crashes > kvm-userspace. I'm not sure if it's a problem introduced by the > patch or if the patch is good and raises a new issue. > > Here is what I'm doing: > > 1) Remove the SS patching that modifies SS_SELECTOR in enter_pmode() > to see vmentry failure. > 2) Add the handler that catches the VMentry failure. It is called > handle_vmentry_failure() > 3) while CS.RPL != SS.RPL, emulate the instruction. > 4) Add the emulation of "ljmp", "mov r, imm", "mov sreg, r/m16" and > "mov r/m16, sreg" that have respectively opcode 0xea, 0xb8, 0x8e and > 0x8c. > > Normally, it should be sufficient to boot openSuse-10.3 because > instructions that need to be emulated are: > > 0x0000000000046e53: ljmp $0x18,$0x6e18 > 0x0000000000046e58: mov $0x20,%ax > 0x0000000000046e5c: mov %eax,%ds > 0x0000000000046e5e: mov %ss,%eax > 0x0000000000046e60: and $0xffff,%esp > 0x0000000000046e66: shl $0x4,%eax > 0x0000000000046e69: add %eax,%esp > 0x0000000000046e6b: mov $0x8,%ax > 0x0000000000046e6f: mov %eax,%ss > > At this point, cs.rpl is equal to ss.rpl. > > I added trace in handle_vmentry_failure() and also in writeback() to > see what functions are emulated and I observe: > <snip trace> > > So everything seems ok but after the emulation of "mov %eax,%ss" > instruction, it seems that cs.rpl == ss.rpl but the guest is still in a > VT-unfriendly state because I have the following error in kvm-userspace: > > [guill@enterprise][~/local/kvm-userspace.git/bin]$ ./qemu-system-x86_64 > -hda ~/disk_images/hd_50G.qcow2 > -cdrom /images_iso/openSUSE-10.3-GM-x86_64-mini.iso -boot d -s -m 1024 > > exception 13 (33) > rax 0000000000000673 rbx 0000000000800000 rcx 0000000000000000 > rdx 00000000000013ca rsi 0000000000055e1c rdi 0000000000055e1d > rsp 00000000fffa0080 rbp 000000000000200b r8 0000000000000000 > r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 > r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 > r15 0000000000000000 rip 000000000000b071 rflags 00033092 > cs 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > ds 4004 (00040040/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > es 00ff (00000ff0/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > ss ff11 (000ff110/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > fs 3002 (00030020/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) > tr 0000 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) > ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) > gdt 40920/47 idt 0/ffff cr0 10 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 > code: 17 06 29 4b 01 18 eb 18 a8 25 aa 19 28 4c 01 28 4d 01 01 17 --> > 0f 17 0f 01 17 0f 17 12 01 17 2c 25 4b 19 21 00 02 17 1a 94 0a 76 67 61 > 3d 30 78 25 78 20 Aborted My memory of x86 protected mode is flaky so I apologise if this is wasted time. Are we looking at the runtime registers for the VM or the registers for the host? Isn't PE clear in CR0 (which I think is real mode and there should be no cpl or rpl). If this is in protected mode (or cpl/rpl are a carried over as a side effect of big real mode), are you sure cs.rpl == ss.rpl? I think I read cs.rpl == 0 and ss.rpl == 1. The opcode with the exception is pop %ss I believe (assuming 32 bit code). Is the value dumped for ss the value loaded by the pop or the value from before the pop? I think cpl is zero and I thought it was ok for code at some cpl to use selectors with rpls equal to its cpl or lower (higher rpl number). That made me wonder if the loaded ss is not the value shown but the value that would have been loaded by the pop. In which case I wonder if it would be a selector for an invalid descriptor. It's a shame we don't see the stack. Beyond that I risk confusion so I'll leave it there, I hope it helps. --- David Mair. |
From: Anthony L. <an...@co...> - 2008-04-29 16:41:49
|
Guillaume Thouvenin wrote: > Hello, > > This patch should solve the problem observed during protected mode > transitions that appears for example during the installation of > openSuse-10.3. Unfortunately there is an issue that crashes > kvm-userspace. I'm not sure if it's a problem introduced by the > patch or if the patch is good and raises a new issue. > You still aren't emulating the instructions correctly I think. Running your patch, I see: [ 979.755349] Failed vm entry (exit reason 0x21) invalid guest state [ 979.755354] emulation at (46e4b) rip 6e0b: ea 10 6e 18 [ 979.755358] successfully emulated instruction [ 979.756105] Failed vm entry (exit reason 0x21) invalid guest state [ 979.756109] emulation at (46e50) rip 6e10: 66 b8 20 00 [ 979.756111] successfully emulated instruction [ 979.756749] Failed vm entry (exit reason 0x21) invalid guest state [ 979.756752] emulation at (46e54) rip 6e14: 8e d8 8c d0 [ 979.756755] successfully emulated instruction [ 979.757427] Failed vm entry (exit reason 0x21) invalid guest state [ 979.757430] emulation at (46e56) rip 6e16: 8c d0 81 e4 [ 979.757433] successfully emulated instruction [ 979.758074] Failed vm entry (exit reason 0x21) invalid guest state [ 979.758077] emulation at (46e58) rip 6e18: 81 e4 ff ff The corresponding gfxboot code is: 16301 00006E0B EA[106E]1800 jmp pm_seg.prog_c32:switch_to_pm_20 16302 switch_to_pm_20: 16303 16304 bits 32 16305 16306 00006E10 66B82000 mov ax,pm_seg.prog_d16 16307 00006E14 8ED8 mov ds,ax 16308 16309 00006E16 8CD0 mov eax,ss 16310 00006E18 81E4FFFF0000 and esp,0ffffh The VT state should be correct after executing instruction an RIP 6E16 (mov eax, ss). The next instruction should not cause a vmentry failure. The fact that it is for you indicates that you're not updating guest state correctly. My guess would be that load_segment_descriptor is not updating the values within the VMCS. Regards, Anthony Liguori |
From: Jerone Y. <jy...@us...> - 2008-04-29 16:39:41
|
On Tue, 2008-04-29 at 10:06 -0500, Hollis Blanchard wrote: > On Monday 28 April 2008 16:23:04 Jerone Young wrote: > > +/* This function is to manipulate a cell with multiple values */ > > +void dt_cell_multi(void *fdt, char *node_path, char *property, > > + uint32_t *val_array, int size) > > +{ > > + > > + int offset; > > + int ret; > > Could you please be more careful with your whitespace? Hmmm..I'm looking at the patch on my local machine and it doesn't have any whitespace damage. If there is whitespace damage it was caused by something else (like hg email is doing something). I've attached the orginal patch to this email. > |
From: Jan K. <jan...@si...> - 2008-04-29 16:09:49
|
Joerg Roedel wrote: > On Tue, Apr 29, 2008 at 03:07:25PM +0200, Jan Kiszka wrote: >> Hi, >> >> looks like we are getting better and better here in hitting yet >> unsupported corner-case features of KVM :). This time our guest fiddles >> with hardware debugging registers, but quickly gets unhappy as they do >> not yet have the expected effect. > > KVM is mostly tested with guests that run with paging. So a 16 bit > protected mode guest is not tested very well :) Yes, we know (we also had a bit fun with stock QEMU in corner cases). But that may change now... :) > >> Joerg, I found you SVM-related patch series in the archive which does >> not seem to have raised much responses. Is this general direction OK? >> Does it allow self-debugging of guests? But how are conflicts resolved >> if both guest and host need the physical registers (host debugging the >> guest which is debugging itself)? > > I sent a patchset in the past to enable guest debugging for SVM which > means debugging the guest from outside using gdb. But I was not able to > test these patches because the userspace side of guest debugging is > broken in the kvm-qemu. > Debugging in the guest should work without problems. The debug registers > are switched between guest and host if the guest uses them. So there > should be no problems when the guest and the host using the debug > registers. I'm currently digging my way through the current VMX code, but I cannot confirm this. Not sure what SVM does, but as far as I understood the VMX side, only DR7 is saved/restored in hardware. The rest is KVM's job. Unfortunately the access to the real debug registers only happens "if (vcpu->guest_debug.enabled)". And as all DR accesses of the guest are trapped, but the desired transfers to/from guest registers are nops, this cannot work yet, at least on VMX. This still leaves me with the question how to handle the case when the host sets and arms some debug registers to debug the guest and the latter does the same to debug itself. Guest access will be trapped, OK, but KVM will then have to decide which value should actually be transfered into the registers. Hmm, does SVM virtualizes all debug registers, leaving the real ones to the host? > >> I would try to dig into the VMX side if the general architecture is >> -mostly- clear. [ Sorry, Joerg, someone put the latter type of HW on my >> desk :->. Hope I can once check our stuff against SVM as well! ] > > With some debug output from SVM I can better help to debug your > problems ;-) I'm sure :). But I guess this topic has a few common aspects to be solved, too. So we may ideally end up with a single series of debug-enabling patches for KVM (maybe even fixing userland - we are not totally unfamiliar with the gdbstub here). Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
From: Andrea A. <an...@qu...> - 2008-04-29 16:04:00
|
On Tue, Apr 29, 2008 at 10:50:30AM -0500, Robin Holt wrote: > You have said this continually about a CONFIG option. I am unsure how > that could be achieved. Could you provide a patch? I'm busy with the reserved ram patch against 2.6.25 and latest kvm.git that is moving from pages to pfn for pci passthrough (that change will also remove the page pin with mmu notifiers). Unfortunately reserved-ram bugs out again in the blk-settings.c on real hardware. The fix I pushed in .25 for it, works when booting kvm (that's how I tested it) but on real hardware sata b_pfn happens to be 1 page less than the result of the min comparison and I'll have to figure out what happens (only .24 code works on real hardware..., at least my fix is surely better than the previous .25-pre code). I've other people waiting on that reserved-ram to be working, so once I've finished, I'll do the optimization to anon-vma (at least the removal of the unnecessary atomic_inc from fork) and add the config option. Christoph if you've interest in evolving anon-vma-sem and i_mmap_sem yourself in this direction, you're very welcome to go ahead while I finish sorting out reserved-ram. If you do, please let me know so we don't duplicate effort, and it'd be absolutely great if the patches could be incremental with #v14 so I can merge them trivially later and upload a new patchset once you're finished (the only outstanding fix you have to apply on top of #v14 that is already integrated in my patchset, is the i_mmap_sem deadlock fix I posted and that I'm sure you've already applied on top of #v14 before doing any more development on it). Thanks! |
From: Amit S. <ami...@qu...> - 2008-04-29 16:02:46
|
On Tuesday 29 April 2008 20:14:16 Glauber Costa wrote: > Amit Shah wrote: > > +static struct kvm_pv_dma_map* > > +find_pci_pv_dmap(struct list_head *head, dma_addr_t dma) > > +{ > > might be better to prefix those functions with kvm? Even though they are > static, it seems to be the current practice. The function names are long enough already. Prefixing everything with kvm_ could hurt the eye as well. > > + host_page = gfn_to_page(vcpu->kvm, page_gfn); > > you need mmap_sem held for read to use gfn_to_page. Yes; it's going to trickle down soon. > > + /* FIXME: guest should send the direction */ > > + r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL); > > + if (r) { > > + r = npages; > > + *hcall_page = sg[0].dma_address | (*hcall_page & ~PAGE_MASK); > > + } > > + > > + out_unmap: > > + if (!r) > > + *hcall_page = bad_dma_address; > > + kunmap(host_page); > > + out: > > + ++vcpu->stat.hypercall_map; > > + return r; > > + out_unmap_sg_dmap: > > + kfree(dmap); > > + out_unmap_sg: > > + kfree(sg); > > + goto out_unmap; > > those backwards goto are very clumsy. Might be better to give it a > further attention in order to avoid id. It does keep everything nicely in one place though. You're right though. Some more attention is needed. > > +static int free_dmap(struct kvm_pv_dma_map *dmap, struct list_head > > *head) +{ > > + int i; > > + > > + if (!dmap) > > + return 1; > > that's ugly. > > it's better to keep the free function with free-like semantics: just a > void function that plainly returns if !dmap, and check in the caller. I was lazy and used the return value from here to propagate it down further. But this kicked me to modify that. > > + if (is_error_page(host_page)) { > > + printk(KERN_INFO "%s: gfn %p not valid\n", > > + __func__, (void *)page_gfn); > > + r = -1; > > r = -1 is not really informative. Better use some meaningful error. The error's going to the guest. The guest, as we know, has already done a successful DMA allocation. Something went wrong in the hypercall, and we don't know why (bad page). Any kind of error here isn't going to be intelligible to the guest anyway. It's mostly a host thing if we ever hit this. > > + if (find_pci_pt_dev(&vcpu->kvm->arch.pci_pt_dev_head, > > + &pci_pt_info, 0, KVM_PT_SOURCE_ASSIGN)) > > + r++; /* We have assigned the device */ > > + > > + kunmap(host_page); > > better use atomic mappings here. We can't use atomic mappings for guest pages. They can be swapped out. |
From: Robin H. <ho...@sg...> - 2008-04-29 15:50:51
|
> I however doubt this will bring us back to the same performance of the > current spinlock version, as the real overhead should come out of > overscheduling in down_write ai anon_vma_link. Here an initially > spinning lock would help but that's gray area, it greatly depends on > timings, and on very large systems where a cacheline wait with many > cpus forking at the same time takes more than scheduling a semaphore > may not slowdown performance that much. So I think the only way is a > configuration option to switch the locking at compile time, then XPMEM > will depend on that option to be on, I don't see a big deal and this > guarantees embedded isn't screwed up by totally unnecessary locks on UP. You have said this continually about a CONFIG option. I am unsure how that could be achieved. Could you provide a patch? Thanks, Robin |
From: Andrea A. <an...@qu...> - 2008-04-29 15:31:54
|
On Mon, Apr 28, 2008 at 06:28:06PM -0700, Christoph Lameter wrote: > On Tue, 29 Apr 2008, Andrea Arcangeli wrote: > > > Frankly I've absolutely no idea why rcu is needed in all rmap code > > when walking the page->mapping. Definitely the PG_locked is taken so > > there's no way page->mapping could possibly go away under the rmap > > code, hence the anon_vma can't go away as it's queued in the vma, and > > the vma has to go away before the page is zapped out of the pte. > > zap_pte_range can race with the rmap code and it does not take the page > lock. The page may not go away since a refcount was taken but the mapping > can go away. Without RCU you have no guarantee that the anon_vma is > existing when you take the lock. There's some room for improvement, like using down_read_trylock, if that succeeds we don't need to increase the refcount and we can keep the rcu_read_lock held instead. Secondly we don't need to increase the refcount in fork() when we queue the vma-copy in the anon_vma. You should init the refcount to 1 when the anon_vma is allocated, remove the atomic_inc from all code (except when down_read_trylock fails) and then change anon_vma_unlink to: up_write(&anon_vma->sem); if (empty) put_anon_vma(anon_vma); While the down_read_trylock surely won't help in AIM, the second change will reduce a bit the overhead in the VM core fast paths by avoiding all refcounting changes by checking the list_empty the same way the current code does. I really like how I designed the garbage collection through list_empty and that's efficient and I'd like to keep it. I however doubt this will bring us back to the same performance of the current spinlock version, as the real overhead should come out of overscheduling in down_write ai anon_vma_link. Here an initially spinning lock would help but that's gray area, it greatly depends on timings, and on very large systems where a cacheline wait with many cpus forking at the same time takes more than scheduling a semaphore may not slowdown performance that much. So I think the only way is a configuration option to switch the locking at compile time, then XPMEM will depend on that option to be on, I don't see a big deal and this guarantees embedded isn't screwed up by totally unnecessary locks on UP. |
From: Anthony L. <ali...@us...> - 2008-04-29 15:20:28
|
This patch eliminates the use of sigtimedwait() in the IO thread. To avoid the signal/select race condition, we use a pipe that we write to in the signal handlers. This was suggested by Rusty and seems to work well. There are a lot of cleanup/simplification opportunities with this but I've limited it just to the signal masking/eating routines. We've got at least one live lock left in the code that I haven't yet identified. My goal is to get this all a lot simplier though so that it's easier to fix the remaining lock-ups. I'm looking for some feedback that this is a sane direction. I haven't tested this enough yet so please don't apply it. Signed-off-by: Anthony Liguori <ali...@us...> diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 9a9bf59..46d7425 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -7,6 +7,9 @@ */ #include "config.h" #include "config-host.h" +#include "qemu-common.h" +#include "block.h" +#include "console.h" int kvm_allowed = 1; int kvm_irqchip = 1; @@ -38,14 +41,6 @@ __thread struct vcpu_info *vcpu; static int qemu_system_ready; -struct qemu_kvm_signal_table { - sigset_t sigset; - sigset_t negsigset; -}; - -static struct qemu_kvm_signal_table io_signal_table; -static struct qemu_kvm_signal_table vcpu_signal_table; - #define SIG_IPI (SIGRTMIN+4) struct vcpu_info { @@ -169,53 +164,37 @@ static int has_work(CPUState *env) return kvm_arch_has_work(env); } -static int kvm_process_signal(int si_signo) -{ - struct sigaction sa; - - switch (si_signo) { - case SIGUSR2: - pthread_cond_signal(&qemu_aio_cond); - break; - case SIGALRM: - case SIGIO: - sigaction(si_signo, NULL, &sa); - sa.sa_handler(si_signo); - break; - } - - return 1; -} - -static int kvm_eat_signal(struct qemu_kvm_signal_table *waitset, CPUState *env, - int timeout) +static int kvm_eat_signal(CPUState *env, int timeout) { struct timespec ts; int r, e, ret = 0; siginfo_t siginfo; + sigset_t waitset; + sigemptyset(&waitset); + sigaddset(&waitset, SIG_IPI); ts.tv_sec = timeout / 1000; ts.tv_nsec = (timeout % 1000) * 1000000; - r = sigtimedwait(&waitset->sigset, &siginfo, &ts); + qemu_kvm_unlock(); + r = sigtimedwait(&waitset, &siginfo, &ts); + qemu_kvm_lock(env); + cpu_single_env = env; if (r == -1 && (errno == EAGAIN || errno == EINTR) && !timeout) return 0; e = errno; - pthread_mutex_lock(&qemu_mutex); if (env && vcpu) cpu_single_env = vcpu->env; if (r == -1 && !(errno == EAGAIN || errno == EINTR)) { printf("sigtimedwait: %s\n", strerror(e)); exit(1); } - if (r != -1) - ret = kvm_process_signal(siginfo.si_signo); + ret = 1; if (env && vcpu_info[env->cpu_index].stop) { vcpu_info[env->cpu_index].stop = 0; vcpu_info[env->cpu_index].stopped = 1; pthread_kill(io_thread, SIGUSR1); } - pthread_mutex_unlock(&qemu_mutex); return ret; } @@ -224,24 +203,20 @@ static int kvm_eat_signal(struct qemu_kvm_signal_table *waitset, CPUState *env, static void kvm_eat_signals(CPUState *env, int timeout) { int r = 0; - struct qemu_kvm_signal_table *waitset = &vcpu_signal_table; - while (kvm_eat_signal(waitset, env, 0)) + while (kvm_eat_signal(env, 0)) r = 1; if (!r && timeout) { - r = kvm_eat_signal(waitset, env, timeout); + r = kvm_eat_signal(env, timeout); if (r) - while (kvm_eat_signal(waitset, env, 0)) + while (kvm_eat_signal(env, 0)) ; } } static void kvm_main_loop_wait(CPUState *env, int timeout) { - pthread_mutex_unlock(&qemu_mutex); kvm_eat_signals(env, timeout); - pthread_mutex_lock(&qemu_mutex); - cpu_single_env = env; vcpu_info[env->cpu_index].signalled = 0; } @@ -263,12 +238,8 @@ static void pause_all_threads(void) vcpu_info[i].stop = 1; pthread_kill(vcpu_info[i].thread, SIG_IPI); } - while (!all_threads_paused()) { - pthread_mutex_unlock(&qemu_mutex); - kvm_eat_signal(&io_signal_table, NULL, 1000); - pthread_mutex_lock(&qemu_mutex); - cpu_single_env = NULL; - } + while (!all_threads_paused()) + main_loop_wait(10); } static void resume_all_threads(void) @@ -391,18 +362,6 @@ static void *ap_main_loop(void *_env) return NULL; } -static void qemu_kvm_init_signal_table(struct qemu_kvm_signal_table *sigtab) -{ - sigemptyset(&sigtab->sigset); - sigfillset(&sigtab->negsigset); -} - -static void kvm_add_signal(struct qemu_kvm_signal_table *sigtab, int signum) -{ - sigaddset(&sigtab->sigset, signum); - sigdelset(&sigtab->negsigset, signum); -} - void kvm_init_new_ap(int cpu, CPUState *env) { pthread_create(&vcpu_info[cpu].thread, NULL, ap_main_loop, env); @@ -411,28 +370,12 @@ void kvm_init_new_ap(int cpu, CPUState *env) pthread_cond_wait(&qemu_vcpu_cond, &qemu_mutex); } -static void qemu_kvm_init_signal_tables(void) -{ - qemu_kvm_init_signal_table(&io_signal_table); - qemu_kvm_init_signal_table(&vcpu_signal_table); - - kvm_add_signal(&io_signal_table, SIGIO); - kvm_add_signal(&io_signal_table, SIGALRM); - kvm_add_signal(&io_signal_table, SIGUSR1); - kvm_add_signal(&io_signal_table, SIGUSR2); - - kvm_add_signal(&vcpu_signal_table, SIG_IPI); - - sigprocmask(SIG_BLOCK, &io_signal_table.sigset, NULL); -} - int kvm_init_ap(void) { #ifdef TARGET_I386 kvm_tpr_opt_setup(); #endif qemu_add_vm_change_state_handler(kvm_vm_state_change_handler, NULL); - qemu_kvm_init_signal_tables(); signal(SIG_IPI, sig_ipi_handler); return 0; @@ -450,8 +393,67 @@ void qemu_kvm_notify_work(void) * while processing in main_loop_wait(). */ +static int kvm_sigfd[2] = {-1, -1}; +static void (*sigalrm_handler)(int signo); +static void (*sigio_handler)(int signo); + +static void sig_aio_handler(int signum) +{ + if (kvm_sigfd[1] != -1) { + ssize_t len; + + len = write(kvm_sigfd[1], &signum, sizeof(signum)); + } +} + +static void sig_aio_fd_read(void *opaque) +{ + int signum; + ssize_t len; + + do { + len = read(kvm_sigfd[0], &signum, sizeof(signum)); + } while (len == -1 && errno == EINTR); + + if (len != 4) + abort(); + + switch (signum) { + case SIGUSR2: + pthread_cond_signal(&qemu_aio_cond); + break; + case SIGALRM: + sigalrm_handler(signum); + break; + case SIGIO: + sigio_handler(signum); + break; + } +} + int kvm_main_loop(void) { + struct sigaction sa; + + sigaction(SIGALRM, NULL, &sa); + sigalrm_handler = sa.sa_handler; + + sigaction(SIGIO, NULL, &sa); + sigio_handler = sa.sa_handler; + + signal(SIGUSR1, sig_aio_handler); + signal(SIGUSR2, sig_aio_handler); + signal(SIGALRM, sig_aio_handler); + signal(SIGIO, sig_aio_handler); + + if (pipe(kvm_sigfd) == -1) + abort(); + + fcntl(kvm_sigfd[0], F_SETFL, O_NONBLOCK); + fcntl(kvm_sigfd[1], F_SETFL, O_NONBLOCK); + + qemu_set_fd_handler2(kvm_sigfd[0], NULL, sig_aio_fd_read, NULL, NULL); + io_thread = pthread_self(); qemu_system_ready = 1; pthread_mutex_unlock(&qemu_mutex); @@ -459,10 +461,8 @@ int kvm_main_loop(void) pthread_cond_broadcast(&qemu_system_cond); while (1) { - kvm_eat_signal(&io_signal_table, NULL, 1000); pthread_mutex_lock(&qemu_mutex); - cpu_single_env = NULL; - main_loop_wait(0); + main_loop_wait(10); if (qemu_shutdown_requested()) break; else if (qemu_powerdown_requested()) @@ -834,10 +834,7 @@ void qemu_kvm_aio_wait(void) CPUState *cpu_single = cpu_single_env; if (!cpu_single_env) { - pthread_mutex_unlock(&qemu_mutex); - kvm_eat_signal(&io_signal_table, NULL, 1000); - pthread_mutex_lock(&qemu_mutex); - cpu_single_env = NULL; + main_loop_wait(10); } else { pthread_cond_wait(&qemu_aio_cond, &qemu_mutex); cpu_single_env = cpu_single; @@ -864,3 +861,17 @@ void kvm_cpu_destroy_phys_mem(target_phys_addr_t start_addr, { kvm_destroy_phys_mem(kvm_context, start_addr, size); } + +void qemu_kvm_lock(CPUState *env) +{ + pthread_mutex_lock(&qemu_mutex); + cpu_single_env = env; +} + +CPUState *qemu_kvm_unlock(void) +{ + CPUState *env = cpu_single_env; + pthread_mutex_unlock(&qemu_mutex); + return env; +} + diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 024a653..2c2b0be 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -74,6 +74,9 @@ int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf); void qemu_kvm_system_reset_request(void); +void qemu_kvm_lock(CPUState *env); +CPUState *qemu_kvm_unlock(void); + #ifdef TARGET_PPC int handle_powerpc_dcr_read(int vcpu, uint32_t dcrn, uint32_t *data); int handle_powerpc_dcr_write(int vcpu,uint32_t dcrn, uint32_t data); @@ -97,4 +100,16 @@ extern kvm_context_t kvm_context; #define qemu_kvm_pit_in_kernel() (0) #endif +static inline void kvm_pre_select(void) +{ + if (kvm_enabled()) + qemu_kvm_unlock(); +} + +static inline void kvm_post_select(void) +{ + if (kvm_enabled()) + qemu_kvm_lock(NULL); +} + #endif diff --git a/qemu/vl.c b/qemu/vl.c index 74be059..cf7677d 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -7919,7 +7919,9 @@ void main_loop_wait(int timeout) } #endif moreio: + kvm_pre_select(); ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); + kvm_post_select(); if (ret > 0) { IOHandlerRecord **pioh; int more = 0; |
From: Anthony L. <ali...@us...> - 2008-04-29 15:15:39
|
Andrea Arcangeli wrote: > On Tue, Apr 29, 2008 at 09:32:09AM -0500, Anthony Liguori wrote: > >> + vma = find_vma(current->mm, addr); >> + if (vma == NULL) { >> + get_page(bad_page); >> + return page_to_pfn(bad_page); >> + } >> > > Here you must check vm_start address, find_vma only checks addr < > vm_end but there's no guarantee addr >= vm_start yet. > Indeed. >> + >> + BUG_ON(!(vma->vm_flags & VM_IO)); >> > > For consistency we should return bad_page and not bug on, VM_IO and > VM_PFNMAP can theoretically not be set at the same time, otherwise > get_user_pages would be buggy checking against VM_PFNMAP|VM_IO. I > doubt anybody isn't setting VM_IO before calling remap_pfn_range but > anyway... > > Secondly the really correct check is against VM_PFNMAP. This is > because PFNMAP is set at the same time of vm_pgoff = pfn. VM_IO is not > even if in theory if a driver uses ->fault instead of remap_pfn_range, > shouldn't set VM_IO and it should only set VM_RESERVED. VM_IO is about > keeping gdb/coredump out as they could mess with the hardware if they > read, PFNMAP is about remap_pfn_range having been called and pgoff > pointing to the first pfn mapped at vm_start address. > Good point. I've updated the patch. Will send out again once I've gotten to test it. Regards, Anthony Liguori > Patch is in the right direction, way to go! > |
From: Ryan H. <ry...@us...> - 2008-04-29 15:10:53
|
* Anthony Liguori <ali...@us...> [2008-04-28 17:30]: > We hold qemu_mutex while machine->init() executes, which issues a VCPU create. > We need to make sure to not return from the VCPU creation until the VCPU > file descriptor is valid to ensure that APIC creation succeeds. > > However, we also need to make sure that the VCPU thread doesn't start running > until the machine->init() is complete. This is addressed today because the > VCPU thread tries to grab the qemu_mutex before doing anything interesting. > If we release qemu_mutex to wait for VCPU creation, then we open a window for > a race to occur. > > This patch introduces two wait conditions. The first lets the VCPU create > code that runs in the IO thread to wait for a VCPU to initialize. The second > condition lets the VCPU thread wait for the machine to fully initialize before > running. > > An added benefit of this patch is it makes the dependencies now explicit. > > Signed-off-by: Anthony Liguori <ali...@us...> This patch passed the same tests I used on the other: 64 1VCPU guest launch, 1 second apart, and a 16-way SMP guest boot. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ry...@us... |