Menu

Restore stops and never finishes

Help
2010-02-24
2013-04-05
  • Steven Salinas

    Steven Salinas - 2010-02-24

    Hello,
    I have been able to create images with no issues, but when I am trying to restore, Partclone starts, and the restore will begin. Everything seems to be running fine, but it just stops…right now it stopped at 36% there are no errors, I just see the Partclone screen and the status bar stuck at 36 percent. Any ideas?

     
  • Steven Salinas

    Steven Salinas - 2010-02-25

    I am using Clonzilla SE. I have tried to restore diffrent images and I have tried using diffrent switches. I still end up with the same result. The restore starts, then randomly stops…sometimes during the sda1 restore, sometimes during sda3…all at random percentages of completion.

     
  • Steven Shiau

    Steven Shiau - 2010-02-27

    Since it's random, I guess the problem is on the hardware? Maybe it's RAM? Please try a memtest first.

    Steven.

     
  • Sřren F. Nielsen

    I'm experiencing a similar problem except it doesn't stop at random percentages. It's always the exact same percentage. It doesn't matter if I try cloning in multicast, broadcast or unicast mode.

    I'm using a Clonezilla server w/ Ubuntu 10.04, Clonezilla 2.3.6-19 and DRBL 1.9.6-15. The netcard used for cloning is a Realtek r8169 Gigabit w/ the driver from Realtek's website*. It's connected to a Gigabit switch and the machine I'm trying to clone is a newer Sony Vaio (can't remember the exact model, but will post it if relevant) with a Marvell Yukon 88E8059 PCI-E Gigabit Ethernet controller. Since this netcard isn't supported until kernel 2.6.33, I've installed 2.6.34 from http://kernel.ubuntu.com/~kernel-ppa/mainline/.

    Everything goes well until a fixed percentage into the largest partition from the image. I've tried 3 different images all with the same result - it freezes at a fixed percentage.
    1) 18 GB partition freezes at 89.95%
    2) 24 GB partition freezes at ~53% (can't remember the exact percentage)
    3) 27 GB partition freezes at 51.00%

    An snip from the server syslog:

    Jun  1 13:29:43 clonezilla-server-02 mountd[1463]: authenticated mount request from 192.168.200.101:893 for /tftpboot/node_root (/tftpboot/node_root)
    Jun  1 13:29:48 clonezilla-server-02 mountd[1463]: authenticated mount request from 192.168.200.101:756 for /usr (/usr)
    Jun  1 13:29:50 clonezilla-server-02 mountd[1463]: authenticated mount request from 192.168.200.101:689 for /home (/home)
    Jun  1 13:29:50 clonezilla-server-02 mountd[1463]: authenticated mount request from 192.168.200.101:962 for /images (/images)
    Jun  1 13:30:02 clonezilla-server-02 udpcast[3624]: New connection from 192.168.200.101  (#0)
    Jun  1 13:30:02 clonezilla-server-02 udpcast[3624]: first connection: min wait[0] secs - max wait[0] - min clients[1]
    Jun  1 13:30:04 clonezilla-server-02 udpcast[3624]: min receivers[1] reached: starting
    Jun  1 13:30:04 clonezilla-server-02 udpcast[3624]: Starting transfer: file[] pipe[] port[2232] if[eth1] participants[1]
    Jun  1 13:30:04 clonezilla-server-02 udpcast[3624]: Transfer complete.
    Jun  1 13:30:04 clonezilla-server-02 udpcast[3624]: Disconnecting #0 (192.168.200.101)
    Jun  1 13:30:27 clonezilla-server-02 udpcast[3720]: New connection from 192.168.200.101  (#0)
    Jun  1 13:30:27 clonezilla-server-02 udpcast[3720]: first connection: min wait[0] secs - max wait[0] - min clients[1]
    Jun  1 13:30:29 clonezilla-server-02 udpcast[3720]: min receivers[1] reached: starting
    Jun  1 13:30:29 clonezilla-server-02 udpcast[3720]: Starting transfer: file[] pipe[] port[2234] if[eth1] participants[1]
    Jun  1 13:41:03 clonezilla-server-02 udpcast[3720]: dropped client #0 because of timeout
    Jun  1 13:41:03 clonezilla-server-02 udpcast[3720]: Disconnecting #0 (192.168.200.101)
    Jun  1 13:41:03 clonezilla-server-02 udpcast[3720]: Transfer complete.
    Jun  1 13:43:37 clonezilla-server-02 mountd[1463]: Caught signal 15, un-registering and exiting.
    

    Perhaps I should try using the kernel driver for the r8169 netcard instead of the official Realtek, since it's no longer needed (see below). I am however experiencing very good transfer rates (usually around 2.2GB/min, but I've seen it at 3.78GB/min.) with the official Realtek driver.

    I've seen similar behavior on a older Clonezilla server we had setup, but the freezing percentages were random and we narrowed down the issue to flaky netcards in the laptops, we were cloning back then.

    Can you recommend other things to try to resolve my issue?

    *) I'm using the official Realtek driver in an attempt to avoid a problem with having both an r8169 and an r8168 card in the server (resulting in a Page Allocation Failure). I've since then removed the r8168 netcard and replaced it with a r8139 100Mbps, since it's only used for administration via SSH, webbrowsing and upgrades.

     
  • Sřren F. Nielsen

    I forgot one question:
    Is there a way to check the log files on the client?

     
  • Sřren F. Nielsen

    I just tested one of the partition archives using the tip from this thread. It came out OK, so my freezing issue is not because of image corruption.

     
  • Steven Shiau

    Steven Shiau - 2010-06-02

    I guess the problem is on the kernel/hardware support issue…
    For this kind of case, unless you find the right driver, and make sure the hardware (especially the network card) is not buggy, otherwise it will happen randomly.
    As for how to check the log of client, you can ssh login the client via root account to check that. Remember to set the root account of client machines when you run "drblpush -i", or run:
    /opt/drbl/sbin/drbl-client-root-passwd
    to set that.

    Steven.

     
  • Sřren F. Nielsen

    Thanks again for your reply.

    I've ruled out the net card on the server side, as I've been able to restore a similar sized image to another (older) laptop with a different net card. The problematic laptop is a Sony VPCEB1M1E with a Marvell 88E8059 net card. The Linux kernel first got support for this card in version 2.6.33, but I'm experiencing this issue with both 2.6.34 and 2.6.35_rc1 (this version had a few fixes to the driver in question, but none that solved my problem).

    My efforts to SSH to the laptop has all failed with "Permission denied, please try again.". I've tried added SSH as a startup service with:

    /opt/drbl/sbin/drbl-client-service ssh on
    

    and tried to reset the client root password with

    /drbl/sbin/drbl-client-root-passwd
    

    , but I still get "Permission denied". What am I doing wrong here? Shall I reconfigure with /opt/drbl/sbin/drblpush to set the root password properly?

     
  • Steven Shiau

    Steven Shiau - 2010-06-05

    Which mode of DRBL/Clonezilla do you use?
    Please run /opt/drbl/bin/drbl-bug-report and post the generated file.
    Thanks.

    Steven.

     
  • Sřren F. Nielsen

    Hi Steven

    I think I've made some progress. I managed to successfully clone the troublesome laptop. It seems my problems aren't related to the netcards or drivers at all. I wanted to try the official driver for the netcard in the Sony Laptop, so I downloaded the driver from Marvell's website. I couldn't make the driver compile with the later kernels (2.6.34/35) due to various changes to the netdevice structures. I thus went with a 2.6.32 kernel. I set it as the active kernel for CloneZilla and tried to clone. It failed with an error about being unable to find the harddisk. I started to suspect the harddisk might have been damaged, so to test this theory I went back to the latest kernel (2.6.35). I tried starting CloneZilla with some extra options to facilitate error finding/debugging.
    I ended up with this command:

    /opt/drbl/sbin/drbl-ocs -b -g auto -e1 auto -e2 -x -v -nogui -r -e -j2 -f -p choose --clients-to-wait 1 -l en_US.UTF-8 startdisk multicast_restore <image_name> sda
    

    To my big surprise it successfully cloned the laptop. It no longer stopped and frooze at a fixed percentage. I was puzzled, but I tried different combinations of options and I managed to single out one that made a difference. If I omit "-nogui" it freezes, but if it's included it works just fine.
    Furthermore I found out why I couldn't SSH to the client. I had "PermitRootLogin no" in /etc/ssh/sshd_config for increased security. This was ofc meant for the SSH service on the server - not for the client. I didn't realize most files for the boot image were simply copied from the server.

    In case you want to look into the freezing problem (when omitting "-nogui"), this is the output of drbl-bug-report:

    Some info about the DRBL environment (PLEASE DO NOT EDIT THEM!):
    ===
    OS version: Ubuntu 10.04
    Server arch: i686
    Server CPU:  Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
    Server memory size: 2060864 kB
    Server Kernel version: 2.6.35-020635rc1-generic
    Installed DRBL-related packages:  drbl-1.9.6-15 clonezilla-2.3.6-19 mkswap-uuid-0.1.1-1 drbl-partimage-0.6.8-1drbl drbl-ntfsprogs-2.0.0-4 partclone-0.2.9-1drbl drbl-chntpw-0.0.20040818-7 drbl-lzop-1.02-0.8drbl pigz-2.1.5-1drbl pbzip2-1.0.5-1ubuntu1 mkpxeinitrd-net-1.5-3 udpcast-20100130-1drbl drbl-etherboot-5.4.3-2 gpxe-1.0.0-1drbl freedos-1.0-14drbl
    Client kernel version: 2.6.35-020635rc1-generic
    Client kernel arch: i586
    NICs with private IP address in server: eth0 eth1
    Private IP address in server: 10.1.0.73 192.168.200.1
    Total client no: 32
    Client IP address: 192.168.200.100 192.168.200.101 192.168.200.102 192.168.200.103 192.168.200.104 192.168.200.105 192.168.200.106 192.168.200.107 192.168.200.108 192.168.200.109 192.168.200.110 192.168.200.111 192.168.200.112 192.168.200.113 192.168.200.114 192.168.200.115 192.168.200.116 192.168.200.117 192.168.200.118 192.168.200.119 192.168.200.120 192.168.200.121 192.168.200.122 192.168.200.123 192.168.200.124 192.168.200.125 192.168.200.126 192.168.200.127 192.168.200.128 192.168.200.129 192.168.200.130 192.168.200.131
    ===
    

    Thank you for your help and your amazing work - it's highly appreciated.

    Kind regards

    Neurosis

     
  • Steven Shiau

    Steven Shiau - 2010-06-09

    So maybe the option "nogui" of partclone does not work very well with kernel 2.6.35.
    Could you please tell us where did you download the kernel 2.6.35? Is it a vanilla kernel or?

    Steven.

     
  • Sřren F. Nielsen

    I downloaded the kernel from here.

    A minor correction: the option "nogui" is needed to make partclone work in my case, but only with this particular Sony laptop (so far). I somehow doubt it's kernel 2.6.35, that's the culprit, as I had the same problem on various 2.6.34 kernels.
    I successfully cloned another laptop (MSI X-One - I can figure out the exact model if needed), with the exact same setup, but without the "nogui" option. The only obvious reason I can think of, that causes a difference between the two, is the netcards. I'm not sure what netcard the MSI has, but it was the netcard in the Sony that forced me to upgrade the kernel in the first place. The Sony has a Marvell Yukon2 88E8059 Gigabit and the MSI only has a 100 Mbps netcard.

    I don't know how the GUI of partclone is implemented, but I (properly wrongly) assume that it's multi-threaded, so one thread can read the data from the netcard and the other thread updates the GUI. Perhaps some weird race-condition stops the data-reader thread or perhaps it isn't multi-threaded in the first place and the GUI part of the code is buggy and causes an endlees loop and thereby effectively stopping the data-reading and causing a timeout.

    All this is purely wild speculation, but hopefully it'll trigger some neurons that'll lead to a solution. Please let me know, if I can provide any other information that could help solve this problem.

     
  • beazer

    beazer - 2010-06-11

    When running clonezilla, and creating the clone image file, it ran fine for awhile. It suddenly went into an error massage loop on hard drive block errors(i think) pretty cryptic error message.
    In any event it never proceeded beyond that point but kept repeating the same four error messages over and over until canceled.

    I was hoping that this utility would run at least as smoothly as Ghost8. What might I be doing wrong?
    Perhaps there are startup parameters that I need to know about?

    Thanks.

     
  • Sřren F. Nielsen

    @beazer697
    You should create a new thread, as you problem is unrelated to the issue being discussed in this thread.

     
  • rob macdonald

    rob macdonald - 2010-08-23

    Try running dcs with -v so you actually get usable info, i'm having horrific timeout problems with latitute 2110s, adding -v at least lets me see what's going wrong, no idea as to why, have tried different slice-size and max-bitrate, even tried broadcast, no full duplex, nothing seems to help, as the session progresses more and more clients timeout until the timeout notAnswered section shows all clients timeing out, and then it sits until i turn them off :(

     
  • Sřren F. Nielsen

    @phishy420:
    Not sure if you're answering beazer697, me or both of us, but assuming you're directing your answer at me:

    Thank you for your answer.

    I'm not actively working on the system ATM, since I only set it up and provide support for it. It is however being used (by a colleague) regularly without major problems. I circumvented my problem (see above) by using the "-nogui" option. I haven't got time to investigate the partclone gui problem further, and since the "-nogui" option circumvents it, I can't justify any time spent on solving the issue.
    If and when I look into the problem again, I will remember to increase verbosity level to help diagnose the actual problem.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.