Fast SSH file transfers with HPN patches

This is a common problem: you have some big files (for example a disk image) to transfer over a Gigabig Ethernet link and it takes too much time with SCP/SFTP . Also you probably don’t want to bother installing an ftp server, then what’s the answer?

It’s called HPN-SSH and it’s a patchset you can apply on top of ssh. Is basically provides dynamic window, the none cipher and the multi-threaded MT-AES-CTR cipher. Obviously with the none cipher you will get no encryption, but that’s not a problem for a point-to-point Gigabit Ethernet link.

First I tested the maximum network speed I can achieve using FTP: 111.5 MB/s (0.87 GB/s)

1706560496 bytes received in 14,6 secs (1,1e+05 Kbytes/sec)

Then I tested the speed with SCP and the none cipher: 95.7 MB/s (0.75 GB/s)

scp -4 -o NoneSwitch=yes -o NoneEnabled=yes /mnt/ram/big root@<ip>:/mnt/ram/big

Finally the speed with SFTP and the none cipher: 81.4 MB/s (0.64 GB/s)

sftp -4 -o NoneSwitch=yes -o NoneEnabled=yes root@<ip>

I used a random generated 1.6 GB file for the tests, also all transfers are from ram to ram to avoid disk bottlenecks.

mount -t ramfs -o size=1640m ramfs /mnt/ram

If you use Debian Squeeze amd64 you can easily install SSH-HPN using my repository. Then add the following to your /etc/apt/preferences (replace at and dot)

Package: openssh-client openssh-server
Pin: release o=Niccolo Belli <darkbasic(a.t.)linuxsystems(d.o.t.)it>
Pin-Priority: 1001

To allow the use of the none cipher add NoneEnabled yes to your /etc/ssh/sshd_config, then restart ssh.

10 comments to Fast SSH file transfers with HPN patches

  • Astara

    Note…with samba and win7, I get 125MB/s writing to the server
    and 119MB/s reading from the server.

    That’s with a 1Gb connection (no encryption)….but with samba overhead.

    However the story is different with a 20G connection — on write’s, the smbd process on the other end tops out at 100% cpu and limits me to 300-400MB/s. Reads are in similar range — max in the low 400′s.

    Needless to say — I forgot about trying to get performance over ssh with 1Gb, as 125MB/s writes is the maximum theoretical!…

    Now with a 20G connect (2x10G bonded), I’m getting a fraction of the bandwidth… so looking at sshd again— just need to find a windows client…. or try to get the ssh rpm to compile under cygwin…urg…

    Not alot of luck so far…

    • darkbasic

      Except for ram to ram tests I’m pretty sure 2x10G bonded for file transfers is pretty uselesss because of the hard disk bottleneck, why do you need it?

  • Astara

    ??HD Bottle Neck?

    Local HD speeds are 1.1GB/s R/W (max, linear, pre-allocated space), w/smaller R/W+seeks causing degeneration below that) — which means 10G could theoretically satisfy those limits, but for whatever reason, using only 1 of the two 10G channels (dual cards are small increment over single IF cards) dropped perf to barely better than 1Gb cards.

    I have no illusions that any part of my stack is tuned correctly for 10Gb cards, let alone 2x10Gb, but I regularly get 200-300MB/s in peak file transfers to or from Windows in non-benchmark usage.

    My fastest local HD is a 3-stripe wide RAID0 of 4-stripe wide RAID5′s (aka a 12 stripe wide (15 disks, w/parity) RAID50) w/2GB SATA’s. My slowest linear, but fastest seek RAID uses a 2-stripe RAID5 of 15K-SAS that I limit to a 50% short-stroke (using only the 1st half of the disks) to cut the seek speed) which I use for the OS..

    I’m pretty sure I don’t know how to optimally configure the 2x10b’s for optimal performance, but see the write-bottleneck being in how samba is configured (smb/cifs uses 1 process to handle all of a client’s R/W’s, and
    the smb process on my client-write test hits 100% cpu and sticks there for
    the duration of the write. The linux samba code doesn’t lend itself, easily to using more than 1 core/client.

    Trying to do testing with more than one client writer from the same machine is hindered by Win7 only allowing 1 person to be logged in on the
    desktop. I might be able to get multiple writers if I used multiple userid’s and had some ssh’d into the win7 machine using cygwin, but that’s so far from my normal work case, it seems pointless to test.

    So I need to find out if there is anyway to cut the cpu usage of the
    single-client writer AND find out if there is anyway of optimizing the single-reader case — as the Samba server process is only about 20-30%
    cpu bound in that case.

    Was hoping to upgrade disk subsystem, but dollar has fallen as fast or faster than disk prices over the past 4-5 years due to the Bush/Fed Bank Bailout-giveaway. ;-( As a result disk prices haven’t fallen at their previous rates (usually dropping about 50%/GB storage capacity / 18 months) — instead, it’s been fairly flat for the past 4 years, maybe increasing a bit.

  • Rob Fantini

    to check where the bottleneck is , try using atop

    # aptitude install atop

    then
    # atop

    and
    # man atop

    we just started using a self compiled hpn-ssh on wheezy, and atop has started to show us source of slowness on transfers. for some nodes it is disks. others are on a slower switch.

  • Astara

    I isolated my speed test, APART from the disks.

    Disks are always blamed for slowness, so my own benching removes them from the equation (using reads from /dev/zero and writes to /dev/null on both ends).

    I wasn’t able to find “atop”… ntop, top, iotop, htop and a few more…but no atop.
    Also, I’m using direct connect cables.

    What I really want — aside from the fast file transfers that don’t seem to be that fast, is to be able to run a linux 3D graphical desktop over the net. Before you say impossible, at least consider that a 20Gb bus bandwidth, while not hot by today’s standards, isn’t bad when compared to normal remote-desktop bandwidth… It should be possible to do alot more, but seems like a latency is a big issue. 2nd, the interrupt driven nature is a problem (really need the card to DMA I/O directly into buffs like disk drivers do…

    My guess, though with the corps trying to move people to the clouds where they can be more easily controlled and monitored, we aren’t likely to see much work on making high-speed local connections faster in terms of latency…

    As far as samba goes… I turned up the optimization — turned on profiling, and created a version that used that profile to optimize the code — it made nearly no difference … but on the positive side, it was alot of work to make work! ;-/

    :-)

  • Ben

    Hi, I was wondering if it would be possible for you to update the HPN in your repo to OpenSSH 6.0. It would help me greatly as I really need HPN and I haven’t been able to compile a working deb with HPN and the debian patches myself. Thanks, Ben.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>