Fast SSH file transfers with HPN patches

This is a common problem: you have some big files (for example a disk image) to transfer over a Gigabig Ethernet link and it takes too much time with SCP/SFTP . Also you probably don’t want to bother installing an ftp server, then what’s the answer?

It’s called HPN-SSH and it’s a patchset you can apply on top of ssh. Is basically provides dynamic window, the none cipher and the multi-threaded MT-AES-CTR cipher. Obviously with the none cipher you will get no encryption, but that’s not a problem for a point-to-point Gigabit Ethernet link.

First I tested the maximum network speed I can achieve using FTP: 111.5 MB/s (0.87 GB/s)

1706560496 bytes received in 14,6 secs (1,1e+05 Kbytes/sec)

Then I tested the speed with SCP and the none cipher: 95.7 MB/s (0.75 GB/s)

scp -4 -o NoneSwitch=yes -o NoneEnabled=yes /mnt/ram/big root@<ip>:/mnt/ram/big

Finally the speed with SFTP and the none cipher: 81.4 MB/s (0.64 GB/s)

sftp -4 -o NoneSwitch=yes -o NoneEnabled=yes root@<ip>

I used a random generated 1.6 GB file for the tests, also all transfers are from ram to ram to avoid disk bottlenecks.

mount -t ramfs -o size=1640m ramfs /mnt/ram

If you use Debian Squeeze amd64 you can easily install SSH-HPN using my repository. Then add the following to your /etc/apt/preferences (replace at and dot)

Package: openssh-client openssh-server
Pin: release o=Niccolo Belli <darkbasic(a.t.)linuxsystems(d.o.t.)it>
Pin-Priority: 1001

To allow the use of the none cipher add NoneEnabled yes to your /etc/ssh/sshd_config, then restart ssh.

3 comments to Fast SSH file transfers with HPN patches

  • Astara

    Note…with samba and win7, I get 125MB/s writing to the server
    and 119MB/s reading from the server.

    That’s with a 1Gb connection (no encryption)….but with samba overhead.

    However the story is different with a 20G connection — on write’s, the smbd process on the other end tops out at 100% cpu and limits me to 300-400MB/s. Reads are in similar range — max in the low 400′s.

    Needless to say — I forgot about trying to get performance over ssh with 1Gb, as 125MB/s writes is the maximum theoretical!…

    Now with a 20G connect (2x10G bonded), I’m getting a fraction of the bandwidth… so looking at sshd again— just need to find a windows client…. or try to get the ssh rpm to compile under cygwin…urg…

    Not alot of luck so far…

    • darkbasic

      Except for ram to ram tests I’m pretty sure 2x10G bonded for file transfers is pretty uselesss because of the hard disk bottleneck, why do you need it?

  • Astara

    ??HD Bottle Neck?

    Local HD speeds are 1.1GB/s R/W (max, linear, pre-allocated space), w/smaller R/W+seeks causing degeneration below that) — which means 10G could theoretically satisfy those limits, but for whatever reason, using only 1 of the two 10G channels (dual cards are small increment over single IF cards) dropped perf to barely better than 1Gb cards.

    I have no illusions that any part of my stack is tuned correctly for 10Gb cards, let alone 2x10Gb, but I regularly get 200-300MB/s in peak file transfers to or from Windows in non-benchmark usage.

    My fastest local HD is a 3-stripe wide RAID0 of 4-stripe wide RAID5′s (aka a 12 stripe wide (15 disks, w/parity) RAID50) w/2GB SATA’s. My slowest linear, but fastest seek RAID uses a 2-stripe RAID5 of 15K-SAS that I limit to a 50% short-stroke (using only the 1st half of the disks) to cut the seek speed) which I use for the OS..

    I’m pretty sure I don’t know how to optimally configure the 2x10b’s for optimal performance, but see the write-bottleneck being in how samba is configured (smb/cifs uses 1 process to handle all of a client’s R/W’s, and
    the smb process on my client-write test hits 100% cpu and sticks there for
    the duration of the write. The linux samba code doesn’t lend itself, easily to using more than 1 core/client.

    Trying to do testing with more than one client writer from the same machine is hindered by Win7 only allowing 1 person to be logged in on the
    desktop. I might be able to get multiple writers if I used multiple userid’s and had some ssh’d into the win7 machine using cygwin, but that’s so far from my normal work case, it seems pointless to test.

    So I need to find out if there is anyway to cut the cpu usage of the
    single-client writer AND find out if there is anyway of optimizing the single-reader case — as the Samba server process is only about 20-30%
    cpu bound in that case.

    Was hoping to upgrade disk subsystem, but dollar has fallen as fast or faster than disk prices over the past 4-5 years due to the Bush/Fed Bank Bailout-giveaway. ;-( As a result disk prices haven’t fallen at their previous rates (usually dropping about 50%/GB storage capacity / 18 months) — instead, it’s been fairly flat for the past 4 years, maybe increasing a bit.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>