Monday, September 12, 2011

Torrent download Cloud appliance

The Why



A friend of mine who's a Linux systems geek as well, was tasked with building a library of Linux distro ISOs, this involved downloading tens of ISOs many of which were only offered in torrent form. Even if that were not the case, it would still be good practice to download such large binaries from torrent to avoid loading a certain mirror too much. Anyway, we were chatting about it, and since where I live bandwidth (especially upload) is a scarce resource, he was considering paying some service to download the torrents he needed and convert them to HTTP!

I mentioned I could build something to do just that in about an hour! It wouldn't even be complex. Armed with Ensemble I can very simply launch an EC2 instance, install rtorrent (my fav cli torrent client) and rtgui (rtorrent Web UI) and have it ready to crunch on any of your torrenting needs. We both became interested in seeing how well that would work and so here we go...

The How


I'll assume you already know how to get started with Ensemble. Let's see what it takes to deploy my torrent appliance

$ bzr branch lp:~kim0/+junk/rtgui
$ ensemble bootstrap
# Wait for ec2 to catch up (2~5 mins)
$ ensemble deploy --repository . rtgui
$ ensemble expose rtgui

That is basically all you need to "use" this appliance! Give another few minutes for the rtgui appliance to boot, install and configure itself. You can check status with

$ ensemble status 
2011-09-12 13:23:53,868 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-50-19-19-234.compute-1.amazonaws.com, instance-id: i-2c37ee4c}
  1: {dns-name: ec2-107-20-96-125.compute-1.amazonaws.com, instance-id: i-1c28f17c}
services:
  rtgui:
    exposed: true
    formula: local:rtgui-9
    relations: {}
    units:
      rtgui/0:
        machine: 1
        open-ports: [80/tcp, 55556/tcp, 55557/tcp, 55558/tcp, 55559/tcp, 55560/tcp,
          6881/udp]
        relations: {}
        state: started
2011-09-12 13:24:01,334 INFO 'status' command finished successfully

The import bit to watch for is "state: started", if it's something else, that means the ec2 instance is still being configured. It's nice to note that the following ports have been opened 55556-55560 since rtorrent is configured to use those ports, port 80 was opened for the Web UI, and port 6881 UDP was opened for the DHT network. I am in no way a torrent expert, so this could be completely unoptimized, but hey it seems to work

Ready to test? Machine 1 runs rtgui, so go ahead and visit it in a browser, for me that's http://ec2-107-20-96-125.compute-1.amazonaws.com/rtgui (replace that DNS name with the right one for your instance, and don't forget the trailing /rtgui like I always do). Click "Add torrent" and pass it the URL to a torrent file, I'm gonna be testing with Ubuntu 11.10 beta1 amd64 torrent file. Once the torrent is added, click the green play button to start it. Since EC2 instances have quite some bandwidth available to them, this Ubuntu torrent downloaded in a just few seconds. I am shipping a default configuration with rtorrent that limits upload speed to 100KB (since you're paying for bandwidth), but you can change that from the web UI. Here's how the whole thing looks


Once a torrent file is downloaded, you can download it through http://ec2-107-20-96-125.compute-1.amazonaws.com/complete (again replace the machine DNS name, with the correct name in your case)

A single torrent appliance is not ofcourse limited to a single torrent! You can keep adding as much as you want, however eventually you're going to hit some limit (disk IO, network IO, disk space ...etc). As such (probably only if you're after downloading really large number of torrents) you may need to "scale up" this torrent download appliance (well it's a cloud for God's sake!). If that's what you wish for, you only need to
$ ensemble add-unit rtgui

Simple as that, as with everything Ensemble! So now you know how you can download your 11.10 copy without loading Ubuntu's servers, actually you'd be helping them and all millions of Ubuntu users if you use this method on release day. Once you're done playing with the appliance, you need to destroy it (to stop paying Amazon for the machines)

$ ensemble destroy-environment
WARNING: this command will destroy the 'sample' environment (type: ec2).
This includes all machines, services, data, and other resources. Continue [y/N]y
2011-09-12 13:53:33,018 INFO Destroying environment 'sample' (type: ec2)...
2011-09-12 13:53:36,641 INFO Waiting on 2 EC2 instances to transition to terminated state, this may take a while
2011-09-12 13:54:18,617 INFO 'destroy_environment' command finished successfully

Want to improve it?


Things I wish I had time to improve:
  • Once a file is downloaded, upload it to S3. You can then terminate the appliance, and still download the files at your own pace
  • Parameterize rtorrent rc configuration file, such that you can pass it parameters from Ensemble (such as upload rate...etc)
  • Integrate notification upon download completion (SMS me, email me, IM me ...etc)
  • Add an auto-redirect to /rtgui :)
  • Figure out a way to download completed files from within the rtgui web UI

If you're interested in improving that appliance, drop by in #ubuntu-ensemble on IRC/Freenode and ping me (kim0) or any of the friendly folks around.

What kind of skills do you need to hack on that project? Just bash shell scripting foo! The feature I love most about Ensemble as a cloud orchestration tool, is that it doesn't twist you into using some abstracted syntax. You get to write in whatever language you feel like using, for me that's bash. You can find the script that does all of the above right here.

Interested to learn more about Ensemble and automating Ubuntu server deployments in the cloud or on physical servers ?
Want to hack on this torrent appliance, or do something similar?
Have comments or a better idea?

Let me know about it, just drop me a comment right here! You can also grab me (kim0) over Freenode irc

No comments: