AWS Glacier, Boto and Me

Posted on June 25, 2014 by davidrfaulkner

Glacier Released! a while ago…

When AWS Glacier launched, I was excited. It was phenomenally cheap at $0.01 per GB (USD). Finally I could afford to store my precious memories (in photo / video form), very cheaply in the cloud.

So I instantly got to work on a simple archive solution. But as I worked on it, and tested it, I discovered that although it worked for my original idea, I kept thinking of how my original archive idea would be complex to manage. And because of that complexity, a backup approach, would be simpler. It was about this time, I realised that being lazy and using CrashPlan would solve my issues, and remove the burden of data management from my shoulders…

So what was I left with?

I now had a small defunct project https://github.com/davidrfaulkner/pyceage-glacier

But worse, I had a bunch of test archives loaded into AWS Vaults. Now if I had deleted the vaults then, using my programs structured metadata, there would have been no issue. But that was not to be, the project was forgotten.

The Generosity of Amazon

Fast forward a year or so, and I remembered that Amazon were probably billing me for usage of these data sets. So, after logging into the console, I discover that they have been billing me about 2c per month. However, I’m guessing due to the challenges of billing 1c, it appears they just accrue it a few months, then it gets “adjusted” out of existence. Thank you, Amazon 🙂 (no sarcasm implied)

Now in the console, we can’t just delete the vaults – as they have active archives associated with them. We need to delete the archives. However, by design, Glacier requires you know the Archive ID’s to delete them… Which I had of course misplaced….

Enter Boto – The AWS Python binding for easily accessing services

So I needed api access, and I needed to recover my archive ids, to delete them

Step 1: Install Boto (included in ubuntu repositories)

$ sudo apt-get install python-boto

Step 2: Test it works

$ python

Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto
>>> dir(boto)
['BUCKET_NAME_RE', etc,etc

Now we can write a really inefficient program in a few minutes, to find out what our archives are called 🙂

Bear in mind it may take about 4 hours for this to finish, as that’s how long it may take for Amazon to return the inventory.

import time
from boto.glacier.layer2 import Layer2
access_key = "#############"
secret_key = "####################################"

C = Layer2(aws_access_key_id=access_key, aws_secret_access_key=secret_key)

jobs = []
for i in C.list_vaults():
    print i
    jobs.append(i.retrieve_inventory_job())

while True:
    if len(jobs) < 1:
        break
    print "wait 10 minutes"
    time.sleep(600)
    if jobs[0].completed:
        print jobs[0].get_output()
        jobs[0].pop(0)

Once it returns the archive IDs, I can delete the archives, and then the vaults.

Misusing Docker Containers, and Hypervisors

Posted on June 6, 2014 by davidrfaulkner

Have you tried Docker yet? Is it what you expected?

I read with interest that someone has made containers cool again. I was mildly interested in LXC, when I realised that even small scale out ARM servers would benefit from some form of virtualisation, but that is a niche requirement. For the rest of us, mature hypervisors have really steamrolled container technology.

Yes, there is OpenVZ, / Parallels Virtuozzo, but that has mainly been limited to the traditional web hosting market as a more cost effective virtual private server platform. Often people avoid these VPS options due to less isolation with hosting neighbors, and a habit of providers overcommiting resources.

Enter Docker. The cool container technology. For a start, this is a tool that is aimed at Developers / DevOps, not infrastructure people. And, in my opinion, its not targeting what virtualisation is good at. Instead its targeting an age old battle between developers and dependencies.

From the website:

Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.

Docker containers can encapsulate any payload, and will run consistently on and between virtually any server. The same container that a developer builds and tests on a laptop will run at scale, in production*, on VMs, bare-metal servers, OpenStack clusters, public instances, or combinations of the above.

Notice the clear goal is NOT to compete with traditional hypervisors. Simply it gives developers the tools to create a maintainable micro environment for their application, with versioning, that is portable between dev / test / prod / etc. It can be considered as a tool to make micro custom PAAS platforms. PAAS platforms that contain any dependencies / languages you want.

How does it work?

1. Have Ubuntu 14.04 (64bit, isn’t currently supported), as then you can just

sudo apt-get install docker.io

2. Have a look at the command options (after you check the docker service is running)

sudo docker.io

3. Search images, then install the Docker made ubuntu one anyway

sudo docker.io search something

sudo docker.io pull ubuntu

4. See how image layering with tags can give you different versions, space efficiently, as we now have many ubuntu images.

sudo docker.io images

REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu              13.10               5e019ab7bf6d        5 weeks ago         180 MB
ubuntu              saucy               5e019ab7bf6d        5 weeks ago         180 MB
ubuntu              12.04               74fe38d11401        5 weeks ago         209.6 MB
ubuntu              precise             74fe38d11401        5 weeks ago         209.6 MB
ubuntu              12.10               a7cf8ae4e998        6 weeks ago         171.3 MB
ubuntu              quantal             a7cf8ae4e998        6 weeks ago         171.3 MB
ubuntu              14.04               99ec81b80c55        6 weeks ago         266 MB
ubuntu              latest              99ec81b80c55        6 weeks ago         266 MB
ubuntu              trusty              99ec81b80c55        6 weeks ago         266 MB
ubuntu              raring              316b678ddf48        6 weeks ago         169.4 MB
ubuntu              13.04               316b678ddf48        6 weeks ago         169.4 MB
ubuntu              10.04               3db9c44f4520        6 weeks ago         183 MB
ubuntu              lucid               3db9c44f4520        6 weeks ago         183 MB

5. Get confused at the prospect of running containers with just one command….

sudo docker.io run -i -t ubuntu /bin/bash

Docker is designed to run ONE application per container. We just cheated by running a shell… But if you run top, you will see that it is the ONLY thing running in the container. When we exit the shell, our container will “stop”, and if we did start a service from the terminal, it is now very stopped.

6. Get confused at the difference between Images and Containers….

sudo docker.io run -i -t ubuntu /bin/bash

Seem logical to run this after we stopped our container? its not. Thats an image. Each time we run that, we get a new container. A clean version of the image with a writable layer above the source image. And because we are not specifying container names they will have cool names like “thirsty_wosniak” and “distracted_feynman”, as well as UUID’s. So lets try the below instead.

sudo docker.io ps -a

CONTAINER ID IMAGE        COMMAND   CREATED    STATUS        PORTS                  NAMES
6b3b35d35b7b eaef1fd5434f /bin/bash 2 days ago Exit Status 0                        thirsty_wozniak 

sudo docker.io start thirsty_wozniak

sudo docker.io attach thirsty_wozniak (if you want terminal access again)

to “detach” ctrl-p ctrl-q – this stops it from uh… stopping

cowsay "You should be running an application component, not a terminal"

7. Expose some kind of network service to the world

sudo docker.io commit thirsty_wozniak newubuntuimage sudo docker.io run -i -t -p 8080:8080 newubuntuimage /bin/bash

Hmmm… thats a bit annoying. We do need to plan what ports we want exposed to the outside world BEFORE creating our container (by running an image). So we create a new image based on our container, so we can make new containers based on our image, linked to a host port…

8. Start misusing things

After a bit of reading you will see we are already misusing things by running a shell, and commiting non-image data to images.My first real effort with docker is currently deploying an old python web app into a container, and I have put both Maria DB and the App in one container. That is the wrong way 🙂 But it has been great for my development efforts, and has stopped me from installing all kinds of dependencies directly on my dev system, and needing to use virt_env etc.

I only scratched the surface, and there are other exciting ways to deal with networking (including container linking, and a separate advanced tool known as pipework). Also you have the option to run a private image index, use a docker cloud private index, instead of using the public index. Also are included save / export functions to enable making images / containers portable as TAR archives.

Also bear in mind our image / containers use an AUFS layered file system, which adds some overhead, and has limitations, so we really want to isolate data to separate volumes not backed by the layered file system.

Also, infrastructure guys will have a bit more to worry about, as this technology really allows linux dev teams to utilize a single dev VM very highly by running MANY containers…

Docker is a great tool that by design encourages separating application components, and restricting network access, and lets us move our application from a notebook with vagrant, to a workstation with Linux, to an Amazon instance, to a VMware VM, keeping our application environment consistent, versioned and portable. I recommend trying Docker out and doing the online tutorial.

Enjoy!

19″ Rack OCP Servers and Storage

Posted on January 30, 2014 by davidrfaulkner

19″ Rack OCP Servers and Storage

I think there could be a market for a distributor for standard rack dimension OCP equipment from Qanta et al. The challenge is the non-existent support, and finding customers (think ISP/Cloud suppliers) happy to design failure tolerant architectures.

Upload a PST to Gmail

Posted on July 6, 2013 by davidrfaulkner

Upload a PST to Gmail

I have used these utilities a few times now to store old PST’s in a Gmail account for long term archiving. Ubuntu allows an apt-get install readpst, so its easy to get going.

The Future of VLANs

Posted on July 2, 2013 by davidrfaulkner

In designing a lab using Open vSwitch, I quickly found that this excellent virtual switch with vlan, vxlan, gre tunnels, and open flow support, has terse documentation. Also, due to rapid development, a great deal of the blog based documentation is out of date.

Scott Lowe has posted many articles on KVM/OVS which I have found to be the most useful documentation source (although some also out of date), so I recommend you check out his blog.

http://blog.scottlowe.org/

He also wrote a nice piece on why this (SDN/OVS) is rapidly becoming important. Basically, they are aimed at solving the issues we have incorrectly been throwing VLAN at.

http://blog.scottlowe.org/2013/06/20/thinking-out-loud-the-future-of-vlans/

KVM – First impressions

Posted on July 2, 2013 by davidrfaulkner

I have been following KVM developments since it was first mainline, but had actually never got around to deploying/configuring it. Well, I finally have, and before I publish a full account of the process, and integration with Open vSwitch (OVS), I thought I would publish my initial impressions – although to be fair, this is more observations around virt-manager, as that’s the easiest way to dive into KVM quickly.

A lot of work has been done to make networking easier than in earlier versions I have read about. The bridging / NAT setup is now as easy as any other Hypervisor.
However, this all explodes a bit as soon as we introduce OVS, or if we want to do VLAN tagging, etc. It then becomes a bit harder.
The performance from early observation, is exceptionally good (especially with virtio paravirtualised drivers). And VM’s as a process really appeals to me.
Storage configuration in virt-manager is also a bit underwhelming, from my impressions so far, but I will be exploring that further.
The configuration options. VM’s can be configured to a ridiculous level (ie CPU feature masking, emulated hardware, networking backends). KVM VM’s are incredibly customizable, with all of the configuration possibilities easily exposed.

From a quick look, KVM is incredibly easy to get running on a modern linux distribution (I used Ubuntu 13.04), but requires more preparation and expertise for more complex network functionality (although its still not that bad, for the advanced FREE features this gets you). It became quickly obvious why this is the preferred OpenStack hypervisor, and also that its an area IT companies can add value due to the specific expertise required.

An example of how easy it is to get running, on Ubuntu 13.04

Check you have Intel-VT, AMD-V.

egrep -c ‘(vmx|svm)’ /proc/cpuinfo

Then install KVM and related items

sudo apt-get install qemu-kvm libvirt-bin bridge-utils

And if you wish, install virt-manager for an easy GUI for managing KVM with libvirt.

sudo apt-get install virt-manager

Restart, and you have a KVM host, with NAT networking, and an easy bridged network method (macvtap).

One Bug I ran into quickly….

If you plan to use Ubuntu Server 13.04 VM’s, change the display driver from cirrus to vga. Currently Ubuntu 13.04 boot hangs on the cirrus driver.

Object Storage IS Cloud Storage

Posted on June 28, 2013 by davidrfaulkner

I plan to write a piece on cloud storage gateways (and global distributed NAS), and on OpenStack Swift / S3 / Glacier, in a little bit. So I thought it prudent to have a bit of a cloud storage rant / primer.

Cloud storage is one of those things that can be hard to communicate clearly to end users, even to users of a technical nature. And it is simply because humans are resistant to change (over the short haul), and cling on to concepts. Why would we change our storage paradigm to something that seems harder, and is limited in terms of traditional use cases? Why is accessing storage using HTTP/HTTPS better than a C: drive, or a Network drive?

http://www.rackspace.com/blog/storage-systems-overview/

First things first, why do I say Object Storage is Cloud Storage. Simply it is the only storage technology (ies – due to no standard api, excluding AWS S3 compatibility), that easily meets the definition of a cloud service. Traditional Block / File storage systems cannot “easily” conform to latent connections, multitenancy, and self service. They also find it difficult (but possible), to deliver elastic scaling. That said, Cloud services require traditional storage. We still need fast Block / File solutions to store production information for our cloud services. But this storage is part of Cloud IAAS (Infrastructure as a Service), not a distinct Storage as a Service offering.

The NIST definition lists five essential characteristics of cloud computing: on-demand self-service, broad network access, resource pooling, rapid elasticity or expansion, and measured service. It also lists three “service models” (software, platform and infrastructure), and four “deployment models” (private, community, public and hybrid) that together categorize ways to deliver cloud services. The definition is intended to serve as a means for broad comparisons of cloud services and deployment strategies, and to provide a baseline for discussion from what is cloud computing to how to best use cloud computing.

So if this is the case, why do vendors have trouble selling object storage, and why do endusers reject it?

http://www.theregister.co.uk/2013/04/30/a_failure_to_launch/

Well, for a start, vendors are doing it wrong. Amazon transforming object storage into a commodity, will not allow you to change the name of an archive platform, and sell it to every business. The existing archive market still exists, but is being canabalised by NAS storage with archive features, or even software only solutions. The cloud service providers are the next choice, but be ready for stiff competition, and an expectation of commodity pricing.

And then there is the user problem. The cloud provider has to explain to each customer, that this “Cloud Storage”, is not like CIFS, NFS, SAN, a C: drive, etc. And after that, sell a solution. But this is exactly what it was like pitching VMware, and SAN, to customers when they were “New” in the customers eyes. Its merely a challenge to overcome.

http://www.theregister.co.uk/2013/06/27/intel_chipping_away_at_objects/

The importance of object storage technology, is that it enables us to build solutions less dependent on the speed of light. The storage can’t bend the rules of latency, it is still slow… But it knows its slower. Its rules are based on being slower – so we don’t mind compressing/encrypting in flight for example. Its also not transactional, and far less “chatty”. This means less round trips than traditional protocols, once again reducing the latency issue. Object storage is also by nature, far easier to scale, and possible to run on commodity hardware (this scale out model also renders RAID unneeded in most cases). This allows the storage to be priced in a disruptive way. This technology forms the foundation for new cost effective solutions designed to deal with distance/latency.

What does this mean?

Besides developers and niche users, most users do not want object storage. But they are very interested in intelligent solutions built on top of object storage.

Also, it would be nice if there was a standard for object storage API’s, besides the current defacto champion, Amazon S3.

Infographic: The Future Is in the Cloud (pcmag.com)
Quoting Different Cloud Storage Pricing (publicloudanalytics.wordpress.com)
Building cloud storage (allyourcloud.wordpress.com)
4 Reasons Why You Need Cloud Storage (shareddotcom.wordpress.com)
Intel’s taking a serious look at object storage. What’s their game? (go.theregister.com)
CTERA floats version 4.0 of its cloud storage platform (zdnet.com)
Tier 3 Launches Global Cloud Object Storage (basho.com)

Microsoft: Someone gave us shot in the ARM by swallowing Surface tabs

Posted on June 26, 2013 by davidrfaulkner

Microsoft: Someone gave us shot in the ARM by swallowing Surface tabs

The Register burns Microsoft on Windows RT customer story… Funny though…

But something like the Wehrmacht at Stalingrad receiving parachuted crates of condoms rather than ammo or warm food, the hapless IT employees of ARM Holdings’ IT, quality assurance, marketing and sales teams have received crate upon crate of Surface RTs.

	Private Clouds: Goog… on Oracle gets friendly with publ…
	OpenStack Compute Fo… on OpenStack is not vSphere…
	OpenStack Compute Fo… on OpenStack is not vSphere…
	Object Storage IS Cl… on Building cloud storage
	OpenStack For VMware… on OpenStack is not vSphere…

All your cloud are belong to us

Discoveries in Cloud, Linux, Storage, and Technology

19″ Rack OCP Servers and Storage

Upload a PST to Gmail

The Future of VLANs

KVM – First impressions

One Bug I ran into quickly….

Microsoft: Someone gave us shot in the ARM by swallowing Surface tabs

Glacier Released! a while ago…

So what was I left with?

The Generosity of Amazon

Enter Boto – The AWS Python binding for easily accessing services

Have you tried Docker yet? Is it what you expected?

How does it work?

Enjoy!

One Bug I ran into quickly….

Related articles