Working with Azure Blob Storage, some notes
I’m working on building a snazzy shiny app that a) drops the inputs/parameter values into blob storage and b) uses Stream Analytics to query the values and present back what people are saying at the moment. This’ll be a fab tool for my pre-con next month if I can get it working in time!
Getting it working, does however mean utilising the Azure Blob Storage API in R which I confess is much harder than expected, especially after the ease of using the Visual Studio Online API for tfsR. To that end, I thought I’d write-up some of my findings before I do a bigger write-up that illustrates how to do everything (in R).
I’m working my way through an intro to azure storage on the (hopefully reasonable) expectation that more knowledge will make it easier to work with. There’s additionally the online reference, although I found the VSO REST API documentation easier to understand and get started with.
Necessary info
To connect to an Azure Blob storage container you need to know your account name, access key, and container that you want to interact with.
- account name will be the blob storage account name
- the access key is either the primary or secondary access keys (either one is fine) and it’s worth noting that this is encoded in base64 already, so you may need to decode it before you can use it
- the container name you want to put blobs in
Constructing a signature
It’s a bit weird (in my eyes anyway) but to authenticate your request to use blob storage you have to send the request basically twice – once as the main request, and second as an identical but encoded signature in the headers of the main request.
When you’re constructing your request you need to:
Fix a date value to be used twice within the signature. This has to be a specific format and in R I create it via:
x-ms-date <- format(Sys.time(),"%a, %d %b %Y %H:%M:%S %Z", tz="GMT")
Construct a “Canonicalized Header” containing the x-ms-date and x-ms-version, separated with line breaks, in the format
headertitle:headervalue
Similarly, build a “Canonicalized Resource” holding
/accountname/container
and all query parameters that appear in your API call, i.e. everything after the ?. These parameters needs to be in alphabetical order and separated with line breaks, in the formatheadertitle:headervalue
Construct a string of:
- the API call verb
- 12 carriage returns
- the “Canonicalized Header”
- the “Canonicalized Resource”
Encoding the signature
Using the (un-encoded) access key as the key or basis for this next bit you have to:
- Make sure your signature string is UTF8 encoded
- Encrypt this using HMAC using the SHA-256 algorithm
- Encode this into base64
Sending the signature
Phew! After all that work, the encoded signature string (as the Authorization header), the x-ms-date and x-ms-version all need to be sent in the header of the request.
My next steps
I have work in progress R functions for this task in my Rtraining package, I’m currently working on the sending of JSON to the BLOB storage and verifying how this needs to be handled in my signature string.
If you see any inaccuracies, misunderstandings, potential improvements, or rational explanations for the stuff above, please let me know by commenting!