Blameless post-mortem

Nope, my new position is not dead yet, thank you very much.
What I mean by this title is usually a meeting in any IT service, after a major incident has been resolved, where all the team members who have worked on the incident gather and discuss what went wrong, and how to improve tools and processes to do better next time.

I specify blameless, as it is a very good practice to avoid finger pointing, generally and particularly in these meetings. If you want people to be honest and share their best insights, you have to keep in mind that these post-mortems have to cultivate an atmosphere of trust. The aim is really to find out how the events have unfolded, which information had been gathered, what went wrong, what steps were smart, which ones did not work properly etc.
For more information about that, I recommend some DevOps sessions and talks, like this one from @Jasonhand from VictorOps : It’s Not Your Fault – Blameless Post-mortems

But my point today is to write about another kind of post-mortem which I discussed with a friend a few months back.
The methodology of a post mortem could and should be used in different settings than just IT infrastructure incidents. It should be extended to sales, whether you manage to win or lose the deal. It could be applied personally to any job interview, even if there are usually not that many people involved. And it could be used after any major event in your life, personal or professional.

The main focus for me right now would be the sales post-mortem. In most companies I have worked for, the sales pipeline strategy is mostly to respond to as many RFP as possible. Statistically, it makes sense, as you are doomed to success every once in a while. In terms of smart strategy… let’s say I am not completely convinced. I tend to prefer a targeted answer to the cases where my team/company can bring out real value and help the customer while bringing attractive project to our team. I usually do not hesitate to forgo any RFP where there is nothing interesting or that puts us in jeopardy without bringing any value, or sexiness to our job.
When you have time to focus on very interesting cases and invest time on those, you would usually find this time useful, on short and long term. And you should take time, whether you win or lose, to have this post-mortem meeting with your team. It is good to get the feelings and insights from everyone involved about the outcome. And I mean everyone. The first stakeholder you should at least get feedback from is the customer. I try to build a trust relationship with a potential customer during the RFP process where we can exchange honest points of view about our positioning and the project expectations. During the process this helps everyone stay on the right track. And afterwards, it helps to know why you have not been chosen.

Beyond knowing, the most important aspect of these post-mortems is to implement some changes on your process, to be more relevant and have a better chance for success the following time around.

And that’s it for early morning musings, ’til next time!

IoT Challenges

After a long summer break, getting back to writing is a bit difficult, so here is a first post for a new era. I’ll be switching jobs early September, so there might a slight variation in the subjects I’ll write about.

As highlighted in Gartner 2018 Cycle of Hype study, IoT is now a mature tech and we will see more and more large scale projects being deployed in the wild. I would like to expand a bit about what it entails to start an IoT initiative, whether it be to design a new product to sell, or to gain some insight and improve your own processes.
The steps are familiar to anyone who has ever come close to a project in his/her life:

1. Design the solution
2. Gather the requirements
3. Choose the components, protocols
4. Build all the processes (logistics, operations, IT, support)
5. Market and sell
6. Maintain and deliver new functionalities

In terms of project management, there is nothing to learn here. I just wanted to highlight the specifics of an IoT project for these steps. There some particularities due to the type of project, or just points to remember that should be obvious but are often forgotten.

Design the solution

What I mean here is a high level design, functional, that will describe what you are aiming to deliver to your users or customers. Nothing fancy, nothing technical, just plain business.

Gather the requirements

Nothing new here, just make sur you include the future functions and the way you are going to develop. For example, if you start with an MVP (Minimal Viable Product) and build from there in short cycles, you need to have a long term plan/strategy that will keep everything on track. And this plan should help you define your long term requirements.

Choose the components/protocols

This is a technical step, rather complex to execute today, as there are so many solutions to one single question out there. And you have to keep in mind the current state of the art, along with what you expect this state to be in 3, 5 or even 10 years.

Build all the processes (logistics, operations, IT, support)

From my experience, this a often disregarded step, even by some companies that have been in the industry for decades. The simple question is : you are going to deliver a product (physical) to your users. What happens when the product breaks? Who are they going to call (and no, the answer has nothing to do with an 80s movie 🙂 ), how are you going to manage your replacements, stock, warranties etc. How do you handle servicing the device? Remotely, using your current support team, or locally? One specific suggestion, coming from experience : remember to include the ability to remotely upgrade your firmware 😉

Market and sell

Nothing to declare here. This should be rather standard. One word of advice : most IoT project that succeed build on their ecosystem and integration of new functionalities. You should probably add that to your strategy, and to the marketing materials.

Maintain and deliver new functionalities

This point relates both to the maintenance and support I have raised earlier, and to the lifecycle of your product.
Think about the many product we have seen with an incredible starting point in sales or customer acquisition, that dropped of the board after a few weeks, because nothing happened beyond the first wow effect. There nothing more infuriating, as an end user, to have a product with no bugfixes, or without any new functionalities beyond what came with the product out of the box. For example, take a mobile game, Pokemon Go : they had an amazing start, with millions of users daily. But, as the hype faded out, rumored functions and abilities did not come out, and the game statistics went down.
https://www.wandera.com/pokemon-go-data-analysis-popular-game/

The short version is : a connected product is a product, physical, with all the requirements that should be included in such a project. Do not go too fast when your Proof Of Conecpt works. Think long term, and try not to be dazzled by a partner or consultant that show off what a POC platform does on a demo screen 😉

Managed Kubernetes and security

Almost a sponsored post today, or better : a shared announcement.

You probably know that I am following Kubernetes rather closely, especially managed Kubernetes services (AKS, EKS or Openshift for example). One domain where these offerings have been lacking is network and security.

It is still a very sensitive subject for our customers, for containers related project, and still for public cloud projects. Security and networking teams have trouble adapting to the public cloud paradigms and architectures. There some fear of loss of control, some base fear of the unknown, and some real worry about how to handle networking and security.
Kubernetes (and the other orchestrators) adds another abstraction layer on top of the existing public cloud platforms, which does nothing to alleviate fear, to say nothing about complexity and transparency.

There are some very good solutions out there to manage network overlays into Kubernetes. My favourite is Calico, but you may like any of those. I’ll stick with Calico for a simple reason, which you will see below.

Microsoft and AWS are both working hard to provide a network overlay into their managed Kubernetes offering. They each chose their own path, but we will get to approximately the same point in a short time.

Thanks to Jean Poizat, we have the two announcements.
1) From Calico for Azure : https://www.tigera.io/tigera-calico-coming-to-azure-kubernetes-service-aks/
2) For AWS : https://itnext.io/kubernetes-is-hard-why-eks-makes-it-easier-for-network-and-security-architects-ea6d8b2ca965

The summary is that Calico will be integrated into AKS in a few weeks/months, and EKS will include AWS CNI.
And that is exactly what we were waiting for, along with our customers : managed Kubernetes, with security!

Designing your own job

Depending on how you consider things, it is the third time that it happens to me.
Being able to design, under certain limits, your own job, is an amazing opportunity.
I will not go into too many details as some of it is work in progress, but the process was amazingly energizing and I wanted to share a bit of that energy.
For my current job, I met my future boss on the recommendation of a former colleague. We discussed many things, from ITIL to Managed Services, and also public cloud and the need to get dev and ops team closer. We went through those kind of talks several times, at least four if memory serves. We went from a job which look like an Ops engineer/ITIL practitioner, to something closer to an Azure tech lead.
In my previous position I also had the opportunity to be offered a promotion, and been able to discuss some of the content and responsibilities of the future role. I was also able to step down when time came for me to admit that it was not an ideal position, for me or for the company. Which was really appreciated, at least on my part.

And once again a few weeks ago, I was called out of the blue by a colleague’s boss. He started to discuss his own future and what he was trying to design. He wanted to build something new, and was searching for a partner to build that together. And in that scheme, he discussed a position very similar to my dream job, and offered it to me.
I almost fell off my chair.
At that point I was ready to accept, without having any more details about the exact role and responsibilities, or even the salary. That’s where my future boss started to ask me what I would include or exclude from that job description, and how I could make it my own. My mind just froze.
It took some time for me to recover and start thinking again. After some lame jokes, we discussed the position, and what we would like to build together. It took us several meetings and calls to see through the fog, as we are really going to build something new together, and we cannot rely much on what exists around us.
The last funny thing to happen was that my next interview was with the CEO of the company, who was convinced by the both of us in less than 35 minutes. I could not believe my luck in getting there.
Anyway, that’s it for the bragging post. I really needed to write that down to make it real (even if I signed and will start by the end of the summer 🙂 )

Autonomous versus autonomic systems

This is a difficult topic. I have to admit I am still not completely comfortable with all the concepts and functions.
However, the thinking is amazingly interesting, and I will take some time to ingest everything.
First things first, I will use this post to summarize what I have learned so far.

How did I end up reading that kind of work, you ask? Weeeellll, that’s easy 🙂
Brendan Burns, in one of Ignite ’17 sessions, used the comparison “autonomous vs autonomic” to discuss Kubernetes.
This got me thinking on the actual comparison, and aided with our trusted friend, Google, I found a NASA paper about that (https://www.researchgate.net/publication/265111077_Autonomous_and_Autonomic_Systems_with_Applications_to_NASA_Intelligent_Spacecraft_Operations_and_Exploration_Systems) I started to read it, but it was a bit obscure for me, and scientific English, applied to space research, was a bit too hard for an introduction to that topic of autonomic systems.
Some more research, helped by me beloved wife, led to a research thesis, in French, by Rémi Sharrock (https://www.linkedin.com/in/tichadok/). The thesis is available right there : https://tel.archives-ouvertes.fr/tel-00578735/document. This one relates to the same topic, but applied to distributed software and infrastructure, which ends up being way more familiar to me 🙂

The point where I am right now is just over getting the definitions and concepts right.
I will try to describe what I understand here about automated, autonomous and autonomic systems.
There is some progression from the first to the second, and from the second to the third concept.
Let’s start with automated. An automated system is just like an automaton in the old world : something that will execute a series of commands, on the order of a human (or another system). For example, you have a thermostat at home that send the temperature from inside and outside your home to the heater controller.

There is no brain in there, or almost none.
The higher step is an autonomous system. This one is able to take some decision and act on the data it captured. To continue with the thermostat example, you have a heater controller which will handle the current temperature, from both inside and outside, and decide whether to start heating the house, and how.
The short version is that the system is able to execute a task by itself.

And then we have an autonomic system. This is able to have a higher view of its environment, and should be able to control the fact that it will always be able to execute its tasks. I have run out of the heater example, but let’s take a smart mower. The first degree of autonomicity that it has is the way it will control its battery level, and return to its base station to recharge, in order to ensure that it will be able to continue its task, which is mowing the lawn.
There are multiple pillars of autonomicity. Rémi Sharrock described four in his thesis, and I tend to trust him on this :

Each of these four pillars can be implemented into the system, to various degrees.
I am not yet comfortable enough on describing precisely the four pillars, but it will come in a future post!

Going back to my (our) roots

Yes, another post with an obscure reference for a title.

After some time discussing tech subjects, I was of a mind of going back to something that has often been misread in the past by IT teams and IT management. And by that I mean : business. Yes, again.

Do not misunderstand me, I am still a technologist, and I love learning about technology, finding out the limits and possibilities of any enw tech that is coming out. I am not a sales person, nor a marketing person. However I have been exposed to many well crafted presentations and talks over the years, and what often came out of even the most interesting ones was that : “our tech is fantastic, buy it!”

All right, I love that tech. Be it virtualisation, SAN, VSAN, public cloud, containers, CI/CD, DevOps… choose whatever you like. But technology is not an end to itself in our day to day world. Whatever matters is what you will do with it for your company or customers.

I will take an example. An easy shot at someone I admire. Mark Russinovich, CTO of Azure, and longtime Windows expert (I would use a stronger term if I knew one 🙂 ). A few months ago, during a conference, he had a demo running where he could spin up thousands of container instances in a few seconds, with a simple command.

First reaction : “Wow!”

Second reaction : “Wooooooowwww!”

Third reaction : “How can we do the same?”

Fourth reaction (probably the sanest one) : “Wait, what’s the point?”

And there we go. What was the point. For me, Mark’s point was to show how good Azure tech is. Which is his job, and this demo made that very clear. But Mark did go further, as he usually does, during his speech and encouraged everyone to think about the usages. Unfortunately, most of the people I have discussed with seem to miss the point. They see the Wow effect, and want to share it. But few of us decide to sit down and think about what the use case could be.

And that is the difficult, and probably multi-million dollar question : how to turn amazing technology into a business benefit.

Never forget that, apart from some very lucky people, we are part of a company that is trying to make money, and our role is to participate to that goal. We should always think about our customers, internal or external, and how we can help them. If doing that involves playing with some cool toys and be able to brag about it, go for it! But that is not the other way around.

PS : to give one answer to how we could use Azure Container instances for the real world, especially the kubelet version of ACI, try and think about batch computing, where you would periodically need to spin up dozens or hundreds of container instances for a very short time. Does that ring any bell for you?

PPS : I could not find the exact session from Mark I am describing here, but there is an almost identical session from Corey Sander and Rick Claus there : Azure Container Instances: Get containers up and running in seconds

My very first public presentation – feedback

There we are, I have finally given my talk about Kubernetes and Azure.

It was both more and less than I expected.

It was more easy, once I got there, into the position of a speaker than I expected. My fellow speakers were very kind and supportive, which helped with the pre-stage flutters 🙂 It was also easier because the room was of a reasonable size, and I was not on stage in front of 500 people.

And it was less deep dive than I expected, which also allowed me to relax a bit. I could get a feeling about the audience before going there, which let me into the dark regarding their needs and expectations.

 

Let’s set the stage. The event took place at Microsoft’s Building 20, which is a Reactor (https://developer.microsoft.com/en-us/reactor/). So the building is definitely designed to host events comfortably. That helped a lot, as we even had someone from the A/V team to help us and ensure all the screens and microphones would be working correctly. And yes, the free coffee might also have been a huge help 🙂

The room was large, without any raised platform for the speaker, but with multiple repeat screens all around.

I was the third speaker, so I definitely had some time to review my slides and demo setup a few times.

I did setup the demo environment the night before, to avoid any deployment issue at the last minute (which did happen 2 days before while I was practicing). Once again, having a scripted demo ensured that I would not forget any step, or mess up some command line options.

 

I did have a few issues during the talk. First the mike did stop working at some point, failed battery. I kept on speaking without it, as the room was small enough to let me speak louder for a short time and still be heard. The support guy came shortly to replace the battery, so no big issue there.

My remote clicker did work perfectly, but not the pointer part. That’s a shame, because it made it more difficult to point out at a precise section of a slide or demo. Afterwards I found out why, and I should be able to avoid that particular issue in the future.

I did not get as much interaction as I hoped I would. I thing that it was mostly due to my anxiety, which prevented me to behave like my normal self and be engaging.

 

What I would change for the future. First, for a set event like this one, I would practice in front of a camera, or a mirror, to actually see and listen to my speech. That would probably ensure that I would keep the correct pace and articulation. And also make sure that the flow of slides is comprehensible.

Second, I would work more to know the expectations of the public. It turns out that my talk was way too technical and fast it should have been. While discussing with the attendees afterwards, I realized that I did not get many of the points through, probably because I went too fast over those. This brings me back to the interactions point above : would I have been more comfortable and interactive, I could have grasped that during the session and corrected it.

Third, I should probably think about learning a bit more about controlling my voice and projecting it. I realized that during the week leading to the event, as I had to speak in a loud environment, and present/discuss the same kind of subjects.

 

Labs

A word on the hands-on labs we had in the afternoon. I just was glad to have stayed for that part.

First because I had never been on the proctor side before, and it’s really fascinating to see a problem through the eye of someone with a different mindset and culture. I really learned a lot, and realized a lot during these 2 hours.

Second, because it showed me the areas where my presentation had been lacking, and how much I had not been clear enough to be understood by everyone.  I think these discussions with the attendees were the deeper feedback and improvement tips that I could get.

For the record, the container labs we used are there : https://github.com/Azure/blackbelt-aks-hackfest/

That’s it for now. This first talk has unlocked something and made me realize that I should talk at every occasion I can, and that I love it, at least when it’s done 😉

 

My very first public presentation – preparation

I’m writing this a bit ahead of time, as I plan to write a follow-up to compare what is planned against what will have happened.

 

As the title suggests, I will be hosting my very first public session on the 21st of April. I am taking part in Global Azure Bootcamp , a worldwide community event where experts from around the world gather locally to share their experience and knowledge on Azure. I would probably have preferred to be involved in an event in France, however I am in Seattle that week, so my event of choice will be directly @Microsoft in Redmond.

This will be an occasion for multiple first times for me : first time on my own as a public speaker, first participation in Global Azure Bootcamp, first time presenting fully in English, and first time presenting in Redmond of course 🙂 So, big step far out of my comfort zone.

 

The aim of this post, as stated above, is to record what I did to prepare for the event, and afterwards, write down what have gone right and wrong, and how I can progress and do better.

 

I have chosen the topic of containers & Kubernetes on Azure for two reasons : first I am rather comfortable with the subject, and second a colleague, Jean Poizat, https://www.linkedin.com/in/jean-poizat-0a97bb/, did already build a slidedeck and demo  which I could expand from.

Obvious first then : I have a chosen familiar grounds and existing material, to limit the amount of work needed. This however presented a challenge : start from slides which I did not write, and get familiar with those, before rearranging & completing those to my purpose and comfort.

A word on how I got out of my comfort zone : a nice kick in the back end! I saw on some social networks few friends and colleagues getting ready for GAB in France, which prompted me to start collaborating, at least to give a hand. Once I realized I would be in Seattle at that time, I contacted the local event owner Manesh Raveendran, https://www.linkedin.com/in/maneshraveendran/, to offer my help, in broad terms. It took me a while to be able to suggest the session I will be presenting, and I almost chickened out a few times. But once Manesh wrote me in, that was it, I had to make this work!

The next step was to get very familiar with the presentation and with the associated demos. I started presenting to myself, but out loud and standing. This allowed me to work my speech, content and speed, and fine tune the slides. I also quickly incorporated the demos, to work out how to time things, and how to work around a failing demo.

I started 10 days before the set date, with the slides & demo mostly ready. I allowed a minimum of a deck run every two days, that I would then adjust depending on my comfort and accuracy.

During these dry run, I would keep a piece of paper next to me, to write down whatever thoughts/questions or clarifications were needed. These would affect either the speech or the slides, and even the demo.

In between these runs, I would review the slides as much as I could every day.

I did not spend as much time reviewing the demo, as Jean had provided me with a solid script that would mostly run by itself, on my cue. The few manuals demos were quite simple, and worked every time.

I was also lucky enough to meet with several architects during that time, who were kind enough to give me their feedback on my slides, and even to let me rehearse in front of them, and give me their impressions and advice. That was a big help, and a great comfort as showtime loomed closer 🙂

I am now a few hours from the actual session, I will submit this post and start writing the follow-up right after the session.

Stay tuned!

 

PS : the program for the Redmond event is there : https://www.azurecommunityevents.com/#/event?181C8806-AFB7-4142-B0D3-B1858E9E8956

IoT everywhere, for everyone

Today is another tentative to explain part of the Microsoft Azure catalog of solutions.

As I did write about the different flavors of containers in Azure, I feel that it’s time for a little explanation about the different ways of running you IoT solution in Azure.

There are three major ways of running an IoT platform in Azure : build your own, Azure IoT Suite and IoT Central.

There are some sub-versions of those, that I will mention as I go along but these are the main players. I have listed those in a specific order, on purpose :

There you have it, I actually do not have to write another word 🙂

 

Alright, some words anyway. At the far end of the figure, you have what has always existed in the cloud and before. If you want a software stack,  you just build it. You will probably use some third-party software, unless you really want to write everything from the ground up. Let’s assume you will at least use a DBMS, probably a queuing system etc. You might go as far as to use some PaaS components from Azure (IoT Hub is a good candidate obviously, along with Stream Analytics). Long story short, you will have complete control over the stack and how you use it. But with great power… etc. It is a costly solution, in terms of time, money, people. And not only upfront investment, but you will also have to maintain all that stack, even provide your users with some kind of SLA.

 

Let’s say you are not ready to invest two years of R&D into your platform, and want to be able at least to get your pilot on track in a few days. Here comes Azure IoT Suite. It is a prepackaged set of Azure PaaS components that are ready to use. There are several use cases fully ready to deploy : Remote monitoring, Predictive Maintenance, Connected Factory.  You can start with one of those, and customize it for you own use. Once it is deployed, you have full access to each Azure component and you may evolve the model to suit your own needs. There are some very good trainings, with devices simulators, available. You can start playing with a suite in a few hours, see the messages and data go back and forth. You still have to manage the components once they are deployed, even though they are PaaS, so the management overhead is rather limited. But it is your responsibility to operate.

 

At the other end of the scope, we have Azure IoT Central. IoT central is a very recent solution to help you start your IoT project. We have been lucky enough to discuss the solution early on and I have to admit I have been convinced very early on by the product and the team behind it. So, the point is you have a business to run, and you might not want to invest millions to build and run something that is indeed not your core business. Start your IoT Central solution, configure a few devices, sensors, users and applications, and you’re done. Pilot in minutes, production in a few hours.

And like a good SaaS solution, you do not operate anything, you do not even have to know what is under the hood.

 

To conclude, I’ll say that the SaaS, PaaS and IaaS subtitles on the figure were here to remind you that the same choice principles apply here as anywhere in a cloud world : it is a choice you have to make between control and responsibility.

Azure SLAs

Another quite short post today, but for a complex topic.

I had the discussion several times with our customers, and more recently with several Microsoftees and MS partners.
The discussion boils down to “SLAs for Azure are complex, and you might not get what you think”.
And I’ll add “you might get better or worse than you are used to on-premises”.

Quick reminder, the official SLA website is here : https://azure.microsoft.com/en-us/support/legal/sla/
They are adapted quite frequently and what I write today might be proven wrong very soon. Yes, it happens, sometimes I am right for a long time 🙂

Back to our SLAs. I will focus on one service, but the idea can be expanded to almost all services.

Some services SLA are quite easy to figure out. Take Virtual Machines (Azure or not) for example. You just have to decide what metric proves that a VM is alive (ping reply for example), and measure that. Do some computation at the end of the month, and you’re done.

With backups, the official SLA () is a monthly uptime percentage. Which does not mean much for me, speaking of backups. Luckily, there is a definition of “downtime” :
“Downtime” is the total accumulated Deployment Minutes across all Protected Items scheduled for Backup by Customer in a given Microsoft Azure subscription during which the Backup Service is unavailable for the Protected Item. The Backup Service is considered unavailable for a given Protected Item from the first Failure to Back Up or Restore the Protected Item until the initiation of a successful Backup or Recovery of a Protected Item, provided that retries are continually attempted no less frequently than once every thirty minutes.

Meaning basically that the “backup service” has to be available at all time, whether you try to backup or restore. But, and there are actually two buts, there is not hard commitment there. Microsoft will give you back a service credit if the service is not provided, to the limit of a 25% credit. Eventually, you could get no service at all for a month, and you would get a 25% service credit. And the second, more important, but, there is absolutely nothing about a guarantee on your data. You could lose all of your data, and at most get a 25% service credit.
Some people would then point you to the storage SLA, stating that once the backup is stored, the SLA that applies is the one from storage. Another but here, as we are in the same situation : no commitment about your data.

One note : I never looked closely at the SaaS services SLAs (Office365 for example), but I remember someone from Microsoft IT saying that it was too difficult, and expensive, even for them, to build the infrastructure and services to compete with what Office365 offers. So yes, you might dig into their SLAs, and find that they have a light hand… but think hard on what you can do yourself, and how much it would cost you 🙂

Do not get me wrong, Microsoft does a quite good job with its SLAs, and from my experience, a way better job that most companies can do internally or for their customers. I worked for a hosting company, and I can assure you that we could write down an SLA about backups, and even commit to it. We could pray that we would be right, and prepare the compensations in case we were at fault, but that was it. There was no way for us to economically handle a complete guarantee.