Al Issa's Blog: 2012

Saturday, December 1, 2012

If nothing else, be authentic

This last week I attended the re:invent Amazon Web Service Conference in Vegas. It was a good conference, in a good venue. I learned a lot of good technology stuff. One non-technical session I really enjoyed, was a key note address in the style of a fireside chat, with Jeff Bezos (you can see it here).

Bezos is an intense, really smart guy. What he has done at Amazon is incredible (interesting side note: he does not have a degree in business, but a degree Electrical Engineering and Computer Science). What struck me most about his talk was his authenticity. He would admit he didn't know sometimes, and talk about mistakes he has made. He talked about working customer service calls and sweeping the warehouse floors at Amazon. Not a lot of pretension there.

Authenticity is key in a leadership role. Engineers are especially really good at picking up phoniness, and once they sense you are phony, they loose respect for you; and without their respect, you can't lead. Besides, it's requires too much negative energy to keep pretending to be something you are not. If you don't know, say you don't. If you have doubts, say you have doubts - even if it is in yourself. If someone else knows better, ask their opinion.

Authenticity requires a fair amount of self-awareness, which few people are really willing to spend time on. You have to know who you are, what you know and what you don't, and, most importantly, be willing to put your ego aside and not be the smartest guy in the room.

Authenticity also means that you recognize that people "under" you are just like you. You are no better than them, and that you are all working for a common purpose. You also need to be able to recognize that in many (if not most cases), those working for you contribute more than you to the success of what you are working on.

I love this quote from Jack Welch, from his book Winning (hat tip to TechCrunch):

"When I was at GE, we would occasionally encounter a very successful executive who just could not be promoted to the next level. In the early days, we would struggle with our reasoning. The person demonstrated the right values and made the numbers, but usually his people did not connect with him. What was wrong? Finally, we figured out that these people always had a certain phoniness about them. They pretended to be something they were not — more in control, more upbeat, more savvy than they really were. They didn’t sweat. They didn’t cry. They squirmed in their own skin, playing a role of their own inventing. A leader in times of crisis can’t have an iota of fakeness in him. He has to know himself — and like himself — so that he can be straight with the world, energize followers, and lead with the authority born of authenticity."

If you want to have success and really lead, lay aside the ego and start to be authentic. As counterintuitive as it may sound, those that work for you will have more respect for you, approach you more, and enjoy working with you more. Oh yeah - and your team will have more success.

Tuesday, October 30, 2012

How we got through the October AWS Service Degradation

After planning and practicing for Amazon Web Services outages at BuildLinks, we got to exercise our plans when AWS had a east coast service degradation this month.

The summary - AWS had a single zone failure in their US East region. We initiated our plan to evacuate the failed zone and we were able to continue to deliver services to our clients from the non-affected AWS zones in the US East. I am very proud of our infrastructure team and how they jumped on this and made sure we continued to deliver service.

We build, plan and practice for this stuff, it doesn't just magically happen. We just went through a drill for this very scenario a week before the AWS service degradation. Also, as I have mentioned in a previous blog post, if you are too attached to an EC2 instance, you are probably doing something wrong.

The thing about product infrastructure and security is this: no one really appreciates it until something goes wrong. As a CTO, it's part of my job to make sure we can deliver product to our customers who run their businesses on our platform. I have to champion for this all the time - "Yes, It is worth our time and effort to build a robust and secure platform, not just add features to the product".

We have some principles that govern how we deliver services on AWS, which helped us in this last service degradation.

Front end server EC2 instances are stateless. If one goes down, there is another to take its place and customers don't get affected.
Front end server EC2 instances are templates. They don't store configurations and application code. When a front end server starts up, it is told what it is, and it finds it's configuration and application code in a pre-configured S3 bucket, it then copies these locally and starts up. This allows us to run many of these as we need, in multiple availability zones, and bring them up and down without issues. We can also modify the config and code in once place and then have the front end servers can pick up the new versions easily.
Front end services run in multiple availability zones fronted by Elastic Load Balancers (ELBs). If a zone goes down, the ELB moves traffic to non-affected zones. If ELBs have trouble, we can by-pass traffic to ELBs and shard traffic directly to the instances by reconfiguring DNS (we had do do this for the last service degradation).
Back end databases are state-full (of course), but they are actively replicated to multiple availability zones. When one zone goes offline, we failover to the non-affected zone (again, we had do do this for the last service degradation). For paranoia, we backup the databases to RackSpace.

There is a lot of more stuff we do, but these principles helped us withstand these last set of issues. We will continue to refine and work on product delivery. For example, we are interested in having our services replicated across regions (East to West coast, for example). There are tools to do this or we can roll our own. We have toyed with this, but need to dedicate more time to it.

Security and service delivery is a process, not a destination. We learned a lot and continue to refine and get better, but the moral is you have to think about this and dedicate time to this - and never really feel that you are "done".

Tuesday, July 31, 2012

Why are software delivery estimates so hard to get right?

A day after Mountain Lion came out for the Mac, I upgraded. It works great and I am very happy with it. If you noticed, Apple didn't let anyone know the exact release date, until the day before it was released. We still don't know when iOS 6 will be released by Apple - only that it is coming out sometime in the fall. This is now the norm with software vendors.

When I was getting my graduate degree in CS, I spent a lot of time thinking about software complexity, which was the subject of my master's thesis and project. I built a cool software model around AJ Albrecht's work for measuring complexity called Function Points.

Software complexity is important, because understanding complexity helps you understand how long it may take you to build something. If you know that a software system has say, n complexity, before you start building it and you know how long it takes to build something of n complexity (based on historical data), then you know how long it takes to build that software system.

The fundamental problem is that you never really know upfront how complex a software system is, and hence how long it will take you to build. This is something that people with production-minded backgrounds have a really difficult time getting their minds around. It's not like manufacturing printers or building houses - software is not constrained by the laws of physics, has many more moving parts, and can be changed at the very last minute. As a matter of fact, in a good software development process, you delay as many decisions as possible - because you are always learning things as you are putting product together. Some people say developing software is more like writing a book or producing a movie, and that sounds about right to me - you can and should change things at last minute, as long as you understand the costs.

Also - the larger the project, the bigger the estimate risk. I personally like breaking up large projects into smaller deliverables, which gives you insight on how well the product development is going. In our current product, I like (and use) two week sprints and 10 week release cycles. That means that every 10 weeks we deliver new product like clock work. If something will take more than 10 weeks, we break it up so that deliverables fit into that window. If a 10 week release cycle deliverable it not useful to the end user, we keep it in-house - but interestingly that is rarely the case. We can usually find value to deliver every 10 weeks.

A time-constrained release cycle also gives you a time box for delivering value. Many times we will constrain the delivery of a feature to a release cycle. This forces us to make decisions about what is really import. There is a lot that can get left behind, but we often find those things were not that important.

It also really helps to have senior software developers with deep experience. Nonetheless, it's non-trivial to estimate how complex a software system is before you build it, but if you break up your product into small deliverables, and time constrain those deliverables, you have a good chance of hitting delivery estimates.

Wednesday, July 25, 2012

It's an app!

This week we delivered an app for iOS devices. It's a mobile portal to our enterprise operations platform, and it actually is pretty cool, robust and functions well. It connects to our EC2-based infrastructure over a secure connection. You wouldn't use it unless you were a customer.

Our app is non-trivial - it interacts with our cloud-based systems to do schedule calculations and manage complex tasks. It took us a while to develop and I think we got the first pass right. Like any other piece of software, there is things we can improve on, bugs we will find, and work flow improvement we will make over the next few release cycles, but I am proud of what our team has accomplished.

This is not my first iOS app, but I learned somethings again in the process.

First - having a good iOS developer in-house to prime the pump is crucial. Once the pump is primed, good software developers with deep experience can jump in, pick it up, and contribute. We did the real thing - Objective C using Xcode. I have not been impressed with third party iOS development packages and they lock you in to their framework.

Second - the app store approval process from Apple is still a pain. They filter out incompetence by the complexity of the process. Maybe it's on purpose.

Third - Mobile user interfaces are very different from web portals (no surprise). We didn't look to re-invent the wheel here. We looked at what others were doing and followed similar UI patterns. No need to complicate things.

The iOS ecosystem is cool. I don't mind objective C and Xcode - I just wish approval and deployment were easier.

Wednesday, July 18, 2012

Agile: dropping the Scrum training wheels

Many of us have been using parts of the Agile Manifesto for many years. What you find in the agile movement, is a rollup of good practices that many of us long time software development practitioners have found work. I am a big fan of agile principles, specifically short development iterations, the focus on working code, and the focus on human interactions over documentation.

A couple of years ago we introduced scrum at my current company. Scrum enforces good practices that agile talks about, by introducing daily stand ups, sprints, demos, etc... It also introduces a Scrum Master - a facilitator that enables communications and makes sure that decisions are made. A good Scrum Master also keeps the process wheels well lubricated to make sure everything moves along well.

We went all "in" on scrum - Story boards, story pointing, estimates - everything pretty much by the book. We bought Jira + green hopper to help manage everything. All went pretty well first year. Every sprint and release cycle we got better at the process. People acted like adults and owned things. Estimates were made, product managers were brought into the loop. It also helped tremendously that I hired a Scrum Master who had lots of deep experience in Scrum, and who is a fantastic communicator and facilitator.

Then, we started dropping parts of scrum. Scrum helped us stick to agile principles, but over time, we started dropping some scrum things that we grew out of. Scrum purists would scream foul, but I have always believed that process should adapt to your people and circumstances, not the other way around. I believe we do agile, but not scrum anymore.

Here is some things we dropped:

A full time Scrum Master. Our Scrum Master is now our director of Software Development. He still helps make sure that the process works well, but by and large the role of scrum master has been taken over by the software development teams themselves. We have four development teams of about two developers and one test engineer. They know enough about the process to move things along themselves. If they need help, they ask and we get it for them.
Story Point estimating. Story point estimating blows. It's hard to get estimates right. You have to keep a history, but when people move around teams, the estimates get all messed up. Now, we still develop stories and talk about them upfront. We still do exit criteria on stories, we prioritize stories, but we just don't story point them. How do we estimate how much can get done in a release cycle or sprint? We do rough estimates with everyone involved ("I think we will get this far"). You know what? those work as well as all the time we put in story pointing. Sometimes we are off, more often then not we pretty much on.
Sprint retrospectives. We do retrospectives, but not that often. If the teams are always talking (and they should be), there should be no surprises.

We have kept stand ups, demos, sprints (two-weeks) and we like Jira. The key is that we are free to evolve the process as we need to, and don't necessarily feel we need to be held up by the scrum model.

There are plenty of things I wish we could do better. We struggle communicating business workflows well, some developers try to be too heroic and do too much, and have a tendency to leave the test team behind. But, here is the key: we talk about these things and we try to do better every sprint.

I know this is heretical to many of my scrum-devoted friends, but I say: make the process work for you and drop the scrum training wheels if you feel you need to.

Monday, July 2, 2012

Software Development: You don't have a technology problem, you have a people problem

"There is nothing new under the sun" - Ecclesiastes 1:9

We rarely invent anything new in commercial software development. Most of what we do is to apply well known patterns and technologies to the domain we are working in. Now, this does not mean what we do is trivial or easy, but generally, we are not inventing anything new.

So then, why are so many software projects, late, bloated, over budget or failures? According to Tom Demarco, in his classic book, Peopleware, it's not because of technology issues, its because of people issues.

That's right, you don't have a technology problem, you have a people problem.

Bad communication, egos, ignorance, office politics, inexperience, weird personalities all are bigger problems than technology issues. These things all get in the way of success. Unfortunately, the vast majority of technology leadership struggles with dealing with people issues.

It's easier to deal with a build problem, or to fix a bug, then it is to figure out why Sally and Mike are not getting along on their project. Technology problems are discrete and deterministic, people are messy and emotional.

I have found that finding leadership that is both technical and people-oriented is very hard, and when you have someone who has both, hold on to them, but have to build these soft skills in your existing staff. I do this following some loose principles:

Technology leadership needs to be technical. This is critical - you can't lead people unless you know what they do so that you can participate, lead and provide input. You have to know how to code.
Focus on "doing the right thing", not pleasing some internal stake holder. When you focus on doing the right thing, politics and bickering takes a back seat.
Don't worry about who get the credit. This is easy to say, hard to do, but when you have a team that really believes this, great things can happen.
Remember that software development is a team sport. Empower the team to make decisions and allow them to be responsible for their actions. Treat them like adults.
Take time to talk and listen. Spending an hour listening to a person who is struggling with a decision, giving input (not telling them what to do), and helping them make a decision is a critical management responsibility. I make sure I have time every day to talk to people and listen.
Treat people like adults, not like children. Management is not like parenting. The people that work for you are adults, treat them as such and respect what they do.

It is true, there is a lot malpractice in this industry related to technology, but the issues all start with people. People design, implement and test software. Focus on your people problems first and your technology problems second, and you will have a lot more success.

Wednesday, June 27, 2012

Cloud-based Infrastructure: If you want to be available, be transient

So, there was outage a couple of weeks ago in the Amazon Web Services east coast region - a power problem affecting one of their zones. They have four (or now five) zones in the east coast, and apparently the others were unaffected.

What was amazing to me was not that there was an outage, but that so many users on the AWS forums had messages which essentially said "Help! I can't connect to my instance!". Well, yes, the zone was offline for a few hours so instances running in that zone were offline. But, if you rely so much on a single instance being available, trouble is coming your way.

Cloud-based infrastructure needs to be thought of as transient - that is, like saying, "I like you but I am not committed to you". If an instance or service is down, you should be able to shrug your shoulders and move on.

At BuildLinks, we are very heavy users of AWS. We don't even own a single piece of hardware (except for laptops, a couple of hubs and wireless routers). We are all "in" on AWS, but we continue to work hard and design our systems such that we are not committed to a particular zone or instance (heck, we even back up our databases to RackSpace outside of AWS, just to be paranoid). If an instance goes offline, there is another to take its place. AWS provides some services for doing this (like ELB), and some you have to build yourself. For example, we use multi-zone web server instances fronted by ELB, but we had to string together database replication ourselves - our database is actively replicated to another zone in the east coast and also to the AWS west coast region. We back up our database to S3 and RackSpace every few minutes too.

Yes, we could experience an outage if an entire region goes offline, ELB fails or something of the like, but we would not be completely wiped out, given the replication and redundancy we took time to design and build, and continue to build and add with every product release. Cloud-based infrastructure reliability needs to be an on-going issue.

If you want to use cloud-based infrastructure, you need to take time to design and iterate your services for robustness - not just throw instances up with the assumption that they will be there forever. It will also keep your blood pressure down and help you sleep. :)

If you have questions, drop me a line. I am glad to share.

Thursday, June 7, 2012

Peopleware

I believe that people who are learning and growing at work, are going to be more productive and positive. To that end, the BuildLinks technology leadership team is reading Peopleware, by Tom DeMarco and Tim Lister. It's a classic book on technology team management. Tom DeMarco has written several great books and articles and I am a big fan.

I first read Peopleware when it first came out years ago. Re-reading it now has reminded me how deeply it has influenced my thinking on technology management. I don't agree with everything in the book, but over my career I have found that most of their insights are spot on.