Archive for the ‘Engineering’ Category

Forget CMMI!

Tuesday, November 15th, 2011

This is probably the most important blog entry I’ve ever posted.

The video is the longest video I’ve ever posted on the blog, and for that reason, I’ll keep the text content to a minimum. 

Here’s why you should watch the video:  CMMI may be entirely wrong for you, and you may not know it!

The video explains an epically crucial reality about CMMI that many agile (and other) teams are not aware of, leading them unknowingly down a path of self-defeat and damage.  All of which could be avoided with this one super-critical piece of knowledge.

You’ll thank me later.

Backstory:

The lure of seemingly limitless opportunities can be quite strong, obviously.  And, especially in tough economic times, succumbing to that lure can cause even the best of businesses to act unwisely.  Such is the lure of CMMI ratings.

Well, anything that’s very alluring can cause unwise behavior, I suppose.  Whether it’s as apparently harmless as indulging in a luscious dessert, spending money on unnecessary luxuries, or any of equally limitless opportunities to make bad choices, doing what we want instead of doing what’s right shows up even when working with CMMI.

This blog is full of examples of such bad CMMI choices, but there’s one bad choice I haven’t mentioned much about.  That’s the choice to even try to use CMMI.

When working with a knowledgeable, concerned, trustworthy CMMI consultant, an organization should be steered away from CMMI when their circumstance doesn’t align well with model-based improvement using CMMI.  In some cases, it may be a matter of steering towards the right CMMI constellation (e.g., for Development, or, for Services).  However, just as whether or not CMMI is right for an organization ought to be discovered before too much energy is put into it, so should the decision about a particular maturity level within the constellation.

No CMMI constellation should be attempted if/when the organization doesn’t control the work that it does.  Namely, that the work it does is controlled by another organization, such as a customer.  Or, put the other way, CMMI should only be used if/when the processes used by the people doing the work are controlled by the same organization using CMMI to improve them.

At Maturity Level 2 (ML2), almost any type of work can use the practices in that level to improve its performance and to demonstrate that the practices are in place.  However, at Maturity Level 3 (ML3), you have to be doing the type of work in the particular constellation in order to be able to use the practices in it.  If you’re not doing that type of work, the practices will be irrelevant.  Attempting to use the practices when there’s no such work being done will only cause the practices to get in the way and add nothing but frustration.

In particular, if you’re not doing work that involves structured engineering analysis, CMMI for Development at ML3 will be truly unwieldy.

Adding practices for work you’re not doing is an example of the bad behavior many organization exhibit when they’re chasing a level rating rather than hot on the trail of performance improvements.  It’s these sorts of behaviors that are somehow rationalized as being beneficial when, in fact, they are unequivocally, diametrically, and everything but beneficial.  They are a colossal waste of time and money and detrimental to morale and productivity.

You really need carve out about 11 minutes to watch the video.

Counting Change

Monday, May 2nd, 2011

A more appropriate title would have been "counting changes" but it would have hardly been as interesting.  :-)

Change happens.  And often.

In particular, when a product is in its operation and maintenance ("O&M") phase, changes are constant.  (Note: O&M is frequently called "production", and this simple choice of words may also be part of the issue.)  But, too often, changes to products are handled as afterthoughts.  When handled as "afterthoughts", product features and functions receive far less discipline and attention than warranted by the the magnitude of the change were the new or different feature/functionality have been introduced during the original product development phase.

In other words, treating real development as one would treat a simple update just because the development is happening while the product is in production is a mistake.  However, it’s a mistake that can be easily diffused and reversed.

O&M work has technical effort involved.  Just because you’re "only" making changes to existing products that have already been designed, does not mean that there aren’t design-related tasks necessary to make the changes in the O&M requests.  Ignoring the engineering perspective of changes just because you didn’t do the original design or because the original (lion’s share of) design, integration and verification work were done a while back doesn’t mean you don’t have engineering tasks ahead of you.

In O&M, analysis is still needed to ensure there really aren’t more serious changes or impacts resulting from the changes.  In O&M, technical information needs to be updated so that they are current with the product.  In business process software, much of the O&M has to do with forms and reports.  Even when creating/modifying forms, while there may not be any technical work, per se, there is design work in the UI.  The form or report itself.  And even if you didn’t do that UI design work, you still need to ensure that the new form can accept the data being rendered to it (or vice-versa: the data can be fit into the report).

It’s frightening, when you think about it, how much of the products we use every day — and many more products that we don’t know about that are used by government and industry 24/7 — are actually "developed" while in the "O&M" phase of the product life cycle when the disciplines of new product development are often tossed out the door with the packing material the original product came in.  Get that?  Many products are developed while in the "official" O&M phase, but when that happens they’re not created with the same technical acumen as when the product is initially developed.

(I have more on this topic, and how to deal with business operations for products in the O&M phase, in this Advisor article from the Cutter consortium.)

In a sadly high number of operations I’ve encountered, once a product is put into production, i.e., is in O&M, the developers assigned to work on it aren’t top-notch.  Even in those organizations where such deleterious decision-paths aren’t chosen, the common experience in many organizations is that the developers are relied-upon even more for their intimate knowledge of the product and the product’s documented functionality — as would have otherwise been captured in designs, specifications, tests and similar work artifacts of new product development.  In these organizations, the only way to know the current state of the product is to know the product.  And, the only way to fix things when they go wrong is to pull together enough people who retain knowledge of the product and sift through their collective memories.  The common work artifacts of new product development are frequently left to rot once the product is in O&M, and what’s worse is that the people working on the new/changed features and functionality don’t do the same level of review or analysis that would have been done were the functionality or other changes been in-work when the product was originally developed.  Of course, it’s rather challenging to conduct reviews or analysis when the product definition only exists as distributed among people’s heads.  Can you begin to see the compounding technical debt this is causing?

I’ve actually heard developers working on legacy products question the benefits of technical analysis and reviews for their product!  As though their work is any more immune to defect-inducing mistakes than the work of the new product developers.  What’s worse is that without the reviews and analyses, defect data to support such a rose-colored view seldom exists!  It’s entirely likely, instead, that were such data about in-process defects (e.g., mistakes in logic, design, failing to account for other consequences) to be collected and analyzed, it would uncover a strong concentration of defects resulting from insufficient analysis that should have happened before the O&M changes were being made.

Except in cases where O&M activities are fundamentally not making any changes to form, fit, feature, appearance, organization, integrity, performance, complexity, usability  or function of the product, there should be engineering analysis.  For that matter, what the heck are people doing in O&M if they’re not making any changes to form, fit, feature, appearance, organization, integrity, performance, complexity, usability or function of the product?!

If anyone still believes O&M work doesn’t involve engineering, then they might need to check their definition of O&M.  Changes to product are happening and they’d better be counted because if not, such thinking fools organizations into believing their field failures aren’t related to this.  Changes count as technical work and should be treated as such.

(I have more on this topic, including how to help treat O&M and development with more consistent technical acumen in this Advisor article from the Cutter consortium.)

Verification, Validation, & the iPhone 4

Wednesday, July 7th, 2010

Apple, Inc. learned the hard way what happens when engineering isn’t complete.  In particular, when verification and/or validation aren’t performed thoroughly.

Verification is ensuring that what you’re up to meets requirements.  “ON PAPER.”  BEFORE you commit to making the product.  It’s that part where you do some analysis to figure out whether what you think will work, will actually do what you expect it to do.  Such as, walking through an algorithm or an equation by hand to make sure the logic is right or that the math is right.  Or, stepping through some code to see what’s going on before you assume that it is behaving.  Just because something you built passes tests, doesn’t mean it is verified.  All passing tests means is just that: you passed tests.  Passing tests assumes the tests are correct.  If you’re going to rely on tests, then the tests need to be verified if you’re not going to verify the requirements or the design, etc.  Another problem with tests is that too many organizations only test at the end.  Verification looks a lot more like incremental testing.  Hey wait!  Where’ve we seen that sort of stuff before?

Had Apple’s verification efforts been more robust, they would have caught the algorithm error that incorrectly displays the signal strength (a.k.a., “number of bars”) on the iPhone4.  This is why peer review is so central to most verification steps.  The purpose of peer review, and of verification, is to catch defective thinking.  OK, that’s a bit crude and rude… it’s not that people’s thinking is defective, per se, but that thinking alone didn’t catch everything, which is why we like to have other people looking at our thinking.  Even Albert Einstein submitted his work for peer review.

Validation is ensuring the product will work as intended when placed in the users’ environments.  In other words, it’s as simple as asking, “when real users use our product, how will they use it, and will our product work like we/they expect it to work?”  Sometimes this is not something that can be done on paper, and you need some sort of “real” product, so you build a prototype.  Just as often it’s not something that can be done “for real” because you don’t get an opportunity (yet) to take your product into orbit before it has to go into orbit to work.  Sometimes you only get one shot, and so you do what you can to best approximate the real working environment.  But neither of these extreme conditions can be used by Apple as excuses for not validating whether or not the phone will work as expected while being held by the user to make calls.

Had Apple’s validation been operating on all bars, they likely would have caught this while in the lab.  When sitting in its sterile, padded vice, in some small anechoic chamber, after taking great care to ensure there are no unintended signals and nothing metallic touching the case, someone might’ve noticed, “gee, do you think our users might actually make calls this way?”  And, instead of responding, “that’s not what we’re testing here”, someone might’ve stepped up and said, “hey, does our test plan have anything in it where we’re running this test while someone’s actually using the phone?”

Again, testing isn’t enough.  Why not!?  After all, isn’t putting it in a lab with or without someone holding the phone a test?   True…  However, I go back to the same issue we saw when using testing as the primary means of performing verification… Testing is too often at the end.  Validating at the end is too late.  You need to validate along the way.  In fact, it’s entirely possible that Apple *did* do validation “tests” of the case separately from the complete system, and, in *those* tests — where the case/antenna were mere components being tested in the lab — performed fine, and, then only when the unit was assembled and tested as a complete system would the issue have been found.  In such a scenario we learn that component (elsewhere known as “unit testing”) is not enough.  We also need system testing (in the lab) and user testing (in real life).  Back we go to iterative and incremental…

So you see… we have a lot we can apply from ordinary engineering, from agile, and from performance improvement.  Not only does this… uh… validate(?) that “agile” and “CMMI” can work together but that for some situations, others can learn from applying both.

In full disclosure, as a new owner of an iPhone 4, I am very pleased with the device.  I can really see why people love it and become devotees of Apple’s products.  Honestly, it kicks the snot out of my prior “smart” phone in every measurable and qualitative way.  And, just so I’m not leaving anything out, the two devices are pretty much equally balanced in functionality (web, email, social, wifi, etc.)  – even with the strange behaviors that are promised to be fixed.  For a few years, this iPhone will rule the market and I’ll be happy to use it.

Besides embarrassing, this will be an expensive couple of engineering oversights for Apple to fix.  And, they were entirely avoidable for an up-front investment in engineering at an infinitesimal fraction of the cost/time it will take to fix.  For even less than one day of their engineering and deployment team’s salary, AgileCMMI can make this never happen again.

Apple, look me up.  I’m easy to find.