Observing Testing

2011年3月16日星期三

introducing bdd

Last thursday, I delivered a presentation about "Behavior Driven Development with cucumber" to my team. Considering my team's agile experience, which is none, I started with the history of Agile.

The main storyline of the history is like this:
* 1956, bug concept shows up, bug hunting involves both testing and debugging
* 1979, testing is seperated from debugging. Movement pushed by Glenford J. Myers, the author of "The art of software testing". Testers start to grow in their own positions and organizations, communicate with developers through document, meetings. W-mode evovles.
* 1990, lightweight software development shows up, to reduce waste in traditional methodology
* 1995, lightweight software development methodology: scrum/XP/Lean development show up
* 2001, lightweight pioneers group together, call their process Agile, Agile manifesto shows up.

After the history, we started to discuss "the waste in traditional development process". Using W-mode as a baseline, my team came up with the following waste:
* duplication
** requirements duplicate with acceptance test cases,
** function specification duplicate with system test cases,
** design/component specification duplicate with integration test cases.
** and document itself hide bugs too
* [Alex] one team idle
** on the W's left part, developers are busy with design/coding, testers are reviewing which is almost idle.
** on the W's right part, testers are busy with all kinds of test cycle, developers are fixing bug which is almost idle.
* bad material generated
** document generated on W-left part do not benifit customers,
** most document can not be re-used
* [yueche] water-fall cause it hard to change
** requirement change or code change like late bug fix, the later they happen, the more they cost

Then we moved on to "How we reduce the waste" with BDD method. Using the outside-in methodology,
* we could replace requirements with automated acceptance test which is user-behavior,
* we could replace function specification with system test cases which is system-behavior,
* we could replace design/component speficition with integration test case which is component-behavior.

Using outside-in, after each component is done, we can test the component itself, then integrate it into system to do system testing, then deploy to do acceptance test, and all of this process is automated. It looks quite like the way how continous integration works.

After the methodology, I did a short demo with Cucumber driven one of our component.

At last, my favourite part - Q&A. I love to see how different people react to new ideas, and what makes them excited and what concerns them. Here's some interesting questions that I collected.

[brady] continous integration looks good, what if I commit 1 line of code, and it takes 2 hours to test through?
If the code committed is passed in unit testing, and component testing, which means it is proven to be good code. Then wait 2 hours to see if it can be integrated into the system is a worthy deal. Anyway it is better than to manually verify it.

[brady] how do you know Given/When/Then can cover all scenarios?
Robert. C. Martin, the author of "Clean Code" comes up with the answer to this one. All programming language we use in our team, perl/java,ruby is state transistion language, which mean it gets an entrance and an exit, between where are different states. Programs transfer from one state to another, caused by different outside/inside actions.
Given/When/Then is a state transistion language, Given as the initial state, Then the final state, and When as the action. So at least for programms in our team, it should fit well.

* [jian] outside-in method might not work for low-level interface change
In traditional water-fall team, this might not work. Because for water-fall, they usually schedule for 2 month as requirement analysis, another 2 month as design phase, then another 2 as coding phase. But in an Agile team, each of our stories last at most 2 weeks, we complete requirement, design, coding, testing in these 2 weeks, any change limit in these 2 weeks are accepted. That's why traditional team takes 1 year to upgrade from 2.1.0 to 2.2.0, while Agile team takes 1 month to release from 1.094320940239 to 5.1203143294.

[jin] how we benefit from it?
The major benefit is that we will have automation to verify the component/system/user's behavior, then we can save tester from verified testing to exploratory testing. Verified testing is that we know how the system should behave, but we do not know if it is doing so. Tasks like input 1, and 1, and press "+", the result is 2, they are better to be automated. Right? While exploratory testing is that we do not know what is out there, what is missing in our requirement or our thoughts, we could all take a moment to see.

2011年1月10日星期一

A real story of system testing

I've done system testing for like 10 times. It's been 5 years since I joined the current team, our team release twice a year, and before every release, we do system testing. Well, actually I do. The bad part about my system testing is that we have no requirement, so no body knows what to do in system testing. This is also the good part, no body is able to tell whether I have done a good job or not.

Do not get me wrong. I work hard. All right?! Every time before I do system testing, I google for it, I read about it, I go through all testing results from the past. But there are too many guideline, rules, practices, checklists in these system testing papers, none of them includes a real story. Take a look at system testing in wikipedia, Secion "Types of testing to include in system testing", this section includes almost all of the testing techniques that I have heard. How am I supposed to finish all these? I usually get 2-3 weeks, with 2-3 developers for temporary help.

Another resource I look into is my system testing results from the past. These results show that how my understanding of system testing evolves. The first 2 years, I was a big fan of test cases. I grouped all test cases from all components together, combined them into bigger cases, every case must involve at least 2 components, then I executed one by one. This was my definition of end-to-end testing. The problem about this method was that it failed to reveal new problems. All bugs that I found were the bugs that escaped from component regression testing.

Later 2 years, I became a fan of var checklists and scenario testing. While planning, I divided different roles in our system, including normal user, manager, admin, etc, then found their interests and their tasks. With that, I designed user scenarios. A scenario would look like:

manager
- interest : manage users
- tasks: assign user a task, assign user some resource , check on his/her status

Using var checklists, e.g. Big Var List, me and 2 developers defined complex steps and data. This did help us to find a lot of bugs, even some security bugs using fuzzy character. The problem about this method was that it was still function tests, all the bugs we found could be found in component regression tests.

I just finished my system testing 2010. This time I use a different method - risk analysis. While reading "pragmatic software testing" by Rex Black, I become a big fan of risk analysis. In chapter 4, he introduces nearly 20 kind of risks - ways that a product can fail. I draw a mind mapping of all these risks, and take it everywhere I test. But mapping all these risks in our system would generate too many test ideas, so me and 2 developers decide to set several filters.

The first filter is based on change, we assume all new features include more risk than old features. Within all new features, we set second filter, based on interaction, we assume the more components a feature interacts, the more risks it includes. Within that feature list, we set our third filter, massive impact, we assume the more user a feature impact, the more risks it includes.

With that list, we label 3 priorities:
priority C: all new features
priority B: all new features that interact with more than 1 components or environment or platform
priority A: all priority B features that might impact massive users

For priority A features, we consider not only functions, but througput - many users access this feature concurrently, volume - many user logs/data generated, load - long time high usage ,etc.

For priority B features, we consider failure mode - access component A from B while A is stopped, error handling - shutdown database while commit, and recovery - system reboot when disk full, because we know normal reboot is covered somewhere else.

For priority C, well, we didn't actually get to priority C because priority A&B cost all of our resource in the past 3 weeks. But we consider it as a good thing, we do find new ways to find new problems, we do testing from different perspectives, and most importantly, we do spend most of our resource on the highest risks.

For example, one of our feature might increase our online user accounts, double the current number or tripple it. So we re-configure our performance test, raise the concurrent session, re-run the performance case, and our system crash, no new session are accepted. I open a severity 2 bug for it. This is just one of the interesting bug that we find.

I got to say I like this method. But still, there is 1 problem. While reviewing our testing results, we find that there are several components are not covered at all, because of the priorties. So next time, I might combine all these 3 methods together, just to make sure that everything is covered, even covered once is better, right?!

2010年12月22日星期三

code coverage - what's to cover

Attend a software testing conference somewhere, anywhere actually, you always hear the magic word "code coverage". 
"Our automation can cover 90% of the code", some one says. 
"Our code coverage rises from 20% to 50% this year", another one says.

When you get back, you couldn't wait to start your own code coverage detection, deploy some code coverage detector, execute your automation, collect the  data. Then, Bang! The number shoot you down. It probably be somewhere between 10% ~ 40% if you have never done this before. Oh, damn! You start hard working, read the code, talk to the developers, write new tests, do whatever you can to raise the little line chart up.

But what do you get when code coverage is rising? Better code? Ah... No. Unit testing provides better code. Because unit testing requires refactor to eliminate global variables, static methods, etc, just to provide testability on code level. Therefore code gets better.

Unfortunately, most of us are not using unit testing to drive automation. We drive automation through UI or API, using TCL/Expect to call command line interfaces, using Selenium to call web interfaces, etc. When the code doesn't provide much testability, and the code not covered in specification usually with less testability, we come up with very fragile ways like record&replay to automate those code, just to raise code coverage. Eventually, we end up with a lot of hard to maintain automation scripts, and the code still sucks.

Do you make customer happy when code coverage is rising? Ah... No. XP or TDD makes customer happy because XPers write Automated Acceptance Test before they write code. They define what the code should do first. If the code fails to match the acceptance test, it becomes a bug needs to be fixed.

Unfortunately, most of us are working in a traditional project. We write tests following function specification to describe what the code should do. When these tests do not cover 100% of the code, we rush out to read the code, follow the code to write our tests. That is white box testing. The problem of white box testing is that it tests what the code is doing, not what the code should do. And that seldom makes customer happy.

Now let's sum it up to see how we could do it right, like a tester:

* high code coverage on unit test level is good, for both code quality and testability

Just remember to refactor your tests while refactoring your code. (XUnit Patterns is a great book to start with)

* if you are trying to raise code coverage with component/integration level automation
** for both driving the interface and checking the results, evaluate testability first
** ask for developers' help

For example, component A only accepts authenticated and valid requests. All other requests including too long variable requests, invalid variable requests, unauthenticated requests, Component A just close the connection and returns nothing and logs nothing. This is a typical design that lacks in testability.

When testing component A, we send several invalid requests, and checks that A is doing nothing. The problem is that when component A is down, our results also passes. That's a false positive result. To prevent that, we need to verify that Component A is working before sending every invalid requests. That causes longer automation execute time and more scripts. All of these can be avoided if Component A provides specific responses to different requests. And for that, we need developers' help.

* when you have the driving and checking interfaces, write tests to describe what the code should do

The reason that I like XP/TDD is that XP-ers write tests before they write code. Then they write code to pass the test. Without this kind of checking, developer always implement what is easier to be done, not what should be done. This bad habit almost becomes a practice, "there's no design that can be completely implmented", developer says. Let me re-phrase and put it this way. "there is no design that can be completely implemented without testing". 

Before following the code to write tests, understand why the system need this piece of code, write up your tests to describe how it should behave.

* A test should always describe what the code should do, not what the code is doing, therefore it is the code that need to cover the test, not the test cover the code.

2010年12月6日星期一

applying risk analysis on testing planning

Give a tester 3 features A, B and C, while feature A with 100 test cases, B with 50, C with 15. Ask him to schedule a testing cycle for all 3 features. Let's assume each test case take the same time to finish. Here's the schedule you'd probably get: 100 test cases -> 10 days, 50 test cases -> 5 days, 15 test cases -> 1.5 days.

What if C is a feature heavily used on production environment, even a single failure would empact thousands of users and cause millions of dollars finance loss? Does the tester get this information when he schedules testing? Does he even try to?

When we treat all features the same way, we're negelacting 1 thing - "risk". The simple definiton of risk is that in a way a program can fail, how possibile the program may fail in this way, and how serious the empact will be when the program fails in this way. As Rex Black called in his book, technical risks - how possible the program may fail in this way, and business risk - how serious the empact will be. Analysis risks of features while planning a testing cycle, because a feature with high technical risk and high business risk should occupy more testing resource than other features.

Now how do we label the level of risks for a feature? To a new start up project, this might need to involve developers, testers and sales. Usually the more complex the code structure is, the more complex the production environment is, the higher level technical risk is on. Sales or marketing fellows could help on business risks, because business risk is sometimes related to the numbers of stakeholders, sometimes to the positions of the stakeholders.

To a mature project that has been up for years, like mine. It's quite straight-forward to define risk levels for each feature.

* Line of Code - count code lines for each feature, more code, higher level of technical risk
* Bug Density - count bug numbers for each feature ,more bugs, higher level of technical risk
* Support Case - count support cases for each feature, more cases, higher level of business risk
* User Behavior - count user log for each feature, more usage, higher level of business risk

In my project, our bug density report shows that :
* features that need to interact with environment usually generate more bugs
* features that need to interact with other components usually generate more bugs

In Rex Black's book, he mentioned that after define the levels of technical risk and business risk for a feature, testers can then define the testing effort for each feature, whether do deep and broad testing, or broad testing, or trial, etc.

But when we get solid data including LC, BD, SC, UB, we can do more than that. With technical risk list of features, we can define our next step of automation, which feature should have more automation tests, which feature already had enough. With business risk list of features, we can define our next step of development, at least we know which bug is affecting more users, therefore the bug should obtain higher priority in the to-be-fixed list.

With these lists, we can stop working on priorities based on our hunches. I am tired of reading emails with , "this is our priroty, let's do it." or "this one looks urgent, do it first". The problem about hunches is that people usually choose the easier ones, intentionally or not. At least, I know I would. :)

2010年10月20日星期三

On Mentoring Software Testing Interns - Clear Assignments

Being software tester for 5 years, I have brought up half a dozen interns. All of them are good kids - smart, passionate for new technique, with not much experience in a real project, and sometimes careless. But the later ones are more productive. The interns join us this year can deliver a tool in a week or two, while it takes the last year's interns almost a month. There are several factors that make this difference, factors that change cross years. And it seems they are all about me.

The first thing makes this difference is about how I assign a task. Take my latest assignment as an example. In my team, we have a set of automated tests currently need an enhancement. I want to assign this to an intern who just get on board for a month.

In the old days, I would book a meeting room for a whole afternoon, introducing and demoing our framework to her. Since our framework is based on an internal automation system, I have to introduce that too. The system is written in TCL, include that too. Then she finds out that our test cases are not in TCL language, oh yeah, we are adapting data driven, then we have to include that concept. It ends up by I talking too much, she writting down too much.

When she gets back, she feels a little bit un-certain about this framework or data driven, so she starts googling and gets lost in the exploding information. A week later, I check on her status. "I'm still having a little trouble with the framework, I see several other framework online, better than ours, easier to understand. Do we have plans to migrate?"

The problem about the old way is huge gap between concept and experience, the gap between knowing and doing. When you are handling a tremendous of information, hundreds of keywords and concepts. Sometimes you just get lost.

What do I do now? I split this assignment in 4 - 5 days.

The first day's like this:

info: internal automation system help page

todo: write a test case in it, execute it, make it pass and fail once

The second day:

info: our product document

todo: make it work to add/delete

The third day

todo: write a case in automation system to call our product to add/delete

The forth day

todo: abstract out of scripts, make it a data copy

The fifth day

todo: the enhancement

Within this way, each assignment only takes me 10 minutes to introduce, and 10 minutes to check up. I divide the tasks into several domains where each one is either separate or based on precious one. Then define a clear goal for each domain. A goal like "to get familiar with this automation system" is not clear. She may wander around in the help pages, just try to understand the concept, what schedule is, tasks, test cases, results, log levels, etc. On the other hand, "write a test case, execute it, pass it then fail it" is better. She knows what to look for, and what to accomplish. After doing so, after seeing this thing works, she could just figure it out, this is a job contains test cases, a test case contains several check points.

With this solid connection between information and experience, she learn much faster by herself, and adapt faster in her next assignment.

2010年10月11日星期一

Testing dojo session 8 - confront with the right tone

Imagine you see another tester testing a text field. He inputs valid&invalid values, special chars, then moves on to other elements. Wait, you think, he missed empty value and the max length of the field. You want to tell him. What will you start with?

"Hey, you missed empty value, and the max length". No, that's weak

"You made a mistake." Better. But not serious enough.

Eventually you come up with this one. "You are a bad tester. I can't believe you are testing so carelessly. Do you understand how important your job is? You are the last defense of quality! If you miss 1 case, it will affect the whole product, and of course customer satisfaction." Here you go.

You start by explaining how important this product is, how important his role is, how a simple mistake would affect the whole team, blah, blah, blah. Because you want your voice sounds serious, authoritative, being heard once and remembered forever. But let's be real, how many times did it work?

Imagine you are testing a product on a tight schedule, hard work for several weeks. Right at this freaking busy and tired moment, someone stops by and says. "You are a bad tester." What's gonna happen next? Of course you start to defend whatever you are doing.

"Oh, empty value? I covered it in a precious text field. I'm on a tight schedule, couldn't cover every type of variable in every text field"

or "I'm using pairwise."

or "Of course we can cover all types for all elements, as soon as our automation is ready."

You might not cover this "empty value" at all, but you would still say so. Because you work hard, you can't just let him judge your job like this. How dare he calling you a bad tester?

I find myself a hard time confronting people in both situations. I seldom succeed, even that I do, it costs too much time and energy, and I piss off someone.

I look for answers from others, and find them divide into 2 groups. Group A loves to confront, it makes them feel more powerful and authoritative, they would do as much as they could; while group B feel it is boring and outputs nothing, they simply shut up. Neither group solves the problem.

Then we started testing dojo. Confronting others becomes a must for every session. The performer is always challenged. It offers me a close look to see how people confront each other, how they tell others their opinions, and how they receive them. I find that the more emotional, personal, judgmental an opinion sounds, the harder the defend will be. And even though everyone knows that we should take nothing personal, it is quite easy to make a conversation personal. See if you say or hear these a lot:

"You are making a mistake."

"Clearly you didn't understand how important this is to us, if you do, you wouldn't do such a bad job."

"You should improve your testing skills."

"Why you didn't test that?"

"How could you miss such a simple case?"

In dojo, we challenge to help people understand what they could do better. But "you made a mistake" doesn't give them a clue. They still don't know how to make things better. On the other hand, a precise challenge, like "we still don't know the boundary of this variable, let's find out", is quite helpful. They are easier to be received. People who get these would not feel offended, they would consider right away whether it makes sense, and give a try.

After realizing this, I start to pay attention to the language I use in dojo. When I'm an auditor, I say, "maybe we could try with empty value, see how the system handles it?". Less "you should...", "I would...", more "what if we..", "can we...".

And when I'm an performer, and getting judgmental and blaming challenges, I would reply with, "Tell me what do we do next?" or "what exactly to improve here?"

Even though I'm trying hard, it is still a little hard for those who attend dojo for the first time. 2 weeks earlier, we organized a testing dojo with developers, trying to improve their testing skills. 3 developers were involved, 2 of them refused to take the performer role, the last one took the job unwillingly.

The performer was quite defensive from the beginning, he asked a lot questions before started testing. He said, "if I don't get it clearly, you will blame me for not doing it later." No matter how many times I told him I wasn't there for judging nor blaming, but helping, he still showed this "I'm angry, be nice" face all the time.

In the book "the fifth Discipline", the author suggests an exercise to master your confronting skills. Draw a vertical line in a white paper to record a conversation. In the left section, write down what kind of tone you need; in the right section, write down the real conversation. See if you are who you think you are. I tried once in dojo, and I was surprised I talked so much just to convince someone was wrong.

2010年10月7日星期四

Ruby Quiz - LCD Numbers - have you done so in code review?

After I posted my answer to the Ruby Quiz LCD Number, I asked my developer friends to see if there was anything to improve in my code. Yes, I was asking for a code review. For 2 weeks, I haven't got any comments.

I guess the major reason is that they are not familiar with Ruby, and a minor reason, they have never been critical to code as much as I am. For them, the only standard of good code is to pass the compiler and get built up. They seldom ask questions like: why you name this variable like this? or I don't understand this part, can u make it simpler?

On the other hand, as a automation tester, I have to work with testers who do not code much. I do not need 10 depth loop, or a super complex design pattern. For every single line, I just need it to be simple, clean, clear, and easy to understand. That's why I start coding, and re-factoring.

Since I haven't got any comments, I decide to try a code review myself. According to the best practice of writing, put your work away 6 weeks after you've done it, then pick it up as a stranger, you'll find as many problems as a stranger could. I guess 2 weeks should work for this 200 lines of code.

So here it goes:

the first question I ask : does init have to do so many things?

def initialize(number, size)
   @number = number
   @size = size
   @width = @size + 2
   @heighth = @size*2 + 3
   init_picture
   init_points
   init_sticks
   end

It would better to make init simple. So I move init_picture, init_sticks, and init_points to method paint_sticks, which makes paint_sticks like this:

def paint_sticks()
   init_picture
   init_sticks
   init_points
   if [0,2,3,5,6,7,8,9].include? @number
     @top_stick.paint_in_picture(@picture)
   end
   if [2,3,4,5,6,8,9].include? @number
     @middle_stick.paint_in_picture(@picture)
   end
   if [0,2,3,5,6,8,9].include? @number
     @bottom_stick.paint_in_picture(@picture)
   end
   if [0,4,5,6,8,9].include? @number
     @upper_left_stick.paint_in_picture(@picture)
   end
   if [1,2,3,4,7,8,9,0].include? @number
     @upper_right_stick.paint_in_picture(@picture)
   end
   if [2,6,8,0].include? @number
     @lower_left_stick.paint_in_picture(@picture)
   end
   if [1,3,4,5,6,7,8,9,0].include? @number
     @lower_right_stick.paint_in_picture @picture
   end
 end

then I ask: can I switch the sequence of these 3 function calls

def paint_sticks
   init_picture
   init_points
   init_sticks

It looks like I can. But I can't. Because init_sticks depends on init_points. So I move init_points into method init_sticks. Leaving init_picture and init_sticks there, of course they can be switched. This makes paint_sticks become: init_picture, init_sticks, following by some messy code. A better way to express it is :

def paint
   init_picture
   init_sticks
   paint_sticks_in_picture
 end

Then I ask: what if I do a=NumberPicture.new;a.paint();a.paint();a.paint()? All calculation and drawing are done in the first paint, once pained, nothing could be changed. The following paint call doesn't have to do anything. So I update paint like this:

def paint
   if !@painted
     init_picture
     init_sticks
     paint_sticks_in_picture
     painted = true
   end
 end

Here comes an interesting finding. For the method paint, even if you call it once in unit testing, you will get 100% code coverage. But you are not done, you need to call it again to test "else do_nothing" part. The challenging part is how to test this part. Well, I don't know. Please leave a comments if you do.

after review, the code looks like this:

require 'pp'
 require 'point'
 require 'stick'

class NumberPicture
   EMPTY = " "
   attr_reader :width, :heighth, :size, :painted
   attr_accessor :picture
 
   def initialize(number, size)
     if size.is_a?(Integer) and size>0 
       @size = size
     else
       raise "size #{size} must be integer and >0"
     end
     @number = number
     @width = @size + 2
     @heighth = @size*2 + 3
     @painted = false
   end

def paint()
     if !@painted
       # TODO: how to unit test this?
       init_picture
       init_sticks
       paint_sticks_in_picture
       @painted = true
     end
   end
 
   private
 
   def init_picture()
     @picture = Array.new()
     (0..@heighth-1).each do |i|
       @picture[i] = Array.new(@width-1, 0)
       (0..@width-1).each do |j|
         @picture[i][j] = EMPTY
       end
     end
   end
 
   def init_points()
     @top_left = Point.new(0, 0)
     @top_right = Point.new(0, @width-1)
     @middle_left = Point.new(@size+1, 0)
     @middle_right = Point.new(@size+1, @width-1)
     @bottom_left = Point.new(@heighth-1, 0)
   end
 
   def init_sticks()
     init_points
     @top_stick = HorizontalStick.new(@top_left, @size)
     @middle_stick = HorizontalStick.new(@middle_left, @size)
     @bottom_stick = HorizontalStick.new(@bottom_left, @size)
     @upper_left_stick = VerticalStick.new(@top_left, @size)
     @upper_right_stick = VerticalStick.new(@top_right, @size)
     @lower_left_stick = VerticalStick.new(@middle_left, @size)
     @lower_right_stick = VerticalStick.new(@middle_right, @size)
   end
 
   def paint_sticks_in_picture
     # TODO: eliminate process redundant
     if [0,2,3,5,6,7,8,9].include? @number
       @top_stick.paint_in_picture(@picture)
     end
     if [2,3,4,5,6,8,9].include? @number
       @middle_stick.paint_in_picture(@picture)
     end
     if [0,2,3,5,6,8,9].include? @number
       @bottom_stick.paint_in_picture(@picture)
     end
     if [0,4,5,6,8,9].include? @number
       @upper_left_stick.paint_in_picture(@picture)
     end
     if [1,2,3,4,7,8,9,0].include? @number
       @upper_right_stick.paint_in_picture(@picture)
     end
     if [2,6,8,0].include? @number
       @lower_left_stick.paint_in_picture(@picture)
     end
     if [1,3,4,5,6,7,8,9,0].include? @number
       @lower_right_stick.paint_in_picture @picture
     end
   end
 end

As a summary of this post, I ask questions including:

does init have to do so many things?
these function calls does not show clear dependency, can I change the sequence?
what if I call this method again?
this method looks like do_A, do_B, then messy, can I make it, do_A, do_B, do_with_A&B?
how could this part be unit tested?

And the code does look better than I find it. :-)

订阅：博文 (Atom)