Generative AI & Artificial General Intelligence (AGI)

Wednesday, February 17, 2010

Regression Testing...

What is Regression Testing?

If a piece of Software is modified for any reason testing needs to be done to ensure that it works as specified and that it has not negatively impacted any functionality that it offered previously. This is known as Regression Testing.

Regression Testing attempts to verify:

1. That the application works as specified even after the changes/additions/modification were made to it

2. The original functionality continues to work as specified even after changes/additions/modification to the software application

3. The changes/additions/modification to the software application have not introduced any new bugs

When is Regression Testing necessary?

Regression Testing plays an important role in any Scenario where a change has been made to a previously tested software code. Regression Testing is hence an important aspect in various Software Methodologies where software changes enhancements occur frequently.

Any Software Development Project is invariably faced with requests for changing Design, code, features or all of them.

Some Development Methodologies embrace change.

For example ‘Extreme Programming’ Methodology advocates applying small incremental changes to the system based on the end user feedback.

Each change implies more Regression Testing needs to be done to ensure that the System meets the Project Goals.

Why is Regression Testing important?

Any Software change can cause existing functionality to break.

Changes to a Software component could impact dependent Components.

It is commonly observed that a Software fix could cause other bugs.

All this affects the quality and reliability of the system. Hence Regression Testing, since it aims to verify all this, is very important.

Making Regression Testing Cost Effective:

Every time a change occurs one or more of the following scenarios may occur:

- More Functionality may be added to the system

- More complexity may be added to the system

- New bugs may be introduced

- New vulnerabilities may be introduced in the system

- System may tend to become more and more fragile with each change

After the change the new functionality may have to be tested along with all the original functionality.

With each change Regression Testing could become more and more costly.

To make the Regression Testing Cost Effective and yet ensure good coverage one or more of the following techniques may be applied:

- Test Automation: If the Test cases are automated the test cases may be executed using scripts after each change is introduced in the system. The execution of test cases in this way helps eliminate oversight, human errors,. It may also result in faster and cheaper execution of Test cases. However there is cost involved in building the scripts.

- Selective Testing: Some Teams choose execute the test cases selectively. They do not execute all the Test Cases during the Regression Testing. They test only what they decide is relevant. This helps reduce the Testing Time and Effort.

Regression Testing – What to Test?

Since Regression Testing tends to verify the software application after a change has been made everything that may be impacted by the change should be tested during Regression Testing. Generally the following areas are covered during Regression Testing:

- Any functionality that was addressed by the change

- Original Functionality of the system

- Performance of the System after the change was introduced

Regression Testing – How to Test?

Like any other Testing Regression Testing Needs proper planning.

For an Effective Regression Testing to be done the following ingredients are necessary:

- Create a Regression Test Plan: Test Plan identified Focus Areas, Strategy, Test Entry and Exit Criteria. It can also outline Testing Prerequisites, Responsibilities, etc.

- Create Test Cases: Test Cases that cover all the necessary areas are important. They describe what to Test, Steps needed to test, Inputs and Expected Outputs. Test Cases used for Regression Testing should specifically cover the functionality addressed by the change and all components affected by the change. The Regression Test case may also include the testing of the performance of the components and the application after the change(s) were done.

- Defect Tracking: As in all other Testing Levels and Types It is important Defects are tracked systematically, otherwise it undermines the Testing Effort.

Tuesday, February 16, 2010

Correlation & Regression

Ref: Nayyar Raza Kazmi...

Correlation

1. Shows strength of association between two variables
2. Tells us how much the two variables are associated with each other
3. However, does not assume CAUSATION
4. Simply tells us whether the two variables are positively or negatively correlated

Regression

If there is a strong correlation between two variables, Regr

ession is used to determine Y (dependent variable) from the X's (Independent variables on which Y depends)

Types of regressions: Linear, Multiple...

You use Coefficient of Correlation (r) as to measure the strength of relationships between two variables

r is also called Pearson's Coefficient.

r can range from -1 to +1

Values of -1 or

+1 indicate strong correlation (-1 perfect negative relationship (inverse relationship) and +1 perfect positive correlation)

Values close to 0.0 indicate weak correlation

Sunday, February 14, 2010

The third variable problem

In statistics, two variables may be correlated (simple correlation or causal relationship), or not related. When value of one variable (dependent variable) changes according to change in the other variable (independent variable), there is likelihood of a correlation. The degree of correlation may be positive or negative. When one variable’s increase causes a concurrent increase in the other variable, then both variables are positively related (e.g. number of hours of study leading to scoring greater in exams). However if one variable’s increase causes the other to fall, both are negatively related (resistance to disease with advancing age).

Two “sympathetic” variables may have mere correlation or they may have a causal relationship. Consider the case of inflation and unemployment. It is generally agreed that when inflation is more, unemployment tends to be more too and when inflation is low, unemployment also tends to be low. This tells us that one causes the other. When one variable causes the other one to vary, the relationship is causal.

Now consider another example. In a country it is found that when ice cream consumption goes up, the number of drownings go up too. Here we cannot assume that ice cream consumption leads to more number of drownings. Actually in this case an unintentional third variable causes a “random and coincidental” relationship between the two variables. Such examples of “unintentional third variable causing a “random and coincidental” relationship between the two variables is called “the third variable problem”.

Saturday, February 13, 2010

CMMI High Maturity

High Maturity in CMMI for Development - Part 1

Reference: http://www.connect2hcb.com/tiki-index.php

The high maturity concept in CMMI is centered around four Process Areas (a Process Area is set of practices aligned together for achieving a defined purpose):

OPP (Organizational Process Performance)
QPM (Quantitative Project Management)
OID (Organizational Innovation and Deployment)
CAR (Causal Analysis and Resolution)

OPP is about establishing organization-level process performance baselines and predictive/forecasting models

QPM is about planning and managing a project in quantitative terms

OID is about planning and implemeting organization-wide improvement initiatives (in quantitative terms)

CAR is about systematically identifying and addressing root causes behind the problems encountered in performing a process

High Maturity in CMMI for Development - Part 2

Here are links to some excellent articles and reference material on CMMI High Maturity from the SEI website:

Note: Links no longer work. Try search using document names…

High Maturity in CMMI for Development - Part 3

For organizations that intend to achieve CMMI High Maturity, they should be able to demonstrate the following key practices:

- Declaration of process performance objectives in quantitative terms. The objectives must be set-up for at least the "critical" processes and sub-processes. The method to do this effectively is to ascertain or determine the organizations' business goals, identify the processes and sub-processes critical to business success (in terms of achievement of business goals), define parameters and metrics for measuring the process performance and finally assign performance targets or goals for the measurement parameters.

- Clear quantitative understanding of the performance of processes and sub-processes selected as "critical" in terms of their central tendency (mean, etc.) and dispersion (standard deviation, etc.). It may be a good idea to also gain an understanding of both the extent of how symmetric the distribution is (skewness) and also how peaked the distribution is (kurtosis), at least for the "super-critical" processes and sub-processes. This statistical characterization represents the "Process Performance Baselines".

- Process composition based on quantitative understanding of the performance of the available processes and sub-processes. Selection of processes and sub-processes for a specific project should be based on comparing theperformance objectives that must be achieved against what will, with a good likelihood, be achieved by following a selected set of processes and sub-processes

- Usage of prediction models (or "Process Performance Models"), in the form Y = f(x), for projecting or predicting the likely outcome that will be achieved by actually performing a process. The predictedperformance of the selected parameter (dependent Y factor) will be based on actual values of some influencing parameters (independent X factors). When actual data on the X factors becomes available as the project moves forward, the projected or the predicted value of the Y factor should be progressively calibrated and refined.

- Usage of control chart (SPC theory) for monitoring and controlling the process performance on a real-time basis

- Usage of statistical tools such as Hypothesis Testing (t-test, etc.) for making decisions based on solid quantitative understanding of variation

- Planning and execution of improvement projects using rigorous statistical tools and techniques. This can be done using a 4-step approach as listed below:

Establish in quantitative terms the current level of performance
Determine the statistical significance of the difference between the current and the desired level of performance. If the difference is significant, statistically speaking, plan and implement improvement actions; otherwise, no action is required.
In case improvement actions are taken, establish in quantitative terms the achieved level of performance
Determine the statistical significance of the difference between the achieved and the desired level of performance. If the difference is significant, statistically speaking, plan and implement further improvement actions; otherwise, stop.

High Maturity in CMMI for Development - Part 4

One of the most difficult (supposedly so) concept in CMMI High Maturity is the development of "valid" and "usable" statistical models for predicting and hence quantitatively managing the outcome of a process - commonly known as Prediction or Predictive Models.

The outcome of a process can be measured in terms of certain variables of interest. For example, the quality of work performed can be measured in terms of the metrics ‘rework effort as a %age of total effort spent on performing the work’ – this metrics can hence be taken as a variable of interest. Knowing the “future” value of this variable right at the beginning of the work can be effectively used for quantitative management (like in this case it could be proactive defect prevention).

Computing the metrics at the end of the work getting completed provides lagging indicators of the variables of interest and as such leaves limited or no scope for effective proactive actions. The ability to predict the “future” value of thevariables of interest gives the power to quantitatively manage the process towards predefined desired goals.

Prediction Model, as the name also suggests, is meant to be used before the associated process has been performed to a significant degree of completion. At the beginning, the variable of interest has an unknown value and hence prediction makes business sense. After the process reaches a significant degree of completion, it may not be cost-effective to do predictions and hence no prediction should be carried out.

Hence, a prediction model should operate in two stages:

Stage 1: Right at the starting point, it should provide an initial predicted value

Stage 2: At intermediate points, it should provide progressively refined predicted value as actual data on certain variables becomes known. The refinement should also consider any non-random events that may occur so that the prediction can be relied upon.

High Maturity in CMMI for Development - Part 5

Statistical models based on multiple (linear) regression can be easily built to serve as prediction models as required by CMMI High Maturity. A prediction model in this case is, simply speaking, nothing but a multiple (linear) regression equation.

A regression equation takes the form Y = f(x1, x2, …, xn), where,

* Y = dependant variable (representing the variable of interest that has to be predicted)

* x1, x2, …, xn = set of independent variables (representing the variables whose value is known and is fed into the model to obtain the predicted value of Y)

The steps to build a prediction model are as follows – the details of the statistical methods to be used for these steps are easily available and hence have not been duplicated:

* Check the data for basic sanity – erroneous data, missing data, etc.

* Determine the outlier values in the data and treat them in an appropriate manner. Outlier values are typically wayward data and may be retained or removed based on certain considerations

* Test the variables Y, x1, x2, …, xn for Normality. Most of the statistical results used commonly assume the underlying distribution to be a Normal Distribution. In case the distribution is Non-normal appropriate transformation is required to be done

* Understand the statistical behavior of the variables. This can be done by computing the measures of central tendency (typically mean) and measures of dispersion (typically standard deviation). A good starting point for this is to draw a histogram which will additionally give insight into the shape of the distribution

* Build the regression equation in the form Y = a + b1x1 + b2x2 + … + bnxn

* Analyze the statistical results (p-value, etc.) for the regression equation for the significance of the overall regression fit (test for R-square) and significance of the regression coefficients (test b1, b2, b3)

* Validate the regression equation for its prediction power. The preferred method is to use the regression equation on a different set of values than the ones that were used to build the model

If both the overall regression fit and the regression coefficients are significant the regression equation can be confidently used as a Prediction Model.

If that is not the case, however, alternative approaches need to be considered and adopted to develop the Prediction Model. Keep watching this space for future posts related to discussion on the alternative approaches that can be considered.

Wednesday, February 10, 2010

ISO 12207

ISO 12207 is an ISO standard for software lifecycle processes. It aims to be 'the' standard that defines all the tasks required for developing and maintaining software.

The ISO 12207 standard establishes a process of lifecycle for software, including processes and activities applied during the acquisition and configuration of the services of the system. Each Process has a set of outcomes associated with it. There are 23 Processes, 95 Activities, 325 Tasks and 224 Outcomes (the new "ISO/IEC 12207:2008 Systems and software engineering -- Software life cycle processes" defines 43 system and software processes).

The standard has the main objective of supplying a common structure so that the buyers, suppliers, developers, maintainers, operators, managers and technicians involved with the software development use a common language. This common language is established in the form of well defined processes. The structure of the standard was intended to be conceived in a flexible, modular way so as to be adaptable to the necessities of whoever uses it. The standard is based on two basic principles: modularity and responsibility. Modularity means processes with minimum coupling and maximum cohesion. Responsibility means to establish a responsibility for each process, facilitating the application of the standard in projects where many people can be legally involved.

The set of processes, activities and tasks can be adapted according to the software project. These processes are classified in three types: basic, for support and organizational. The support and organizational processes must exist independently of the organization and the project being executed. The basic processes are instantiated according to the situation. [Source: Wikipedia]