Generative AI & Artificial General Intelligence (AGI)

Wednesday, December 12, 2007

Process Capability Baseline...

PCB specifies, based on data of past projects, what the performance of a process is. That is, what a project can expect by following a process. The performance factors of a process are primarily those that relate to quality and productivity. PCBs define - Productivity, Quality, Effort Distribution, Defect Distribution, Defect Injection Rate, CoQ, etc.

Using the capability baseline, a project can predict at a gross level the effort that will be needed for various stages, the defects likely to be observed during various development activities, and quality and productivity of the project.

Tuesday, December 11, 2007

Process Assets & Process Database...

Process Assets: Process Assets form the repository to facilitate dissemination of engagement learnings across an organization. A process asset could be any information from an engagement, which can be re-used by future engagements. Typically these include project plans, CM plans, requirements docs, design docs, test plans, standards, checklists, CAR reports, utilities, etc.

Process Database: Process Database is a s/w engineering database to study the processes in an organization with respect to productivity and quality. Its intents are:

a. To aid estimation of efforts and defects.
b. To get the productivity and quality data on different types of projects.
c. To aid in creating process capability baselines (PCBs).

Friday, November 30, 2007

NFT - Non functional testing...

Courtesy: Ahamed. http://softwaretesting.blogspot.com

Smoke Test

Smoke testing Refers to the first test made after repairs or first assembly to provide some assurance that the system under test will not catastrophically fail. After a smoke test proves that the pipes will not leak, the keys seal properly, the circuit will not burn, or the software will not crash outright, the assembly is ready for more stressful testing.

· In plumbing, a smoke test forces actual smoke through newly plumbed pipes to find leaks, before water is allowed to flow through the pipes.

In computer programming and software testing, smoke testing is a preliminary to further testing, which should reveal simple failures severe enough to reject a prospective software release. In this case, the smoke is metaphorical.

Smoke test covers all the functionality in less time to ensure the application works fine. These are a narrow set of tests which determine if more extensive testing of the product is warranted. For example, in an OOP framework, one might instantiate an object from each class in the End User API and arbitrarily invoke a single method from each.

Daily Build and Smoke Test

If you want to create a simple computer program consisting of only one file, you merely need to compile and link that one file. On a typical team project involving dozens, hundreds, or even thousands of files, however, the process of creating an executable program becomes more complicated and time consuming. You must "build" the program from its various components.
A common practice at Microsoft and some other shrink-wrap software companies is the "daily build and smoke test" process. Every file is compiled, linked, and combined into an executable program every day, and the program is then put through a "smoke test," a relatively simple check to see whether the product "smokes" when it runs.

BENEFITS. This simple process produces several significant benefits.
It minimizes integration risk. One of the greatest risks that a team project faces is that, when the different team members combine or "integrate" the code they have been working on separately, the resulting composite code does not work well. Depending on how late in the project the incompatibility is discovered, debugging might take longer than it would have if integration had occurred earlier, program interfaces might have to be changed, or major parts of the system might have to be redesigned and reimplemented. In extreme cases, integration errors have caused projects to be cancelled. The daily build and smoke test process keeps integration errors small and manageable, and it prevents runaway integration problems.
It reduces the risk of low quality. Related to the risk of unsuccessful or problematic integration is the risk of low quality. By minimally smoke-testing all the code daily, quality problems are prevented from taking control of the project. You bring the system to a known, good state, and then you keep it there. You simply don't allow it to deteriorate to the point where time-consuming quality problems can occur.

It supports easier defect diagnosis. When the product is built and tested every day, it's easy to pinpoint why the product is broken on any given day. If the product worked on Day 17 and is broken on Day 18, something that happened between the two builds broke the product.
It improves morale. Seeing a product work provides an incredible boost to morale. It almost doesn't matter what the product does. Developers can be excited just to see it display a rectangle! With daily builds, a bit more of the product works every day, and that keeps morale high.

USING THE DAILY BUILD AND SMOKE TEST. The idea behind this process is simply to build the product and test it every day. Here are some of the ins and outs of this simple idea.
Build daily. The most fundamental part of the daily build is the "daily" part. As Jim McCarthy says (Dynamics of Software Development, Microsoft Press, 1995), treat the daily build as the heartbeat of the project. If there's no heartbeat, the project is dead. A little less metaphorically, Michael Cusumano and Richard W. Selby describe the daily build as the sync pulse of a project (Microsoft Secrets, The Free Press, 1995). Different developers' code is allowed to get a little out of sync between these pulses, but every time there's a sync pulse, the code has to come back into alignment. When you insist on keeping the pulses close together, you prevent developers from getting out of sync entirely.

Some organizations build every week, rather than every day. The problem with this is that if the build is broken one week, you might go for several weeks before the next good build. When that happens, you lose virtually all of the benefit of frequent builds.
Check for broken builds. For the daily-build process to work, the software that's built has to work. If the software isn't usable, the build is considered to be broken and fixing it becomes top priority.

Each project sets its own standard for what constitutes "breaking the build." The standard needs to set a quality level that's strict enough to keep showstopper defects out but lenient enough to dis-regard trivial defects, an undue attention to which could paralyze progress.

At a minimum, a "good" build should

· compile all files, libraries, and other components successfully;
· link all files, libraries, and other components successfully;
· not contain any showstopper bugs that prevent the program from being launched or that make it hazardous to operate; and
· pass the smoke test.

Smoke test daily. The smoke test should exercise the entire system from end to end. It does not have to be exhaustive, but it should be capable of exposing major problems. The smoke test should be thorough enough that if the build passes, you can assume that it is stable enough to be tested more thoroughly.

The daily build has little value without the smoke test. The smoke test is the sentry that guards against deteriorating product quality and creeping integration problems. Without it, the daily build becomes just a time-wasting exercise in ensuring that you have a clean compile every day.
The smoke test must evolve as the system evolves. At first, the smoke test will probably test something simple, such as whether the system can say, "Hello, World." As the system develops, the smoke test will become more thorough. The first test might take a matter of seconds to run; as the system grows, the smoke test can grow to 30 minutes, an hour, or more.
Establish a build group. On most projects, tending the daily build and keeping the smoke test up to date becomes a big enough task to be an explicit part of someone's job. On large projects, it can become a full-time job for more than one person. On Windows NT 3.0, for example, there were four full-time people in the build group (Pascal Zachary, Showstopper!, The Free Press, 1994).

Add revisions to the build only when it makes sense to do so. Individual developers usually don't write code quickly enough to add meaningful increments to the system on a daily basis. They should work on a chunk of code and then integrate it when they have a collection of code in a consistent state-usually once every few days.

Create a penalty for breaking the build. Most groups that use daily builds create a penalty for breaking the build. Make it clear from the beginning that keeping the build healthy is the project's top priority. A broken build should be the exception, not the rule. Insist that developers who have broken the build stop all other work until they've fixed it. If the build is broken too often, it's hard to take seriously the job of not breaking the build.
A light-hearted penalty can help to emphasize this priority. Some groups give out lollipops to each "sucker" who breaks the build. This developer then has to tape the sucker to his office door until he fixes the problem. Other groups have guilty developers wear goat horns or contribute $5 to a morale fund.

Some projects establish a penalty with more bite. Microsoft developers on high-profile projects such as Windows NT, Windows 95, and Excel have taken to wearing beepers in the late stages of their projects. If they break the build, they get called in to fix it even if their defect is discovered at 3 a.m.

Build and smoke even under pressure. When schedule pressure becomes intense, the work required to maintain the daily build can seem like extravagant overhead. The opposite is true. Under stress, developers lose some of their discipline. They feel pressure to take design and implementation shortcuts that they would not take under less stressful circumstances. They review and unit-test their own code less carefully than usual. The code tends toward a state of entropy more quickly than it does during less stressful times.

Against this backdrop, daily builds enforce discipline and keep pressure-cooker projects on track. The code still tends toward a state of entropy, but the build process brings that tendency to heel every day.

Who can benefit from this process? Some developers protest that it is impractical to build every day because their projects are too large. But what was perhaps the most complex software project in recent history used daily builds successfully. By the time it was released, Microsoft Windows NT 3.0 consisted of 5.6 million lines of code spread across 40,000 source files. A complete build took as many as 19 hours on several machines, but the NT development team still managed to build every day (Zachary, 1994). Far from being a nuisance, the NT team attributed much of its success on that huge project to their daily builds. Those of us who work on projects of less staggering proportions will have a hard time explaining why we aren't also reaping the benefits of this practice.

Courtesy: http://www.stevemcconnell.com/, http://www.wikipedia.org/

Saturday, September 08, 2007

Verification & Validation Techniques

Verification and validation techniques are of two types: Formal and Informal as shown in the diagram below.

Formal V&V Techniques

Formal V&V techniques are based on formal mathematical proofs of correctness. If attainable, a formal proof of correctness is the most effective means of model V&V. Unfortunately, “if attainable” is the sticking point. Current formal proof of correctness techniques cannot even be applied to a reasonably complex simulation; however, formal techniques can serve as the foundation for other V&V techniques. The most commonly known techniques are briefly described below.

Induction, Inference, and Logical Deduction are simply acts of justifying conclusions on the basis of premises given. An argument is valid if the steps used to progress from the premises to the conclusion conform to established rules of inference. Inductive reasoning is based on invariant properties of a set of observations; assertions are invariants because their value is defined to be true. Given that the initial model assertion is correct, it stands to reason that if each path progressing from that assertion is correct and each path subsequently progressing from the previous assertion is correct, then the model must be correct if it terminates. Birta and Ozmizrak (1996) present a knowledge-based approach for M&S validation that uses a validation knowledge base containing rules of inference.

Inductive Assertions assess model correctness based on an approach that is very close to formal proof of model correctness. It is conducted in three steps.

Input-to-output relations for all model variables are identified
These relations are converted into assertion statements and are placed along the model execution paths so that an assertion statement lies at the beginning and end of each model execution path
Verification is achieved by proving for each path that, if the assertion at the beginning of the path is true and all statements along the path are executed, then the assertion at the end of the path is true

If all paths plus model termination can be proved, by induction, the model is proved to be correct.

Lambda Calculus is a system that transforms the model into formal expressions by rewriting strings. The model itself can be considered a large string. Lambda calculus specifies rules for rewriting strings to transform the model into lambda calculus expressions. Using lambda calculus, the modeler can express the model formally to apply mathematical proof of correctness techniques to it.

Predicate Calculus provides rules for manipulating predicates. A predicate is a combination of simple relations, such as completed_jobs > steady_state_length. A predicate will be either true or false. The model can be defined in terms of predicates and manipulated using the rules of predicate calculus. Predicate calculus forms the basis of all formal specification languages.

Predicate Transformation verifies model correctness by formally defining the semantics of the model with a mapping that transforms model output states to all possible model input states. This representation is the basis from which model correctness is proved.

Formal Proof of Correctness expresses the model in a precise notation and then mathematically proves that the executed model terminates and satisfies the requirements with sufficient accuracy. Attaining proof of correctness in a realistic sense is not possible under the current state of the art. The advantage of realizing proof of correctness is so great, however, that, when the capability is realized, it will revolutionize V&V.

Informal V&V Techniques

Informal techniques are among the most commonly used. They are called informal because they rely heavily on human reasoning and subjectivity without stringent mathematical formalism. The informal label should not imply, however, a lack of structure or formal guidelines in their use. In fact, these techniques should be applied using well-structured approaches under formal guidelines. They can be very effective if employed properly.

The following techniques are discussed in the paragraphs below:

Audit
Desk Checking / Self-inspection
Face Validation
Inspection
Review
Turing Test
Walkthroughs
Inspection vs Walkthrough vs Review

1. Audit: An audit is a verification technique performed throughout the development life cycle of a new model or simulation or during modification made to legacy models and simulations. An audit is a staff function that serves as the "eyes and ears of management". An audit is undertaken to assess how adequately a model or simulation is used with respect to established plans, policies, procedures, standards, and guidelines. Auditing is carried out by holding meetings and conducting observations and examinations. The process of documenting and retaining sufficient evidence about the substantiation of accuracy is called an audit trail. Auditing can be used to establish traceability within the simulation. When an error is identified, it should be traceable to its source via its audit trail.

2. Desk Checking / Self-inspection: Desk checking, or self-inspection, is an intense examination of a working product or document to ensure its correctness, completeness, consistency, and clarity. It is particularly useful during requirements verification, design verification, and code verification. Desk checking can involve a number of different tasks, such as those listed in the table below.

Typical Desk Checking Activities:

· Syntax review
· Cross-reference examination
· Convention violation assessment
· Detailed comparison to specifications
· Code reading
· Control flowgraph analysis
· Path sensitizing

To be effective, desk checking should be conducted carefully and thoroughly, preferably by someone not involved in the actual development of the product or document, because it is usually difficult to see one’s own errors.

3. Face Validation: The project team members, potential users of the model, and subject matter experts (SMEs) review simulation output (e.g., numerical results, animations, etc.) for reasonableness. They use their estimates and intuition to compare model and system behaviors subjectively under identical input conditions and judge whether the model and its results are reasonable.

Face validation is regularly cited in V&V efforts within the Department of Defense (DoD) M&S community. However, the term is commonly misused as a more general term and misapplied to other techniques involving visual reviews (e.g., inspection, desk check, review). Face validation is useful mostly as a preliminary approach to validation in the early stages of development. When a model is not mature or lacks a well-documented VV&A history, additional validation techniques may be required.

4. Inspection: Inspection is normally performed by a team that examines the product of a particular simulation development phase (e.g., M&S requirements definition, conceptual model development, M&S design). A team normally consists of four or five members, including a moderator or leader, a recorder, a reader (i.e., a representative of the Developer) who presents the material being inspected, the V&V Agent; and one or more appropriate subject matter experts (SMEs).

Normally, an inspection consists of five phases:

1. Overview
2. Preparation
3. Inspection
4. Rework
5. Follow up

5. Review: A review is intended to evaluate the simulation in light of development standards, guidelines, and specifications and to provide management, such as the User or M&S PM, with evidence that the simulation development process is being carried out according to the stated objectives. A review is similar to an inspection or walkthrough, except that the review team also includes management. As such, it is considered a higher-level technique than inspection or walkthrough.

A review team is generally comprised of management-level representatives of the User and M&S PM. Review agendas should focus less on technical issues and more on oversight than an inspection. The purpose is to evaluate the model or simulation relative to specifications and standards, recording defects and deficiencies. The V&V Agent should gather and distribute the documentation to all team members for examination before the review.

The V&V Agent may also prepare a checklist to help the team focus on the key points. The result of the review should be a document recording the events of the meeting, deficiencies identified, and review team recommendations. Appropriate actions should then be taken to correct any deficiencies and address all recommendations.

6. Turing Test: The Turing test is used to verify the accuracy of a simulation by focusing on differences between the system being simulated and the simulation of that system. System experts are presented with two blind sets of output data, one obtained from the model representing the system and one from the system, created under the same input conditions and are asked to differentiate between the two. If they cannot differentiate between the two, confidence in the model’s validity is increased. If they can differentiate between them, they are asked to describe the differences. Their responses provide valuable feedback regarding the accuracy and appropriateness of the system representation.

7. Walkthrough: The main thrust of the walkthrough is to detect and document faults; it is not a performance appraisal of the Developer. This point must be made to everyone involved so that full cooperation is achieved in discovering errors. A typical structured walkthrough team consists of:

Coordinator, often the V&V Agent, who organizes, moderates, and follows up the walkthrough activities
Presenter, usually the Developer
Recorder
Maintenance oracle, who focuses on long-term implications
Standards bearer, who assesses adherence to standards
Accreditation Agent, who reflects the needs and concerns of the UserAdditional reviewers such as the M&S PM and auditors

Inspection vs Walkthrough vs Review:

Inspections differ significantly from walkthroughs. An inspection is a five-step, formalized process. The inspection team uses the checklist approach for uncovering errors. A walkthrough is less formal, has fewer steps, and does not use a checklist to guide or a written report to document the team’s work. Although the inspection process takes much longer than a walkthrough, the extra time is justified because an inspection is extremely effective for detecting faults early in the development process when they are easiest and least costly to correct.

Inspections and walkthroughs concentrate on assessing correctness. Reviews seek to ascertain that tolerable levels of quality are being attained. The review team is more concerned with design deficiencies and deviations from the conceptual model and M&S requirements than it is with the intricate line-by-line details of the implementation. The focus of a review is not on discovering technical flaws but on ensuring that the design and development fully and accurately address the needs of the application. For this reason, the review process is effective early on during requirements verification and conceptual model validation.