La mayoría de automatización de pruebas se enfoca en pruebas funcionales (o de regresión), repitiendo la misma secuencia de acciones para encontrar comportamientos inesperados. A pesar de tener muchas ventajas, también tiene sus limitaciones y a en muchas ocasiones quedan defectos serios en nuestro software. “Test Monkeys” te puede ayudar a rellenar estos espacios vacios a donde las pruebas tradicionales no llegan.
El termino “Monkey Testing” se refiere al proceso de ejecución de test automaticos de forma aleatoria usando una herramienta.
Este artículo enviado por “John A. Fodeh”, he querido mantenerlo en english…
B-K Medical has successfully utilised monkey test tools in several projects both on embedded and Windows applications. This paper will share the experiences gained applying those tools.
1. Traditional Test Automation
Test automation is often perceived as the magical solution to the bottleneck of software development; testing. Expecting to accomplish more and better testing in the limited time available, many organisations make huge investments in sophisticated test automation tools. Applied in immature test regimes and facing unrealistic expectations, these tools often end up as “shelfware”.
The traditional test automation approaches concentrate on high-level regression tests using a capture/replay tool. Utilised properly, these approaches can undoubtedly enhance the test organisation’s capabilities. However, traditional test automation has certain flaws and limitations one needs to be aware of. These shortcomings are closely related to the characteristics of traditional test automation. Traditional test automation is:
Static
Each time a test are run, the same actions are executed in the same order. Of course, repeating previously executed tests is the nature of regression testing. Nevertheless, this means that, when executed, the tests traverse the same path in the software. Unless the software underneath is changed, the probability of finding new defects is unquestionably low. Repeating the test won’t yield additional code coverage.
Simple
Due to the fact that the tests have to be reused and maintained, most automated tests consist of short and simple sequences of commands. The tests use a limited range of input and combinations, that are mostly positive.
As known, negative testing is very effective in finding defects. Many defects reside in the error handling routines of the software (if any) and are often revealed when using illegal input and actions or combinations of these. Due to the limited negative input, many automated tests fail to detect critical defects in the software.
Moreover, automated tests are characterised by frequent “initialisation”; after the short command sequence is executed, the application under test is typically reset to a known start state (in many cases, the application is simply shut down and restarted).
Furthermore, a “clean up” is performed to restore databases, delete created files, etc. This “use-pattern” varies significantly from the way real users use the application under test.
Real users rarely “initialise” and “clean up” after completing a certain command sequence. Instead, they perform many different and long sequences (sometimes running 3 on multiple instances of the application), thus creating extensive and composite use patterns.
Synchronised with application
The playback of the automated tests is synchronised with the application under test. This feature, which is incorporated in most Capture/Replay tools, means that the tool will halt the playback and wait for a certain synchronisation point to appear (for example the appearance of specific window, push-button, etc.) before resuming the test. The tool will eventually time out if the control does not appear within a predefined timeout period. This feature is very useful when running regression tests, but it also prohibits some tests – while waiting for a synchronisation point no testing is done!
Applied late in development process
To automate regression tests, the application under test must have a reasonable level of stability. Furthermore, the graphical user interface (GUI) should be established (and preferably unchanging). If these prerequisites are not met, automating the tests become impractical if not impossible. In many cases, these prerequisites are not fulfilled until the late stages of the development process.
Consequently, the automated tests become a bottleneck in an already squeezed test plan.
Vulnerable to changes
As known, software undergoes many changes during its lifecycle. To keep up with changes in the software and the requirements, the corresponding tests need to be maintained.
Automated tests are closely linked to the GUI of the application. Changes in the GUI may affect the automated tests, in worst case making large parts of the tests obsolete or useless. For this reason, special attention must be made to promote the automated tests’ maintainability and reduce their vulnerability.
Vulnerable tests are costly to maintain. Applying good software development practices can assist developing maintainable tests.
1.1 Quality attributes of traditional test automation
Fewster and Graham (ref. [3]) describe four attributes for defining the quality of a test. A test should be:
• Effective: Has a reasonable probability for detecting errors.
• Exemplary: Practical and with low redundancy.
• Evolvable: Easy to maintain.
• Economic: Inexpensive to develop and perform.
It is possible to visualise the quality attributes of a test using a Keviat diagram.
Figure 1 shows a manual test that has been automated. Initially, the automated test is less economic and evolvable than the manual test. When the automated test is repeated many times, it becomes more economic. As illustrated, the automated test is only feasible if repeated many times.
2. Test Monkeys
Monkey Testing refers to the process of randomly exercising a software program. A Test monkey is an automated tool for random testing.
The primary function of the test monkey is to transfer input to the application under test, register the performed actions and monitor the system response (see Figure 2).
The input to the application under test is specified in an action list script (preferably a plain text file, so it can be created and edited in any text editor). When executing a script, the commands are read randomly by the test monkey and advanced to the interface of the application under test.
The test monkey monitors the response from the application under test and detects if the application crashes or hangs. If the application becomes non-responsive, the test monkey restarts the application under test and resumes the test. During execution the test monkey records the performed commands in a log file. Additional test information and the corresponding “post mortem dump”, if the application breaks down, are also registered in the log.
2.1 Basic features
A test monkey should possess some basic features. It should be able to:
• Select randomly from input range: This is the main idea of the random test. Ideally, the input range would cover all possible input for the application under test.
• Enter input to the application under test: The test monkey should enter the input through the normal interface as used by the real users (e.g. the applications GUI). It is not desirable to utilise any backdoor interfaces (for example entries made directly in database instead of going through the GUI).
• Detect “life signs” of the application under test: This is a very important feature, as the test monkey should sense if the application under test is non-responsive, hanging or has crashed.
• Log the performed commands: Since it is important later on to be able to reproduce the found defects, a test monkey should have a robust logging facility, that can preserve the log in case of serious application breakdowns. The log should even
survive a “blue screen of death”. To reproduce a defect, the test monkey should be able to play back the log.
• Restart and initialise the application: When the application under test breaks down, the test monkey should be able to restart and initialise the application, so the test can be resumed. With embedded systems this can be achieved by turning the power off then on again.
3. Implementing Test Monkeys
It is possible to implement test monkeys in various environments and platforms. For testing Windows applications, the test monkey can be implemented using a capture replay tool. For embedded systems the test monkey can be implemented as a standalone tool using an external interface (e.g. RS232)
Developing a test monkey, with the basic features described above, is generally uncomplicated (at least technically). The main challenge is to determine the level of application knowledge the test monkey should possess. Test monkeys with limited
application knowledge are cheap to develop and maintain, but overlook many defects in the application. On the other hand, test monkeys with wide application knowledge can be very effective in finding defects, but are costly to develop and maintain.
While it is questionable how much application knowledge a test monkey should have, it is indisputable that the test monkey should have good environment awareness. This is needed for detecting “life signs” as well as for restarting and initialising the application under test.
4. Using Test Monkeys
Test monkeys can be used throughout the development process by developers and testers.
Test monkeys do not require an established GUI, and can therefore be applied as soon as the first versions of the application under test are built.
Developers use the test monkeys to get valuable feedback on state of the application under test. The test monkey can be used as a part of the “daily build of smoke” to verify the stability of the latest software build. If the stability suddenly drops, the developer can investigate the newest changes implemented in the software and make the necessary corrections before building an official (labelled) version of the application.
When approaching the release date the test monkeys provide vital information about the release candidates’ readiness for release. Testers can use test monkey for establishing an entry criterion (e.g. for the system test phase). If the application under test fails to pass a certain number of random operations, the software is rejected and returned to the developers (for more details, see the Metrics
section).
Test monkeys can be utilised whenever the test systems are not used for other purposes and are ideal for overnight and weekend tests. Test monkeys can operate in parallel on multiple systems and run unattended for many hours or days, until a predefined limit is reached or until the test is interrupted manually.
Logs of previously found crashes can be used as a part of the automated test suite to ensure that those defects do not emerge again (the notorious “Haven’t I seen this bug before?” – phenomenon).
When new functionality is implemented in the application under test, it is possible to direct the test monkey to a specific area of interest, this can be accomplished by:
• Adding multiple entries of a particular command in the action list script, thus increasing the probability of executing this command during testing.
• Excluding specific commands from the action list script (e.g. commands that exit the area of interest).
• Running combinations of sequential and random tests. A sequence of commands will bring the test monkey into the area of interest, afterwards the test monkey starts the random test.
Test monkeys are good at constructing complex combinations and sequences (especially those that were not considered during design). Test monkeys are therefore suitable for testing medium to large event-driven systems, systems with complex application environment settings and systems with many running and interacting applications. It may not be feasible to use test monkeys on simple embedded systems or systems that require a huge number of correlated data to be entered in a specific order (the test monkey will then spend most of its time in illegal states).
4.1 Quality attributes of test monkeys
Considering the four quality attributes described previously, a monkey test (repeated many times) can be plotted in a Keviat-diagram as shown in Figure 3
Compared to manual testing, a monkey test is less effective in finding defects (as it misses many apparent defects typically found in manual testing). Also, a monkey test is less exemplary than a manual test (as there is often high redundancy between monkey tests). On the other hand, a monkey test repeated many times is more evolveable and economic than manual testing.
4.2 Metrics
As mentioned earlier, test monkeys deliver vital information about the reliability of the application under test. A useful metric is the mean number of random operations between failures. This metric is equivalent to the widely spread Mean Time Between Failures (MTBF) metric (e.g. used for hardware reliability testing). This measure is calculated by
counting the total number of system breakdowns encountered and the number of random operation performed. The following formula is used:


Últimos Comentarios