Saturday, 5 December 2020

Refactoring messy Test Suite in Python

Recently, I did some code reviews on my peer's code. We've created 100+ pySpark jobs and each job has its own test cases. When I looked at the file, it was super long. Let's take a look at main function. It is pretty general test suite main function.

if __name__ == "__main__":
    loader = unittest.TestLoader()
    suite = create_unit_suite()
    runner = unittest.TextTestRunner(verbosity=2)
    runner.run(suite)

However, when diving into create_unit_suite function, I was utterly dumbfounded.

def create_unit_suite():
    suite = unittest.TestSuite()
    suite.addTests(loader.loadTestsFromModule(tests.test_job_1))
    suite.addTests(loader.loadTestsFromModule(tests.test_job_2))
    suite.addTests(loader.loadTestsFromModule(tests.test_job_3))
    # ...
    suite.addTests(loader.loadTestsFromModule(tests.test_job_100))
    suite.addTests(loader.loadTestsFromModule(tests.test_job_101))
    suite.addTests(loader.loadTestsFromModule(tests.test_job_102))
    # ...

And the import statements look like

import tests.test_job_1
import tests.test_job_2
import tests.test_job_3
# ...
import tests.test_job_100
import tests.test_job_101
import tests.test_job_102
# ...

Every time a new job is created, the developers need to import it and and add a new addTests line. The folder structure is way more complicated and there is no guarantee that all developers remember to add the tests. You can't tell which jobs are missing test cases simply by looking at this file.

Thanks to unittest, there is a function called discovery introduced in v3.2. It can be used like TestLoader.discover() or from the command line. From the documentation, it states that

Unittest supports simple test discovery. In order to be compatible with test discovery, all of the test files must be modules or packages (including namespace packages) importable from the top-level directory of the project (this means that their filenames must be valid identifiers).

In order to expose the test files, we need to make them modules. We can do that simply by adding init.py.

Here's what's left. Specify the path and pattern for unittest to discover. No long import statements and addTests code.

import unittest
from tests import *


if __name__ == "__main__":
    testsuite  = unittest.TestLoader().discover("tests", pattern="test_*.py")
    runner = unittest.TextTestRunner(verbosity=2).run(testsuite)

Simply run the below command to get the result.

python -m unittest

However, running all the test cases may take a great of time. In order to have more flexibility, it should be able to run in a specific module.

We can use argparse to take --module flag to determine which module we are going to run test cases on.

import argparse

Put the logic in main()

def main():
  parser = argparse.ArgumentParser()
  parser.add_argument("--module", action="store", default="all", help="the module of a collection of test cases to be executed")
  args = parser.parse_args()
  target_module = args.module
  target_path = "tests" if target_module == "all" else "tests/{}".format(target_module)
  testsuite  = unittest.TestLoader().discover(target_path, pattern="test_*.py")
  runner = unittest.TextTestRunner(verbosity=2).run(testsuite)

if __name__ == "__main__":
    main()

No comments:

Post a Comment

A Fun Problem - Math

# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...