Tutorial

This tutorial is written to demonstrate some useful functionality built into the DecLog library, as well as how to write your own Database backend and extend the BaseLogger.

Say for example, that we want to overhaul the way we record analysis results at a research lab.

Our research lab has an extensive analysis library, providing well tested and documented functions and classes for performing common analysis routines on our experimental data. Each experiment involves firing a 'shot' on one of the machines, and a number of measurements are made depending on the interaction between the machine and the load.

Let's create an object representing the various options for a machine containing all the possible machines. Using an Enum, we can store the different machine names with the corresponding folder number:

from enum import Enum


class FacilityMachines(Enum):
    """Used to generate filepath of metadata directories. Machine number corresponds to
    folder number in /data"""

    M2 = 2
    M3 = 3

Setting custom unique keys

The most basic customisation one can make to the BaseLogger is to add custom keys. In fact, all the DefaultLogger does is inherit from the BaseLogger, set the database to the PickleDatabase and set the database keys to the function name and datetime.

At our research lab, every experiment is conducted on one of several machines and is assigned a shot number. So for example, two separate experiments could be conducted on Machine A and Machine B but they could share the same shot number.

Clearly, we need to differentiate between these experiments, so we need to set the keys to make this clear:

from declog.logger import BaseLogger
from declog.database import BaseDatabase

class FacilityLogger(BaseLogger):
    db = BaseDatabase()
    unique_keys = "machine shot_number function_name datetime".split()

Now, when the FacilityLogger is applied to a function, every time the function is run a new entry will be created in the underlying database, using the captured values in the order specified in unique_keys.

Manually setting key values

The intention of this library is to reduce work for developers by automatically logging all the arguments supplied to an analysis function. However, due to the key system, it is required to supply some variables for logging, even if they are not actually required by the function. One option is to make the key a redundant argument to the function, this is inconsistent with PEP8 and will not pass flake8 checks. Instead, one can use the set() class method from the BaseLogger.

from examples.facility_example.logger import FacilityLogger, FacilityMachines


@FacilityLogger
def my_processing_function_for_any_machine(machine: FacilityMachines, shot_number: int):
    return machine.value * shot_number


@FacilityLogger.set(machine=FacilityMachines.M3)
def my_processing_function_for_m3(shot_number: int):
    return shot_number

Using a custom database

So far, our Logger uses the BaseDatabase, which is an abstract type that defines how a database is stored in memory, but doesn't specify any persistent file format. There are a couple of built-in options, but this example will demonstrate how to create your own.

This research lab already has its own data storage protocol, each machine has a json file which is used to log shot variables for that machine. See below for this illustrated as a tree diagram.

.
└── data/
    ├── 02_M2/
    │   ├── metadata/
    │   │   └── shot_db.json
    │   ├── raw_data/
    │   │   ├── s0001
    │   │   ├── s0002
    │   │   └── ...
    │   └── processed_data/
    │       ├── s0001
    │       ├── s0002
    │       └── ...
    └── 03_M3/
        └── ...

In order to go between the database interface used by the logger and our own custom storage format, we need to create a custom database:

class FacilityDatabase(PersistentDatabase):
    def __init__(self):
        db_paths = self.gen_db_paths()
        self.dbs = {k: JSONDatabase(v) for k, v in db_paths.items()}

    def __getitem__(self, item):
        if isinstance(item, FacilityMachines):
            return self.dbs[item]
        else:
            raise TypeError(
                "The first key when using the FacilityDatabase must be a machine."
            )

    def __setitem__(self, key, value):
        if isinstance(key, FacilityMachines):
            self.dbs[key] = value

    def __repr__(self):
        print_dict = {}
        for machine, db in self.dbs.items():
            print_dict[machine] = db
        return repr(print_dict)

    def write(self):
        for db in self.dbs.values():
            db.write()

    @staticmethod
    def gen_db_paths():
        machine_databases = {}
        data_path = Path(__file__).parent / "data"
        for machine in FacilityMachines:
            db_path = Path(f"{machine.value:02d}_{machine.name}/metadata/shot_db.json")
            machine_databases[machine] = data_path / db_path
        return machine_databases

Finally, we need to supply this as the database to our logger:

from declog.logger import BaseLogger
from examples.facility_example.logger import FacilityDatabase

class FacilityLogger(BaseLogger):
    db = FacilityDatabase()
    unique_keys = "machine shot_number function_name datetime".split()

Capturing information about the environment

So far, we have seen that we are able to log values using the log function and that the function arguments are captured automatically by the Logger. But what if we want to automatically capture information about the environment the function is being called in?

Let's go back to our example research facility. We would like to know what version of the code we are running at the time. Given the examples so far, we have the options of making a log call in every function, or making it an argument to all functions, neither of which are particularly maintainable.

DecLog gives us the option of using logged_properties.

from declog import logged_property
from declog.logger import DefaultLogger

class MyLogger(DefaultLogger):

    @logged_property
    def meaning_of_life(self):
        return 42


assert MyLogger(lambda: None).meaning_of_life == 42

These behave just like normal properties, and can be accessed from an instance at any time. Whenever a decorated function is called, any logged properties are added to the database entry, and can be used as keys if desired.

Logged properties are also how the DefaultLogger captures the name of the function and time of the function call. If we look at the class definition, we find that it inherits the FunctionNameMixin and DateTimeMixin, each of which define one logged_property method. In declog.logger.mixins a few commonly used logged_propertys are defined. It is worth taking a look at these mixins, as they may be useful in your logger.