Architecture
How components work together, mostly interfaces, no inner workings
Set up an experiment by modifying the training config in trainer.py
and starting trainer.py.
During the initialization of each experiment a folder for saving
tensorboard, models and config files is created in a results
folder
at the project root level.
It is also possible to reload previous experiments by passing
the base_experiment key in the config.
The environment orchestrator handles the creation and communication
with the number_envs
environments by spawning each environment
in a process using the native multiprocessing package of python.
Environments (currently environment_robot_task.py
) are composed
of a robot and a task which the user can combine in arbitrary ways.
During initialization the environments return the observation
and action space to the trainer which initializes
the agent using these. Additionally the success criterion and the reward
function are passed back to the trainer from the respective task.
The success criterion is used to determine the test_success_ratio
which is the number of concluded test runs with success_criterion=True
compared to the total number of tests for the current test period.
If the previous best ratio
is outperformed, the agent is saved to the model directory.
Success criterion- and reward function are set in the specific
task file. This file in junction with the task meta class
specifies everything except for the communication with the simulated robot.
Task objects and robot files are saved in urdf format in the data folder
and read by pybullet which composes the simulation environment.
Agent files are stored in agents folder, test scripts are also included.