Using Amazon SageMaker Debugger on DeepRacer stack — Part 1
My initial goal is to try out Amazon SageMaker Debugger and see if I can get some useful information apart from what DeepRacer stack provides.
However, after many trial and errors, I found that it’s not as easy as AWS’s sample codes show. Though, I think my journey would still be a good example to show how to make SageMaker Debugger works in customised environments.
What is Amazon SageMaker Debugger
Amazon SageMaker Debugger is a tool for debugging ML training. It helps us do many heavy lifting, like collecting data, monitoring training process, detecting abnormal behaviour, etc.
How does Amazon SageMaker Debugger work?
Amazon SageMaker Debugger consists of 2 parts: Collections/Hooks and Rules.
Collections/Hooks
Collections are groups of artifacts (a.k.a. tensors) generated by the training. It can be the tensors storing model losses, weights, etc.
In order to do debugging on the models, we need to retrieve those artifacts. Thus, SageMaker Debugger uses hooks to emit the tensors from SageMaker container to other storage (most commonly, S3).
Rules
Besides the training job itself, SageMaker will spin up another process job if you choose to use SageMaker debugger for that training.