Training requires a lot more memory to keep gradients + gradient stats for the optimizer, and needs higher precision weights for the optimization. It's also much more parallelizable. But inference is kind of a subroutine of training.
Did you custom build those Language processors for this task? Or did you repurpose something already existing? I have never heard anyone use ‘Language processor’ before.
reply