...While this additional information provides us more to work with, it also requires different network architectures and, often, adds larger memory and computational demands.We won’t use any optical flow images. This reduces model complexity, training time, and a whole whack load of hyperparameters we don’t have to worry about. Every video will be subsampled down to 40 frames. So a 41-frame video and a 500-frame video will both be reduced to 40 frames, with the 500-frame video essentially being fast-forwarded. We won’t do much preprocessing. A common preprocessing step for video classification is subtracting the mean, but we’ll keep the frames pretty raw from start to finish.