It's a single value per pixel, but each pixel has a different color filter in front of it, so it's effectively that each pixel is one of R, G, or B
So, for a 3x3 image, the input data would be 9 values like:
R G B B R G G B R ?
So, for a 3x3 image, the input data would be 9 values like: