Sensors are the spine of all cameras, the hardware that captures the light, converts photons to electrons. It is not necessary to understand how a sensor works to use one, however it will allow different creative choices when deciding lenses and help you understand why particular cameras produce certain aesthetic characteristics. The complexity runs deep and this article is confined in its description to the basics of how an image sensor impacts the final image. Restraining the description to core features does not aim to limit the understanding, rather in video production there is an infinite amount of aspects which are more important. Saying that, many times people ask for video specifications or purchase cameras without understanding why specific qualities are important and are easily cheated into thinking ‘the bigger the better’. Every camera system works differently to render out an image and over time it seems camera sensors dominate market because of their look, we become accustomed to the visuals from these image sensors. So if you understand the pipeline from photon capture to digital output is essential because each stage influences the final image quality, noise characteristics, and creative possibilities available to filmmakers and photographers.
Sensor design
The illustration clearly demonstrates the image sensor’s role when capturing light and movement. You can see here the image sensor as the first point where light is captured. The image sensor build determines how much light is captured, the conversion from voltage to digital value, the image and video data is now measurable in bits.

Camera sensors work by converting light into digital information through a sophisticated process that directly impacts image quality. When light hits the camera sensor, photons strike individual photosites (pixels) where they generate electrical charges through the photoelectric effect. This electrical signal is then processed through a Bayer filter array, which separates the light into red, green, and blue color information using a mosaic pattern of colored filters over each photosite. The relationship between sensor size and megapixel count determines the physical size of each photosite, larger photosites can capture more photons, resulting in better signal-to-noise ratio, improved dynamic range, and superior low-light performance. Smaller photosites, created when more megapixels are packed onto the same sensor size, may provide higher resolution but often at the cost of light-gathering efficiency. After the sensor captures this light information, the Image Signal Processor (ISP) converts the analog electrical signals to digital values, applies various corrections and processing algorithms, then sends the processed data through the Input/Output interface for storage or display.
Sensors have existed for centuries and CMOS image sensor chips are used in over 85% of smartphones for image and video capture, Sony, which holds 40-60% of the market share, manufactures the image sensors that dominate our digital world today.
The diagram shows one chip sensor, which is common in mobile imaging, consumer and cinema cameras, in a single-chip sensor, a Bayer filter is normally used to capture red, green, and blue through filters arranged in a 2×2 grid pattern (typically two green, one red, and one blue). This layout enhances sensitivity to green light, which contributes significantly to perceived image sharpness. Three-chip systems use a beam splitter to divide incoming light into separate red, green, and blue channels, directing each to its own dedicated image sensor. In contrast, three-chip sensors are more likely to be found in larger broadcast cameras such as those used in electronic news gathering (ENG), due to their bulkier design and higher performance requirements.
Sensor design innovation doesn’t stop there, notably Sony marked a new era of sensor illumination possibilities, one major advancement is the development of BSI (Back-Side Illuminated) and stacked CMOS sensors. In BSI sensors, the photodiode is positioned closer to the light source by relocating circuitry to the back of the sensor, improving light sensitivity and signal strength. A well-known consumer example is the Sony A7R II and other camera models.
Modern CMOS sensors feature numerous innovations: improved circuitry layouts, advanced noise reduction designs, better light sensitivity, refined component organization, and upgraded materials. All of these elements work together to enhance image quality.
The following sections highlight other technical qualities but clearly sensor technology, design philosophy, and engineering execution often determine whether your footage looks professional or amateur.
What are Pixels…
A term ubiquitous with digital video and images, simply put, pixel is a picture element. A digital image consists of thousands of pixels, what you see is just lists of picture elements. That’s right, an image is just arrays of pixels: sets of numerical values representing pictorial information. If that isn’t exciting enough an image array is also a matrix, common in RGB (Red, Green, Blue) or YUV (Luminance, Chrominance) colour spaces. An enlightening idea when thinking about sculpting light as an artistic endeavour, that the digital representation of reality is just an array of numbers.
The pixels on a camera sensor are measured in millions, so millions of these pixels represent light and movement. You’ll typically see the megapixel count listed (e.g 24 MP) on camera sensors, but this isn’t the only aspect determining high image quality or a camera sensor’s capacity to maximise its photon capturing.
What more pixels mean is greater light-collecting capacity, and manufacturers today find many ways to capture more photons. However, what matters more is how well those pixels collect photons. To improve light capture, modern sensors use techniques like pixel binning, where clusters of adjacent pixels are grouped together, or ‘binned’, to form larger ‘superpixels.’ This boosts light sensitivity and reduces noise, especially in low-light conditions.
Pixel pitch
Another key factor is pixel pitch, the distance between the photodiode and the neighbouring photodiode as this indicates how much fall-off light is captured, usually measured in microns (µm). A larger pixel pitch generally means better light-gathering ability. Innovations in sensor design, such as optimising pixel layout or improving the materials used, have enabled even smaller sensors, like those in smartphones, to produce impressive image quality. All of this relies on the photoelectric effect, where the photons strike the sensor’s crystalline silicon surface, knocking electrons into the conduction band and creating an electrical signal.
Sensor size
When reading about sensor size in videography, it feels a bastardised conversation about photography and cinema cameras, this is due to videography evolving with video camera technology becoming accessible to the wider population and camera manufacturers hearing the cries for greater aesthetic choices. Taking from both these worlds, there is a variation of image sensor sizes and nomenclature used to describe size of the image sensor.
Cameras which have dual function of stills and video, the popular video cameras now used for alot of social media content, b-roll, documentary, news reportage and web content will use photographic sensor sizing. For example, Full Frame format (24 × 36 mm) is the commercial reference. Originating from film photography, the 24 mm × 36 mm format is also called 35 mm, due to the width of the film and was not achievable in the early days of photography and manufacturers have used smaller photographic formats, such as the advanced photo system (APS).
Often photographers speak about small, large and medium format, these are terms from analog photography with 60mm being medium format, and the large film 90mm. When digital sensors replaced film, manufacturers kept the familiar size classifications (full frame = 35mm film size, medium format = larger than 35mm, etc.) even though we’re now talking about sensor dimensions rather than film dimensions. While camera phones use a sensor is qualified by the inch fraction occupied by 1.5× the diagonal of the sensor, the factor 1.5 comes from the optical mount that surrounds a video tube.
Here is a simple chart that compares the sensor ‘area’ and the sensor size in mm. These consumer video cameras by no means should be eliminated from professional discussions about video production, these are formidable image acquisition machines.

Sensor size impacts depth of field and depth of view. The difference is talk a bit about size and lenses.
Resolution:
The digital revolution in film and video has fundamentally changed how we define and measure image capture, moving away from traditional film format terminology toward pixel-based measurements that more accurately reflect modern imaging technology. Today’s cinema cameras and high-end video equipment are commonly described using resolution nomenclature such as 2K, 4K, 6K, and even 8K, where the number refers to the approximate horizontal pixel count of the captured image. This system makes practical sense because digital video no longer consists of scan lines—a vertical measurement from analog television—but rather discrete pixels that can be quantified in both width and height. When cinematographers discuss 4K video, they’re referring to footage that is 4,096 pixels wide, while consumer 4K (UHD) measures 3,840 pixels wide. This pixel-based naming convention remains consistent regardless of aspect ratio changes, making it far more logical than the confusing array of film format names that previously dominated the industry.
However, it’s crucial to understand that resolution numbers alone don’t tell the complete story of image quality or sensor performance. As imaging experts emphasize, resolution simply describes the ability to distinguish the smallest discernible details—such as individual wood grains—before contrast is completely lost, but resolution is not synonymous with sharpness. A sensor’s actual output dimensions, such as 7952 × 5304 pixels for a 42-megapixel sensor, represent the raw data capture capability, but the perceived quality depends heavily on factors like contrast, noise performance, and optical quality. Our perception of resolution is intrinsically linked to image contrast, meaning a low-contrast 8K image may appear softer and less detailed than a high-contrast 4K image. This distinction becomes particularly important in professional filmmaking, where the combination of sensor size, pixel density, optical performance, and post-production workflow ultimately determines whether a camera system can deliver the cinematic quality that modern audiences expect, regardless of its headline resolution specifications.
Below is a table indicating some different brands to better understand
| Camera | Pixel Pitch | Megapixels | Resolution | Sensor Size |
|---|---|---|---|---|
| Sony FX3 | ~5.94 µm | 12 MP | 4240 × 2832 | Full-frame (35.6 × 23.8 mm) |
| Canon EOS R5 | 4.39 µm | 45 MP | 8192 × 5464 | Full-frame (36 × 24 mm) |
| ARRI ALEXA 35 | ~6.07 µm | ~14.6 MP | 4608 × 3164 | Super 35 (27.99 × 19.22 mm) |
| Apple iPhone 16 Pro | ~1.22 µm | 48 MP (main) | 8064 × 6048 | ~1/1.28″ (~9.6 × 7.2 mm) |
Bits and Digital Imaging
Understanding bit depth is crucial for video makers because it fundamentally determines the creative flexibility and technical quality of your footage throughout the entire production pipeline. An 8-bit recording provides only 256 possible values for each colour channel, which might seem adequate for basic content, but becomes severely limiting when you need to perform colour corrections, match different shots, or recover details from shadows and highlights. Professional video work often requires 10-bit or 12-bit recording specifically because these higher bit depths provide exponentially more data.
When you’re shooting, the bit depth your camera captures directly affects how much tonal information is preserved from the sensor to your final deliverable. Bit depth refers to the number of bits used to represent each colour component (e.g., red, green, blue). It’s also known as bit depth per channel. In digital imaging, bits determine how much information is stored for each pixel and colour channel. This directly affects image quality, especially in post-production where colour grading and tonal adjustments rely heavily on the depth of this data. Bits are not to be confused with bytes. A byte consists of 8 bits and can represent values from 0 to 255 (2⁸ – 1). Professional video work often requires 10-bit or 12-bit recording specifically because these higher bit depths provide exponentially more data. Providing smooth gradations and preventing the visible banding that occurs when 8-bit footage is pushed during post-production.
The practical impact becomes immediately apparent during color grading and finishing work. When you lift shadows, adjust exposure, or apply creative color treatments, higher bit depth footage maintains smooth tonal transitions while lower bit depth material quickly breaks apart into visible steps and artifacts. The analog-to-digital converter determines how much of the light information captured by your sensor gets preserved as usable data. If you’re creating content for broadcast, streaming platforms, or any professional application, choosing a higher bit depth will prevent image defects, artifacts when editing in post, that isn’t to say that 8-bit or 10-bit don’t suffice for the big screen rather know now, a higher bit depth will allow better colour grading, image malleability in VFX and colour grading.
Knowledge is power
Understanding image sensors provides the technical foundation that separates informed filmmakers from those who simply chase specifications and marketing claims. While you don’t need to be a sensor engineer to create compelling video content, this knowledge empowers you to make deliberate creative and technical decisions rather than falling victim to the “bigger numbers are always better” mentality that pervades camera marketing. In an industry where technology constantly evolves, understanding these core principles provides the lasting knowledge that remains relevant regardless of which new camera hits the market next year.
