You have to first read the image header, an optional ICC profile and finally a portion of the first frame. This first frame might actually be a preview generated by an encoder, but should be fine for our purpose and it's not hard to seek to subsequent frames anyway. The frame itself contains its own header and all offsets to per-frame sections ("TOC"), while there is always one LfGlobal section that contains the heavily downscaled---8x or more---image in the modular bitstream, even when the frame itself uses VarDCT.
Any higher resolution would require some support from the encoder. The prime mechanism relevant here is a version of the modified Haar transform named Squeeze, which generates two half-sized images from one source image. As each output image is placed to distinct sections, only one out of two output images is needed for low-fidelity decoding. If the encoder didn't do any transformation however (often the case in VarDCT images), then all sections would be required regardless of the target resolution.
Therefore it is technically possible and in fact libjxl does support partial decoding by rendering a partial bitstream, but anything more than that would be surprisingly complex. For example how many bytes are needed to ensure that we have at least 8x downscaled image? This generally needs TOC, and yet a pathological encoder can put the LfGlobal section to the very end of frame to mess with decoders (though no such encoder is known at the moment). Any transformation, not just Squeeze, has to be also accounted to ensure that all of them will produce the wanted resolution once combined. Since the ICC profile and TOC already require most entropy encoding stuffs except for meta-adaptive trees, even the calculation of the number of required bytes already needs about 1/2--1/3 of the full decoder in my estimate from building J40.
That said, I'm not very sure this complexity could've been radically reduced without inefficiency in the first place. In fact I've just described what I wanted when I started to build J40! I think there was an informal agreement that the ICC profile could have been made skippable, but you still need all the same stuff for decoding TOC anyway. Transformation is a vital part of compression and can't be easily removed or replaced. So any such tool would be definitely possible, but necessarily complicated, to build.
You have to first read the image header, an optional ICC profile and finally a portion of the first frame. This first frame might actually be a preview generated by an encoder, but should be fine for our purpose and it's not hard to seek to subsequent frames anyway. The frame itself contains its own header and all offsets to per-frame sections ("TOC"), while there is always one LfGlobal section that contains the heavily downscaled---8x or more---image in the modular bitstream, even when the frame itself uses VarDCT.
Any higher resolution would require some support from the encoder. The prime mechanism relevant here is a version of the modified Haar transform named Squeeze, which generates two half-sized images from one source image. As each output image is placed to distinct sections, only one out of two output images is needed for low-fidelity decoding. If the encoder didn't do any transformation however (often the case in VarDCT images), then all sections would be required regardless of the target resolution.
Therefore it is technically possible and in fact libjxl does support partial decoding by rendering a partial bitstream, but anything more than that would be surprisingly complex. For example how many bytes are needed to ensure that we have at least 8x downscaled image? This generally needs TOC, and yet a pathological encoder can put the LfGlobal section to the very end of frame to mess with decoders (though no such encoder is known at the moment). Any transformation, not just Squeeze, has to be also accounted to ensure that all of them will produce the wanted resolution once combined. Since the ICC profile and TOC already require most entropy encoding stuffs except for meta-adaptive trees, even the calculation of the number of required bytes already needs about 1/2--1/3 of the full decoder in my estimate from building J40.
That said, I'm not very sure this complexity could've been radically reduced without inefficiency in the first place. In fact I've just described what I wanted when I started to build J40! I think there was an informal agreement that the ICC profile could have been made skippable, but you still need all the same stuff for decoding TOC anyway. Transformation is a vital part of compression and can't be easily removed or replaced. So any such tool would be definitely possible, but necessarily complicated, to build.