There is one situation where 44k/24 bit and 88k/24 bit CAN sound appreciably different, and that's when aliasing is introduced into the recording, mixing, or the sample rate conversion.
If proper precautions are not taken during the recording/mixing/mastering phases aliasing artifacts can be heard in the recording. This may account for the differences that some people hear when judging whether there are differences between the two. Higher sample rate files are more permissive of aliasing and exhibit less perceptible artifacts. So you're less likely to hear it at a higher sample rate.
The artifacts of aliasing manifest as inharmonic distortion that starts at the top octaves and then folds back into lower frequencies as the effect is intensified. This can be easily perceived by most listeners if it is pointed out to them. It is not a pleasant effect like first-order or second-order distortion. It does not compliment the record at all.
That said, if proper precautions are taken to mitigate latency artifacts during the record-making process then a listener shouldn't perceive any difference between a 44k and an 88k record. The best case scenario is often a record that's recorded, mixed, and mastered, at high sample rates, even if it's ultimately be down-sampled to CD quality (44 kHz).
If you think about this "aliasing" as in, what occurs in 3d graphics, then you can understand this. What these 3d fiters do is either remove infirmation with blur (FXAA) or use information that is not available in the image (MSAA and derivatives)
In audio recording, sampling at 88k would be like generating MSAA x2 image, so it can be displayed with higer fidelity, despite the outgut resolution being in lower 44k sampling rate.
Aliasing can actually occur in the capture stage if the setup isn't right. Think of moire patterns in a video with highly textured subjects. It can happen whenever a signal is sampled.
The mastering discussed higher up in the thread is going on ahead of time at the studio, not on your playback system. The whole mastering pipeline starts with some initial capture resolution from microphones/cameras. The studio processes these original raw captures into a combined form and prepares the distribution format, i.e. a planned audio/video stream resolution. The studio can use different resolutions during capture, processing, and final distribution.
Generally speaking, the highest rates would be easiest to work with and avoid perceptible artifacts. But practical tradeoffs are made to save cost whether in processing, transfer, or storage.
The point is that because of sampling, order of operations can matter. So having a 88k file -> apply an effect -> downsample to 44k, can sound different than having a 88k file -> downsample to 44k -> apply an effect.
This is an important point. The main reason that pro audio gear pushes bit depth and sample rate up to higher that 16/44.1 audio is because when you start doing the floating point math to mix and apply effects to audio you can end up with audible differences when multitrack recording. In this case (and I still think it’s optional for all but the most demanding recording of live performance) higher sample rates can help and to a lesser degree but depth can give you more dynamic range.
I give that long preamble to say once a record is done and mastered, having > 16/44.1khz is wasted bandwidth.
Being pedantic here, but since it's on topic. This only applies for non-linear processing, which is most of what we want to do when mixing music. But not exclusively.
Where would “taking a 44k track and up-sampling it to 88k (audio DLSS?), applying effects, and then downsampling back to 44k” fall between those two points in the spectrum?
The downsampled 44k that went through a half rate filter might actually sound better, for that matter. The speakers won't try to reproduce the content above 22khz then.
Even in professional audio environments, monitor speakers' electronics are generally designed to filter out such high frequencies.
The "frequency response" spec listed on speakers will tell you what range they are designed to reproduce. Typically, it's approximately 20Hz to 20,000Hz to match human hearing, perhaps with a higher floor if the speaker is designed to be paired with a subwoofer. This range is usually a deliberate (and sensible) limitation imposed by the electronics, not necessarily the materials or the magnet design, etc.
Some speaker manufacturers will list abnormally low or high range numbers on the spec sheet in an attempt to attract customers who mistakenly believe a wider range means the speaker is better. But even those speakers have a steep roll off curve at the extreme ends of the range, so it barely makes any difference.
Even pro audio manufacturer don't seem to lowpass their loudspeakers. Compounded with aluminium dome tweeters having a breakup resonance around ~25 kHz, the problem /could/ exist; but since we don't hear above 20 kHz (and not much above 18~19 for most of us), the only possibility would some kind of mythical modulation I've never seen measured in real world design.
About 20 years ago I bought some live concert DVDs but the band did not release an audio CD version of it. I wanted to listen to it in my car so I ripped the audio from the DVD but when I made a CD it sounded a little off.
Then I learned that DVD was 48 kHz and not 44.1 kHz so the conversion program I used didn't account for this. I went back and used a polyphase filter to adjust the sampling rate. It sounded normal again but there were some audio glitches at various times.
I went back about 10 years ago and ripped it again and converted it to FLAC which supports multiple sampling rates like 44.1, 48, 96, whatever and now everything sounds good.
Aliasing is also icky if you go from 44.1kHz to 8kHz (for phone systems), a 48kHz source would be better for that.
But 44.1kHz worked better in the lead up to CDs (works for modulating onto video tape in PAL and monochrome NTSC), so it won until DVD audio brought 48kHz to the masses.
If proper precautions are not taken during the recording/mixing/mastering phases aliasing artifacts can be heard in the recording. This may account for the differences that some people hear when judging whether there are differences between the two. Higher sample rate files are more permissive of aliasing and exhibit less perceptible artifacts. So you're less likely to hear it at a higher sample rate.
The artifacts of aliasing manifest as inharmonic distortion that starts at the top octaves and then folds back into lower frequencies as the effect is intensified. This can be easily perceived by most listeners if it is pointed out to them. It is not a pleasant effect like first-order or second-order distortion. It does not compliment the record at all.
That said, if proper precautions are taken to mitigate latency artifacts during the record-making process then a listener shouldn't perceive any difference between a 44k and an 88k record. The best case scenario is often a record that's recorded, mixed, and mastered, at high sample rates, even if it's ultimately be down-sampled to CD quality (44 kHz).