And if you have any actual evidence that the other groups you name are being measurably affected, make your data known, so we can make corrections in those cases as well.
That can be trivially done for the reason I mentioned before: you can divide people into an infinite number of groups based on an infinite number of traits. Showing the software is discriminatory thus becomes a pointless exercise of arbitrary human categorization scheme creation.
The objective should be to reduce how many people in total die as a result of the software's flaws. Showing what particular groups are "measurably affected", meaning have traits that the software does less well with, does not provide any valuable information. It ignores the whole to focus on an arbitrarily elevated part deemed more important than other parts. Because if the software fails in 7% of cases with anti-race-disparity development priorities, and 5% of cases when development prioritizes population-wide performance, you are sacrificing more people of other groups to get better results with a favored group.
As for unfairness: we can slice and dice the statistics to show less or more disparity between the average and a disadvantaged group, by selectively manufacturing group categorizations that produce more or less disparities (a group can be anything: people with dark skin, people with wide-set eyes, people with small chins). It's impossible for the software to perform equally well with all people, unless it is perfect. Getting to perfection is more efficiently done by focusing on improving the statistics in relation to the whole of the population, rather than any subset of it.
So to summarize: choosing race or skin-color as the categorization determinant is not objectively any more moral than choosing any other trait, and trying to find groups that are exceptionally disadvantaged is an impossible feat because an infinite number of groups can be created using an infinite number of trait combinations.
I can't show real world numbers when the phenomenon in question can't have controlled experiments run on it, and I don't need to show real world numbers to make a case for the logical soundness of a principle, in this case the principle that prioritizing improvement of a metric other than overall performance will generally lead to less overall performance improvements than not doing so.