I did mean hamming. I group by length as insertions and deletions are not expected. I convert to a one-hot vector so I can xor to get the differences.
Thanks for the suggestions. I hadn't thought about splitting the the strings into 3 roughly equal sizes though, that would definitely help narrow the search pool. FWIW, the strings use the protein alphabet (20 values instead of 4).
Thanks for the suggestions. I hadn't thought about splitting the the strings into 3 roughly equal sizes though, that would definitely help narrow the search pool. FWIW, the strings use the protein alphabet (20 values instead of 4).