After a bit more investigation, I found that if you replace the following code: ...

fauigerzigerk · on May 7, 2016

I'm seeing a 23% increase in running time after making that same change. Strange.

That's with Xcode 7.3.1 (7D1014)

austinz · on May 7, 2016

Weird. I'm using the same version of Xcode, running on a trashcan Mac Pro running OS X 10.11.4.

fauigerzigerk · on May 7, 2016

OK I found the cause of that weirdness. I had slightly changed my test code since I posted it here weeks ago. After undoing that change I'm seeing the same thing you do.

But this raises more questions, because what I changed is the test code generation. I'm now generating a million different strings instead of adding the same string a million times (note the \(i) at the end of the string):

    func generateTestData() -> [String] {
        var a = [String]()
        for i in 0..<N {
            a.append(",,abc, 123  ,x, , more more more,\u{A0}and yet more, \(i)")
        }
        return a
    }

The running time of generateTestData() isn't what we measure but apparently the performance improvement you found only works if the same string is used every time. Otherwise performance drops.

austinz · on May 7, 2016

That's bizarre.

One thing I've noticed is that performing the string scanning operation is relatively cheap. (If the splitAndTrim code is modified to not use Strings and to return a [String.UTF16View], the runtime is around 1.2 seconds.) It's the process of building Strings out of those UTF16 views that is destroying performance.

I still don't know why changing the way the input data are constructed would have that effect, except to guess that the underlying representation is different somehow. I'll file a ticket.

fauigerzigerk · on May 7, 2016

This looks to me like memory allocation / reference counting is at least part of the problem. Slicing a UTF16View to get another UTF16View mostly likely doesn't involve any dynamic memory allocation at all.