I found the link to Stroustrup's "Learning Standard C++ as a New Language" [edit...

saurik · on March 23, 2014

Writing tac in C++ is trivial: std::deque<std::string> lines; for (std::string line; std::getline(std::cin, line); ) lines.push_front(line); for (const auto &line : lines) std::cout << line << std::endl; // Note that I typed this quickly off the top of my head: I'm willing to believe there is a trivial typo the compiler would catch, but the overall implementation should be fine ;P.

As for ohce, you have defined a very very hard problem, one that it does not seem you realize is quite as hard as it actually is: if you have a sequence of Unicode codepoints and reverse their order you do not end up with a string of reversed characters, not in the general case, and not even for some reasonable encodings of seemingly-simple cases like an accented letter e.

Like, I challenge you to provide a working version of ohce in Python (2 or 3: your choice). Virtually no language actually provides a string type that makes this problem reasonable. It simply isn't fair to pick on C++ in this regard when no language "gets this right": at least C++ is being honest about the lack of guarantees it is making about string manipulation.

For more information, I recommend reading this article:

http://mortoray.com/2013/11/27/the-string-type-is-broken/

e12e · on March 24, 2014

> Like, I challenge you to provide a working version of ohce in Python (2 or 3: your choice).

Not a full implementation, but wouldn't this approach actually work (note, doesn't work for python2):

    $ python3
    Python 3.2.3 (default, Feb 20 2013, 14:44:27) 
    [GCC 4.7.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "abc"[::-1]
    'cba'
    >>> "øæåにほ言"[::-1]
    '言ほにåæø'
    [edit: with accents]:
    >>> "eẽêëèøæåにほ言"[::-1]
    '言ほにåæøèëêẽe'

    [edit2: formatting, indentation]

There might very well be problems with this, but I'm not aware of any?

[edit4: This is indeed broken for the ligature (baﬄe) case in python3. I'm not entirely sure that is an entirely fair test (but it is very interesting). I would argue that the ligature should probably be a replacement done for display/print, not in a text file. Just like the reverse of "æ" isn't a reverse composition of "e" and "a" (even if "æ" might be seen as a compositon of "a" and "e".

I'm not sure how it deals with changing direction (left-to-right, right-to-left) -- comments welcome.]

[edit3, sorry for the many edits]

To be clear, I do not wish to "pick on c++", nor do I think the example is trivial. I do think it probably should be trivial -- it is something that should be supported in a canonical way by a standard library/implementation.

Working with graphemes is a very fundamental part of working with text -- the fact that half(?) of developers have been able to hide behind ascii isn't a good excuse for not fixing it. How would one implement an editor if you can't access graphemes in a reasonable way?

And more importantly, how would you test for palindromes? ;-)

mpyne · on March 23, 2014

For the "reversing Unicode text" problem the easiest C++ solution is probably to use an external library such as QtCore or ICU (Qt uses ICU internally).

Unfortunately even in UTF-16 grapheme clusters do not correspond 1:1 with Unicode code points so you wouldn't be able to just reverse a list. But Qt can split up a QString into its grapheme clusters (a quick example I had made a couple of months ago):

    static QString reverse(QString src)
    {
        auto src_nfc = src.normalized(QString::NormalizationForm_C);
        QChar *start = src_nfc.data();
        int length = src_nfc.length();

        QTextBoundaryFinder finder(QTextBoundaryFinder::Grapheme, start, length);
        finder.toStart();

        // Reverse code elements that make up a code point when that code point has
        // been expressed in more than one code element (which is even possible in
        // UCS-4!)
        while(finder.position() < src_nfc.length()) {
            int oldPos = finder.position();
            finder.toNextBoundary();
            int newPos = finder.position();

            if(newPos - oldPos > 1) {
                std::reverse(start + oldPos, start + newPos);
            }
        }

        std::reverse(start, start + length);
        return src_nfc;
    }

yati · on March 23, 2014

This. I am very happy with how C++ has evolved and how expressive it has become today, but decent Unicode support at least in the stdlib is something any programmer would(hopefully) look for in a modern language. Thanks for the interesting links, though.

e12e · on March 23, 2014

FWIW I've submitted an improvement request to the c++ faq along these lines.

a8da6b0c91d · on March 23, 2014

There's ICU and boost.locale but unicode in C++ truly remains a pain in the ass.

I feel like C++ is still better as the "fast parts" language under a layer of Perl or whatever for the shoveling data and text around stuff.

e12e · on March 23, 2014

Any canoical examples to go with those two? As I understand it now, I can pretty much get away with utf8 and some locale code, as long as I stick to (possibly some subset of distributions of) Linux. Which really is fine for my use case, but it's not really a very nice stance to take (it's all fun and games until you need to work in an environment where you for some reason or other can't change the OS, and need that clever utility that wasn't quite as standard/cross-platform as it maybe should've been...).