I'm toying with implementing O(1) copies by using a reference counted pointer to basic_string instead of storing it by value within the class. It would mean a small memory overhead and another level of indirection, but probably worth doing. Thoughts?
libc++ (the Clang project standard C++ library) doesn't use COW for std::string, and I'm pretty sure GNU stdlibc++ will also drop it the next time they have to break ABI. C++11 move semantics and the 'small string optimisation', as it's known, blow away any performance benefit of COW for most sane uses.
I don't have a copy of the final 2011 standard, but as far as I can see there's nothing in the latest 2011 draft of C++11 that expressly prohibits COW.
[21.4.1.4] does say "Every object of type basic_string<charT, traits, Allocator> shall use an object of type Allocator to allocate and free storage for the contained charT objects as needed", but that "as needed" most definitely leaves the door open for a COW implementation.
The only other part that I can see that may preclude a COW implementation is the postconditions specified for copy operations [21.4.2], which says data() returns a pointer which "points at the first element of an allocated copy of the array whose first element is pointed at by str.data()". Again though, "allocated copy" doesn't necessarily mean "a copy I just allocated". When I go and get a copy of a book I don't literally go and copy it.
In fact the standard specifies that the move constructor leave the source value "in a valid state with an unspecified value"... which again suggests you could use COW and have the source argument return the same value it had before you moved from it.
In the latest C++14 draft though (N3690), you're right, it is explicitly prohibited because "Invalidation is subtly different with reference-counted strings".
21.4.1 p6 states invalidation of iterators/references is only allowed for
— as an argument to any standard library function taking a reference to non-const basic_string as an argument.
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.
The non-const operator[] in a COW string requires making a copy of the string if the ref count > 1 which invalidates references and violates this paragraph.
Hmm true. This is what happens when you let the user violate the iterator abstraction for performance and rely on contiguity and raw refs/ptrs. It's kind of a shame, because the moment you mutating characters in a string you're often doing something silly anyway.
The class is movable already. Reference counting will allow a copy constructor or copy assignment to avoid copying the string object, just increase the reference count.
reply