> note: in java, everything(except basic types) is a pointer. so you rather should ask the opposite: why do I need simple objects?
I think this is by far the most important piece of advice in that thread. It gets right to the heart of one of the most basic areas of confusion that programmers coming from garbage collected environments to ones with manual memory management have.
They need to understand that they have NOT been working in an environment without pointers, as they are often told to believe, but one in which nearly everything is a pointer. Understanding this is fundamental to understanding these differences and a 10,000 word essay about C++ allocation policies will not make it any clearer without dealing with that first.
But the fact that they are programming in an environment where everything is a pointer is hidden to the them. In C++ one needs to explicitly use pointers whereas in Java all of the pointers are handled behind the scenes.
Not really. In Java it's a leaky abstraction - it's true that the JVM does a lot of micromanagement for you, but if you are careless you start getting exceptions (best case scenario) or start to get mysterious changes in your data (worst case scenario).
I agree with grandparent that understanding what a pointer is and that almost everything in a Java program is a pointer it's critical for a Java programmer.
I think that might be a bit too broad of a take away. I hate to be a pedant but C# allows user defined value types and I'm sure there are other GC'd languages that aren't as reference type heavy as Java. I think it would be a more fair statement if you replaced "garbage collected environments" with Java. I say "more fair" because I don't think I know any serious Java devs who don't understand the difference between reference and value types. I could just be lucky with who have crossed my path though.
EDIT: After thinking about it a little, most of the Java people I know have at least some embedded or native background. This could be a reason. I don't know what the market over all is like.
Consider the other direction from Java, into the fully dynamic languages. Many of those give you even less ability to control heap vs. stack allocations than Java does and often present an everything-is-a-reference facade even on things that are often optimized to stack objects (Fixnum in Ruby for eg).
But yes, a java programmer with embedded or native background would have a better understanding of this than people who learned Java first, which is what I'm really addressing here. And in the OP on SE you can clearly see that this is a big part of that person's confusion.
Most of them are recent high school and college dropouts. What did you expect them to know?
They learned programming by reading blogs and pirated PDF files on BitTorrent. Of course they don't got a level of understanding most hackers here have.
You want a good programmer you need 15+ years of experience plus a good understanding of computer science, recent high school and college dropouts don't have that.
You hire a cheaper labor source, you won't get that level of quality a more expensive labor source will give you and you get what you pay for!
Well, to be fair, I don't have 15 years of experience (professional experience anyways). But I do believe the fundamentals are important and they seem to be glossed over.
There are some very good answers to that question. Related to that question, I still fail to see any advantage to using shared_ptr, weak_ptr, unique_ptr, auto_ptr, etc. If you are doing C++ right, nearly everything should be allocated on the stack, and the things that aren't should be encapsulated in a class, RAII style. If you can't design your program so that the ownership and lifetime of your objects are clear, then you're using the wrong language, and shared_ptr won't save you.
The point of the smart pointers is that they are RAII classes. If you have an object that manages more than a single resource, you open yourself up to all kinds of exception safety and thread safety issues.
For example, let's say you have a class that news up an object in its constructor. Presumably, the corresponding delete lives in the destructor. But if that constructor throws an exception, the destructor will never be called. If it throws the exception after the "new," then you have a memory leak.
So basically, you need a class whose only job is to manage a single pointer. You could certainly roll your own smart pointer, but it would have to work pretty much exactly like one of the ones in the standard library.
The word RAII by itself is too non-specific. People typically mean: use (i) a scoped/auto pointer or (ii) a shared pointer (thats C++ speak for reference counted pointers). These are good thumb rules to rely on, but in itself they are very inadequate.
First, auto or scoped pointer gets you a stack discipline. If your objects weren't very big, you might as well create them on the stack itself. More efficient and suitable for multithreading. If the lifetime does not follow the lexical stack then you are out of luck with this style of RAII.
Next, reference counted pointers: Cycles are clearly a problem. The traditional mantra is: use weak pointers to break cycles, but then you are back to error prone manual management (of cycles). If I am to be trusted with using weak pointers correctly, I wouldn't be too bad with manual memory management either. Pure functional languages with eager semantics do not allow cycles, so are ref counts a good idea in such cases ?
Although, many do not think about it that way, reference counted system is a _garbage_collector_. Just a particularly simple one. Just because one understands how it works, does not disqualify it from being a garbage collection system. It is just not a very efficient garbage collector for cases where running time is sensitive to cache locality, which is true fairly often.
I am quite excited about Rust's regions. I dont think programming with regions is easy and you would want to have a garbage collector anyway. I hope this style of programming sees more mainstream visibility. Something should come out of all the MLton work on this. EDIT Indeed meant MLKit but it draws enough from MLton that I thought HNers would be able to relate.
There is also D's scoped guards, but I am not yet familiar with what they do.
[Regarding Rust:]
> I dont think programming with regions is easy and you would want to have a garbage collector anyway.
I agree and I've been wanting to emphasize this too for a while.
I think it would help Rusts adoption a lot if it would get a good garbage collector while it's still young. Otherwise I think a lot of programmers will give up after stumbling around with borrow-checker errors when doing some things that are pretty trivial in other languages.
If we had a garbage collector, we could only worry about regions in the hot spots, that would ease many people's way into the language a lot, IMO.
I actually think the moves Rust has made away from an expectation of GC as part of the language will only help its adoption down the road. There are about 8 billion languages occupying some facet of the C++ With Garbage Collector niche and we don't really need another one. We need a better C++, and it's not a better C++ if you can't use it without a garbage collector, it's just another Java/D/Go/so on and on and on.
I agree it should be usable without a garbage collector, for sure.
But some types of algorithms which are allocation heavy (with dynamically sized parts) will just run much faster with a good collector, and are easier to write that way.
Also I've found that with regions, things get really complicated when structures contain borrows (and not every structure will logically own what it references), and then the structures become parameterized over the lifetimes, which multiply quickly.
For cases like that I'd rather start with a GC reference, and gradually remove GC references where time permits and profiling dictates.
I think Rust can really have the best of both worlds here.
BTW: I did manage to finish a fairly involved analysis program in Rust without using any GC, and I'm really very happy with the results. Being able to GC a few references would have made things easier though.
> For cases like that I'd rather start with a GC
> reference, and gradually remove GC references where time
> permits and profiling dictates.
Long ago this was one of the original theses of Rust, that you'd favor GC'd objects at first and then we'd provide easy ways to transition your code to use linear lifetimes where profiling deemed necessary.
It took us a long time and boatloads of iteration to finally realize that this just wasn't feasible. The semantics of GC'd pointers and linear types are just too drastically different, especially with how they interact with mutability and inter-task communicability. We "officially" abandoned GC in the language itself last October, and we're still trying to wrest the last vestiges of its use out of the compiler itself. Ask pcwalton how much of his time has been spent doing exactly that in the past four months. Just yesterday one of our most tenacious community contributors proudly announced that an all-nighter had resulted in removing two more instances of GC'd pointers from the compiler, which required touching "only" a few thousand lines. It's a bad scene.
So yes, while Rust will still offer GC'd pointers in the standard library, we've long since learned not to encourage users to reach for them without great justification. You must understand that once you introduce garbage collection into your code, it will be very difficult to return from that path.
We've also, in practice, tended to point people toward RC'd pointers (with weak pointers to handle cycles) whenever anyone needs multiple references to the same piece of memory. Thanks to linear types, reference counting in Rust is actually surprisingly cheap (many instances where other languages would need to bump the refcount are simply achieved by moving the pointer in Rust, since we can statically guarantee that that's safe).
No, RAII is a general technique with an object owning a resource, to limit it to those 2 things is just wrong. When using std::shared_ptr, cycles are rarely a problem. Also, I don't think MLton ever implemented anything relating to regions, perhaps you are thinking of ML Kit.
Not disagreeing, you missed the "typically". Well, I find manual memory management to be rarely a problem, but when it is ...
EDIT: @detrino Ah! our typical sets are different then. How about "destructor dependent cleanup", call what we may, thats the underlying mechanism, but its too limiting.
Obj-C has been a (non-automatic) refcounted language since its inception, save for a very short period of more advanced GC on OSX (the GC never made it to iOS).
Also, ARC is not a refcounting GC, the ARC calls are injected statically in the binary as if they'd been inserted manually in the source.
They have no real reason to exist. At best, they are syntactic sugar (auto_ptr), at worst they actually encourage poor design (shared_ptr). They are basically a crutch for programmers who grew up with managed languages and are afraid of raw pointers.
That's just false. The automated cleanup could be done manually, but the type safety is invaluable. Change how you want to handle ownership of a particular object as part of a refactoring. Now, find everywhere that assumes the previous model - if you use bare pointers, it's a pain in the ass. USE TYPES TO HELP YOU WRITE CORRECT CODE. I do a lot of this in my C code.
I happen to agree with the grandparent post. C++ is overbloated with features and when STL and templates came out, not all compilers even supported all their features. Not to say that templates themselves are bad, but there are too many things in C++. For example copy constructors vs initializers, (String)foo vs String(foo) etc. And now with C++0x and 11 we have even more. Lambdas? Concepts were about to be in there? Etc.
A programming language is supposed to be small so that programmers can share code. The rest can be handled with libraries. That's why C is better for huge projects. Linus agrees with me :)
In addition, why encourage reference counting pointers? If you really want to manage objects on the heap, have a garbage collected reference counted pointer.
How did we ever survive without them, before boost showed up? I seem to remember doing just fine. They are a symbol of the useless bloating of C++, a language trying to be all things to all people. A half-assed halfway house between managed and unmanaged. You're either a competent C++ coder, in which case you shouldn't need them, or you're new to the language, in which case they are obfuscating the details that you need to learn.
> transfer ownership
Then transfer ownership. The object that owns them encapsulates a container. When it is destroyed, it iterates the container and destroys the objects that it owns. So to transfer, take it out of that container and pass it to the new owner. It isn't rocket science.
>unknown lifetimes
Then you have a lazy, poor design. And that's the point really, managing your memory properly is an intellectual rigour - it forces you to improve your design, to be acutely aware of these issues. Not just spawn shared_ptrs and hope for the best. We have nice forgiving languages where you can be lazy. Coding in C# is like taking a holiday. C++ is meant to be hard.
I thought C++ was meant to be a lot of things but I don't think it being "hard" was one of the key points of its inception.
Just because you're using a "hard" language, it doesn't make you a better programmer. If the language can be made easier, more productive to write, with less problems and bugs due to memory leaks and inaccuracies, while not impacting performance, why wouldn't you want to do that?
That's like asking a marathon runner why he doesn't just hop in a car.
I use many languages. I use C# to knock out a GUI. VBA for a bit of spreadsheet work. Q/KDB+ for database/functional work. R for stats. All of these are far more productive than C++. If you want fancy abstractions, other languages have them by the bucketload.
I don't use C++ to be productive, and I don't expect to be. I use it to get close to the metal and have full control. Everything you code will take 4-5 times longer to write. And longer to compile. You will be careful because you have to be careful, and that's a good thing because with that care comes quality.
That's like asking a marathon runner why he doesn't just hop in a car.
Most C++ coders I've talked to (including all of the ones who get paid for doing it) are using C++ to get useful stuff built, not for the sake of 1337ness.
You are aware that recompiling with c++11 can improve the speed of existing STL code bases, right? (some bloat, move constructors are, and guess what, that makes transferring unique_ptr free [edit: as in no overhead over manually doing it]..) And you are aware that 95% of the code should not be performance critical, and be made easy to write? And you are aware that before standardized smart pointers everybody wrote their own by hand, to take advantage of RAII?
Can someone explain RAII to me? In the way that it's commonly used, it seems like it really means "stack variables that go out of scope are deallocated", which seems kind of, well, duh? I don't really understand why it's treated like a big deal.
To better understand the importance of RAII, you might consider what some other languages offer to solve similar problems.
Consider a C example:
int f()
{
int ret = -1;
handle * a = acquire_handle(0);
if (a == NULL) goto fail1;
handle * b = acquire_handle(1);
if (b == NULL) goto fail2;
handle * c = acquire_handle(2);
if (c == NULL) goto fail3;
// use a, b, c
ret = 0;
release_handle(c);
fail3:
release_handle(b);
fail2:
release_handle(a);
fail1:
return ret;
}
Consider a C# example:
void f()
{
using (handle a = new handle(0))
using (handle b = new handle(1))
using (handle c = new handle(2))
{
// use a, b, c
}
}
Now a C++ example using RAII:
void f()
{
handle a{0};
handle b{1};
handle c{2};
// use a, b, c
}
These examples are mostly equivalent (Although the C#/C++ assume exceptions instead of error codes).
The C#/C++ examples are far more structured and less error prone than the C example.
The advantage of C++'s RAII over C#'s using statement is that cleanup is tied to the object rather than a statement. This means that RAII is both composable and non-leaky as an abstraction. You cannot forget to destruct an object in C++, and you don't have to care that its destructor frees resources. When you have an IDisposable member in C# you must manually mark your class as IDisposable and then implement the Dispose method yourself. Clients must also be aware that your class is IDisposable in order to use it correctly.
Stylistic side note for C#: If you're nesting using blocks like that you can leave out the braces in all but the deepest instance which reads a little nicer, imho:
using (handle a = new handle(0))
using (handle b = new handle(1))
using (handle c = new handle(2)) {
// use a, b, c
}
It gets rid of excessive nesting when you need multiple resources allocated after another (e.g. SqlConnection, SqlCommand, etc.). You can do
using (handle a = new handle(0), b = new handle(1), c = handle(3)) {
// use a, b, c
}
instead, too, but that obviously only works with equal types.
I'm usually not a friend of too deep nesting (worst thing I've seen in our codebase was 65 spaces deep) and in C# you already have one level for the class, one for the namespace (possibly) and another for the method. No need to add two more if you can help it.
Right, I understand that it removes excessive nesting, but the first example just looked ... wrong to me. Your example with above looks somewhat better.
The worst problem is that the destructor should not throw an exception (AFAIK). Thus, c++ RIAA is not perfect - can't use it for things that could fail. edit: some people use it for closing/flushing files, bad idea.
edit 2: I judge a language in part by seeing if they actually can define nested / chained exceptions well. If they don't, they're probably not too serious about exception handling...
Out of curiosity, how would you recommend handling a situation in which a file close fails? My instinct is to have an RAII class whose destructor catches any exceptions, adds the error to some list somewhere, and doesn't rethrow anything.
What's an example of a language you like that does exceptions in a better way?
By the way, I posted on the GOLang list a while ago my suggestion for error handling. It's like exception handling, but it's all done through function calls, (no try catch and jumping forward). Example:
ret, e = _some_function()
# e is non-null if an exception/error happened
ret = some_function()
# on error, an exception is raised and caught by the first parent function that used _call() syntax (leading underscore)
So basically, _func() means call func(), and have an extra return value that is an exception or Null if no error. All one has to implement is either a _func or func, not both, the compiler handles the conversion.
So bottom line, you can write real exception safe code and not have to worry about a panic/exception breaking the flow, by using _func() calls, or you can bail by using func() calls, it's your choice.
edit: not sure how this ties in to RAII, if at all. Guess I should get back to the drawing board... :-)
Like everyone, I'm still waiting for the perfect language. That said, due to this issue and the fact that writing exception safe code in c++ scares me (due to having datastructures potentially being in a wierd state), I have never used exceptions in c++.
I'd like to try c#, to see if it has the power of pythons unchecked exceptions and with blocks but with more helpful static typing for more 'mission critical' stuff, but I'm quite torn because as my name suggests, I like working on linux...,
Stack variables that go out of scope are deallocated, that's one of the points of RAII (an advantage over dynamic allocation, especially because scope is also terminated by exceptions.)
But crucially, their destructor is called (and any destructors of their member variables, etc.) The destructor might perform cleanup work beyond freeing up memory. For example, std::istream will close the file handle when it goes out of scope, and boost::lock_guard will release the lock.
RAII isn't rocket science, but there are some advantages over plain C where all cleanup has to be explicit and it's fairly easy to get resource deallocation wrong if you don't know what you're doing. And as the Wikipedia article says, RAII is vital in writing exception-safe code in C++ if you don't want to litter try/catch blocks all over your code.
> it seems like it really means "stack variables that go out of scope are deallocated"
And the destructor, if any, is called. That means you can bundle automatic cleanup action in the destructor, no need for goto: cleanup, @try/@finally or some other sort of "manual" cleanup, the cleanup is implicitly set up by, well, the setup.
The ability itself is by no means unique to C++ (even amongst oldies, Smalltalk has BlockClosure#ensure: and Common Lisp has unwind-protect) but a surprising number of languages still don't have such a capability, and it's a pretty neat consequence of C++'s memory model.
Right, that's implied by a C++ object being deallocated. I understand how it works - it's just that it's really obvious if you know anything at all about how the stack works (and C++ destructors).
> it's just that it's really obvious if you know anything at all about how the stack works (and C++ destructors).
Which does not change it being an elegant solution to the problem of resource scoping does it?
Also the semantics of C++ destructor (and their relation to stack deallocation) had to be defined to get RAII, and it was, back in the very early 80s when the idea of resource scoping was not exactly well known.
It's using automatic storage duration and destructors tow age dynamically allocated memory (or some other resource which require manual management.). So yeah, that's it.
The "non obvious" part that makes it RAII is simply that the stack variable in question owns another resource besides itself. The stack variable is being used to wrap something else, be it a chunk of memory, a file handle, one or more other dynamically allocated objects, etc. The key in RAII is that when the stack variable wrapper goes out of scope, the wrapped resource (which is in addition to the stack variable) is deallocated, freed, released, etc.
mutex.lock();
doStuff(); /*throws an exception, mutex is never unlocked*/
mutex.unlock();
ScopedLock lock(mutex);
doStuff(); /*mutex unlocked by destructor in all cases*/
It guarantees that resources are freed no matter how the scope exits.
RAII is "Resource Acquisition Is Initialization", and it means "tie the lifetime of a resource to the lifetime of an object" because we have ways of making sure objects are released when they are no longer wanted (of which stack allocation is the easiest, when it's applicable).
The crucial idea is that what's pointed to by those variables is also cleaned up. When a POD pointer goes out of scope, it doesn't help you that the pointer is deallocated, you need what's pointed to to be deallocated.
Beyond pointers, it applies to many things that have a "C-style" API. Without RAII, anything that needs to be explicitly cleaned up needs to be monitored. For example, if you declare a pthread mutex, it doesn't help you that the mutex variable itself goes out of scope, because it's just a pointer and that will not clean up the actual mutex.
And as people have said, without RAII it's impossible to write exception-safe code.
"And as people have said, without RAII it's impossible to write exception-safe code."
Assuming that's hyperbole, I agree with the sentiment. It's possible to write exception safe code with try & catch - it's just almost as ugly as checking exit statuses and easier to miss something.
> "stack variables that go out of scope are deallocated"
That's all it is.
But, it guarantees that a given piece of code (an object's destructor) is always executed when execution leaves the current scope, either through normal execution, by calling `return`, or by throwing an exception.
In fact, RAII is the thing that allows exceptions to be remotely usable in C++; the lack of RAII in Obj-C is the reason why exceptions are not common in Obj-C.
It's used for regular memory management, but it's also used for managing access to other resources, such as files, mutexes, etc. E.g., you can use it ensure that a file handle is automatically closed when you leave scope.
To my mind it's like stack objects - not only do you not have to worry about freeing them, the destructor gets called and so do subordinate destructors for memeber objects. It was a bit of a revelation when I did some 'real' c++ for the first time a couple of years back.
In some ways it is just another of the myriad of implicit behaviours in C++ that make it hard to work with, though.
I find it interesting that nobody mentions iterators anywhere in this discussion. One of the main uses of pointers is to iterate through data. Pointers are the most basic form of iterators in C++. STL iterators are based on, and often implemented in terms of, pointers.
One interesting thing is that the QT framework prefers the use of pointers over reference variables. There's actually some reasonable logic here. Pointer syntax clearly shows you are acting on something in a different way than you act on an "ordinary" object. Reference object syntax is the same as ordinary object syntax but you are making changes that will go beyond the local scope and that is thus less obvious.
With this approach, you can still allocate objects on the stack - you then take their address and pass that to a subroutine.
Presumably, this pointer vs. reference debate is only relevant in C and no longer has a place in C++ with the use of unique_ptr, shared_ptr, std::move, etc?
C has no references -- 'type& param' -- so there is no debate. This is a topic exclusive to C++, and isn't affected by smart pointers, which overload all the pointer operators in order to act like regular pointers, only with some sort of lifetime management.
In the Qt idiom, these smart pointers are irrelevant (as mentioned in the other poster). Here, when you are passing a pointer, it's going to be a pointer to an object on the stack which is always going to be good - the pointer argument shows you that you are dealing with something outside the functions local scope, the variable allows you pass information back, etc.
Qt also has used its own smart pointers for a while but these generally are declared within some container class, so the idiom basically allows you to do allocation entirely within containers and so never use the new operator directly.
Now, there may be situations where you have a complex chain of multiple connections that don't fit to an existing container. The sloppy way then is to use new and the good way to write a custom container. The idiom can also when dealing external libraries that allocate memory directly or somehow demand you use the new operator.
They don't even bother to teach pointers, the stack, or heap. It just confuses the students they claim. Also teach students to store the value of PI into an Integer. Use DRJava and Java 6 and some bloated JAR library that makes Windows crash.
When I studied computer science in 1986, the very first things we learned where how memory and binary worked, plus what a pointer was and why it should be used and how to manage memory so it would not crash and lock up. We learned how to debug as well.
Most comp sci graduates that I talk to know almost nothing of how a pointer works, or the stack or heap, and binary might as well be Tagalog to them.
I think that over the years as the demand for IT workers increased, they relaxed the standards they used in teaching computer science and programming.
Any good MOOC should teach:
Pointers
The Stack
The Heap
Binary
Hexadecimal
How to debug code
Assembly Language instructions the code gets converted into and how to debug that
I think this is by far the most important piece of advice in that thread. It gets right to the heart of one of the most basic areas of confusion that programmers coming from garbage collected environments to ones with manual memory management have.
They need to understand that they have NOT been working in an environment without pointers, as they are often told to believe, but one in which nearly everything is a pointer. Understanding this is fundamental to understanding these differences and a 10,000 word essay about C++ allocation policies will not make it any clearer without dealing with that first.