Nov 2003

C++ jams

or why I think C++ is a complex language
(*) especially good




Nested friend

O: In the design there are 2 main classes: Master and Slave. A Slave object is visible to the world (it is legal to have a pointer on it), but no one except Master can create, use, or destroy. So it is natural to make all Slave members private and class master to be a friend. In the following code

        class Slave{
          friend class Master;

          int x;
          Slave():x(0){}
        };

        class Master{
          struct Servant{
            Slave a;
          };
        public:

          int f(){ 
            Slave s; 
            return s.x; 
          }

          int g(){ 
            Servant c;
            return c.a.x;
          }

        };
the only complication is that Master has a nested class Servant and Servant has a Slave member. In the function f() a Slave object is created, accessed, and destroyed. In the function g() a Servant object is created, accessed, and destroyed; only that access and destruction is allowed, but creation is forbidden, because the public implicit constructor of Servant cannot call the Slave constructor - Master::Servant is not a friend of Slave. It looks somehow strange that if we would be able to construct a Servant then we have the full access to its internals. For example,
        class Slave{
          friend class Master;

          int x;
        //  Slave:x(0){}
          void i(){x=0;}
        };

...
          int g(){ 
            Servant c;
            c.a.i();
            return c.a.x;
          }
...
The problem again is seen as that we can access any member and call any member function of Slave subobject except the constructor. The constructor can be called only within the Servant constructor; hence, it is forbidden inside g().

The first straightforward fix would be to declare Master::Servant to be a friend to Slave:

        class Slave{
          friend class Master;
          friend class Master::Servant;

          int x;
          Slave:x(0){}
        };
This is the error though, because Master class is not yet defined. The second intention might be to move Slave below the Master, so Slave would see the class Master::Servant. That is the error too, because Servant declares Slave as its member, so Slave must be fully defined before Master (we cannot split the definition of class).

Next solution may be to declare the Slave constructor public. That fixes the problem but it impairs the initial idea that Slave can be created by only Master.

Another solution is to define the Servant class in the outer scope, then it would be possible to declare both Master and Servant to be friends of Slave. The drawback in this case is that Servant becomes a global(same scope as Master and Slave) class, but it is never used outside Master scope. In fact, Master and Slave are quite big and elaborated classes and Servant is just a small ancillary struct. And it looks somehow illogical to create an additional class just to avoid an access rights problem.

Well, my question is, is there any other possible good or maybe not good solutions to this problem.


B: At the heart of your problem is the fact that these two are fundamentally contradictory:
> no one except Master can create [Slave]
> Servant has a Slave member.
I guess the design solution lies in understanding *why* no-one except Master should be able to create a Slave, and what Servant's role is. It seems that either:
(a) Servant's role is as a container of Slaves on behalf of Master - in which case why not give Servant the unique power to construct Slaves?
(b) Servant's role is to do something useful to Master's slaves - in which case Servant doesn't need a Slave as its member, it just needs a pointer to Slave's public interface, and the Slaves themselves should be created and owned by Master.


O: I was seeing the Servant as a part of class Master, not a standalone class, but some useful struct. Like in

        #include <set>

        class Slave{

          friend class Master;

          int x;
          Slave():x(0){}
        };

        class Master{

          typedef std::pair Servant1; //this will work

          struct Servant2{      // this will not
            Slave s;
            int i;
          };
                
          std::set<Slave> sl;
          std::set<Servant1> sr1;
          std::set<Servant2> sr2;
        
        public:
          void f(){
            sl.insert(Slave());
            sr1.insert(Servant1());
            sr2.insert(Servant2()); // error
          }
        };
sl are Slaves, and sr1 and sr2 are Slaves with associated number. [Obviously, the line "sr1.insert(Servant1());" works because Slave's copy constructor is public.]

Your response gave another idea to put Slave inside Master class. Then apparently everything would be solved except maybe a weird name Master::Slave.

        class Master{

          struct Servant; // forward decl of Master::Servant
          class Slave{

            friend class Master;
            friend struct Servant; // friend Master::Servant !
        
            int x;
            Slave():x(0){}
          };

          struct Servant{
            Slave a;
          };
        
        public:
        
          int f(){ 
            Slave s; 
            return s.x; 
          }

          int g(){ 
            Servant c; // OK
            return c.a.x;
          }

        };
        
        using Master::Slave;
        
        Slave * p;
Does this now look better?


B: Well, beauty is in the eye of the beholder, as they say. :-/

I personally am not a fan of nesting classes, period. I don't usually find that the reduction in readability and simplicity is justified by the little extra encapsulation.

It's hard to comment without knowing the detail of the design, but in your example I'm still not sure why, if Master needs to be the only one able to create Slave objects, everyone else (Servant included) shouldn't simply deal in pointers to an abstract Slave interface. That way Servant (and anyone else) can store, associate and operate on Slave objects, but only Master can create new instances of these objects.

Slave - pure abstract interface class
  ^
  |
ConcreteSlave - has private constructor and has Master as a friend.

struct Servant {
  Slave *s;
  int i;
}
- of course, then Slave objects are always created on the heap and you have to deal with ownership and lifespan of the objects Master creates, but that's in keeping with the original notion that 'only Master shall be able to create these objects'.


O: I am not a fun of nested classes either. However I am reluctant as well to introduce the abstract interface without necessity.






Sizeof and template

Do you think this code is legal

	template<int I>
	struct A{
		char a[I];
	};

	A<sizeof(A<2>)> a;
?






Template specialization with a template argument

O: In our code we have the following (simplified):

        template<class T> class B{};

        template<class T> class A{};

        template<> 
        template<class T> 
        class B< A<T> > {};

        //B< A<int> > b;
which compiles and work successfully with GCC* under Solaris and Windows.

The third template is a specialization of the first template with a template argument.

This does not compile with MS* compiler (and some other compilers).

My question is, is it a legal according to standard code and other compilers do not conform to standard OR is it a some non-standard way to express this which only GCC* can understand? Any ideas?

O: The specialization written as above is illegal. The correct one would be without 'template<>'. It still does not help for MS* compiler.






Typo in Stroustrup?

PB: In the section on Function Objects (18.4) Stroustrup gives an example of using a 'function object' to total up all the numbers in a list. But as written, it doesn't appear to work:

	#include <algorithm>
	#include <iostream>
	#include <string>
	#include <vector>

	class Sum {
	  int res;
	public:
	  Sum():res(0) {};
	  void operator() (int i) {res += i; cout << res << '\n'; };
	  int result() const {return res;};
	};

	void main (void) {
	  int temp[6] = {1,2,3,4,5,6};
	  vector<int> iv(temp,temp+6); 

	  Sum sum;

	  for_each(iv.begin(), iv.end(), sum);

	  cout << sum.result() << '\n';
	}  
When I run this, I get 0. If I change the for_each line to read
        sum = for_each(iv.begin(), iv.end(), sum);
Then I have the correct result. So is this a simple slip in Stroustrup, or am I doing something wrong? More importantly, if I want to write a template function that applies a function to a data structure and can return some total value, will it have to return the functor, or is there some better way of doing that?

PL: Looking at the STL doco, I'd say you're onto something. I'm thinking that the function object "sum" used in the for_each algorithm is a temporary copy, given that for_each returns the function object by value (instead of by reference). The call to result() on the last line is actually invoked on the one *you* constructed earlier, which has done no work, and hence gives zero.

PB: Bringing my story forward a little, a type-conversion operator in the functor can impersonate returning the end result:

	class Sum {
	  int res;
	public:
	  Sum():res(0) {};
	  void operator() (int i) {res += i;};
	  operator int() const {return res;};
	};

	void main (void) {
	  int temp[6] = {1,2,3,4,5,6};
	  vector<int> iv(temp,temp+6); 

	  Sum sum;
	  cout << (for_each(iv.begin(), iv.end(), sum)) << '\n';
	}  
And then you can take this the next step and not declare the local 'sum':
	void main (void) {
	  int temp[6] = {1,2,3,4,5,6};
	  vector<int> iv(temp,temp+6); 
	  cout << for_each( iv.begin(), iv.end(), Sum() ) << '\n';
	} 

B to PL: You're right. Sum is passed by value into the for_each, so the one that does all the work is a copy of the one local to main() that you finally call result() on.

One approach to this is to create functors which don't hold data, but which write their results 'externally', like this:

	class Sum {
	   int * res;
	public:
	   Sum(int * r):res(r) {};
	   void operator() (int i) {*res += i; cout << res << '\n'; };
	};
That way, there's no need for a result() method to get the result at the end - the functor operates directly on data external to itself:
	int sum = 0;
	for_each(i.begin(),i.end(),Sum(&sum));

O to PB: That is strange, because in 3rd edition ('97, first printing) it was

        Sum<double> s;
and
        s = for_each(iv.begin(), iv.end(), s);
if I remember correctly. What edition do you use? And what page exactly?

PB: I'm looking at page 515 in the third edition, third printing, September '97. He gave the example of a template function class Sum and used a double instance of it, but he definitely doesn't assign the return value of for_each back to s. Anyone got the hardcover handy?

O: Sorry. Right. It was fixed in 5th printing:

Chapter 18: 
pg 515 s/for_each(ld.begin(),ld.end(),s);/s = for_each(ld.begin(),ld.end(),s);/ 
http://www.research.att.com/~bs/3rd_printing5.html






Declaration joke

	int i;			// definition
	extern int i;		// declaration
	extern "C" {int i;}	// definition
	extern "C++" int i;	// declaration
	extern "C++" {int i;}	// definition
	extern "C" int i;	// declaration





Local static variable of a template

Question: if you have a template class and a member inline function has a static varibale, which instantiations will have the same variable and which will have different ones? For exmaple,

	// FILE 1
	#include <iostream>

	template <class T>
	struct A{
		int& x(){ static int a; return a; }
	};

	void f();

	int main(){
		std::cout<<&A<int>().x()<<'\n';
		std::cout<<&A<int>().x()<<'\n';
		std::cout<<&A<long>().x()<<'\n';

		f();
	}
and
	// FILE 2
	#include <iostream>

	template <class T>
	struct A{
		int& x(){ static int a; return a; }
	};

	void f(){
        	std::cout<<"inside f\n";
		std::cout<<&A<signed int>().x()<<'\n';
		std::cout<<&A<const int>().x()<<'\n';
		std::cout<<&A<long>().x()<<'\n';
	}
are compiled separately and linked together. Which outputs will show the same addresses and will not?

Answer: Obviously, same instantiations must have the same variable. So

	A
	A
	B
	inside f
	A
	C
	B





Watch memory leaks

Question: Comment on the following code lines.
	#include <stdio.h>
	#include <stdlib.h>

	int ns=0, nv=0, ds=0, dv=0;

	struct A{
		void report(){ 
		 printf("new: %d\tnew[]: %d\tdel: %d\tdel[]: %d\n", ns,nv,ds,dv);
		}
		A(){ 
		 printf("Hi> ");
		 report();
		}
		~A(){
		 printf("By> ");
		 report();
		}
	};
	
	inline void begin(){
	 static A a;
	}
	
	void * operator new(size_t s){
	 
	 begin();
	
	 ns++;
	 return malloc(s);
	}

	void * operator new[](size_t s){
	
	 begin();
	
	 nv++;
	 return malloc(s);
	}
	
	void operator delete(void * p){
	 if(p){
		ds++;
		free(p);
	 }
	}
	
	void operator delete[](void * p){
	 if(p){
		dv++;
		free(p);
	 }
	}
	
	int main(){
		void *p = new int;
		delete p;
	}

Answer:
#include <stdio.h>
This include introduces printf declaration. We need it because output with cout mey use new and delete operators.

#include <stdlib.h>
This includes malloc.

int ns=0, nv=0, ds=0, dv=0;
These are counters for new scalar, new vector, delete scalar, and delete vector operators.

struct A{...};
This is a declaration of class of a dummy object, which constructor is called before the very first new and destructor after the very last delete.

inline void begin(){ static A a; }
This is the fuction holding this object. This function must be called first in our allocators. That will gaurantee that object a is created before any allocation.

void * operator new(size_t s){...}
void * operator new[](size_t s){...}
void operator delete(void * p){...}
void operator delete[](void * p){...}
Redefined global new and delete operators.

if(p){...}
Some STL implementations may call extra delete with zero argument.






Dynamic cast references

SJ: If I dynamic_cast a pointer i can check the pointer to see if the cast worked:

        class X { };
        class Y : public X { };

        X* x = new X();
        if (Y* y = dynamic_cast<Y*>(x))
        {
                // pointer not null, cast worked,
                // do something involving the Y interface
        }
that's lovely. now what happens if i'm working with references?
        X a;
        X& x = a;
        Y& y = dynamic_cast<Y&>(x);
how do i test if that cast worked?

AP: The last code line should throw bad_cast if the cast did not work. Use a try/catch block to deal with that.

DG: Same is explained on page 411 (Dynamic_cast of References - section 15.4.1.1) The C++ Programming Language 3rd Ed by Stroustrup.

O: Strictly speaking the example:

        class X { };
        class Y : public X { };

        X* x = new X();
        if (Y* y = dynamic_cast<Y*>(x))
        {
                // pointer not null, cast worked,
                // do something involving the Y interface
        }
is an error, because classes X and Y are not polymorphic.






Set iterator error

O: I am trying to compile with g++ the following code

        #include <set>

        class A {};

        template <class C>
        struct B{
                std::set<A*,C>::iterator x;
        };
getting the error
        > g++ -pedantic -c a.cpp
        a.cpp:7: syntax error before `;'
which looks very weird, because if I remove "-pedantic", or C(comparator), or "::iterator" then it passes fine.

What wrong does pedant gcc find in set::iterator ? Any ideas?

D: If you preface "std::set<A*,C>::iterator x;" with the keyword "typename", it passes. g++ usually prepends "typename" when the fact that the name is a type is ambiguous. -pedantic does not however.

Thankyou, internet.

O: Thanks,
gcc 3.2 gives the warning:
a.cpp:7: warning: `typename std::set<A*, C, std::allocator<A*> >::iterator' is implicitly a typename
a.cpp:7: warning: implicit typename is deprecated, please see the documentation for details
I was completely confused by the fact that
        std::set<A*>::iterator x;
worked with no problem.

Still curious why then -pedantic passes this case?

D: If I walk through this, with "g++ -pedantic -c a.cpp"...

I'm thinking in the case where you omit the ",C" then "std::set<A*>" may be searched for "::iterator", because "std::set<A*>" is dependent upon a class (A) the compiler already knows about.

Additionally, "std::set<A*,C> x" requires no search for any name ("::iterator", say), so that passes too, even tho' x is dependent on C which is unknown at this stage.

But "std::set<A*,C>::iterator x" requires "std::set<A*,C>" to be searched for "::iterator", and the compiler won't look for it, because that statement is dependent on the unknown C.

Yeah?

Yeah??

There's got to be a good explanation for this case based on g++'s implementation, but I don't have the luxury of time to spend finding the answer :-)

O: Yeah, right! Just changing the C class to the class known to the compiler

        #include <set>

        class A {};
        struct D { bool operator<(const A*)const; };

        template <class C>
        struct B{
                std::set<A*,D>::iterator x;
        };
makes it pass.

What a dumb error message "syntax error"!






Hiding

RF: In the following code:

	#include <iostream>

	using namespace std;

	void fA (int a) { cout << "fA: int\n"; }
	void fA (char *a) { cout << "fA: string\n"; }

	class A {
	   public:
	   void check (int) { cout << "check int\n"; }
	};

	class B : public A {
	   public:
	   void check (char *s) { cout << "check string\n"; }
	};

	int main() {
	 A a;
	 B b;

	 fA (1);
	 fA ("1");

	 a.check(1);
	 b.check("1");
	 b.check(1);   /* Error: invalid conversion from `int' to `char*' */
	}
Is there any practical/technical reason why the last call to "b.check(1)" cannot be compiled, since the compiler can happily figure out the name overloading on the two calls to fA using an int and a string? (this is not on any real code, I am just wondering why this does not compile)

O: A::check is not overloaded, but is hidden by B::check. So you have to use the using declaration inside the B class

        using A::check;
to make it accessible through a B object.

RF: Thanks, so I guess my question is:
Why does B::check hides A::check, if they do not have the same signature? (which does not happen for functions)
Why isn't the compiler able to differentiate them? (or is this just because "the C++ standards say so" or because it is bad practice or because <insert philosophical reason>?)

O: There are 3 possible ways which might be done to handle same names.
1. To add the name as it would be declared in the derived class (like overloading).
2. To add the name, but make it a "second class" member (to have different priorities when choosing the function).
3. To forbid using the name implicitly (as it is now).
The first option has 2 difficulties. Let's see, for example,

	struct A{ 
	        void f(int); 
	};
	struct B:A{ 
	        void f(long);
	        void f(char*);
	};
	...
	B b;
	b.f(1); // would call A::f
	...
One problem is that looking at the class B you would not know that the function B::f(long) is not called until you find A::f(int) (which may be difficult if the inheritance is complex). The second problem is that the change A::f(int) to A::f(long) or another signature would silently change the semantics of the call b.f(1), which is obviously undesirable. [Of course, this example is a simple one with int-long conversion, but in reality the arguments might be classes with inheritance relation, so the call would depend on what kind of relation the classes have (the inheritance may be private, for example, then the conversion will be forbidden).]

The option 2 would drastically increase the complexity of the language and implementation, because the "best match" rule would have take into account what priority to give to the "second class" member in comparison to standard or user defined conversions. And there would be no good preference of doing one way against another.

So the 3rd option seems to be easy and logical, I think.






i bet what you think is wrong

O: Answer the following questions without using a compiler.
1. What is the output of the following program on a standards-conforming C++ compiler?

	#include <iostream>

	int main()
	{
	  int x = 1;
	  for( int i = 0; i < 100; ++i );
	    // What will the next line do? Increment???????????/
	    ++x;
	  std::cout << x;
	}
This is a really funny thing from http://www.gotw.ca/gotw/086.htm

PK: 1

O: Thats impressive if you did not actually tried the compiler!

D: I was sure it was 2, are you saying it's 1 and I'm missing something gnarly?

PK: You are right and I am wrong. It should be 2.

AW to D: So did I because of the null statement. Poking around it appears there's something very fishy with that single-line comment - remove the trailing forward slash and you'll see.

D: Well, g++ returns 2...

AW: yep, while sun CC and HP aCC both yield 1.

DS: Output is '2'. Because there's a 'bonus' semicolon at the end of the for statement, hence the program just runs the empty statement (;) 100 times, before then incrementing x (only once).

PK: You are right, but ... Any half decent compiler will probably optimise away the entire loop statement as the for loop in this case has no side effect.

AW: and the for loop controlled null stmt is neither here nor there... The output seems to depend on whether or not that single-line comment ends in '???/' - very puzzling!

PB: ??/ is a trigraph for \ Hence the ++x will be commented out if the compiler recognises trigraphs. Try g++ with and without the -trigraphs option.

AW: Ah... :) I was just beginning to wonder if ??/ was some esoteric way of saying \ but couldnt for the life of me think why!






Difficult situation

My code

	#include <string>

	void f(void*);
	int main(){

		std::string * p = new std::string("hello");

		f(p);

	}
is linked to a third party code
	// another file
	void f(void *v){

		// do something

		delete v;
	}
This link obviously has problem. Question: how to fix this problem given that I cannot modify the third party code?

Answer: The problem is that delete operator does not call the string destructor; hence any memory allocated by string is leaked.

There is no good (at least I do not know) solution to this problem, but there are 2 bad solutions:

  1. If the function f() does not access the string pointed by v, then it is possible to destroy the string object just before calling f():
    	p->~string();
    	f(p);
    
    In this case p string is destroyed before entering f(), and deallocated inside f().
  2. Using the assumption that after "delete v" the memory occupied by the string
    is not given back to the system (i.g. under a memory manager),
    is not used by another thread, and
    is not modified by the deallocator (i.g. memory manager),
    then it is possible to destroy the string object right after f():
    	f(p);
    	p->~string();
    
    In this case, after f(), p is still pointing to (already deallocated) memory holding the string representation and calling the destructor would just destroy the object properly.

P.S. The same error is hidden in the following example:

	#include <string>

	struct A{};

	struct B: A{
		std::string s;
		B(const std::string & s): s(s){}
	};

	int main(){
		A * a = new B("hello");
		delete a;
	}
The fix to it is either explicit cast on deallocation, or virtual destructor.






What is lower_bound

Question: In a set of elements 1 2 3 5 8 9 what is the lower bound for 5 and what is for 6?

Answer: For 5 is the element 5 and for 6 is the element 8.






How operator==() is used in set

Question: In a set of elements 1 2 3 5 8 9 find(5) finds 5 and find(6) does not find anything. What will find(5) find if the operator==() is undefined for the elements?

Answer: 5. Set uses only the operator<(), so the check is: x<5 false and 5<x false.






Invisible difference

int //\ 
i,  // i is defined if there is a white space character after backslash
j; 





Simple gymnastics

The following excercise is simple, but may be confusing somewhere even for the experienced programmer.
	int main(){

	 register int x= { 1 };

	 switch(x) default: x=3;       // x==3
	 do x=4; while(0);             // x==4
	 for ( x=0; x<10, x++; ) x--;  // x==1

	 x = sizeof (x+&x);            // C++ okay, C error

	 extern int deep(int);
	 inline int deep(int);

	 int deep(int);
	 int a(int());                 // a is int

	 int (*f)(int)( (int (*)(int)) (deep) );
	 typedef int (*hazy[])(int);
	 hazy h = { f };
	 (*h[0])(10);                  // deep();
	}

	int deep(int n){

	 try{

	   if( !n ) throw n;
	   deep(n-1);                  // recursive call is straight
	
	 } catch(int r) { return r; }

	 throw n;                      // but return is twisted
	}





Objects life in throw

Question: What does the following program print?
	#include <iostream>

	struct A{
	  int i;
	  A(int i):      i(i)      { std::cout<<" X"<<i; }
	  A(const A& a): i(a.i+1)  { std::cout<<" Y"<<i; }
	 ~A()                      { std::cout<<" Z"<<i; }
	};

	void f() throw(A) { 
	 throw A(1); 
	}

	int main(){

	 try{ f(); }
	 catch(A a){}

	}
Answer: For example, MS*(13.00.9466): " X1 Y2 Z2 Z1", GCC*(3.2) " X1 Y2 Z1 Y3 Z3 Z2".




Template fucktorial

A known example of static calculations is a factorial function realized via template. The technique is similar to

	template<int n>
	struct A{
	 static const int i;
	};

	template <>
	const int A<1>::i = 1;

	template<int n>
	const int A<n>::i = 10*A<n-1>::i + n ;
I slightly modified the function so the result would be:
A<1>==1
A<2>==12
A<3>==123
A<4>==1234
...
and so on until integer overflow; rather then mathematical factorial function 1, 2, 6, 24, 120, ... . When I run this definition with the program
	#include <iostream>
	int a = A<4>::i;
	int main(){ 
	 int b = A<5>::i;
	 std::cout<<a<<' '<<b<<' '<<A<6>::i;
	}
surprisingly it printed

CompilerOutput
  MS* 12.00.8168     0 45 456  
  MS* 13.00.9466     0 45 456  
  GCC* 2.95.3-10     1234 12345 123456  
  GCC* 3.2-3     0 45 456  
  BCC* 5.5.1     0 5 56  

The interesting is the consistency between MS* and GCC* 3.2, and the difference between GCC* 2.95 and 3.2. The unexpected output is produced because of the compiler behaviour. When it encounters
	int a = A<4>::i;
it creates instantiation and template initialization
	template
	struct A<4>{
	 static const int i;
	};

	template
	const int A<4>::i = 10*A<3>::i + 4;
but template initialization takes place after it is used in "int a =". At this moment it is just instantiated and initialized to 0. Later when it comes to initialization of A<4>::i the same happens with A<3>::i, so A<4>::i is initialized to 4. And so on: A<3>::i is initialized to 3 and A<2>::i is initialized to 2. Finally, when A<5>::i is accessed in "int b = A<5>::i;" it is instantiated and initialized beforehand, since initialization happens at global scope; and A<4>::i is already initialized to 4, so b becomes 45.

The elegant solution which works on all above compilers with expected output is

	#include <iostream>

	template<int n>
	struct A{
	  enum{ i = 10*A<n-1>::i + n };
	};

	template <>
	struct A<1>{
	  enum{ i=1 };
	};

	int a = A<4>::i;

	int main(){ 
	 int b = A<5>::i;
	 std::cout<<a<<' '<<b<<' '<<A<6>::i; 
	}






Simple conversions

Question: Is following code legal?

	class A{
	  A(A&);
	  void operator=(A&);
	public:
	  A(int){}
	  operator int(){ return 1; } // explicit?
	};

	A a(2);
	A b = 2;
	float i = a+b;
Namely. Do private declarations of copy constructor and assignment operator invalidate initialization of a and b objects? Is initialization of i illegal due to operator int declared as explicit, or due to operator+ not defined in A, or both?

Answer. For both objects a and b the constructor A(int) is called, so the copy constructor and assignment operator do not have any effect. Declaration of operator int explicit is illegal since explicit can be used only on constructors. Expression "a+b" is legal, both operands are converted into int.






Lvalue, rvalue

The standard says that every expression can be lvalue or rvalue. However every expression can be one of the following 4 types:

  1. can be used as lvalue and rvalue;
  2. can be used as rvalue, but not as lvalue;
  3. can be used as lvalue, but not as rvalue;
  4. can be used neither as lvalue, nor rvlue.
Question: Give examples for each of these 4 types.

Answer. The first two are easy, any defined object and simple expression:

	A * p;
	p = p+1;
p is type 1, and p+1 is 2.
Case 3 is harder.
	class A;
	A * p;
	A & r = *p;

	void g(A&);
	void h(A);
	void f(){ 
	  g(r); // use r as lvalue
	  h(r); // error: not an rvalue
	}
The last 4th case.
	void v();
	void f(){ 
	  return v(); // v() is void, neither rvalue nor lvalue
	}






Deletion of incomplete type

Question: Is the following code legal?

	class A * p; 
	// class A is not defined, just declared
	void f(){
	  delete p; 
	}

Answer. Yes. But it has the problem that the A destructor is not called. GCC* does not allow deletion of incomplete types.






Accessing set elements

The code

	#include <set>

	int main(){
	  std::set<int> a;
	  int & z = *a.begin();
	}
does not compile with GCC* (but compiles with BCC* and MS*). Question: Why?

Set is a container storing its elements in strict order. Dereferencing the iterator to a not const type makes it possible to change the element inside the container, thus destroy the order. So it is the matter what type of operator* is, T& or const T&. I do not know what is correct according to the standard.






Volatile storage

In the following piece of code a vector v is filled with integers and another storage keeps the addresses of the stored integers. Later if we access the integers by their addresses we get unexpected result!

	#include <iostream>
	#include <vector>

	int main(){

	  std::vector<int*> p;
	  std::vector<int> v;

	  for( int i=0; i<3; i++ ){
	    v.push_back(i);
	    p.push_back(&v.back());
	  }

	  for( int i=0; i<p.size(); i++ ) 
	    std::cout<<*p[i]<<' ';   
	  // surprise !!
	}

The problem here that the contents of the vector is moved while the vector grows. Is there a container which is guaranteed not to move its elements?






New expression

Question: Which of the following new expressions is incorrect?
	new (const int (*)[10]);
	new const int*[10];
	new (const int*)[10]; 
Note: GCC* and BCC* accept all of them. MS* not.




Simple as pointer

Question: In

	int a[3];
	int * p = a;
	const int * cp = a;
	int * const pc = a;
	int (&r)[3] = a;
if to compare void* values, is
a == &a ?
&p == a ?
p == a ?
cp == p ?
&cp != &p ?
pc == a ?
&pc != a ?
r == a ?
&r == &a ?

Answer: yes, no, yes, yes, no, yes, no(*), yes, yes.
(*) pc is a synonym to a, but &pc does not exist unless explicitly used in the program.






Const violation

Question: What is the output in this program?

	#include <iostream>

	const int i = 1;

	struct Int{ int x; };

	const Int j = {1};

	int main(){

	 int * p = const_cast<int*>(&i);
	 *p = 2;
	 int * q = const_cast<int*>(&j.x);
	 *q = 2;
	 std::cout<<i<<j.x;

	}

Answer: Undefined. GCC* gives "Segmentation fault" error, which means that const entities placed in a segment forbidden for writing. BCC* and MS* print 12. Interesting thing is why i is printed as 1. It is because the compiler substitute i with 1 in the last statement because it assumes that it cannot be modified. Then, why j.x is 2?






Sizeof things

Question: What does print this program?

#include <iostream>

char a = 'e';
int ar[10];
long double d;

int main(){

 std::cout << sizeof (8.l+a) <<' '<<  sizeof d 
         <<' '<< sizeof (ar) <<' '<< sizeof (ar+1);

}

Answer: The output is equivalent to sizeof(long double), sizeof(long double), 10*sizeof(int), and sizeof(int*). Under Windows MS* prints "8 8 40 4", BCC* "10 10 40 4", and GCC* "12 12 40 4".






Virtual destructor

Question: Is ~A() called in the last statement?

	struct A {
	 virtual ~A(){}
	};

	struct B: A{     
	 B(){}
	 ~B(){}
	};

	int main(){
	 A * p = new B;
	 delete p;
	}

Answer: Yes. In the last statement the destruction of an A object is dynamically resolved to a call ~B(), and ~B() calls A::~A() to destroy A subobject of *p.






Small trick

This program is illegal.

	void f(int){}

	int main(){
	  f(2,3));
	}
Question: Make one modification to make the program legal. You can add, change, or remove only one character.

Answer: Add the parenthesis: f((2,3));.






Register var

Question: Is this program legal?

	void main(){
	  register double long a;
	  register double long *b;
	  b = &a;
	}

Answer: It depends on whether it is compiled as C++ or C. In C it is not legal to take address of a register variable.






Virtual call

Question: Write a small program which calls pure virtual function (and abotrs) without using cast operators

Answer: The trick is to call a virtual function in the constructor while constructing a subobject.
#include <iostream.h>


struct B {
	virtual void v()=0;
	void f(){ v(); }
	B(){ f(); }
};

struct D : B {
	virtual void v(){ cout<<"v"; }
};

int main(){
  D d;
}





Memory profiler

B: There are numerous purify-like tools around, but these target the detection of leaks, array bounds errors, etc. Those with a profiling capability seem to do what's easy to do, namely:
- overload new, malloc, etc
- for each allocation, save the call stack
- for each allocation, record which size 'bucket' it falls in based on how many bytes were allocated

This is all very well, easy to do, and non-invasive (just re-link). However, the stats that are produced (memory allocated per-function, distribution of size of allocated blocks, etc) don't answer the questions I'm interested in.

What I want is to be able to take a snapshot of the heap at a point in time and record:
- for each class, how many objects of that class exist
- how allocated memory is distributed among all objects
- I'd also like to track the number of constructions/deletions for all classes over time

These seem like simple and obvious facts to gather, but I suspect that gathering them is quite hard.

More info - why I suspect it's hard:

Counting active instances doesn't seem difficult, but the simple question of 'how much space is taken up by object A' is a bit tricky. I don't think that just overloading new, delete, etc and recording stats will get close - it's easy enough to trace a call to new to a particular constructor and know that an object of class A, which has sizeof() 23, has just been created. The problem is that A's static size may be an insignificant component of its actual size at runtime. Its members may be pointers to other objects on the heap, some of which will be allocated by someone else after A's construction and then attached to A during A's lifetime. In general, A's members will spawn a chain of pointers to heap objects, some of which conceptually belong to A, and some of which A just holds a reference to. There will be circular dependencies (e.g. A holds a pointer to the container that holds it), and many-to-one relationships (the same object is referenced by many others).

It would be relatively easy to gather these kinds of stats if the source code was modified so that all objects supported a 'how big are you' interface, delegated recursively through their heap-allocated members, although the problem of double- counting shared objects would need to be addressed. However, I want to be able to profile existing code like this in a non-invasive way.

I suspect that a good approach to this problem may be in the kinds of heap analysis done by garbage collectors, e.g.:
http://130.15.168.200/faqs/SUNWspro/htmldocs/locale/C/gc/manual-44.htm
http://www.microway.com.au/catalog/geodesic/products.stm
...but I also wonder whether some kind of help from the compiler might be necessary to gain enough information about the structure and relationship between classes (maybe just debug information).

For examples of available profiling tools see:
mpatrol - one of the more comprehensive 'classic' memory analysis packages
http://www.cbmamiga.demon.co.uk/mpatrol/
Valgrind/MASSIF - profiler for Linux based on powerful underlying x86 emulation engine
http://www.cl.cam.ac.uk/~njn25/valgrind/ms_main.html

For some articles, see:
http://www.memorymanagement.org/bib/lang.html

O: To add to why it is hard, I would say it is impossible. Because once you loose the definition of object size being sizeof, there will be no good definition what size of an object really is. There is no general way to trace all object's references. What belongs to an object is only in the mind of the programmer. [For example, two different objects may have references to the same memory block being created by one and deleted by the other. So it depends on me, whom I call the owner of the block.] Whatever mechanism counting object sizes one can invent, I think it is always possible to make an example which will break it.

Impossible in general, but possible in some special cases, of course. However one has to specify a number of tricky assumptions which have to be satisfied in order that some tool would work. So I think it will be difficult to find a proper tool, because people usually do not like to write tools which work basing on tricky assumptions.

B: I agree that it's impossible for a tool to second-guess the programmer's opinion of ownership. However, it still seems to me that it should be possible to make a practical tool to do this based on heuristics and some small amount of user guidance.

Well, we could define it as "the size of the object plus the size of all (recursively) referenced blocks on the heap that haven't been assigned already to some other object".

It's possible to imagine heuristics based on a minimal amount of guidance - e.g. a priority order over classes supplied by the user - that might make for a useful tool. Here's a wild stab at a garbage-collector-like algorithm:

  1. Overload new, malloc etc so that every heap allocation is logged - that is, for a given address, we know if it's a pointer to an active allocated block
  2. For each allocation, also determine (using the call stack and probably debug info generated by the compiler) whether it's being done in order to construct an instance of a class. Tag the pointer with the class.
    Now, to take a snapshot of the heap for profiling:
  3. Scan all 'root' data in the program - that is local variables, global variables, and compiler temporaries - looking for pointers to the known active objects. Add these to a 'pending' set.
  4. Sort the pending set according to some cunning heuristic (*)
  5. For each object O in the pending set
    - add O's size to a bucket for its respective class
    - if O is not a root object, also add its size to the buckets for each unique class represented in the chain of pointers from the root level to O
    - scan O's members for pointers-to-objects (debugging info about class structure will help here)
    - for each child object that is not already marked as having been done, insert it to the pending set, preserving ordering according to the ordering heuristic.
    - mark O as done and remove it from the pending set

(*) This approach relies on my imaginary ordering heuristic to achieve a meaningful assignment of object sizes to appropriate buckets. I imagine the order could be determined by some combination of a user-provided priority order over classes and 'depth from root' of the object in question (i.e. we aim to descend layer-by-layer, tending to allocate objects that are referenced many times to parents that are instances of classes identified by the user as high priority and/or those that are closest to the root data)

Note that I'm not saying that a scheme like this could be 'correct' in any exact sense - it just seems to me that it should be possible to come up with a scheme that gives a reasonably meaningful and accurate picture of memory use on a per-class basis, so long as the user has a way to influence the 'tricky assumptions' required. It would be easy to expose the detail to the user - i.e. 'I calculated that 13MB is taken up by 147 instances of Foo, which includes 793 referenced instances of Bar, 28 referenced instances of Snafu, ...' etc - so the user could tweak the priority order (or add rules like 'never count Snafus as part of Foos' or similar) if the assignment didn't appear to reflect his perception of ownership. So long as everything only gets counted once, and most things are grouped together in a reasonable way, the picture would be useful in spite of some inevitable noise.

O: If we sacrifice the rigidity (neglect the 'inevitable noise'), then I agree - it is possible to do. But, again, there are some extremely difficult points. These are on the top of my head. Solvable obviously, but difficult.

The user interface may be quite complicated; even a simple language may be required to help the user describe how to calculate object sizes.

Any inhouse memory manager probably will break any algorithm of collecting information.

Most of STL implementations include their own layer of memory management that may cause the algorithm to be inconsistent. For example in

        class A{
                std::set<B*> bs;
                B *left, *right;
        };
left and right will be counted, but bs's may or may not be counted depending on STL implementation.

Priority order over classes may not be sufficient if the information required not only 'what class has what' but 'what part of the program has what'. Because objects of the same kind may have shared parts.

I agree, this kind of tool would be very useful. But I did not even try to search internet, because I do not believe it exists. I may be wrong though.






Compilers

MS* here refers to Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86 or Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.00.9466 for 80x86.
GCC* here refers to GNU GCC C++ compiler version 2.95.3-10 or 3.2-3 20020923 for 80x86.
BCC* here refers to Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland.
Home