thasso.xyz

Can you use a class in C?

Recently, I’ve been working on a C debugger. This requires reading and processing the DWARF debugging information that’s part of the binary. Since this is a rather complex task, I figured I might use a library that exports a nice interface to the debugging information.

One such library that I found early on was libelfin. It wasn’t perfect from that start because it is a bit dated now, only supporting DWARF 4 and missing features from the newer DWARF 5 standard, but I thought that I could work around this. The bigger problem was that libelfin is written in C++ while most the debugger is written in C.

It is pretty easy to call code written in C from C++ since a lot of C is still part of the subset of C that C++ supports. The problem with calling C++ code from C is that there are many features in C++ that C is missing. This means that the C++ interface must be simplified for C to be able to understand it.

Handling objects

The most important concept in C++ that C is missing is true object orientation. That is, in C you don’t get a this pointer for free; you need to handle it manually.

Let’s start with a simple example. Say we have a class that represents a rational number $r = p / q$ where $q \neq 0$. The declaration without any of the operations we need might look something like this, which will print 5 / 3 when we run it.

// rational.h

class Rational {
public:
  int _numer;
  int _denom;

  Rational(int numer, int denom)
    : _numer{numer}, _denom{denom} {}
};

This is how we might use it in C++:

// main.cc
#include <iostream>
#include "rational.h"

auto main() -> int {
  auto r = Rational(5, 3);
  std::cout << r._numer << " / " << r._denom << std::endl;
  return 0;
}

How do you write this as a C program using the Rational class? After all, there is no such thing as a class in C. To solve this issue we can rely on one of the primitives that most systems languages have in common by virtue of running to the same type of computer: the pointer. We will allocate an instance of our class on the heap and then give the C program a pointer to that instance. This way we can keep track of the object to manipulate it. It’s also possible to use handles for this, but they are just pointers with extra steps and a bit overkill for us at this point.

The following is what we might want.

// main.c
#include <stdio.h>
#include "rational.h"

int main(void) {
  void *r = make_rational(5, 3);
  printf("%d / %d\n", get_numer(r), get_denom(r));
  del_rational(&r);
  return 0;
}

We need to extend our interface with all the new functions to construct, access and manually delete instances of Rational.

// rational.h
class Rational { /* ... */ };

void *make_rational(int numer, int denom);
int get_numer(const void *r);
int get_denom(const void *r);
void del_rational(void **rp);
// rational.cc
#include "rational.h"
#include <cstdlib>

void *make_rational(int numer, int denom) {
  // Allocate an instance on the heap.
  Rational *r = static_cast<Rational*>(malloc(sizeof(Rational)));
  r->_numer = numer;
  r->_denom = denom;
  return r;
}

int get_numer(const void *r) {
  // Cast to access members.
  const Rational *_r = static_cast<const Rational*>(r);
  return _r->_numer;
}

int get_denom(const void *r) {
  const Rational *_r = static_cast<const Rational*>(r);
  return _r->_denom;
}

void del_rational(void **rp) {
  Rational *_r = static_cast<Rational*>(*rp);
  // Delete the instance on the heap.
  free(_r);

  // Delete the dangling pointer too.
  *rp = nullptr;
}

The trick is to allocate instances on heap and then pass them around as void pointers. We use C’s malloc instead of the new operator because the new operator is a C++ only feature which raises a linker error. A good way to improve type safety is to typedef an opaque type to represent the class on the C side, as suggested in this reply. This is the approach that we’ll be using later on, so keep on reading. Alternatively, if you have control over all of the C++ code (i.e. you don’t just wrap a library) you could follow this Stack Overflow answer too.

Now, ignoring how incredibly unsafe all of this is, there is a bigger problem we must face: this is not even close to compiling! The reason for this is that when we #include "rational.h" into main.c, we essentially copy all the contents of rational.h into the C source file. This means that we suddenly present the C compiler with a class declaration and other things that it doesn’t understand because they are part of a totally different language.

We can use the C preprocessor to help us here. Using the __cplusplus macro, we can check whether to include the C++ parts in the interface. This way it’s hidden from the C compiler but available to the C++ compiler.

// rational.h
#ifdef __cplusplus
class Rational {
public:
  int _numer;
  int _denom;

  Rational(int numer, int denom)
    : _numer{numer}, _denom{denom} {}
};
#endif  // __cplusplus

// ...

Using the two different compilers to build, the program could look like this: g++ -c rational.cc && gcc main.c rational.o.

Great it compiles! But uhh … now the linker signals an error. There are two problems left to fix. Firstly C++ uses a different ABI than C which means that the calling convention is different. Additionally, C++ compilers mangle the names of identifiers in the source code differently than C compilers do, so the linker can’t find them. Fortunately, C is the lingua franca of computer programming so C++ compilers can adapt their behavior in both of these aspects to that of C compilers. To do so, we just prefix all C++ declarations that should be used by C code with extern "C".

This is very simple to do in the rational.cc source file, but requires some extra smartness in rational.h. Again, extern "C" is only a C++ feature, so it cannot be part of the header when the C compiler is looking at it. The solution to this is to use the __cplusplus macro once more.

// rational.h
#ifdef __cplusplus
class Rational { /* ... */ };
#endif  // __cplusplus


#ifdef __cplusplus
extern "C" {
#endif  // __cplusplus

void *make_rational(int numer, int denom);
int get_numer(const void *r);
int get_denom(const void *r);
void del_rational(void **rp);

#ifdef __cplusplus
}  // extern "C"
#endif  // __cplusplus

This wraps all of the function definitions in an extern "C" block when the C++ compiler is looking at it. After making those changes to rational.h and rational.cc we get the following output.

g++ -c rational.cc
gcc main.c rational.o
./a.out
5 / 3

We successfully created a class in C++ that we can now use in C!

Now that we have covered how to use the preprocessor to change the content of a file based on the compiler that’s looking at it, we can make the API a bit safer, too. To do that we create an opaque type that acts a proxy for the Rational class on the C side. By only declaring this type, the C compiler will ensure that the pointers passed around in the interface are all of the same type (i.e. Rational). However, it won’t let you dereference the pointers because the type is never really defined.

#ifdef __cplusplus

class Rational {
	// ...
};

#else

// Opaque type as a C proxy for the class.
typedef struct Rational Rational;

#endif // __cplusplus

In addition to that we now replace all void * with Rational *. This will allow you to remote some of the static_casts from the beginning.

Linking the C++ standard library

Above, we used malloc and a cast to allocate the instance of Rational to prevent a linker error later on. If we had used new and delete instead (which is the proper C++ way), we would have gotten linker errors like this one:

rational.cc:(.text+0x15): undefined reference to `operator new(unsigned long)'

Usually in a C++ program, this issue doesn’t arise because new and delete are provided in the C++ standard library. The problem is that we used a C compiler to build the executable, which doesn’t link the C++ standard library by default. The solution is to pass the linker flag -lstdc++ to the compiler explicitly.

With new we can also use normal C++ constructors, making everything more concise and safe:

// rational.cc
#include "rational.h"

extern "C" Rational *make_rational(int numer, int denom) {
  // Now we're using the constructor.
  Rational *r = new Rational(numer, denom);
  return r;
}

// ...

extern "C" void del_rational(Rational **rp) {
  delete *rp;
  *rp = nullptr;
}

Handling exceptions

Exceptions are another feature of C++ that C doesn’t have. If the C++ code we wrapped throws an exception, the whole program will crash without doing any cleanup. This can be addressed in multiple ways, one of which is to pass -fno-exceptions to the C++ compiler to abort if a library throws an exception and to reject code that uses exceptions. The more realistic and safe approach is to carefully catch all exceptions at the language boundary.

If you take another look at the definition of rational numbers above, you’ll notice that we don’t actually ensure that $q \neq 0$. This will become problematic if we try to implement rational number arithmetic for our class. We’ll address this by throwing an exception in the constructor if the denominator is 0.

// rational.h
#ifdef __cplusplus

#include <stdexcept>

class Rational {
public:
  int _numer;
  int _denom;

  Rational(int numer, int denom) {
    this->_numer = numer;
    if (denom == 0) {
      throw std::domain_error("denominator is 0");
    } else {
      this->_denom = denom;
    }
  }
};
#endif  // __cplusplus

// ...

Since we know now that the constructor might throw, we catch all exceptions in the wrapper and return a nullptr in case of an exception. In general, it’s often a good idea to catch anything and return a generic error value such as null. In addition to that, you could add infinitely more complex error-handling schemes at the language boundary.

// rational.cc
#include "rational.h"

extern "C" Rational *make_rational(int numer, int denom) {
  try {
    // Allocate an instance on the heap.
    Rational *r = new Rational(numer, denom);
    return r;
  } catch (...) {
    return nullptr;
  }
}

In such a simple case it’s also feasible to check if the denominator is 0 in make_rational but that doesn’t apply to more realistic examples.

You can find all the code for this post on my GitHub.

Conclusion

I ended up not using libelfin for my debugger, but I am glad that I had this opportunity to learn so much about calling C++ code from C. This is the first time that I documented any of the insights I discovered about a particular problem, and I am excited to find out what you think about it. Feel free to contact me through my about page. Your insights and perspectives would be greatly appreciated. I am committed to write more post like this one in the future and I hope you found it helpful ^^.