NAME
ctf —
Compact C Type Format
SYNOPSIS
#include <sys/ctf.h>
DESCRIPTION
ctf is designed to be a compact representation of the C
programming language's type information focused on serving the needs of
dynamic tracing, debuggers, and other in-situ and post-mortem introspection
tools.
ctf data is generally included in
ELF objects and is tagged as
SHT_PROGBITS
to ensure that the data is accessible in a running process and in subsequent
core dumps, if generated.
The
ctf data contained in each file has information about the
layout and sizes of C types, including intrinsic types, enumerations,
structures, typedefs, and unions, that are used by the corresponding
ELF object. The
ctf data may also include
information about the types of global objects and the return type and
arguments of functions in the symbol table.
Because a
ctf file is often embedded inside a file, rather
than being a standalone file itself, it may also be referred to as a
ctf container.
On illumos systems,
ctf data is consumed by multiple programs.
It can be used by
dtrace(1).
Programmatic access to
ctf data can be obtained through
libctf(3).
The
ctf file format is broken down into seven different
sections. The first section is the
preamble and
header, which describes the version of the
ctf file, links it has to other
ctf files,
and the sizes of the other sections. The next section is the
label section, which provides a way of identifying similar
groups of
ctf data across multiple files. This is followed
by the
object information section, which describes the type
of global symbols. The subsequent section is the
function
information section, which describes the return types and arguments of
functions. The next section is the
type information section,
which describes the format and layout of the C types themselves, and finally
the last section is the
string section, which contains the
names of types, enumerations, members, and labels.
While strictly speaking, only the
preamble and
header are required, to be actually useful, both the type
and string sections are necessary.
A
ctf file may contain all of the type information that it
requires, or it may optionally refer to another
ctf file
which holds the remaining types. When a
ctf file refers to
another file, it is called the
child and the file it refers
to is called the
parent. A given file may only refer to one
parent. This process is called
uniquification because it
ensures each child only has type information that is unique to it. A common
example of this is that most kernel modules in illumos are uniquified against
the kernel module
genunix and the type information that
comes from the
IP module. This means that a module only has
types that are unique to itself and the most common types in the kernel are
not duplicated.
This documents version
two of the
ctf file
format. All applications and tools currently produce and operate on this
version.
The file format can be summarized with the following image, the following
sections will cover this in more detail.
+-------------+ 0t0
+--------| Preamble |
| +-------------+ 0t4
|+-------| Header |
|| +-------------+ 0t36 + cth_lbloff
||+------| Labels |
||| +-------------+ 0t36 + cth_objtoff
|||+-----| Objects |
|||| +-------------+ 0t36 + cth_funcoff
||||+----| Functions |
||||| +-------------+ 0t36 + cth_typeoff
|||||+---| Types |
|||||| +-------------+ 0t36 + cth_stroff
||||||+--| Strings |
||||||| +-------------+ 0t36 + cth_stroff + cth_strlen
|||||||
|||||||
|||||||
||||||| +-- magic - vers flags
||||||| | | | |
||||||| +------+------+------+------+
+---------| 0xcf | 0xf1 | 0x02 | 0x00 |
|||||| +------+------+------+------+
|||||| 0 1 2 3 4
||||||
|||||| + parent label + objects
|||||| | + parent name | + functions + strings
|||||| | | + label | | + types | + strlen
|||||| | | | | | | | |
|||||| +------+------+------+------+------+-------+-------+-------+
+--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 |
||||| +------+------+------+------+------+-------+-------+-------+
||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24
|||||
||||| + Label name
||||| | + Label type
||||| | | + Next label
||||| | | |
||||| +-------+------+-----+
+-----------| 0x01 | 0x42 | ... |
|||| +-------+------+-----+
|||| cth_lbloff +0x4 +0x8 cth_objtoff
||||
||||
|||| Symidx 0t15 0t43 0t44
|||| +------+------+------+-----+
+----------| 0x00 | 0x42 | 0x36 | ... |
||| +------+------+------+-----+
||| cth_objtoff +0x2 +0x4 +0x6 cth_funcoff
|||
||| + CTF_TYPE_INFO + CTF_TYPE_INFO
||| | + Return type |
||| | | + arg0 |
||| +--------+------+------+-----+
+---------| 0x2c10 | 0x08 | 0x0c | ... |
|| +--------+------+------+-----+
|| cth_funcff +0x2 +0x4 +0x6 cth_typeoff
||
|| + ctf_stype_t for type 1
|| | integer + integer encoding
|| | | + ctf_stype_t for type 2
|| | | |
|| +--------------------+-----------+-----+
+--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... |
| +--------------------+-----------+-----+
| cth_typeoff +0x08 +0x0c cth_stroff
|
| +--- str 0
| | +--- str 1 + str 2
| | | |
| v v v
| +----+---+---+---+----+---+---+---+---+---+----+
+---| \0 | i | n | t | \0 | f | o | o | _ | t | \0 |
+----+---+---+---+----+---+---+---+---+---+----+
0 1 2 3 4 5 6 7 8 9 10 11
Every
ctf file begins with a
preamble,
followed by a
header. The
preamble is
defined as follows:
typedef struct ctf_preamble {
ushort_t ctp_magic; /* magic number (CTF_MAGIC) */
uchar_t ctp_version; /* data format version number (CTF_VERSION) */
uchar_t ctp_flags; /* flags (see below) */
} ctf_preamble_t;
The
preamble is four bytes long and must be four byte aligned.
This
preamble defines the version of the
ctf file which defines the format of the rest of the header.
While the header may change in subsequent versions, the preamble will not
change across versions, though the interpretation of its flags may change from
version to version. The
ctp_magic member defines the magic
number for the
ctf file format. This must always be
0xcff1
. If another value is encountered, then the file
should not be treated as a
ctf file. The
ctp_version member defines the version of the
ctf file. The current version is
2
.
It is possible to encounter an unsupported version. In that case, software
should not try to parse the format, as it may have changed. Finally, the
ctp_flags member describes aspects of the file which modify
its interpretation. The following flags are currently defined:
#define CTF_F_COMPRESS 0x01
The flag
CTF_F_COMPRESS indicates that the body of the
ctf file, all the data following the
header, has been compressed through the
zlib library and its
deflate algorithm. If
this flag is not present, then the body has not been compressed and no special
action is needed to interpret it. All offsets into the data as described by
header, always refer to the
uncompressed
data.
In version two of the
ctf file format, the
header denotes whether whether or not this
ctf file is the child of another
ctf file
and also indicates the size of the remaining sections. The structure for the
header, logically contains a copy of the
preamble and the two have a combined size of 36 bytes.
typedef struct ctf_header {
ctf_preamble_t cth_preamble;
uint_t cth_parlabel; /* ref to name of parent lbl uniq'd against */
uint_t cth_parname; /* ref to basename of parent */
uint_t cth_lbloff; /* offset of label section */
uint_t cth_objtoff; /* offset of object section */
uint_t cth_funcoff; /* offset of function section */
uint_t cth_typeoff; /* offset of type section */
uint_t cth_stroff; /* offset of string section */
uint_t cth_strlen; /* length of string section in bytes */
} ctf_header_t;
After the
preamble, the next two members
cth_parlablel and
cth_parname, are used to
identify the parent. The value of both members are offsets into the
string section which point to the start of a null-terminated
string. For more information on the encoding of strings, see the subsection on
String Identifiers. If the value
of either is zero, then there is no entry for that member. If the member
cth_parlabel is set, then the
ctf_parname
member must be set, otherwise it will not be possible to find the parent. If
ctf_parname is set, it is not necessary to define
cth_parlabel, as the parent may not have a label. For more
information on labels and their interpretation, see
The Label Section.
The remaining members (excepting
cth_strlen) describe the
beginning of the corresponding sections. These offsets are relative to the end
of the
header. Therefore, something with an offset of 0 is
at an offset of thirty-six bytes relative to the start of the
ctf file. The difference between members indicates the size
of the section itself. Different offsets have different alignment
requirements. The start of the
cth_objotoff and
cth_funcoff must be two byte aligned, while the sections
cth_lbloff and
cth_typeoff must be
four-byte aligned. The section
cth_stroff has no alignment
requirements. To calculate the size of a given section, excepting the
string section, one should subtract the offset of the
section from the following one. For example, the size of the
types section can be calculated by subtracting
cth_stroff from
cth_typeoff.
Finally, the member
cth_strlen describes the length of the
string section itself. From it, you can also calculate the size of the entire
ctf file by adding together the size of the
ctf_header_t, the offset of the string section in
cth_stroff, and the size of the string section in
cth_srlen.
Type Identifiers
Through the
ctf data, types are referred to by identifiers. A
given
ctf file supports up to 32767 (0x7fff) types. The
first valid type identifier is 0x1. When a given
ctf file is
a child, indicated by a non-zero entry for the
header's
cth_parname, then the first valid type identifier is 0x8000
and the last is 0xffff. In this case, type identifiers 0x1 through 0x7fff are
references to the parent.
The type identifier zero is a sentinel value used to indicate that there is no
type information available or it is an unknown type.
Throughout the file format, the identifier is stored in different sized values;
however, the minimum size to represent a given identifier is a
uint16_t. Other consumers of
ctf
information may use larger or opaque identifiers.
String Identifiers
String identifiers are always encoded as four byte unsigned integers which are
an offset into a string table. The
ctf format supports two
different string tables which have an identifier of zero or one. This
identifier is stored in the high-order bit of the unsigned four byte offset.
Therefore, the maximum supported offset into one of these tables is
0x7ffffffff.
Table identifier zero, always refers to the
string section in
the CTF file itself. String table identifier one refers to an external string
table which is the ELF string table for the ELF symbol table associated with
the
ctf container.
Type Encoding
Every
ctf type begins with metadata encoded into a
uint16_t. This encoded information tells us three different
pieces of information:
- The kind of the type
- Whether this type is a root
type or not
- The length of the variable
data
The 16 bits that make up the encoding are broken down such that you have five
bits for the kind, one bit for indicating whether or not it is a root type,
and 10 bits for the variable length. This is laid out as follows:
+--------------------+
| kind | root | vlen |
+--------------------+
15 11 10 9 0
The current version of the file format defines 14 different kinds. The
interpretation of these different kinds will be discussed in the section
The Type Section. If a kind is
encountered that is not listed below, then it is not a valid
ctf file. The kinds are defined as follows:
#define CTF_K_UNKNOWN 0
#define CTF_K_INTEGER 1
#define CTF_K_FLOAT 2
#define CTF_K_POINTER 3
#define CTF_K_ARRAY 4
#define CTF_K_FUNCTION 5
#define CTF_K_STRUCT 6
#define CTF_K_UNION 7
#define CTF_K_ENUM 8
#define CTF_K_FORWARD 9
#define CTF_K_TYPEDEF 10
#define CTF_K_VOLATILE 11
#define CTF_K_CONST 12
#define CTF_K_RESTRICT 13
Programs directly reference many types; however, other types are referenced
indirectly because they are part of some other structure. These types that are
referenced directly and used are called
root types. Other
types may be used indirectly, for example, a program may reference a structure
directly, but not one of its members which has a type. That type is not
considered a
root type. If a type is a
root type, then it will have bit 10 set.
The variable length section is specific to each kind and is discussed in the
section
The Type Section.
The following macros are useful for constructing and deconstructing the encoded
type information:
#define CTF_MAX_VLEN 0x3ff
#define CTF_INFO_KIND(info) (((info) & 0xf800) >> 11)
#define CTF_INFO_ISROOT(info) (((info) & 0x0400) >> 10)
#define CTF_INFO_VLEN(info) (((info) & CTF_MAX_VLEN))
#define CTF_TYPE_INFO(kind, isroot, vlen) \
(((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN))
The Label Section
When consuming
ctf data, it is often useful to know whether
two different
ctf containers come from the same source base
and version. For example, when building illumos, there are many kernel modules
that are built against a single collection of source code. A label is encoded
into the
ctf files that corresponds with the particular
build. This ensures that if files on the system were to become mixed up from
multiple releases, that they are not used together by tools, particularly when
a child needs to refer to a type in the parent. Because they are linked used
the type identifiers, if the wrong parent is used then the wrong type will be
encountered.
Each label is encoded in the file format using the following eight byte
structure:
typedef struct ctf_lblent {
uint_t ctl_label; /* ref to name of label */
uint_t ctl_typeidx; /* last type associated with this label */
} ctf_lblent_t;
Each label has two different components, a name and a type identifier. The name
is encoded in the
ctl_label member which is in the format
defined in the section
String
Identifiers. Generally, the names of all labels are found in the internal
string section.
The type identifier encoded in the member
ctl_typeidx refers
to the last type identifier that a label refers to in the current file. Labels
only refer to types in the current file, if the
ctf file is
a child, then it will have the same label as its parent; however, its label
will only refer to its types, not its parents.
It is also possible, though rather uncommon, for a
ctf file to
have multiple labels. Labels are placed one after another, every eight bytes.
When multiple labels are present, types may only belong to a single label.
The Object Section
The object section provides a mapping from ELF symbols of type
STT_OBJECT in the symbol table to a type identifier. Every
entry in this section is a
uint16_t which contains a type
identifier as described in the section
Type Identifiers. If there is no
information for an object, then the type identifier 0x0 is stored for that
entry.
To walk the object section, you need to have a corresponding
symbol table in the ELF object that contains the
ctf data. Not every object is included in this section.
Specifically, when walking the symbol table. An entry is skipped if it matches
any of the following conditions:
- The symbol type is not
STT_OBJECT
- The symbol's section index
is SHN_UNDEF
- The symbol's name offset is
zero
- The symbol's section index
is SHN_ABS and the value of the symbol is zero.
- The symbol's name is
_START_
or _END_
. These
are skipped because they are used for scoping local symbols in ELF.
The following sample code shows an example of iterating the object section and
skipping the correct symbols:
#include <gelf.h>
#include <stdio.h>
/*
* Given the start of the object section in the CTF file, the number of symbols,
* and the ELF Data sections for the symbol table and the string table, this
* prints the type identifiers that correspond to objects. Note, a more robust
* implementation should ensure that they don't walk beyond the end of the CTF
* object section.
*/
static int
walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata,
long nsyms)
{
long i;
uintptr_t strbase = strdata->d_buf;
for (i = 1; i < nsyms; i++, objftoff++) {
const char *name;
GElf_Sym sym;
if (gelf_getsym(symdata, i, &sym) == NULL)
return (1);
if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
continue;
if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0)
continue;
if (sym.st_shndx == SHN_ABS && sym.st_value == 0)
continue;
name = (const char *)(strbase + sym.st_name);
if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0)
continue;
(void) printf("Symbol %d has type %d0, i, *objtoff);
}
return (0);
}
The Function Section
The function section of the
ctf file encodes the types of both
the function's arguments and the function's return type. Similar to
The Object Section, the function
section encodes information for all symbols of type
STT_FUNCTION, excepting those that fit specific criteria.
Unlike with objects, because functions have a variable number of arguments,
they start with a type encoding as defined in
Type Encoding, which is the size of a
uint16_t. For functions which have no type information
available, they are encoded as
CTF_TYPE_INFO(CTF_K_UNKNOWN,
0, 0)
. Functions with arguments are encoded differently. Here, the
variable length is turned into the number of arguments in the function. If a
function is a
varargs type function, then the number of
arguments is increased by one. Functions with type information are encoded as:
CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs)
.
For functions that have no type information, nothing else is encoded, and the
next function is encoded. For functions with type information, the next
uint16_t is encoded with the type identifier of the return
type of the function. It is followed by each of the type identifiers of the
arguments, if any exist, in the order that they appear in the function.
Therefore, argument 0 is the first type identifier and so on. When a function
has a final varargs argument, that is encoded with the type identifier of
zero.
Like
The Object Section, the
function section is encoded in the order of the symbol table. It has similar,
but slightly different considerations from objects. While iterating the symbol
table, if any of the following conditions are true, then the entry is skipped
and no corresponding entry is written:
- The symbol type is not
STT_FUNCTION
- The symbol's section index
is SHN_UNDEF
- The symbol's name offset is
zero
- The symbol's name is
_START_
or _END_
. These
are skipped because they are used for scoping local symbols in ELF.
The Type Section
The type section is the heart of the
ctf data. It encodes all
of the information about the types themselves. The base of the type
information comes in two forms, a short form and a long form, each of which
may be followed by a variable number of arguments. The following definitions
describe the short and long forms:
#define CTF_MAX_SIZE 0xfffe /* max size of a type in bytes */
#define CTF_LSIZE_SENT 0xffff /* sentinel for ctt_size */
#define CTF_MAX_LSIZE UINT64_MAX
typedef struct ctf_stype {
uint_t ctt_name; /* reference to name in string table */
ushort_t ctt_info; /* encoded kind, variant length */
union {
ushort_t _size; /* size of entire type in bytes */
ushort_t _type; /* reference to another type */
} _u;
} ctf_stype_t;
typedef struct ctf_type {
uint_t ctt_name; /* reference to name in string table */
ushort_t ctt_info; /* encoded kind, variant length */
union {
ushort_t _size; /* always CTF_LSIZE_SENT */
ushort_t _type; /* do not use */
} _u;
uint_t ctt_lsizehi; /* high 32 bits of type size in bytes */
uint_t ctt_lsizelo; /* low 32 bits of type size in bytes */
} ctf_type_t;
#define ctt_size _u._size /* for fundamental types that have a size */
#define ctt_type _u._type /* for types that reference another type */
Type sizes are stored in
bytes. The basic small form uses a
ushort_t to store the number of bytes. If the number of
bytes in a structure would exceed 0xfffe, then the alternate form, the
ctf_type_t, is used instead. To indicate that the larger
form is being used, the member
ctt_size is set to value of
CTF_LSIZE_SENT (0xffff). In general, when going through the
type section, consumers use the
ctf_type_t structure, but
pay attention to the value of the member
ctt_size to
determine whether they should increment their scan by the size of the
ctf_stype_t or
ctf_type_t. Not all kinds
of types use
ctt_size. Those which do not, will always use
the
ctf_stype_t structure. The individual sections for each
kind have more information.
Types are written out in order. Therefore the first entry encountered has a type
id of 0x1, or 0x8000 if a child. The member
ctt_name is
encoded as described in the section
String Identifiers. The string
that it points to is the name of the type. If the identifier points to an
empty string (one that consists solely of a null terminator) then the type
does not have a name, this is common with anonymous structures and unions that
only have a typedef to name them, as well as, pointers and qualifiers.
The next member, the
ctt_info, is encoded as described in the
section
Type Encoding. The types kind
tells us how to interpret the remaining data in the
ctf_type_t and any variable length data that may exist. The
rest of this section will be broken down into the interpretation of the
various kinds.
Encoding of Integers
Integers, which are of type
CTF_K_INTEGER, have no variable
length arguments. Instead, they are followed by a four byte
uint_t which describes their encoding. All integers must be
encoded with a variable length of zero. The
ctt_size member
describes the length of the integer in bytes. In general, integer sizes will
be rounded up to the closest power of two.
The integer encoding contains three different pieces of information:
- The encoding of the
integer
- The offset in
bits of the type
- The size in
bits of the type
This encoding can be expressed through the following macros:
#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24)
#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16)
#define CTF_INT_BITS(data) (((data) & 0x0000ffff))
#define CTF_INT_DATA(encoding, offset, bits) \
(((encoding) << 24) | ((offset) << 16) | (bits))
The following flags are defined for the encoding at this time:
#define CTF_INT_SIGNED 0x01
#define CTF_INT_CHAR 0x02
#define CTF_INT_BOOL 0x04
#define CTF_INT_VARARGS 0x08
By default, an integer is considered to be unsigned, unless it has the
CTF_INT_SIGNED flag set. If the flag
CTF_INT_CHAR is set, that indicates that the integer is of a
type that stores character data, for example the intrinsic C type
char would have the
CTF_INT_CHAR flag set.
If the flag
CTF_INT_BOOL is set, that indicates that the
integer represents a boolean type. For example, the intrinsic C type
_Bool would have the
CTF_INT_BOOL flag
set. Finally, the flag
CTF_INT_VARARGS indicates that the
integer is used as part of a variable number of arguments. This encoding is
rather uncommon.
Encoding of Floats
Floats, which are of type
CTF_K_FLOAT, are similar to their
integer counterparts. They have no variable length arguments and are followed
by a four byte encoding which describes the kind of float that exists. The
ctt_size member is the size, in bytes, of the float. The
float encoding has three different pieces of information inside of it:
- The specific kind of float
that exists
- The offset in
bits of the float
- The size in
bits of the float
This encoding can be expressed through the following macros:
#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24)
#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16)
#define CTF_FP_BITS(data) (((data) & 0x0000ffff))
#define CTF_FP_DATA(encoding, offset, bits) \
(((encoding) << 24) | ((offset) << 16) | (bits))
Where as the encoding for integers was a series of flags, the encoding for
floats maps to a specific kind of float. It is not a flag-based value. The
kinds of floats correspond to both their size, and the encoding. This covers
all of the basic C intrinsic floating point types. The following are the
different kinds of floats represented in the encoding:
#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */
#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */
#define CTF_FP_CPLX 3 /* Complex encoding */
#define CTF_FP_DCPLX 4 /* Double complex encoding */
#define CTF_FP_LDCPLX 5 /* Long double complex encoding */
#define CTF_FP_LDOUBLE 6 /* Long double encoding */
#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */
#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */
#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */
#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */
#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */
#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */
Encoding of Arrays
Arrays, which are of type
CTF_K_ARRAY, have no variable length
arguments. They are followed by a structure which describes the number of
elements in the array, the type identifier of the elements in the array, and
the type identifier of the index of the array. With arrays, the
ctt_size member is set to zero. The structure that follows
an array is defined as:
typedef struct ctf_array {
ushort_t cta_contents; /* reference to type of array contents */
ushort_t cta_index; /* reference to type of array index */
uint_t cta_nelems; /* number of elements */
} ctf_array_t;
The
cta_contents and
cta_index members of
the
ctf_array_t are type identifiers which are encoded as
per the section
Type Identifiers.
The member
cta_nelems is a simple four byte unsigned count
of the number of elements. This count may be zero when encountering C99's
flexible array members.
Encoding of Functions
Function types, which are of type
CTF_K_FUNCTION, use the
variable length list to be the number of arguments in the function. When the
function has a final member which is a varargs, then the argument count is
incremented by one to account for the variable argument. Here, the
ctt_type member is encoded with the type identifier of the
return type of the function. Note that the
ctt_size member
is not used here.
The variable argument list contains the type identifiers for the arguments of
the function, if any. Each one is represented by a
uint16_t
and encoded according to the
Type
Identifiers section. If the function's last argument is of type varargs,
then it is also written out, but the type identifier is zero. This is included
in the count of the function's arguments.
Encoding of Structures and
Unions
Structures and Unions, which are encoded with
CTF_K_STRUCT and
CTF_K_UNION respectively, are very similar constructs in C.
The main difference between them is the fact that every member of a structure
follows one another, where as in a union, all members share the same memory.
They are also very similar in terms of their encoding in
ctf. The variable length argument for structures and unions
represents the number of members that they have. The value of the member
ctt_size is the size of the structure and union. There are
two different structures which are used to encode members in the variable
list. When the size of a structure or union is greater than or equal to the
large member threshold, 8192, then a different structure is used to encode the
member, all members are encoded using the same structure. The structure for
members is as follows:
typedef struct ctf_member {
uint_t ctm_name; /* reference to name in string table */
ushort_t ctm_type; /* reference to type of member */
ushort_t ctm_offset; /* offset of this member in bits */
} ctf_member_t;
typedef struct ctf_lmember {
uint_t ctlm_name; /* reference to name in string table */
ushort_t ctlm_type; /* reference to type of member */
ushort_t ctlm_pad; /* padding */
uint_t ctlm_offsethi; /* high 32 bits of member offset in bits */
uint_t ctlm_offsetlo; /* low 32 bits of member offset in bits */
} ctf_lmember_t;
Both the
ctm_name and
ctlm_name refer to the
name of the member. The name is encoded as an offset into the string table as
described by the section
String
Identifiers. The members
ctm_type and
ctlm_type both refer to the type of the member. They are
encoded as per the section
Type
Identifiers.
The last piece of information that is present is the offset which describes the
offset in memory that the member begins at. For unions, this value will always
be zero because the start of unions in memory is always zero. For structures,
this is the offset in
bits that the member begins at. Note
that a compiler may lay out a type with padding. This means that the
difference in offset between two consecutive members may be larger than the
size of the member. When the size of the overall structure is strictly less
than 8192 bytes, the normal structure,
ctf_member_t, is used
and the offset in bits is stored in the member
ctm_offset.
However, when the size of the structure is greater than or equal to 8192
bytes, then the number of bits is split into two 32-bit quantities. One
member,
ctlm_offsethi, represents the upper 32 bits of the
offset, while the other member,
ctlm_offsetlo, represents
the lower 32 bits of the offset. These can be joined together to get a 64-bit
sized offset in bits by shifting the member
ctlm_offsethi to
the left by thirty two and then doing a binary or of
ctlm_offsetlo.
Encoding of Enumerations
Enumerations, noted by the type
CTF_K_ENUM, are similar to
structures. Enumerations use the variable list to note the number of values
that the enumeration contains, which we'll term enumerators. In C, an
enumeration is always equivalent to the intrinsic type
int,
thus the value of the member
ctt_size is always the size of
an integer which is determined based on the current model. For illumos
systems, this will always be 4, as an integer is always defined to be 4 bytes
large in both
ILP32 and
LP64, regardless
of the architecture.
The enumerators encoded in an enumeration have the following structure in the
variable list:
typedef struct ctf_enum {
uint_t cte_name; /* reference to name in string table */
int cte_value; /* value associated with this name */
} ctf_enum_t;
The member
cte_name refers to the name of the enumerator's
value, it is encoded according to the rules in the section
String Identifiers. The member
cte_value contains the integer value of this enumerator.
Encoding of Forward
References
Forward references, types of kind
CTF_K_FORWARD, in a
ctf file refer to types which may not have a definition at
all, only a name. If the
ctf file is a child, then it may be
that the forward is resolved to an actual type in the parent, otherwise the
definition may be in another
ctf container or may not be
known at all. The only member of the
ctf_type_t that matters
for a forward declaration is the
ctt_name which points to
the name of the forward reference in the string table as described earlier.
There is no other information recorded for forward references.
Encoding
of Pointers, Typedefs, Volatile, Const, and Restrict
Pointers, typedefs, volatile, const, and restrict are all similar in
ctf. They all refer to another type. In the case of
typedefs, they provide an alternate name, while volatile, const, and restrict
change how the type is interpreted in the C programming language. This covers
the
ctf kinds
CTF_K_POINTER,
CTF_K_TYPEDEF,
CTF_K_VOLATILE,
CTF_K_RESTRICT, and
CTF_K_CONST.
These types have no variable list entries and use the member
ctt_type to refer to the base type that they modify.
Encoding of Unknown Types
Types with the kind
CTF_K_UNKNOWN are used to indicate gaps in
the type identifier space. These entries consume an identifier, but do not
define anything. Nothing should refer to these gap identifiers.
Dependencies Between Types
C types can be imagined as a directed, cyclic, graph. Structures and unions may
refer to each other in a way that creates a cyclic dependency. In cases such
as these, the entire type section must be read in and processed. Consumers
must not assume that every type can be laid out in dependency order; they
cannot.
The String Section
The last section of the
ctf file is the
string section. This section encodes all of the strings that
appear throughout the other sections. It is laid out as a series of characters
followed by a null terminator. Generally, all names are written out in ASCII,
as most C compilers do not allow and characters to appear in identifiers
outside of a subset of ASCII. However, any extended characters sets should be
written out as a series of UTF-8 bytes.
The first entry in the section, at offset zero, is a single null terminator to
reference the empty string. Following that, each C string should be written
out, including the null terminator. Offsets that refer to something in this
section should refer to the first byte which begins a string. Beyond the first
byte in the section being the null terminator, the order of strings is
unimportant.
Data Encoding and ELF
Considerations
ctf data is generally included in ELF objects which specify
information to identify the architecture and endianness of the file. A
ctf container inside such an object must match the
endianness of the ELF object. Aside from the question of the endian encoding
of data, there should be no other differences between architectures. While
many of the types in this document refer to non-fixed size C integral types,
they are equivalent in the models
ILP32 and
LP64. If any other model is being used with
ctf data that has different sizes, then it must not use the
model's sizes for those integral types and instead use the fixed size
equivalents based on an
ILP32 environment.
When placing a
ctf container inside of an ELF object, there
are certain conventions that are expected for the purposes of tooling being
able to find the
ctf data. In particular, a given ELF object
should only contain a single
ctf section. Multiple
containers should be merged together into a single one.
The
ctf file should be included in its own ELF section. The
section's name must be ‘
.SUNW_ctf
’. The
type of the section must be
SHT_PROGBITS. The section should
have a link set to the symbol table and its address alignment must be 4.
SEE ALSO
dtrace(1),
elf(3),
gelf(3),
a.out(5),
elf(5)