Traditionally, LLVM IR pointer types have contained a pointee type. For example, i32 * is a pointer that points to an i32 somewhere in memory. However, due to a lack of pointee type semantics and various issues with having pointee types, there is a desire to remove pointee types from pointers.
The opaque pointer type project aims to replace all pointer types containing pointee types in LLVM with an opaque pointer type. The new pointer type is tentatively represented textually as ptr.
Address spaces are still used to distinguish between different kinds of pointers where the distinction is relevant for lowering (e.g. data vs function pointers have different sizes on some architectures). Opaque pointers are not changing anything related to address spaces and lowering. For more information, see DataLayout.
LLVM IR pointers can be cast back and forth between pointers with different pointee types. The pointee type does not necessarily actually represent the actual underlying type in memory. In other words, the pointee type contains no real semantics.
Lots of operations do not actually care about the underlying type. These operations, typically intrinsics, usually end up taking an i8 *. This causes lots of redundant no-op bitcasts in the IR to and from a pointer with a different pointee type. The extra bitcasts take up space and require extra work to look through in optimizations. And more bitcasts increases the chances of incorrect bitcasts, especially in regards to address spaces.
Some instructions still need to know what type to treat the memory pointed to by the pointer as. For example, a load needs to know how many bytes to load from memory. In these cases, instructions themselves contain a type argument. For example the load instruction from older versions of LLVM
load i64* %p
becomes
load i64, ptr %p
A nice analogous transition that happened earlier in LLVM is integer signedness. There is no distinction between signed and unsigned integer types, rather the integer operations themselves contain what to treat the integer as. Initially, LLVM IR distinguished between unsigned and signed integer types. The transition from manifesting signedness in types to instructions happened early on in LLVM’s life to the betterment of LLVM IR.
The frontend should already know what type each operation operates on based on the input source code. However, some frontends like Clang may end up relying on LLVM pointer pointee types to keep track of pointee types. The frontend needs to keep track of frontend pointee types on its own.
For optimizations around frontend types, pointee types are not useful due their lack of semantics. Rather, since LLVM IR works on untyped memory, for a frontend to tell LLVM about frontend types for the purposes of alias analysis, extra metadata is added to the IR. For more information, see TBAA.
Some specific operations still need to know what type a pointer types to. For the most part, this is codegen and ABI specific. For example, byval arguments are pointers, but backends need to know the underlying type of the argument to properly lower it. In cases like these, the attributes contain a type argument. For example,
call void @f(ptr byval(i32) %p)
signifies that %p as an argument should be lowered as an i32 passed indirectly.
If you have use cases that this sort of fix doesn’t cover, please email llvm-dev.
LLVM currently has many places that depend on pointee types. Each dependency on pointee types needs to be resolved in some way or another. This essentially translates to figuring out how to remove all calls to PointerType::getElementType and Type::getPointerElementType().
Making everything use opaque pointers in one huge commit is infeasible. This needs to be done incrementally. The following steps need to be done, in no particular order:
If you have your own frontend, there are a couple of things to do after opaque pointer types fully work.