One of the design goals of the PDB format is to provide accelerated access to debug information, and for this reason there are several occasions where hash tables are serialized and embedded directly to the file, rather than requiring a consumer to read a list of values and reconstruct the hash table on the fly.
The serialization format supports hash tables of arbitrarily large size and capacity, as well as value types and hash functions. The only supported key value type is a uint32. The only requirement is that the producer and consumer agree on the hash function. As such, the hash function can is not discussed further in this document, it is assumed that for a particular instance of a PDB file hash table, the appropriate hash function is being used.
.--------------------.-- +0
| Size |
.--------------------.-- +4
| Capacity |
.--------------------.-- +8
| Present Bit Vector |
.--------------------.-- +N
| Deleted Bit Vector |
.--------------------.-- +M ─╮
| Key | │
.--------------------.-- +M+4 │
| Value | │
.--------------------.-- +M+4+sizeof(Value) │
... ├─ |Capacity| Bucket entries
.--------------------. │
| Key | │
.--------------------. │
| Value | │
.--------------------. ─╯
The bit vectors indicating the status of each bucket are serialized as follows:
.--------------------.-- +0
| Word Count |
.--------------------.-- +4
| Word_0 | ─╮
.--------------------.-- +8 │
| Word_1 | │
.--------------------.-- +12 ├─ |Word Count| values
... │
.--------------------. │
| Word_N | │
.--------------------. ─╯
The words, when viewed as a contiguous block of bytes, represent a bit vector with the following layout:
.------------. .------------.------------.
| Word_N | ... | Word_1 | Word_0 |
.------------. .------------.------------.
| | | | |
+N*32 +(N-1)*32 +64 +32 +0
where the k’th bit of this bit vector represents the status of the k’th bucket in the hash table.