I’m trying to write a binary template that parses Microsoft’s PDB debug files. However, the format is paged, meaning a data stream of structures can be split across non-contiguous, fixed-size pages. I can coalesce the pages into a contiguous local uchar array, but I can’t do anything else with that.
I know you can FSeek inside a structure definition, but in this case I don’t know where exactly it will be split. Is there a way to get around this limitation?
Unfortunately, there is no easy way to handle this type of data in Binary Templates right now. We are planning on adding some extensions in the future to handle these types of files but we are still working on how it will operate. There is a template for Microsoft DOC files in the repository that works by assuming that the file is first defragmented (all the blocks put in order) and I’m not sure if PDB could use a similar technique. Another option could be to use a short script to write all the blocks in the PDB file to a temporary file in order, and then run a template on the temporary file. This is of course not ideal and we hope to have something to handle these types of files in not too long.
It’s good to know that I wasn’t missing anything obvious and I look forward to this new functionality. Any time I write an encoder or decoder for a file format I like to create a corresponding template because 010’s interface is much nicer to look at than text dumps, and this is the first time I couldn’t figure out how to accomplish that. As for the defragmented temporary file, I think that’s my best option, but I’ll do that outside the editor as a pre-processing step.