


These structures translate trivially to the syntax used by DROID.Īn example where we can see such structure appear in a file format is version two of the GRIdded Binary format from the World Meteorological Organisation which describes its structure thusly: Bytes 1-4 "iGRIBi" (ASCII) 5-6 Reserved 7 Discipline 8 GRIB Edition (version) number (currently 2) 9-16 Total length of GRIB message in octets In practicality when we first create a new signature this is difficult, but the more signatures we create and the more testing we can do on larger sets of files, the better we can refine this work.Īn example of signatures that work particularly well are those that relate directly to the programming structures output by the creating software, a struct is often a structured type that aggregates different objects and data types of fixed and variable sizes (in bytes). It is important that we can identify a discrete format from all others without our signatures colliding or incorrectly identifying another format. Theoretically this syntax allows us to create unique and robust signatures. The constant patterns we look for are linked together by regular expression syntax which allow us to model the unpredictability of the linking bytes, so for example, to connect two patterns, one a sequence at a beginning that tells us the primary type of file, the other further into the file stream, a snippet of XML that tells us something about the version, we would connect the two sequences using a wildcard (*) character. They are downloaded from PRONOM and stored in a ‘signature file’ which is referenced by the tool. Ideally these bytes also provide extra information which helps us to obtain more granular identifications. The basic premise of the digital signatures used by DROID is to find these unique and constant patterns of bytes within a file. The 32-bit hexadecimal numbers 0xCAFEBABE and 0xCAFEDOOD stored in PRONOM and taken verbatim would allow us to identify both file types above respectively. When we talk about digital signatures in the context of the work we do in Digital Preservation we are referring to these magic numbers, a unique pattern of bytes which we read from a digital file to help us identify its file format and version. 0xCAFED00D and 0xCAFEBABE Hexadecimal Bytes
