LINFO

Magic Number Definition



A magic number is a number embedded at or near the beginning of a file that indicates its file format (i.e., the type of file it is). It is also sometimes referred to as a file signature.

Magic numbers are generally not visible to users. However, they can easily be seen with the use of a hex editor, which is a specialized program that shows and allows modification of every byte in a file.

For common file formats, the numbers conveniently represent the names of the file types. Thus, for example, the magic number for image files conforming to the widely used GIF87a format in hexadecimal (i.e., base 16) terms is 0x474946383761, which when converted into ASCII is GIF87a. ASCII is the de facto standard used by computers and communications equipment for character encoding (i.e., associating alphabetic and other characters with numbers).

Likewise, the magic number for image files having the subsequently introduced GIF89a format is 0x474946383961. For both types of GIF (Graphic Interchange Format) files, the magic number occupies the first six bytes of the file. They are then followed by additional general information (i.e., metadata) about the file.

Similarly, a commonly used magic number for JPEG (Joint Photographic Experts Group) image files is 0x4A464946, which is the ASCII equivalent of JFIF (JPEG File Interchange Format). However, JPEG magic numbers are not the first bytes in the file; rather, they begin with the seventh byte. Additional examples include 0x4D546864 for MIDI (Musical Instrument Digital Interface) files and 0x425a6831415925 for bzip2 compressed files.

Magic numbers are not always the ASCII equivalent of the name of the file format, or even something similar. For example, in some types of files they represent the name or initials of the developer of that file format. Also, in at least one type of file the magic number represents the birthday of that format's developer.

Various programs make use of magic numbers to determine the file type. Among them is the command line (i.e., all-text mode) program named file, whose sole purpose is determining the file type.

Although they can be useful, magic numbers are not always sufficient to determine the file type. The main reason is that some file types do not have magic numbers, most notably plain text files, which include HTML (hypertext markup language), XHTML (extensible HTML) and XML (extensible markup language) files as well as source code.

Fortunately, there are also other means that can be used by programs to determine file types. One is by looking at a file's character set (e.g., ASCII) to see if it is a plain text file. If it is determined that a file is a plain text file, then it is often possible to further categorize it on the basis of the start of the text, such as <html> for HTML files and #! (the so-called shebang) for script (i.e., short program) files.

Another way to determine file types is through the use of filename extensions (e.g., .exe, .html and .jpg), which are required on the various Microsoft operating systems but only to a small extent on Linux and other Unix-like operating systems. However, this approach has the disadvantage that it relatively easy for a user to accidentally change or remove the extensions, in which case it becomes difficult to determine the file type and use the file.

Still another way that is possible in the case of some commonly used filesystems is through the use of file type information that is embedded in each file's metadata. In Unix-like operating systems, such metadata is contained in inodes, which are data structures (i.e., efficient ways of storing information) that store all the information about files except their names and their actual data.

Magic numbers are referred to as magic because the purpose and significance of their values are not apparent without some additional knowledge. The term magic number is also used in programming to refer to a constant that is employed for some specific purpose but whose presence or value is inexplicable without additional information.






Created August 21, 2006.
Copyright © 2006 The Linux Information Project. All Rights Reserved.