Why is hashing important in forensics?

“A “hash value” is an electronic fingerprint. The data within a file is represented through the cryptographic algorithm as that hash value”. Digital forensics professionals use hashing algorithms to generate hash values of the original files they use in the investigation. This ensures that the information isn’t altered during the course of the investigation since various tools and techniques are involved in data analysis and evidence collection that can affect the data’s integrity. Another reason why hash values are important is that electronic documents are shared with legal professionals and other parties during investigation, and it’s important to ensure that everyone has identical copies of the files. Hash values are used to identify and filter duplicate files (i.e. email, attachments, and loose files) from an ESI collection(electronically stored information ) or verify that a forensic image or clone was captured successfully.

Following are the rules regarding the hash value extracted from the Digital Evidence Investigation Manual Central Board of Direct Taxes ,Department of Revenue, Ministry of Finance, Government of India:

  • Accessing a system or hard disk in any way without the use of “write-protect” devices causes change in the hash value or digital fingerprint of the disk. This can render the evidence on such disks inadmissible.
  • Mathematical hashing is equivalent to one-way encryption. Every digital evidence at the lowest level translates into a big numerical number. When the digital device or data is encrypted using a hashing algorithm, it results in a new number of a fixed length called the dark message digest. The hashing algorithm has some unique characteristics, which are as follows:
  1. Message digest always of a fixed length: The digital evidence may be of any size, but on application of the hash algorithm the resultant message digest would always be of a fixed length.
  2. Message digest is a randomly generated number: The message digest is a randomly generated number. However, if the contents of the digital evidence remain the same, the hash algorithm would generate the same message digest every time it is applied on the digital evidence. This property is useful in authenticating seized digital evidence before a court of law. If application of hash algorithm on digital evidence in a court of law results in the same message digest as was obtained during the time of seizure, it indicates that the presented evidence is the same as what was seized.
  3. One-way hash function: It is computationally not feasible to determine the contents of the digital evidence if somebody knows the message digest. Hash algorithm is a one-way function. This property is of great importance from the legal point of view, since it prevents manipulation of digital evidence as no one can predict the message digest that would be generated if the evidence is manipulated.
  4. collision-free hash: The odds that two digital pieces of evidence with different contents have the same message digest is roughly 2 to the power128 (i.e. 34 followed by 37 zeros).
  • On completion of the imaging process, the device displays the hash value of the cloned hard disk. The image/clone has to have the same hash value as that of the target hard disk. The Hash value should be recorded in the Panchnama and the assessee can be given the option of seeking a copy of the imaged/ cloned hard disk by paying the copying charges.
  • Before seizing any of the digital evidence, their hash value must be calculated using forensic tools such as cyber check or duplicator or anything else. There will be a report generated by these tools which can be attached along with the panchnama.
  • Digital Forensic Report( Given by Forensic Examiner) containing details of hash value and the details of all mahazar drawn to open the digital evidence at various times to gather further evidence should be included as an annexure to the assessment order. If the chain of custody form is present, the same can be annexed to the assessment order. This will establish the integrity of the data before any court of law.
  • 11.4.3 Procedure for imaging seized hard disks

In cases where hard disks cannot be cloned at the site and are therefore seized, two sets of images/clones should be created in the lab in presence of the assessee or his representative and the authorized officer following the same procedure as described above. A panchnama should be prepared for this activity recording the hash value of each of the hard disks imaged and the other particulars mentioned above. The assessee may be given an option to obtain a copy of the image at his cost.

  • Sec. 3. Of the IT Act: Authentication of electronic records.

(1) Subject to the provisions of this section any subscriber may authenticate an electronic record by affixing his digital signature.

(2) The authentication of the electronic record shall be effected by the use of an asymmetric cryptosystem and hash function which envelopes and transforms the initial electronic record into another electronic record.

Explanation.- For this sub-section, “hash function” means an algorithm mapping or translation of one sequence of bits into another, generally smaller, set known as “hash result” such that an electronic record yields the same hash result every time the algorithm is executed with the same electronic record as its input making it computationally infeasible-

(a) to derive or reconstruct the original electronic record from the hash result produced by the algorithm;

(b) that two electronic records can produce the same hash result using the algorithm.

CASE LAWS

  • United States vs. Cartier: the district court found that the files with the same hash value have 99.99% probability of being identical.
  • Shirley Williams v. Sprint/United Management Company  : The court observed that When an electronic file is sent with a hash mark, others can read it, but the file cannot be altered without a change also occurring in the hash mark.  The producing party can be certain that the file was not altered by running the creator’s hash mark algorithm to verify that the original hash mark is generated.  This method allows a large amount of data to be self-authenticating with a rather small hash mark, efficiently assuring that the original image has not been manipulated.
  • Dramatico Entertainment Ltd. vs. British Sky Broadcasting Ltd: While commenting on the role of hash values in identifying the impugned data by the investigation agency in Peer to Peer Network and its admissibility in the court, it was observed that “the hash value is a reference code comprising a string of letters and numbers, which is used to identify each piece of the content to be shared. This enables the tracker to recognize pieces of the content file as they are shared and is intended to ensure that the content files are correctly downloaded and unmodified.”
  • Lorraine v. Markel:The court dealing with the issue search by Hash Value also observed that it would narrowly restrict the searched of digital devices and observed that “A file can be mislabeled; its extension (a sort of suffix indicating the type of file) can be changed; it can actually be converted to a different file type (just as a chat transcript can be captured as an image file, so can an image be inserted into a word-processing file and saved as such).  Any of these manipulations could change a document’s hash value. And in any event a limited hash-value search would not have turned up any chat transcripts (which, again, can be saved as image files)”

CONCLUSION

Thus, the hash value is internationally accepted scientifically attested means to authenticate the reliability and authenticity of the electronic evidence which has also been recognized as admissible by the Courts in USA as well as various other digitally advanced countries. The provision of Information Technology Act, 2000 also recognize the hash value as unique and MD5 & SHA-2 as the standard hash function attuned to International Standards.Therefore, before seizing any of the digital evidence, their hash value must be calculated using forensic tools such as cyber check or duplicator or anything else. There will be a report generated by these tools which can be attached along with the panchnama and it can be concluded from the above-mentioned guidelines that any digital evidence does not have any evidentiary value if the hash value is not taken into account while considering it. 

What is hashing and why is it important?

Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use for hashing is the implementation of hash tables.

What is the purpose of hashing an evidence file?

Just like law enforcement uses DNA to authenticate physical evidence at a crime scene, eDiscovery and forensic professionals use hash values to authenticate electronic evidence, which can be vitally important if there are disputes regarding the authenticity of the evidence in your case!

What is the importance of the hash function?

Hash functions are used for data integrity and often in combination with digital signatures. With a good hash function, even a 1-bit change in a message will produce a different hash (on average, half of the bits change). With digital signatures, a message is hashed and then the hash itself is signed.