EMD/CRT: Difference between revisions

From Ft Wm, Inv & Tor CC
< EMD
Content added Content deleted
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
==The [https://en.wikipedia.org/wiki/Chinese_remainder_theorem Chinese Remainder Theorem] is a useful result in the theory of prime numbers==
==The [https://en.wikipedia.org/wiki/Chinese_remainder_theorem Chinese Remainder Theorem (CRT)] is a useful result in the theory of prime numbers==


In June 2022 [[EMD]] used the CRT to devise a maximally collision-resistant checksum with an entropy in excess of 323 bits without the hassle of careful CRC design. He then wrote simple C programs to select near-optimal prime moduli based a candidate table cribbed from <[https://math.utm.edu/lists/2small/0bit.html https://math.utm.edu/lists/2small/]>, and to determine good 'geometries' (ie formats) for temporary files recording bitvectors totalling 547 30-bit words, suitable for recording whether or not a file having length congruent to each number in the range 1..2^14 exists in a given directory (aka 'folder'); 547x30 = 2^24+26 = 16410. Satisfied, he wrote a rather longer C program to:-
In June 2022 [[EMD]] used the CRT to devise a maximally collision-resistant checksum with an entropy in excess of 323 bits without the hassle of careful [https://en.wikipedia.org/wiki/Cyclic_redundancy_check CRC] design. He then wrote simple C programs to select near-optimal prime moduli based a candidate table cribbed from <[https://primes.utm.edu/lists/2small/0bit.html https://primes.utm.edu/lists/2small/0bit.html]>, and to determine good 'geometries' (ie formats) for temporary files recording bitvectors totalling 547 30-bit words, suitable for recording whether or not a file having length congruent to each number in the range 1..2^14 exists in a given directory (aka 'folder'); 547x30 = 2^14+26 = 16410. Satisfied, he wrote a rather longer C program to:-


* Traverse a filestore taking a snapshot of the directory structure, filenames, and file lengths.
* Traverse a filestore taking a snapshot of the directory structure, filenames, and file lengths.
Line 8: Line 8:


For obvious reasons, the resulting output file has a '.CRT' filename extension. EMD then wrote another, somewhat shorter, C program to read up to five CRT files simultaneously and generate a report of consistencies and inconsistencies; these two programs were soon integrated into a single executable, 'SinoSure.EXE', intended to verify that filesystems have been accurately duplicated onto backup media.
For obvious reasons, the resulting output file has a '.CRT' filename extension. EMD then wrote another, somewhat shorter, C program to read up to five CRT files simultaneously and generate a report of consistencies and inconsistencies; these two programs were soon integrated into a single executable, 'SinoSure.EXE', intended to verify that filesystems have been accurately duplicated onto backup media.

By 11th July 2022, EMD had added per-file reporting to SinoSure, meaning that individual files not fully backed-up were named-and-shamed explicitly; this feature was arguably crucial to its subsequent adoption as a standard.

Latest revision as of 07:53, 11 August 2022

The Chinese Remainder Theorem (CRT) is a useful result in the theory of prime numbers

In June 2022 EMD used the CRT to devise a maximally collision-resistant checksum with an entropy in excess of 323 bits without the hassle of careful CRC design. He then wrote simple C programs to select near-optimal prime moduli based a candidate table cribbed from <https://primes.utm.edu/lists/2small/0bit.html>, and to determine good 'geometries' (ie formats) for temporary files recording bitvectors totalling 547 30-bit words, suitable for recording whether or not a file having length congruent to each number in the range 1..2^14 exists in a given directory (aka 'folder'); 547x30 = 2^14+26 = 16410. Satisfied, he wrote a rather longer C program to:-

  • Traverse a filestore taking a snapshot of the directory structure, filenames, and file lengths.
  • Split the snapshot into many subfiles using a 'geometry' appropriate to the total number of directories encountered.
  • Load, in turn, each subfile into main memory, and calculate and output 324-bit checksums of each file in the filestore.

For obvious reasons, the resulting output file has a '.CRT' filename extension. EMD then wrote another, somewhat shorter, C program to read up to five CRT files simultaneously and generate a report of consistencies and inconsistencies; these two programs were soon integrated into a single executable, 'SinoSure.EXE', intended to verify that filesystems have been accurately duplicated onto backup media.

By 11th July 2022, EMD had added per-file reporting to SinoSure, meaning that individual files not fully backed-up were named-and-shamed explicitly; this feature was arguably crucial to its subsequent adoption as a standard.