I found my paranoid-file-copier program! The UI seriously needs work, tho...
@woozle At first glance it looks like you just started a package manager.
What about it classifies as "paranoid" though?
@ThorstenAnzomi By default, it does a byte-by-byte comparison after each copy, and logs any mismatches. If it's in "offload" mode, it won't delete the source if the target doesn't match.
There's some other minor stuff it does, but I think that's the main reason.
@woozle That's....... interesting.
you don't happen to have the source code available anywhere, do you?
Also VERY dumb question, why not use a (secure) hash?
@ThorstenAnzomi I'm planning to post it on GitLab as soon as I can get it tidied up a bit.
Correct me if I'm wrong, but a hash would require just as much I/O and would therefore not be any faster.
@woozle A hash would require the same amount of I/O, yes, as both files need to be read in their entirety and summed up.
However, the time saved is in the comparisons: you're doing a single comparison (hash == hash) and not one comparison per byte. Yes, hash comparisons are also made of multiple, but they average 40-50, not in the thousands, as you would for byte-by-byte.
@ThorstenAnzomi But the comparison takes place in RAM, so is almost instantaneous either way, no?
@woozle Comparisons are usually done in the CPU's ALU, effectively a special subtract. If the subtraction result is 0, the two are equal. Takes a few cycles to complete, for take of argument, let's say 5.
To check an entire file, that's 5 cycles per comparison, times file length.
To check a hash, that's 5 times either the length of the hash divided by the CPU's word size (if stored as a number, usually <10), or the length of the string (90 char for b64). I/O it's the same, but CPU time isn't.
@woozle To expand more: using SHA-256 as an example, which generates 256 bit hashes, and most computers today are 64 bit, that means the hash is 4 times the CPU's word size, meaning it'd take 4 subcompares to store a hash when stored as a (giant, admittedly) number.
@woozle Then again, I could also be COMPLETELY wrong, but from my understanding of computers, you'd shave off a little bit of cycle time by using hash comparisons, especially since this is the sort of thing hashes are used for.
@ThorstenAnzomi Possibly -- but is the time saving significant by comparison to the I/O time? ...not to mention the time spent parsing interpreted code and displaying results on the screen, which can measurably slow down progress if there's too much updating (ask me how I know).
@woozle That would depend on the speed at which the files can be read, and the cpu speed, few other things...
probably not a significant speed increase because it's the same I/O, however it might reduce memory consumption a little.
Then again, I come from the world of trying to shave literal milliseconds off programs so they run just that much quicker, it's probably nothing noticable in production, but I'd need a copy to test myself what the difference is, if any.
@ThorstenAnzomi Been there -- I ended up translating backprop neural network code from Pascal to ASM86 just to get a few percentage points of improvement in speed.
(Geez, isn't there anything faster than a 486DX-25?? :D)
@woozle Ah, pascal... How I originally taught myself programming (and THEN moved to VB.NET, then... C++...)
@ThorstenAnzomi I did FORTRAN IV (and then 77), then Pascal, then C++, then VB6, then a bit of Perl and finally PHP.
@woozle I've done.... in chronological order... Pascal, VB.NET, C++, Java, Lua, FORTRAN, COBOL, Perl, Python, FORTH, C#, Go, F#, Haskell...
Yeah, I think I might have a minor obsession...
@ThorstenAnzomi Any favorites?
@woozle I'm definitely a Go / C# programmer now, despite C# being god awful on Linux. C++ has it's place, and so does Python.
And yes.. as someone who does the modded Minecraft, both OpenComputers and ComputerCraft use Lua by default, though I did write a OC CPU architecture for Python 3. So that's my reasoning for Lua.
@ThorstenAnzomi Belated thought: if sftp had an API function for returning a hash of a file, that would *definitely* make it faster to compare hashes than block-by-block.
Failing that... I suppose it would be easy enough to just write a utility to calculate a hash for a local file, and then have the remote request that... or does such a thing exist already, perhaps? But this would still require executing remote code...
@woozle Here's a thought. Most Linux distributions already have a command (sha256sum) that does the calculations and returns a base-16 string of the hash. Open a persistent SSH connection and sha256sum each of your files? You can also pipe the output into "base64" to convert it to a base64 string.
Naturally, only works for Linux.
A Mastodon instance for cats, the people who love them, and kindness in general. We strive to be a radically inclusive safe space. By creating an account, you agree to follow our CoC below.