• 0 Posts
  • 36 Comments
Joined 11 months ago
cake
Cake day: August 4th, 2023

help-circle




  • Bash scripts are rarely the best choice for large, complicated, programs or for software that requires complex data structures. (Git is very much in both categories)

    In bash there are so many ways to accidentally shoot yourself in the foot that it’s absurd. That can lead to bizarre seeming behavior, which may break your script, or even lead to a security vulnerability.

    There are things that make it a bit more manageable, like “[[]]” instead of “[]”, but a frustrating number of such things are bash specific and this is written for the subset that is POSIX shell, meaning you don’t even get those token niceties.

    Where you generally “want” to use POSIX sh is for relatively simple scripts where the same file needs to run on Linux with bash, Linux with only BusyBox sh, OSX with zfs (and an outdated version of bash which will never be updated because Apple refuses to ship any GPLv3 software in MacOS).

    This is not that, and so one would expect that:

    1. The developer of this git implementation has poor / foolish judgement.

    2. Shit will be buggy

    3. Shit will be insecure

    4. Shit will be a PITA to try to troubleshoot or fix

    5. And shit will be slow, because bash is slow and this isn’t a job that you can just hand off all of the heavy lifting to grep / sed / awk*, because the data structures don’t lend themselves to that.

    * You could write the entire program in awk, and maybe even end up with something almost as fast as a python implementation done in ⅒ the time, but that would be terrible in other ways.





  • Either way, this is a rule that you as a human are required to follow, and if you fail the compiler is allowed to do anything, including killing your cat.

    It’s not a rule that the compiler enforces by failing to build code with undefined behavior.

    That is a fundamental, and extremely important, difference between C and rust.

    Also, C compilers do make optimization decisions by assuming that you as a human programmer have followed these strict aliasing rules.

    https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

    Has a few examples where code runs “properly” without optimizations but “improperly” with optimizations.

    I put “improperly” in quotes because the C spec says that a compiler can do whatever it wants if you as a human invoke undefined behavior. Safe rust does not have undefined behavior, because if you write code which would invoke UB, rustc will refuse to build it.




  • To put it another way:

    Strict aliasing is an invariant that C compilers assume you as a developer will not violate, and use that assumption to make optimization choices that, if you as the developer have failed to follow the strict aliasing rules, could lead to undefined behavior. So it’s a variant that the compiler expects, but doesn’t enforce at compile time.

    I guess it is possible to just disable all such optimizations to get a C compiler that doesn’t create UB just because strict aliasing rules were broken, but there are still many ways that you can trigger UB in C, while safe rust that compiles successfully theoretically has no UB at all.








  • For years I wrote embedded C for 8 bit microcontrollers used in industrial controls.

    Never again.

    Rust is by far a better language for embedded. The only times I would consider it reasonable to write embedded code in C is if you’re doing it for fun, or you depend on an existing and well tested / audited codebase or library and your application logic is less complicated than rust to C FFI.

    Even then, you won’t find me contributing to that effort.