• 0 Posts
  • 20 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle

  • Could use Polars, afaik it supports streaming from CSVs too, and frankly the syntax is so much nicer than pandas coming from spark land.

    Do you need to persist? What are you doing with them? A really common pattern for analytics is landing those in something like Parquet, Delta, less frequently seen Avro or ORC and then working right off that. If they don’t change, it’s an option. 100 gigs of CSVs will take some time to write to a database depending on resources, tools, db flavour, tbf writing into a compressed format takes time too, but saves you managing databases (unless you want to, just presenting some alternates)

    Could look at a document db, again, will take time to ingest and index, but definitely another tool, I’ve touched elastic and stood up mongo before, but Solr is around and built on top of lucene which I knew elastic was but apparently so is mongo.

    Edit: searchable? I’d look into a document db, it’s quite literally what they’re meant for, all of those I mentioned are used for enterprise search.



  • I was originally going to to go the docker route but honestly just ended up going the binary route and leaving it using sqlite as it’s good enough for now. It’s pretty well documented and a chunk of the prereqs I already had, like the git user creation.

    Did have SSH auth issues though, probably becauae I didn’t fully cleanup after uninstalling gitlab (oops), had them in parallel for a bit to migrate the repos, gitlab had it trying to use gitlab-shell which didn’t exist anymore. Probably a better/proper solution but what worked was changing the git user’s home directory back to /home/git as gitlab had it using a gitlab config directory. I welcome anyone giving me a better/cleaner solution for this, on my to do list to do some more cleanup.




  • Heck, there are already ISO language standards, and there’s ISO Software Lifecycle standards, it’s absolutely not a leap to move into standards adhering processes. It’s not like there’s no desire to do it either, code standards alone, how many times have you had discussions about style guides and coding standards company wide? It makes things more consistent and easier for different developers to maintain.

    Semi related, I see a lot of non-iso standard SQL that’s a pain if you do migrations or refactors, often even just sucks to read through (old school oracle joins look really strange and aren’t clear compared to iso standard joins). I really wish people would adhere to the standards as much as possible.


  • I realised you meant this over lunch, I’m a mech eng who changed disciplines into software (data and systems mainly) over my career, I 100% feel you, I have seen enough colleagues do things that wouldn’t fly in other disciplines, it’s definitely put me off a number of times. I’m personally for rubber stamping by a PEng and the responsibility that comes with that. There’s enough regulatory and ethical considerations just in data usage that warrants an engineering review, systems designed for compliance should be stamped too.

    Really bothers me sometimes how wildwest things are.


  • Edit: see my response, realised the comment was about engineering accountability which I 100% agree with, leaving my original post untouched aside from a typo that’s annoying me.

    I respectfully disagree coming from a reliability POV, you won’t address culture or processes that enable a person to make a mistake. With the exception of malice or negligence, no one does something like this in a vacuum; insufficient or incorrect training, unreasonable pressure, poorly designed processes, a culture that enables actions that lead to failure.

    Example I recall from when I worked manufacturing, operator runs a piece of equipment that joins pieces together in manual rather than automatic, failed to return it to a ready flag and caused a line stop. Yeah, operator did something outside of process and caused an issue, clear cut right? Send them home? That was a symptom, not a cause, the operator ran in manual because the auto cycle time was borderline causing linestops, especially on the material being run. The operator was also using manual as there were some location sensors that had issues with that material and there was incoming quality issues, so running manually, while not standard procedure, was a work around to handle processing issues, we also found that culturally, a lot of the operators did not trust the auto cycles and would often override. The operator was unlucky, if we just put all the “accountability” on them we’d never have started projects to improve reliability at that location and change the automation to flick over that flag the operator forgot about if conditions were met regardless.

    Accountability is important, but it needs to be applied where appropriate, if someone is being negligent or malicious, yeah there’s consequences, but it’s limiting to focus on that only. You can implement what you suggest that the devs get accountability for any failure so they’re “empowered”, but if your culture doesn’t enable them to say no or make them feel comfortable to do so, you’re not doing anything that will actually prevent an issue in the future.

    Besides, I’d almost consider it a PPE control and those are on the bottom of the controls hierarchy with administrative just above it, yes I’m applying oh&s to software because risk is risk conceptually, automated tests, multi phase approvals etc. All of those are better controls than relying on a single developer saying no.


  • Supposed to be an easy, if not a drop in replacement afaik, it’s under a permissive licence (Apache 2.0), beyond that it’s authored by RedHat I can’t tell you much else, it’s something I’ve been considering moving to personally (and work, pretty much for licencing and the few of us that want to use more open tech stacks) I just haven’t had a chance to work with it.

    Supposedly able to pull docker images and work with docker-compose, just not swarm.







  • Interestingly, bing of all things turns up better results than Google with the same search terms, first 3 blocks are “popular results”, first is tutorial sites, second is w3 schools and third takes you to the current docs for functions and operators.

    If you ignore those, the fourth result takes you to the current docs for comparison functions and operators. I’d prefer it taking you right to the official docs on the first result, but comparatively acceptable. It was memed to death but I’ve seriously found it more useful than Google these days, comparable to ddg’s results.





  • 33, Canada, yes I can drive standard.

    I learned on a 1990 corolla, my partner can’t drive standard so when we downsized a few years ago we kept her car. I miss it for the around town trips but I’m not upset to drive automatic when I have to (rarely) commute. I also really liked it for winter driving, definitely felt more in control, that little corolla could plow through snow and ice like it was nothing.

    Pretty much everyone on my side of the family learned on standard, drove for a while, but now more or less has a vehicle with automatic.