Hello all, I would like a reality check regarding our tech stack and if I’m the one with the wrong expectations. I’ll try to formulate the situation as neutral as possible, but forgive me my bias. Here’s the state:
I work at a Series A fintech company in Europe, mainly on the backend. I did Frontend in the past and would consider myself a Fullstack dev, but have focused on the Backend for the last years.
I joined one year ago, the backend codebase was started roughly 9 months before that by a team of 3 devs. The company focused on hiring senior developers with a lot of industry experience in the beginning (and we only did hire 1 junior and a few mids after that), all of the starter devs have 10+ years industry experience. The CTO had leadership experience of 100+ people. Some of the starting devs are running media content about programming, best practices, etc. CTO has Go experience, rest not very much.
The codebase is built in Go following Clean Architecture using a variety of tools, here’s what they are and what they do:
- Goa: Design first specification of APIs, generates types and interfaces. Just used for HTTP at the moment.
- Gorm: On the database end of the stack we us Gorm as the ORM.
- Wire: DI management framework, We specify what provides dependencies and then wire generates files to link them all up to those that need them. No one in the company really understands it, and everyone hates it.
In general we rely a lot on code generation. Clients for the frontend are generated as well. The layers are also separated. Meaning for each entity we have a “model”, as well as a database implementation and one or more things on the API side. This results in mappers that map: database ↔ “model” ↔ API payload/response.
One of my first PRs was to add a field to the database, model and return it in the API. It took me multiple days to do this and the PRs doing things like that are always ~40 files big. Usually you need to actually touch 5-10 files, mappers, etc. and the rest are generated. PR reviews are not pleasant because of that. After 1 year in that codebase this task would take me 2-3 hours today. It took me 30 minutes on backends I worked before. If this additional field would require to add a new (existing) dependency to the code it will take ~1 day, because wire complicates this by a lot and the codebase is already coupled a lot.
Our business logic is captured in usecases that are called by the service that fulfils the HTTP requests. Refactorings / changes in there, that don’t change any inputs are usually quite easy, because you’re just touching the inside logic. But again, if you need to include another package (could be a domain or a service) that needs to be DIed it will take additional hours or days.
Features that are not adding small stuff to existing things, are usually in the realm of weeks because of all the setting up required in Goa, Gorm, wire and the architecture in general.
The first half year I was under the assumption that I just didn’t get the elaborate structure and features that our codebase had. The further I dug I found that even the developers that created the codebase are unhappy with it, and it’s considered to be a mess generally by all other developer as well.
Since I joined we’re pretty constant with the numbers of engineers which are 11-13, plus 2 EMs and 1 Head of Dev, CTO has left half a year ago. One of the engineers has worked with Go previously.
In the year I’ve been working on it, nothing from an architecture perspective changed, besides other developers introducing event sourcing for one model, that makes everything that relies on this model more complicated in my opinion. Also the codebase became even more coupled, just by adding features.
The business is largely unhappy with the speed we can deliver features. I am as well. It is very unsatisfying to take hours or days to deliver very little. We increased the delivery speed in the last weeks and months at the behest of business leadership but using hacks on top of hacks to deliver at an increased (still slow) speed.
I understand startup pressure is always high and that we will go out of business if we don’t deliver, which is used as a justification why we ended up with this messy codebase. But I have a hard time piecing together why this stack was chosen at the first place, because everything is basically custom, you can’t just google solutions. Also we had very experienced senior engineers building this codebase. It would be another thing if the codebase state was terrible after almost 2 years into a startup if it had been build by a squad of university grads. Connecting the dots between “super experienced starting team” and the state we are in, is where I have a hard time. We’re not building tech platforms or AI models, we’re implementing business logic and flows.
A few months ago we started a new product. I proposed (written, laid out over a few pages) to use another stack to build this in, based on TypeScript, because basically everyone has TypeScript experience, and we use it in the frontend. Not only did the engineers that commented not like it - they were afraid. The head of dev was also heavily against it. The main reasons being that we could not use our already developed tools (which don’t work well for us) and the chance of if this also ends up being a horrible codebase, we have two horrible things to maintain. I backed up and proposed to start a new Go service then, this idea was more welcome, but only if we use the same tools like Goa, that we already have - basically adding all the weight again. This was not done in the end because I could not justify taking extra ramp up speed to get this service up.
So we’re basically in the situation that we can’t do anything new, but the existing codebase is not improving meaninigfully. From my perspective the main problem with the codebase currently being that domains are so coupled that everything needs everything, and that we introduced custom tooling, that make our job harder, and also premature abstractions. Fixing the domain things will take a while and is a big endeavour. Improving all the other stuff, and some refactoring to make things simpler, which I’m constantly pushing for, will make it a little better but won’t fix the main problem.
We had several open discussions over the state of our codebase. Everyone working on it hates it. Head of dev says it’s in an okay state. But nothing significant is happening/moving. I try to move as many things as I possibly can, and try to encourage others to do it as well, as this is my job as a senior engineer. Recently small improvements are being made more often, because delivery pressure is a bit lower than usual. I raised this problems a few times already with our dev leadership. In my opinion we need a strong leader with Go experience to lead us forward, but we don’t have that and I’m not there yet. Head of dev promised a few times that things will be addressed, but I’m seeing very little of it.
I’m not very fulfilled by this job currently, but I’m also not completely hating it, so I’d still like to give this half a year to move this into the right direction.
Here are the reality check questions:
- Is this a normal state to be expected in most companies / teams?
- Am I expecting too much? *
- How did we actually end up in this situation, even before I joined, given the experienced team?
- It seems like it’s not a Go specific problem, but a matter of how set it up and the architecture - is that correct?
- I explored what productive frameworks are out there currently, that would be able to replace our custom Go stack. With Next.js, Remix and Laravel out there it seems a lot of the problems and custom solutions we have, would be covered by those, and it seems like productivity and shipping speed would be a lot higher using on of these frameworks. Maybe this is just wish-thinking and reality is also complex with using those?
And a more productive question: Has anyone been in a similar situation - how did you resolve it?
Has anyone been in a similar situation - how did you resolve it?
I have been there and after half a year of lobbying for changes, the only real solution was to change jobs.
This might be the solution in the end if I’m not able to improve it as much as I hope.
I didn’t get do you do integrations besides of your DB.
To me architecture sounds good anyway. I’m not GO dev but it looks like stack is your main problem not architecture itself. Multiple models is great long term approach which makes sure you are not leaking implementation details of your persistence/3rd party services to your client. You have layer of mappers where you can enrich model in optimal way and at the same time you can accommodate whatever client request/payload quirks you may need. Yes it’s sometimes annoying to add one field and pass it across all layers but it’s the price and software development is all about balance and compromises. I worked in projects that didn’t follow this architecture and any kind of change in DB/3rd party were cascading through entire application so end up changing hundreds of files instead of couple of models and few mappers
You may have issues with this architecture if you invoke services directly one from another for complex use cases. This can couple things together. If that’s what’s happening think if can apply Facade. Try using/reusing more trivial and focused services in Facade to make sure that for example Order service does not depend on Client service. Leave it to ClientOrderFacade.
Thanks, I will look into if we can make use of Facades.
Everybody thinks that this time will be different, and we will definitely build an elegant solution from the ground up.
It’s a helpful delusion, because we feel good knowing we won’t have to waste time refactoring later, once we actually understand the problem space.
Actual elegant code earns it’s existence by retaining and enabling thoughtful developers to work from one hack to another, with enough breathing room to try out the fixes they think of.
If I inherited your team tomorrow, our top priority would be fixing anything that causes slowdowns - especially lack of CI/CD and lack of test coverage.
My top priority would be enforcing collaboration patterns over grandstanding (and probably building the case to fire my worst grandstander), while telling every stakeholder that their pet project is on hold while I put the delivery pipeline in order.
Fun times, but also extremely usual in development shops.
Define elegant. I think in this case the first devs built elaborate structures. Maybe seen as elegant in the beginning. But it doesn’t really make sense to introduce abstractions and structures when you don’t know yet where the domain boundaries are, etc.
You just defined it perfectly.
“Elegance” early on ends up being a mistake. The chosen abstractions look really nice, but turn out to be the wrong ones for the unique problem domain.
The first six times I encountered it, I thought the team got unlucky. Then I recognized that it’s a pattern/mistake that pretty much every new formed development team falls into.
Even when I’m on the team, I end up being “old man yells at cloud”, and they add early abstractions anyway. (Which I don’t mind… As they point out, there’s no harm trying to get it right on day one. The harm comes from believing we got it right.)
The benefit I can bring is introducing patterns that support refactoring: regression test automation, strong ruthlessly fast test pipelines, and chat-ops.
deleted by creator
I see the benefit of frameworks in having a doc / “definitive answer” on how to do certain things. This should help align developers, especially if everyone is new to the language used.
Didn’t read it all and I don’t know about Wire but generated code should be in separate commits, making the PR way more readable.
You can always squash them if needed after the code review.To me, generated code should not be committed at all. Again, I know nothing about this stack but code generators can have different behavior on different machines due to versions, flags and even OS. To deliver consistent results they should run in consistent env. It’s build time concern which CI/CD should take care of.
In general it should not be checked in, but as with everything there are exceptions. If you need it to be deterministic and evaluate all changes to the generated code it can be useful; precisely for the reason you site in opposition. A small change in your build environment can change what was generated. If that isn’t diffed against preceding versions I think we could contrive cases where that would be an issue. Seems sufficient to me to caution that there are always exceptions.
This. Unless you generator guarantuees reproducible code generation and your build environment does the same across developer machines and CI, you are opening yourself up for “works on my machine but fails on CI” issues that are hard to debug. Since most developers don’t pay attention to these things (they really should), I would always advise to check generated code into version control.
Totally agree, generated code shouldn’t be checked in 99% of the time. I’d check it in if it’s something like openApi spec file that’s generated and then everything else can use that spec file for generating clients and those don’t get checked in.
Problem is we depend on all of them in the backend directly, except the clients. So we need to generate them locally. We have CI checking that there is no drift in generation, though.
A build tool like bazel would be able to “hide” the generated code. Essentially you never see it in the repo because it’s generated on the fly and cached
Do you know of any tools that refactor my branch to put all files following a certain pattern into it’s own commit?
find -regex pattern -exec git add {}
Might work for you?
To be honest, it doesn’t seem that bad. With clean architecture, you are going to end up with extra types and mappers. I would argue that what you have isn’t coupled, because a change in one place doesn’t have unexpected side effects elsewhere.
I haven’t used Goa or Gorm. Writing SQL by hand gets old quick so I get why you’d use Gorm - just less code to write in the end. I’ve used sqlc as it’s more a library than a framework, and it’s fine, but it can’t fulfill every use case. Goa looks too opinionated for me, on the face of it.
I’ve used wire. It takes some understanding but it’s definitely a lot to understand just to add a dependency. At work we’ve got our own template for doing dependency injection and although I was skeptical at first it strikes a really good balance between being understandable and abstracting away DI. If this is your pain point, I’d consider going back to basics and get rid of the framework.
If you decide to go with a framework like Laravel, Rails or Next.js and build everything around the framework, you will deliver quickly at first, but you won’t have type safety and it particular point it will stop scaling because these frameworks have no consideration for clean architecture. You won’t necessarily be better off.
I only have half as much experience as you, and none with Go specifically, so I can’t give you any good answers but I can say I empathize - the company I work at is also stuck with a legacy monolith that’s still on .net framework and everything is so coupled that it’s impossible to even unit test, less alone deploy the projects separately. Some people aren’t bothered even with the basic principles of code writing and the senior people are just overworked and can’t keep tabs on it even if they wanted to.
The worst part is that the company is mostly either juniors just doing what they are told or older seniors that are stuck in their ways and are afraid of anything new - although as I got older I started to see why that might be the correct approach, not everyone wants to learn and adapt to new tech and it’s a big ask of the upper management to risk it on that. Basically we’re just repeating the same mistakes and wasting time fixing known errors that keep happening and any actual improvement or proper removal of tech debt never happens.
So yeah… I’m starting to believe that “clean good code” only happens either in hobby projects or new startups. Any larger, “stable” codebase of a larger company is going to be an inefficient mess however 🤷♂️
I ran across this today: https://graphite.dev/blog/how-large-prs-slow-down-development
They describe just the problem you are experiencing: change amplification.
Contrary to some comments this is not a sign of good architecture. It may be needed at your company, but if I was betting I’d bet it’s not.
To enable smaller PRs we need to get rid of all the generation artefacts in the PR.
You won the bet. We don’t need that architecture, at leas in my opinion. We’re a startup. We need something to iterate as fast as possible, without craping our code base. Requirements change a lot. The initial folks did a lot of big brain thinking and introduced a lot of things that might be need to be abstracted, and most of them haven’t been used to date and just add complexity.
As a bit of low-hanging fruit, you may be able to reduce the length of the diffs in an MR by marking generated files with
-diff
in a.gitattributes
file. This is at least supported by GitLab (not sure about others): https://git-scm.com/docs/gitattributes#_marking_files_as_binaryLooked into that yesterday. I can mark generated files as generated. It collapses them but doesn’t remove them from the review, which is a bit annoying. Need something like a .diffignore