Reinventing The Wheel

July 6, 2023 - 5 min read

One of the very first principles you learn as a new developer is DRY, or “don’t repeat yourself”. With the vast landscape of libraries many programming languages have to offer, it is easy to not only not repeat yourself but also not to repeat anyone. In this article, I’d like to discuss the topic of when it is acceptable to deviate from this rule and “reinvent the wheel”.

I recently started work on Collective, a platform that merges GitHub’s proven and very scalable collaboration workflow with a user experience designed for non-technical people. Given that, as of now, Collective’s user flow is very similar to GitHub, the initial thought was, of course, also to use Git under the hood to take care of all the heavy lifting. But due to various factors, I, in the end, decided against using Git and in favor of rebuilding the core features of Git. Let me explain why and how this might be relevant to you.

With every problem you don’t solve yourself, it’s always worth considering what tasks you’re actually outsourcing and what’s still left for you to do. Only then can you make an informed decision on the age-old question of DIY or buy. Since almost nothing is ever perfect, another question to consider is what new responsibilities or downsides you are getting alongside the solution to the original problem. While financial cost is the most common metric, engineering effort and the risk of error-prone and hard-to-debug code should also be kept in mind.

Know Your Problem

Initially, using Git as a foundation for Collective seemed like a great option since Git already has all the required algorithms implemented. And using a library like simple-git also rightfully promises to make using Git through Node.js a seamless experience. The reality, however, is that Git is a monster. It is a fully-fledged distributed version control system, capable of handling many files inside a single repo and designed to make working offline possible. With Collective, there is no need for multiple files or an offline workflow. Choosing the right size tool can make maintaining a project significantly easier. Very often, tools meant for large-scale operations also bring more constraints with them. Choosing a technology too big for your problem means having to deal with additional overhead down the line.

A classic example of this is databases. Key-value stores like basic Redis are super easy to get started with, but you will run into limitations with advanced filtering of entries. Document databases such as MongoDB solve this, but unlike relational databases, they don’t offer strong guarantees for the data format. If you choose a too-small (in the sense of style) database, you will run into limitations. But going too big, and you will have to deal with operational overhead, such as database schema migrations, while only getting a glimpse of the advantages the large-scale solution has to offer.

Coming back to the example of Collective and Git: Given that Git commands require the repository data to be stored in a specific file structure format, this would mean needing to have a fully-fledged Git repository on disk for every document and interacting with that repository through a wrapped command line interface. Depending on the hosting setup, this could prove to be very challenging, especially at scale.

Cost of DIY

Once you’ve evaluated the actual requirements of your project, the next step is to figure out what the cost of DIYing exactly what you need would be. Engineering time should be the best measure in most cases. It’s important not to fall for the trap of thinking you have to recreate the existing solution completely, but rather the key is to take only the subset of features that you actually require into account. For many open-source projects, a lot of the complexity comes from trying to cater to as many use cases as possible and therefore making the code as flexible as possible. In the case of SaaS solutions, they usually also aim for flexibility, but it’s more hidden. And again, know your problem: There is no point in building a configuration UI just because the existing solution has one when you can perfectly get away with some hard-coded values.

For Collective, this process meant cutting the set of required features down to only single file versioning with support for diffing and merging. And given that these are generally solved problems, for which you can even easily find pre-made code implementations, the cost of DIY turned out to be rather low.

Future Flexibility

As your project and list of requirements grow, you will inevitably grow out of some technologies - take the example of databases from earlier. The issue to keep in mind here is that the more specific a solution is and the less control you have over it, the bigger the risk you will grow out of it. A document database is already very generic and universally usable, and in many cases, it is a perfectly sufficient solution for very long. But with something as specific as Git, with its focus on software development, there is a non-negligible risk that Collective will outgrow it. On the other hand, if you built the tools yourself, it doesn’t matter how specific they are. Due to how your DIY solution is ideally very focused, it should be comparatively trivial to adapt to the updated requirements. Especially compared to forking and tweaking a general-purpose open-source solution.

Conclusion

Know what your actual requirements are. Evaluate the cost of building the solution yourself. Should the cost of DIYing a perfect fit turn out to be smaller or only slightly above the buy option when also considering the cost of integrating the latter into your setup, then I believe there is a strong case in favor of DIY over buy. Especially if you expect your requirements to change in the foreseeable future.