- There is too little code reuse in software, particularly at the community level
- Examples of useless duplication (not-dry) include many programming languages, package managers, data-stores, tools
- This community duplication means engineers have to learn different interfaces to nearly identical systems and reduces interoperability of tools
Section 1: Some examples
1. Why is there more than one unix/linux package manager? Do we really need a package manager with the same commands but renamed for every programming languages? Do we really need a distinct package manager for each distro? I can’t think of any case where different programs are necessary.
This leads to a scenario where “learning a language” is more about learning the library than anything else (e.g. “How do timezones work again in PHP?”)
3. MongoDB never should have existed. MongoDB should be a storage engine. The concept of a datastore that adapts its schema on-the-fly and drops relations for speed is okay, but there’s no reason the entire database has to be reinvented to allow this. There’s no reason the entire query syntax has to be reinvented. There’s no reason the security policy has to be reinvented and all the DB drivers. There’s no reason all the tools to get visibility (sql pro) and backup the database need to be reinvented. Plus, if it were just a storage engine, migrating tables to InnoDB would be easier.
The same point holds for cassandra (which is basically mysql with sharding built in), elastic search, and even kafka (basically just log part of mysql without columns). For example, a kafka topic could be seen as a table with the columns: offset, value. Remember storage engines can process different variations on SQL to handle any special functionality or performance characteristics as-needed.
4. Overly-specialized technologies should not exist. You ever see a fancy dinner-set, where for “convenience” people are offered 5 forks and spoons, each one meant to be used slightly differently for a slightly different task? That’s how I feel about specialized technologies. For example, people seem to love job queues. All job queues should be built on top of a SQL backend so that engineers get the normal benefits
- engineers know how to diagnose the system if it fails because it’s a common one (e.g. performance issues, permissions)
- engineers can always query the system to see what’s happening because it’s using a standardized query language
- engineers can modify the system if necessary because it provides visibility into its workings
- engineers can use existing backup, replication, and other technologies to store/distribute the queue (giving interoperability)
Section 2: What’s the result of all this?
- Senior Engineers are all set back years relative to junior ones (which is bad for senior engineers, good for junior engineers)
- The ecosystem is set back as a whole (all tools, libraries that interact with the old technology are rebuilt for the new one)
- The company is placed in a precarious position because it now only has junior engineers in the given technology. Did I tell you that time the place I worked accidentally lost most of their customers phone numbers, because their PHP driver for mongo would convert numeric strings to numbers, and phone numbers would overflow the default integer, resulting in no fatal errors but simply negative phone numbers?
- The company runs the risk of being saddled with a technology that will be dropped (e.g. couchdb, backbone) and will require a rewrite back to a standard technology or be perceived as behind-the-times.
- Slow-learning / part-time engineers must keep pace with the changing landscape or face irrelevance. Those that can’t learn 10 technologies a year (a storage technology, a build tool, a package manager, a scripting language, data-monitoring tool, 2 infrastructure tools, 5 libraries, etc) will stumble. Those that can learn say 20 technologies a year spend their time learning things about these technologies how their config files work, gotchas and bugs, how their performance scales, how to understand their logs, but then 75% of this progress gets undone as these particular technologies get phased out. It’s a treadmill effect – engineers have to run (keep learning new technologies) just to stay in place, and if you can’t keep pace with the treadmill you get thrown off the machine.
Section 3: The exceptions
There are a few exceptions I can think of when a complete rebuild from scratch was an improvement. One would be Git. In a few months, one of the most prominent software geniuses of our era invented a source-control system so superior to everything else that it has been adopted universally in a few years, despite the intimidating interface.
The times a rebuild is justified seem to be when many of these criteria apply:
- You’re a known and well-respected name that people trust so much the community might standardize on what you make (e.g. Linus Thorvalds, Google)
- The existing systems are all awful in fundamental ways, not simple in easily-patchable ways (e.g. UI)
- You can make your system backward compatible (C++ allows assembler, Scala allows Java) and thus can reuse existing knowledge and tools
- You’re so smart and not-average that your system isn’t going to have the myriad of unanticipated flaws that most software systems you want to replace will. For example, angular, backbone, nosql, are all community fails.
- Your system is already built-in or easily-integrated with existing systems (e.g. JSON being interpretable in all browsers automatically)
Section 4: What can one do?
- Wait years before adopting a technology in professional use for a major use-case– let other companies beta it for you and file all the bug reports.
- Roll your eyes the next time somebody tells you about a new sexy technology
- Next time you have a brilliant idea, instead of thinking “How great it would be if the entire dev ecosystem adapted itself to use my invention” think “Is there any open-source project out there that can be minimally adapted to reap this same benefit?”