Sopping Wet — Today’s Software Ecosystem Isn’t DRY [and nobody seems to understand or care]

Tl; Dr:

  • Everyone seems to understand DRY is good at the program level, but they don’t seem to understand it at the community level.
  • Examples of useless duplication include many programming languages, libraries, package managers, data-stores, tools
  • This community duplication reduces interoperability and slows productivity across the board

Section 1: Some examples

1. Why is there more than one unix/linux package manager? Do we really need a different package manager with the same commands but renamed for each programming language? Do we really need a distinct package manager for each distro? Rhetorical question — No. We don’t.

2. Nobody seems to admit it, but Php, Ruby, Python, and Javascript are the same language, with a little sugar added here or there and different libraries.  I’m okay if not everybody wants to use curly braces but would rather indent for typing, but I’m not okay with every library for every functionality (date parsing, database connectivity, html parsing, regex, etc) being rewritten as a distinct library for every language when those languages have almost no significant differences.

This leads to a scenario where “learning a language” is more about learning the library than anything else (e.g. “How do timezones work again in PHP?”)

3. MongoDB never should have existedMongoDB should be a storage engine. The concept of a datastore that adapts its schema on-the-fly and drops relations for speed is okay, but there’s no reason the entire data-storage technology has to be reinvented to allow this. There’s no reason the entire query syntax has to be reinvented. There’s no reason the security policy has to be reinvented and all the DB drivers. There’s no reason all the tools to get visibility (sql pro) and backup the database need to be reinvented. Plus, if it were just a storage engine, migrating tables to InnoDB would be easier.

The same point holds for cassandra (which is basically mysql with sharding built in), elastic search, and even kafka (basically just WAL of mysql without columns). For example, a kafka topic could be seen as a table with the columns: offset, value. Remember storage engines can process different variations on SQL to handle any special functionality or performance characteristics as-needed.

4. Overly-specialized technologies should not exist (unless built directly around a general technology). You ever see a fancy dinner-set, where for “convenience” people are offered 5 forks and spoons, each one meant to be used slightly differently for a slightly different task? That’s how I feel about overly-specialized technologies. For example, people seem to love job queues. All job queues should be built on top of a SQL backend so that engineers get the normal benefits

  1. engineers know how to diagnose the system if it fails because it’s a common one (e.g. performance issues, permissions)
  2. engineers can always query the system to see what’s happening because it’s using a standardized query language
  3. engineers can modify the system if necessary because it provides visibility into its workings
  4. engineers can use existing backup, replication, and other technologies to store/distribute the queue (giving interoperability)

Section 2: What’s the result of all this?

  • Senior Engineers are all set back years relative to junior ones (which is bad for senior engineers, good for junior engineers)
  • The ecosystem is set back as a whole (all tools, libraries that interact with the old technology are rebuilt for the new one)
  • The company is placed in a precarious position because it now only has junior engineers in the given technology. Did I tell you that time the place I worked accidentally lost most of their customers phone numbers, because their PHP driver for mongo would convert numeric strings to numbers, and phone numbers would overflow the default integer, resulting in no fatal errors but simply negative phone numbers?
  • The company runs the risk of being saddled with a technology that will be dropped (e.g. couchdb, backbone) and will require a rewrite back to a standard technology or be perceived as behind-the-times.
  • Slow-learning / part-time engineers must keep pace with the changing landscape or face irrelevance. Those that can’t learn 10 technologies a year (a storage technology, a build tool, a package manager, a scripting language, data-monitoring tool, 2 infrastructure tools,  5 libraries, etc) will stumble.
  • Fast paced-engineers will lose half of their learning capacity on trivialities and gotchas of each technology’s idiosyncrasies (e.g. why can’t apache configs and nginx configs bare any resemblance to each other?). Once these technologies are phased out, all of that memorization is for naught. It’s a treadmill effect – engineers have to sprint (keep learning new technologies) to move forward at all, walk just to stay in place, and if you can’t keep pace with the treadmill you get thrown off the machine.

 

Section 3: The exceptions

There are a few exceptions I can think of when a complete rebuild from scratch was an improvement. One would be Git. In a few months, one of the most prominent software geniuses of our era invented a source-control system so superior to everything else that it has been adopted universally in a few years, despite the intimidating interface.

The times a rebuild is justified seem to be when many of these criteria apply:

  • You’re a known and well-respected name that people trust so much the community might standardize on what you make (e.g. Linus Torvalds, Google)
  • The existing systems are all awful in fundamental ways, not simple in easily-patchable ways. You’ve got the ability, time [and we’re talking at least a decade of support], money to dedicate yourself to this project (git, aws, gmail, jquery in 2006)
  • You can make your system backward compatible (e.g. C++ allows C, C allows assembler, Scala allows Java, many game systems and storage devices can read previous-generation media) and thus can reuse existing knowledge, libraries, and tools
  • You’re so smart and not-average that your system isn’t going to have the myriad of unanticipated flaws that most software systems you want to replace will. For example, angular, backbone, nosql, are all community fails. I theorize Go, Clojure, Haskell, Ruby, and several other high-buzz languages will evaporate.
  • Your system is already built-in-to or easily-integrated-with existing systems (e.g. JSON being interpretable in all browsers automatically, moving your service to the web where it will work cross-platform and be accessible without installation)

Section 4: What can one do?

  1. Learn the technologies that have stood the test of time: linux cli, c++/java, javascript, sql
  2. Wait years before adopting a technology in professional use for a major use-case– let other companies be the guinea pig
  3. Roll your eyes the next time somebody tells you about a new sexy technology. For whatever reason, it’s culturally “cool” to know about the “next big thing,” but professionals need to rise above such fads
  4. Next time you have a brilliant idea, instead of thinking “How great it would be if the entire dev ecosystem adapted itself to use my invention” think “Is there any open-source project out there that can be minimally adapted to accomplish my goal?”

One thought on “Sopping Wet — Today’s Software Ecosystem Isn’t DRY [and nobody seems to understand or care]

  1. I have been programming for 25 years and I can’t agree more. I feel that my skills are being hobbled by just the issues your raise and am exhausted by ‘trying to keep up’. Gradually, I am drawn to tools that resist change better e.g. C and SQL

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.