Mar 9, 2021 5 min read

Digital Maneuver 20200309: Real-time Analytics and Distributed Systems

Thank you for subscribing to the Digital Maneuver newsletter.

If you find this content useful, feel free to forward it to others you think will benefit.

If this was forwarded to you by someone and you're missing messages as they're sent, you can subscribe now.

If you have feedback you'd like to share, feel free to email me at adam@digitalmaneuver.com

Trunk-Based Development

When determining how software development teams will work together on complex projects, the topic of how to manage version control and version control systems is something that needs to be addressed very early on. If you are setting up a verified and approved software development process in private industry, or in public sector where the development and scanning process is part of your Authority to Operate (ATO), the process your developers use for managing their version control system is critical.

Some common approaches that were very en vogue in the mid-2000s, namely GitFlow, are still viable in some cases, but the fact is that those approaches were based around the idea of software versions from the old days when you would put media in a shrink-wrapped box. If you're doing modern CI/CD and deploying regularly from your primary branch (master in git parlance) then you should not be using the GitFlow model. The reason is that the GitFlow model has separate branches, usually referred to as development, testing, and production, that correspond to different environments. Problems abound in this approach and I won't enumerate all of them here. For more details you can check out articles like GitFlow Considered Harmful.

So what to do then?

For modern software development, Trunk-Based Development is absolutely the way to go.

In Trunk-Based development, developers branch off of the main branch to build small features or bugfixes and then that small branch is merged back into master (usually after a code review). By working in these small increments, and ensuring the feature branches are short-lived, you can dramatically reduce the risk and complexity of operations for a software development team.

If you're helping to design or oversee development teams and aren't sure how to structure the process of committing and reviewing code, Trunk-Based development is generally the answer.

Whom the Gods Would Destroy They First Give Real-time Analytics

It's often requested that data or analytics be real-time. Sometimes having real-time data or analytics is listed as a constraint a product or service must satisfy, but it is almost never the case in reality. Real-time anything is almost always a poorly-considered desire because people don't know that having data updated too often leads to suboptimal decisions.

One side effect of humans having access to more frequently updated data is that they make decisions too quickly because of the false confidence that the updated data provides. This can be extremely harmful and can lead to very bad product and business decisions.

Dan McKinley discusses this topic in his post on the subject, but I've found the easiest way to help clients gauge how current their data needs to be is with the following heuristic:

Your data on a topic should never be updated faster than the timescale on which you're willing to change your mind.

Put yourself in the role of a CEO managing a large multinational organization with offices all over the world. What time period of poor financial performance would be required in order for you to decide to close the offices and cease operations in a given country?

Surely you would not cease operations in that country if it didn't make money for 5 minutes. How about 5 months? What about 5 years? In reality, those sorts of decisions are usually taken after a trend spanning multiple quarters or years. For that reason, the frequency of data updates required to make such a decision is on the scale of quarters or years. If you had data that was more current than that you would not add any value to your decision making process and would be more likely to decide prematurely in reaction to variation. People recognize this in many areas of their lives, but don't quite realize that it applies generally.

Your data on a topic should never be updated faster than the timescale on which you're willing to change your mind.

Distributed Systems in One Lesson

Distributed systems are those where you have different systems, usually (hopefully) with different storage layers, and different concepts of time, that all operate independently. Microservices are one (still trendy) example of distributed systems that many people come across.

Building distributed systems is hard and the increased complexity is very rarely worth it unless the size and scale of processing and computation is exceptionally large.

Tim Berglund has a great ~45 minute overview of the landscape, benefits, and difficulties with distributed systems in his video Distributed Systems in One Lesson (YouTube). At the time of the video Tim was working with Confluent, an organization that provides products and services around the Apache Kafka messaging system. Amazon has a version of the same called Kinesis, for example.

The lecture is very accessible even to non-technologists and I recommend it highly, especially if you are involved in making decisions or overseeing projects where distributed systems architecture may or should be involved. For the government folks, any system that must function in a degraded or denied communication environment is automatically distributed system.

Capability IMmaturity Model

You may have heard of the Capability Maturity Model (Integration), which is an appraisal and process improvement training program developed by Carnegie Mellon University. It's often required for a variety of acquisition efforts, particularly for U.S. Government contracts. The CMM(I) has a variety of levels that go from Level 1 (Initial) through Level 5 (optimizing).

Level 1, the least mature, indicates the process or organization at this level lacks documentation, is in a state of dynamic change, and tends to be driven in an ad hoc, uncontrolled, and reactive manner by users or events.

Level 5, the most mature, indicates that the process or organization is being continually improved through incremental and innovative technological changes or improvements.

A lot of you probably feel like you're in a Level 1 situation.

What you might not know is, it gets worse! There's a Capability Immaturity Model.

The Capability Immaturity Model(PDF) was originally published in 1992 and the basic idea is that Level 1 of the CMM isn't sufficiently descriptive of the depth of dysfunction in many software organizations. Therefore, the Capability Immaturity Model also has levels 0 (Negligent), -1 (Obstructive), -2 (Contemptuous), and -3 (Undermining).

I'm particularly fond of -2 (Contemptuous):

The organization’s ineffectiveness has become apparent to the marketplace or the larger organization, which ignores or attempts to neutralize these unfavorable perceptions. Measurements are fudged to make the organization look good. Measures of activity (bugs fixed, lines of code written, hours worked) replace measures of productivity (% functions completed, test success rates). Volatility in specifications and schedules is recast as evidence of organizational “agility.” Certifications on “best processes” are presented as evidence that the organization is performing optimally; poor results are blamed on factors outside the organization's control. The processes chosen typically omit or shortcut essential components of recognized methods (e.g. “6-week Six-Sigma” or “Lean CMM”), which are flexible and can cover both good and bad practices. The organization becomes committed to ineffective processes, leading to a feedback cycle of increasing disorganization.

For many organizations, it's better to start with the levels in the Capability Immaturity Model and see how you can work your way up rather than start with the CMM and get too far ahead of yourself. Starting at a level that doesn't really represent the organization will simply lead to overreach and frustration as continued attempts to change the culture are stymied by ossification and bureaucracy.

For feedback or to provide contributions, you can email me at adam@digitalmaneuver.com