Conway’s Law Is Real

Kellen Evan

2018/10/13

Categories: Technology Writing

Conway’s Law was first articulated in 1967 by computer programmer Melvin Conway. Examining it reveals a technique that can be applied for a radical leap forward in the development of complex programmatic systems:

“organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”

… But it might not be what you think.

Prove it!

In 2011, Harvard business school published a paper: Exploring the Duality between Product and Organizational Architectures: A Test of the “Mirroring” Hypothesis. The collaborative work, conducted by MaCormack, Rusnak and Baldwin, studied what they called “The Mirroring Hypothesis”, aka “Conway’s Law”. The team analyzed software patterns to better understand the relationship between people and the systems they create.

During their analysis they separated organizations into two groups: tightly coupled and loosely coupled.

Table from “A Test of the “Mirroring” Hypothesis”

Tightly Coupled Loosely Coupled
Goals Shared, Explicit Diverse, Implicit
Membership Closed, Contracted Open, Voluntary
Authority Formal, Hierarchical Informal, Meritocratic
Location Centralized, Colocated Decentralized, Distributed
Behaviour Planned, Coordinated Emergent, Independent

What did they learn? A key concept used to frame their findings is Propagation Cost.

They defined it as:

“the percentage of system elements that can be effected, on average, when a change is made to a randomly chosen element.”

A function or file has a propagation cost which indicates the overall percentage of the system that is impacted when it is altered. In other owrds, if you change a file, how many other files are impacted your change. Did your change break a bunch of things? That is a high propopagation cost.

By comparing open source projects to projects that were first private, and then open sourced, the paper established a clear link: tightly coupled organizations build systems with significantly higher propagation cost while loosely coupled organizations build systems with a much lower propagation cost.

The reason seems to be that loosely coupled systems do not share any existing social context, and therefore their authors write more approachable code so that anyone can make alterations to the system with minimal context. In other words: open source authors are aware that their code will be seen by the broad public and, similar to a writer cleaning up their language for a broad or international audience, so the programmer does with their code. It is a function of applied empathy for the reader or the next contributor.

But why does a tight-knit, tightly coupled physical structure produce more tightly coupled programmatic systems? We can imagine you and your peers gathered around one computer to solve a problem and “co-author” a solution. Deep context is presumed inside of the solution as you all think through it together, in the office, over lunch, or wherever you go.

In an close office environment, there is little motivation to factor accessibility into the code because the physical interaction makes the context apparent to more contributors. The primary author can also be relied upon to “be there” if clarity is ever needed. Now repeat this for the development of most of the system and you can see how a “shared social mind” will result in a “contraction” of the programmatic surface area and a binding of the undocumented creative context between those present into code that is more tightly coupled, with a higher propagation cost.

The study, and its implications, provides partial proof of Conway’s Law: systems reflect the communication structures within which they are developed.

A second study, The Influence of Organizational Structure on Software Quality: An Empirical Case Study by Nagappan, Murphy and Basili of Microsoft pushed the idea further. It revealed that Conway’s Law is bidirectional, and that code level failure metrics can also indicate poor organizational metrics:

“…organizational metrics when applied to data … were statistically significant predictors of failure-proneness.”

But what were these “organizational metrics” that they collected? And how did these metrics speak towards software failure proneness?

And does this mean that code can be analyzed to see where organizations need to change?

The following table has been adapted from the article, with the descriptions expanded for clarity and context. Low failure rate is is good, high failure rate is bad.

Organizational Metric Quality Indication
Number of engineers The greater the number of engineers that touched the code, the higher the failure rate.
Number of ex-engineers Teams that lost team members had a decline in knowledge retention and therefore an increase of failure rate.
Edit frequency The more edits to a component, the higher the instability and thus failure rate.
Depth of Master Ownership Components that were “owned” by a “master” – a single person with 75% or more of edits – had lower failure rates.
Percentage of Org contributing to development The lower the overall percentage of the organization that contributed to development, the lower the failure rate.
Level of organizational code ownership The more code contributors outside of the core working group, the higher the failure rate.
Overall Organization Ownership The ratio of “masters” making edits to code compared to the total number of engineers. The better the ratio in favour of “master” contribution, the lower the failure rate.
Organization Intersection Factor For each group that contributed 10% or more to a codebase, the higher the failure rate.

From the quality indications, we can describe an “ideal development team” as one that:

After developing a means to determine what success and failure look like within a system, the teams’ conclusion was thus:

“… organizational measures predict failure-proneness … with significant precision, recall and sensitivity.”

This is fascinating! And it pushes the proof of Conway’s Law even further: organizations can be analyzed to predict quality of programmed work and vice versa. But it is easy to “miss the forest from the trees” when drawing conclusions from other software systems given the near infinite nature of their potential complexity. And social dynamics are so rooted in chaos that one wonders about the repeatability and transferability of these studies to most environments. But this is science and so we emerge with two key take aways and consider them to be true:

1) Systems reflect the communication structures within which they are developed.

2) Organizations can be analyzed to predict the quality of programmed work and vice versa.

Or, in short, we agree that Conway’s Law is true.

The Reverse Conway

Awareness of Conway’s Law can help organizations establish healthy social structures as projects grow. Microservice archicture, for example, offers social scaling as a key benefit of its method: architect your code into modular microservices first, and then the optimal organizational structure will follow. It is marketed as Conway’s Law in reverse. Simpler, tidier systems with healthy communication create simpler, tidier teams with healthier communication, or so it’s suggested.

It is a fair proposal, as when we think back to the “ideal team” a microservice team will be small, have a “master” and have only one working group. But it can still be impacted by churn, make many updates and work across other organizational teams, perhaps even more often.

Yet the approach is still wise from a pure technical perspective, even with the social caveats. Compared to tightly coupled monolithic designs, a loosely-coupled microservice team will proliferate systems which are more approachable and less failure prone. But it cuts the other way in that there will then be a mesh of independent pieces, each vulnerable to the drawbacks of poor inter-dependent communication.

Ten microservice teams staffed by poor communicators will render a poor system in aggregate, no matter how competent the programmers and no matter how “API-like” they behave. It may be better than the chaotic, monolithic alternative, but it only scratches the surface of where you can generate the most improvement.

It is not so simple to reverse architect a technical system around a team to make it healthier. It is also not at all simple to look at a team and re-architect them in hopes of creating an ideal representation of a technical system. There is too much afoot, too much human chaos, to attempt either feat of social engineering. Only if communication is the highest order concern will whatever linkage between human and system reach an optimal state. We cannot escape this reality.

Conway Yourself

Communication is often called a “soft problem". It is rooted in emotion, spirit, and chaos, and that is why we see weak teams try to over-engineer around it. On the surface, this human avoidance feels appropriate, as there is evidence that open source projects operate well without a long-term plan, organizational structure, or any form of social interaction. It also appears as though organizations can “reverse Conway” their systems, using architectural changes to “fix” and “analyze” their people. But both approaches fail, because neither of them will scale towards thriving teams or healthy environments.

Communication is not an engineering problem and it is not a “soft problem". It is a human problem, the hardest problem, and the one that most teams lack the courage to remedy. A team optimizes for Conway’s Law not when it rearranges people around code like furniture, but when it creates an environment where each person is inspired and expected to improve their interpersonal communication instead of trying to tool over it or engineer around it. The technical brilliance of tomorrow will be born from those that learn to minimize ego and enhance their verbal, non-verbal, emotional and spiritual communication.

So how do you do it? Ask your team mates.

>> Home