Understanding V-Structures and the Role They Play in Causal Validation and Causal Inference

Author:Murphy | View: 20324 | Time: 2025-03-22 22:22:02

Introduction

Causal Inference is an emerging field within machine learning that can move beyond predicting what could happen to explaining why it will happen and it doing so offers the promise of permanently resolving the underlying problem rather than dealing with the potential fallout.

Solving causal inference problems requires a visualisation of the cause-and-effect factors in a "Directed Acyclic Graph" or DAG which is typically developed by domain experts who have built up an informed view of the causality in a system or process.

A challenge with this approach is that the views of the domain experts can be flawed or biased and without an accurate DAG the results and outputs of causal models will be inaccurate and hence ineffective and the process of ensuring the DAG accurately represents the causality is called causal validation.

One specific problem within causal validation is detecting the direction of causality between two variables. For example it could be that studying for a management qualification "causes" promotion but it could be that newly promoted managers start a qualification to help develop their skills.

In the real world establishing the timing or sequencing of events can help. For example if 90% of staff studied first and got promoted second the causality would become clear but if all we had were the historical data indicating a correlation the direction of causality may be unclear.

The Problem

Verifying the direction of causal links is difficult and on the face of it looks impossible to do.

The Opportunity

If an algorithm existed that could verify the direction of causal links it would add significant value by improving the accuracy of the DAG and hence providing confidence in the predictions of causal models.

The Way Forward

A specific type of junction within a DAG called a v-structure can be used to indicate connections that exist in both the DAG and that can be used to suggest where the arrow is in the wrong direction and should be reversed in the DAG to correct a mistake.

Getting Started

Choosing a Directed Acyclic Graph

Let's start by choosing an entirely fictitious DAG that will be used throughout the examples in the article. It is one of my favourites to work with because it is simple enough to use in testing but complex enough to contain all the variations that will be found in more complex examples –

The Example DAG Used Throughout the Article, Image by Author

X is the "treatment" or cause and Y is the "outcome" or effect and the objective of causal inference is to ascertain the true and isolated effect of the treatment on the outcome independent of the effects of all the other variables.

In the real world X might represent taking a new drug, W might be the effect of the drug on blood pressure and Y might be the improvement in patient outcomes but for the purposes of the example I have just chosen letters.

Generating Test Data

In a real world example we would start with a set of data and the domain experts would use their expertise to produce a candidate DAG but in tests and examples the opposite is true.

Firstly a DAG is selected based on its suitability to illustrate the examples and then a set of data is generated that fits the DAGs by randomly selecting weightings between each node and then creating data based on those weightings.

Here are the weightings selected to generate the data –

Synthetic Data Weightings for the Test Data, Image by Author

That gives rise to the following set of structural equations, one equation for each node –

Structural Equations for the Test Data, Image by Author

To complete the explanation all that is required is to understand the distinction between exogenous nodes / variables and endogenous variables / nodes. Exogenous nodes have no incoming causal arrows so in the example DAG the exogenous variables are Z1 and Z2 and the endogenous variables are X, W, Y and Z3. Exogenous variables must be assigned values randomly, usually by following a rule on a distribution.

The 6 structural equations (one for each node on the DAG) can therefore be fully explained and understood as follows ..

Z1 is an exogenous variable (i.e. it has no inputs) that is normally distributed with a mean of 4.75 and a standard deviation of 1.72
Z2 is an exogenous variable that is normally distributed with a mean of 3.29 and a standard deviation of 1.88
Z3 = 3 X Z1 + 1.5 x Z2 + an error term
X = 2 x Z1 + 2.5 x Z3 + an error term
W = 3 x X + an error term
Y = 2 x W + 2 x Z2 + 3 x Z3 + an error term

Here is a preview of the synthetically generated data …

The Synthetic Test Data Set, Image by Author

The Apparent Impossibility of Detecting the Direction of Causality

In a recent article I explored and explained the concept of paths through the DAG that are made up of junctions and if you are unfamiliar with these concepts this is essential reading to enable a full understanding of v-structures …

Understanding Junctions (Chains, Forks and Colliders) and the Role they Play in Causal Inference

That discussion included an exploration of "fork" junctions and how messages get through from the start to the end node. Consider the junction highlighted in the DAG below from X to Y through Z3 …

The Fork Junction X <- Z3 -> Y, Image by Author

Clearly "messages" can flow from Z3 to Y (i.e. a change in Z3 will produce a change in Y) because Y = 3 x Z3 but the article on junctions explained that messages can also flow from X to Z3.

This is because if X = 2.5 x Z3 it follows that Z3 = 1 / 2.5 x X because dividing each side of the equation by 2.5 will resolve it for X.

So if the relationship or multiplier between variables V1 and V2 is simply the inverse of the multiplier between V2 and V1 surely it is impossible to work out the direction if all we have is the data which always comes before the DAG in any real world causal problem?

This was my conclusion for a very long time. I had managed to work out some validation rules for identifying missing links and spurious links but proving directionality of a link to identify edges in the DAG which are in the wrong direction seemed impossible.

There are glimmers of a solution in the literature using v-structures but they are complex and always incomplete.

The remainder of this article aims to explore and combine those partial solutions to the point where an algorithm can be developed in Python to detect directional links in a DAG that are pointing in the wrong direction when compared to the dataset the DAG is representing.

Identifying Colliders and V-Structures in the Data (Without Referring to the DAG)

The article on junctions explained that colliders are just junctions where the start and end nodes both point to the intermediary node so it is easy to identify all of the colliders in the example DAG …

All Colliders in the Example DAG, Image by Author

Colliders have a special property in that in theory they can be detected in the data by carrying out an independence test and if a collider is found in the data that matches where the DAG says it should be then we have proved that part of the DAG is correct.

This idea can be expressed in a statistical expression as follows …

Statistical Expressions for Equivalent Independence between the DAG and Data, Image by Author

This expression is looking at the final collider above and stating the following

Expression 1 states that if Z1 is independent of Z2 in the graph / DAG (indicated by the G subscript) this implies that Z1 is also independent of Z2 in the data (indicated by the D subscript).
Expression 2 states that if Z2 is independent of Z1 in the graph / DAG this implies that Z2 is also independent of Z1 in the data.

It is important that both expressions ⓵ and ⓶ must be true because a collider must be symmetrical i.e. messages cannot pass from the start node to the end node AND messages cannot pass from the end node to the start node and if this condition can be detected in the data then in theory a collider has been identified.

The following python code carries out the dependence check and the full implementation details of the .dependence() Dataframe extension method can be found in this article …

Demystifying Dependence and Why it is Important in Causal Inference and Causal Validation

<script src="https://gist.github.com/grahamharrison68/fbc2b9dd0094878bee6cd58f167edbaf.js"></script>
{'Z1': 'treatment', 'Z3': 'collider', 'Z2': 'outcome'}
v-structure found in data: True

This is great news! Our DAG indicates that there should be a collider junction at Z1 -> Z3 <- Z2 and a simple Python code snippet has used a DataFrame extension method to prove this is the case!

However, this is not the end of the quest for detecting colliders in data, it is just the beginning.

In theory we should be able to identify all the colliders that appear in the DAG by applying a dependency test to the data but this is not always the case.

Consider the following simple DAG that contains a collider but where the start and end nodes have an additional connection …

A Collider Junction with Adjacency, Image by Author

It should be intuitively obvious that Y cannot be independent of X even though there is a collider between them because "messages" can flow directly from X to Y and this intuitive conclusion can be proven by re-running the code above …

{'X': 'treatment', 'Z': 'collider', 'Y': 'outcome'}
collider found in data: False

In the instance where the start and end nodes of a collider are connected in either direction the existence of the collider can neither be proven nor disproven by dependency testing the data and this gives rise to the concepts of adjacency and v-structures.

A collider exhibits adjacency if a direct connection exists between the start and end nodes and a junction is classified as a v-structure if it is a collider that is non-adjacent i.e. not connected.

Reviewing our example DAG 2 of the 5 colliders are "adjacent" and hence cannot be reliably detected in the data …

All Adjacent Colliders in the Example DAG, Image by Author

This leaves 3 colliders that do not exhibit adjacency (which we now know are termed v-structures) that in theory can be detected in the data using dependency tests …

All V-Structures in the Example DAG, Image by Author

It has already been shown in the Python code above that the 1st v-structure Z1 -> Z3 <- Z2 can be identified in the data so what about the other two?

The table below is the output of a test harness that applies the symmetrical dependency test to all junctions in the example DAG …

Independence / V-Structure Detection Testing Results, Image by Author

This test shows that the only v-structure the data dependency test can reliably detect is Z1 -> Z3 <- Z2 and whilst we would have expected the test to fail for the 2 colliders that are adjacent (and for the chains and forks) the expected result from this test would have been to identify the 3 v-structures.

To understand this result requires a closer investigation of the 2 v-structures that were not correctly identified in the data and the realisation that although they are not directly connected (i.e. they are not adjacent), they are indirectly connected through open backdoor paths.

Let's consider the collider at W -> Y <- Z3 and review the backdoor paths between the start node W and the end node Z3 …

Back-door Paths in the W -> Y <- Z3 Junction Image by Author

W -> Y <- Z3 is a collider and is also a v-structure because is is not adjacent (i.e. directly connected) but 2 backdoor paths exist between the the start and end node. Hence "messages" can leak between those nodes making it impossible (or at least inconsistent and unreliable) to detect this v-structure in the data.

Note that W -> Y <- Z2 suffers from exactly the same problem with a backdoor path from W <- X <- Z3 -< Z2.

At this point in my collider, v-structure and causal validation journey I almost gave up because if v-structures cannot be detected in the data then no reliable validation test can be constructed.

But there is a solution to this conundrum that produces an algorithm whose accuracy and reliability are sufficiently hight to make it useable and useful …

An Algorithm Definition for Detecting the Direction of Causal Links

Given the following definition of a v-structure …

A v-structure is a collider that is not adjacent (i.e. the start and end nodes are not connected in either direction)

… an algorithm of identifying links that exist in the data, but that have been reversed (i.e. are shown in the wrong direction) in the DAG can be defined as follows …

Iterate around all the edges in the DAG and for each edge …
Create a new DAG by reversing the current edge.
If a v-structure has been destroyed and that v-structure does not exist in the data then the current DAG is wrong and the edge needs reversing.3.
If a v-structure has been created and that v-structure exists in the data then the current DAG is wrong and the edge needs reversing.

Step 1: Iterating over all the edges

The first stage of the algorithm is going to iterate over all of the edges in the DAG which can be visualised as follows …

All Single Edges to Test for Reversal in the Example DAG, Image by Author

Step 2: Create a New DAG by Reversing the Current Edge

Assuming we are on the first iteration (edge Z2 -> Z3) reversing that edge results in 2 DAGs – the original and the variation created by the reversal …

The Old DAG and New DAGs Highlighting the V-Structure Destroyed by Reversing Z2 -> Z3, Image by Author

Step 3: If a V-Structure has been Destroyed …

The act of reversing Z2 -> Z3 has destroyed (or removed) a v-structure that exists in the original DAG because the v-structure Z1 -> Z3 <- Z2 has been replaced by a chain Z1 -> Z3 -> Z2.

The destroyed v-structure Z1 -> Z3 <- Z2 can now be tested in the data using a dependency test –

If that v-structure is found in the data then the directionality of Z2 -> Z3 is correct in the DAG.
If that v-structure is not found in the data then the directionality of Z2 -> Z3 is incorrect in the DAG and needs reversing.

Here is the test implemented in Python …

v-structure found in data: True

The collider Z1 -> Z3 <-Z2 has been identified in the data. Therefore the proposed reversal of Z2 ->Z3 is wrong. The algorithm will now proceed to the next step …

Step 4: If a V-Structure has been Created …

The first iteration i.e. reversing the edge Z2 -> Z3 does not create any new v-structures so we will fast-forward to the second iteration where the edge X -> W is reversed …

The Two V-Structures Created by Reversing Z2 -> Z3, and Image by Author

The act of reversing X -> W has created two v-structures in the new DAG …

The chain Z1 -> Z -> W has been replaced by the v-structure Z1 -> X <- W
The chain Z3 -> X -> W has been replaced by the v-structure Z3 -> X <- W

These new v-structures can now be tested and if they are identified in the data then the existing edge X -> W is in the wrong direction in the DAG and needs to be corrected by changing it to X <- W.

v-structure found in data: False

v-structure found in data: False

V-structures were not found in the data providing a proof that the existing DAG is correct in respect of the directionality of X -> W and this is the expected result.

Completing the Iterations

We know that the data matches the DAG as it was synthetically created to match it and that means that every edge reversal tested should confirm the DAG is correct.

The two edges that have been fully worked through have proven this so far so does this mean that the algorithm is working and we have reached the goal of proving directionality?

The Major Problem with the Created / Destroyed V-Structure Algorithm

To ascertain if that goal has been reached requires the algorithm to be executed against every edge testing the reversal of each using the created / destroyed v-structures rules.

Unfortunately when the algorithm runs over all the edges it does not work as expected. When it gets to edge W -> Y it mistakenly decides that the directionality is wrong and that is should be reversed …

The first step i.e. considering created v-structures works as expected …

The V-Structure Created by Reversing W -> Y, Image by Author

This new v-structure is not detected in the data so this step of the algorithm correctly predicts that the data matches the current DAG.

The next step of the algorithm ascertains that reversing W -> Y also destroys two v-structures – W -> Y <- Z2 and W -> Y <- Z3 as follows …

The Two V-Structures Destroyed by Revesing W -> Y, Image by Author

When the dependency tests are executed on the data for these two destroyed v-structures the actual results do not match the expected results …

v-structure found in data: False

v-structure found in data: False

A Proposed Solution to the Problem

In both cases the v-structures are known to exist in the data but the dependency test fails to find them which causes the algorithm to conclude that the proposed reversal of W -> Y should stand and that the DAG is currently incorrect in this respect.

A closer inspection of the results shows that in both cases the independence was discovered in one direction, but not the other, and it has been shown that v-structures must be symmetrical, hence the test fails.

The underlying reason for the test failure has already been explored and explained in the section above i.e. that backdoor paths exist between the start and end nodes and hence messages can leak through causing the independence test to fail.

When I first arrived at this conclusion I almost quit in my attempt to design and develop a usable algorithm for detecting reversed links in Directed Acyclic Graphs and it does seem impossible at this point.

The algorithm as currently defined will fail to prove that a DAG is valid (i.e. matches the data) where a destroyed v-structure has backdoor paths and if a DAG cannot be validated surely the algorithm is useless.

I toyed with the idea of changing my definition of "destroyed v-structures" to exclude those where backdoor paths exist and this looked promising for a while but although this works for proving a valid DAG is valid, it fails completely when attempting to prove that an invalid DAG is invalid as it detects an unacceptably large number of false positives.

The answer, or at least the compromise, I settled on was based around reading about causal discovery in the available literature.

If causal validation is difficult, causal discovery is even harder. Causal validation is attempting to validate a DAG against a dataset and if we assume that the domain experts had a good level of expertise then the DAG is likely to be close to being correct and the task is just to identify any errors.

However the idea of Causal Discovery is to start with no DAG and to attempt to reverse engineer it from nothing but the dataset. This is a complex task with large numbers of permutations and one of the strategies described in the literature is not to start with a completely empty DAG but by confirming some edges that the domain experts are confident about so the discovery algorithm has less to do.

In the example DAG the nodes are entirely fictitious but if we assume that X is a new drug, W is the positive change in blood pressure and Y is patient discovery it is unlikely that the direction of these nodes is wrong.

For example could patient recovery be causing the change in blood pressure (unlikely but not impossible) and could the blood pressure be causing the patient to take the drug (very unlikely).

And there are other conclusions we can reasonably draw. Wherever there is a temporal (or time) based aspect where variable B occurs later in time than variable A it cannot be the cause of it.

For example if one datapoint is exercise frequency and another is general health there could be a debate around whether healthier people did more exercise or whether those that exercised more became healthier. However if the timing were recorded with clear evidence that the increase in exercise came before the improvement in health the directionality is proven.

Furthermore there are inferences that can be made around the treatment and outcome. Usually the treatment causes events and the effect is being caused by events (though this is not always the case).

Therefore the conclusion is that the issues with an inability to identify v-structures in the data that have backdoor paths can be minimised by instructing the algorithm about certainty in some of the edges that are assumed to be correct and therefore will not be tested.

By making this modification and by providing a hint to the algorithm that W -> Y is correct the results suddenly look very promising.

A test harness that randomly created 100 datasets matching the example DAG correctly identified that the DAG matched the data in 100% of cases provided that the edge W -> Y was given as a hint before the algorithm started its search.

Additional Testing of the Validation Algorithm

So testing has been carried out against the same example DAG which could contain some unusual features causing the algorithm to over or under perform. Here is a the definition of an algorithm to widen the test results …

Select a DAG to test.
Generate a synthetic dataset that matches the causal relationships in the data using edge weightings and structural equations.
Execute the validation algorithm with no edge confirmations (i.e. in its raw form).
Go back to step 2 and repeat the random generation of 100 different datasets that all match the DAG.
Add up the total of tests passed i.e. where the algorithm correctly verified that the DAG matches the data.
Show the results as a percentage.

Basic DAGs

The Triangle, Trapezium and E-Shaped DAGs, Image by Author

Additional testing DAGs that I refer to as "the Triangle DAG", "the Trapezium DAG" and "the E-shaped DAG" all pass the additional testing in 100% of test cases with no hinting required.

Complex DAGs

The Complex Endogenous DAG Highlighting False Positives, Image by Author

This more complicated DAG passes 90% of tests cases but in around 10% the Z -> W and Uw -> W edges are incorrectly identified as being in the wrong direction.

However, if these two edges are hinted the test pass rate returns to 100%.

Increasing the DAG Complexity

The Increased Complexity Endogenos DAG Highlighting False Positives, Image by Author

Increasing the complexity by adding additional nodes / variables (Ut and T) and additional paths causes the test results to degrade.

The algorithm correctly ascertains validity in 30% of test cases and across various tests the edges Z -> W, Uw -> W, T -> Y and Uy -> Y are incorrectly identified as requiring reversal.

Again if these 4 edges are hinted 100% of tests are passed.

Maximum DAG Complexity

The Super-Complex Real-World DAG Highlighting False Positives, Image by Author

The final test is the most complex example DAG I could find with 9 nodes / variables, 16 edges, every type of junction and an unobserved confounder (node C).

100% of tests fail for this DAG as all of them identify one of the edges in the diagram above (B -> X, G -> X or E -> X) as requiring reversal.

However if these 3 edges are hinted even a DAG of this complexity will pass 100% of test cases.

Lessons Learned from the Testing

Clearly there is a pattern in the tests. Simple DAGs tend to perform with 100% reliability and as DAGs become more complex they start to identify edges for reversal that are incorrect and need more help through providing hints to the algorithm.

The results show that the algorithm is not perfect but the performance is sufficiently high to make is useful and usable, especially where hints are provided which is perfectly legitimate in the context of DAGs that domain experts have helped create.

So is this the end and can we declare victory in the quest for an algorithm that can detect whether a DAG correctly represents an associated dataset in terms of reversed edges?

Unfortunately not, in fact we are only half way there. What has been achieved so far is an algorithm that detects (with acceptable accuracy) when a DAG matches the data but the bigger challenge is to test the algorithm for the case where the DAG does not match the data.

Detecting Errors in a DAG that do not Accurately Reflect the Data

To illustrate this use case let us review the original example DAG assuming that the domain experts have provided enough confidence about edge W -> Y for it to be hinted to the algorithm but they have mistakenly identified the Z1 -> Z3 edge in the wrong direction.

Here are the test results for the algorithm …

The Test Results for Detecting a Single Error (Reversal) in the DAG that is not in the Data, Image by Author

This is a very promising start. The algorithm has correctly identified that the DAG does not match the data and has also correctly identified exactly which edge is wrong!

It is unsurprising though that the algorithm correctly identified this reversal because Z1 -> Z3 <- Z2 is the single v-structure in the DAG with no backdoor paths hence the algorithm can easily identify it in the data so the "created v-structure" test inside the algorithm is bound to work correctly.

The next stage is to iterate through all the edges in the DAG reversing them in turn and observing the algorithm performance.

Please note that edges Z1 -> X, Z2 -> Y, Z3 -> Y are not tested because each of these reversals would produce a DAG that is not acyclic (i.e. it has loops leading back to the treatment) which by definition is not allowed.

These are the remaining tests noting that: Z1 -> Z3 has already been tested, 3 tests would produce an acyclic DAG and W -> Y is excluded because it is hinted to the algorithm as correct –

Reverse X -> W
Reverse Z2 -> Z3
Reverse Z3 -> X

All Three Valid (i.e. Non-Acyclic) Single Edge Reversal Tests for the Example DAG, Image by Author

The results are mixed. The algorithm correctly identifies that the DAG is not a valid representation of the data and also picks out the edge that has been deliberately reversed in the data in 100% of test cases.

Unfortunately though the algorithm always over identifies i.e. in addition to picking out the edge that should be reversed it also identifies edges that are in fact correct.

It is a similar story with the other test DAGs.

The edge that has been deliberately reversed is found but so are other edges that are valid. The reason for over-identifying is that as soon as a single edge is invalid (i.e. in the wrong direction) the whole DAG is unbalanced in that it does not match the data and the created and destroyed edge tests will mis-fire as they are looking for the wrong things.

So does this mean that after all this effort we still cannot identify which edges are invalid in a DAG?

A Proposal for a "Hybrid Edge Reversal Detection Algorithm"

Before admitting defeat here are some irrefutable facts …

The algorithm is good at identifying when a DAG is valid (noting that in complex DAGs providing hints for high-confidence links may be required).
The algorithm is good at detecting an edge that has been reversed (noting that it may also over-identify and propose that other valid links should be reversed as well).

The solution therefore is to extend the basic algorithm into something I call the hybrid edge reversal detection algorithm which cn be defined as follows –

If the base algorithm passes (i.e. concludes that the DAG matches the data) then stop here.
If the base algorithm fails (i.e. declares that the DAG is invalid) then iterate over every possible permutation of reversed edges and test each permutation to see if the base algorithm passes.
Present every DAG that the algorithm concludes is valid as a possible correction to the DAG that is known to have an error in it.

I have not seen this approach anywhere in the causal literature. It is my own conclusion drawn from many hours of analyzing test results to come up with a solution that, whilst not 100% accurate, produces a result that is clearly usable and useful.

If a data team has a proposed DAG that the experts have produced using their domain knowledge and causal validation shows that the DAG does not match the data and can also propose one or more alternatives that do match then this can inform a discussion that selects one of the valid alternatives and then carries out further testing.

This is an extremely useful result as it will have avoided the potential for an incorrect DAG being used as the input to a causal model which would have potentially invalidated any results and conclusions drawn form the modelling and machine learning.

Evaluating the Hybrid Edge Reversal Detection Algorithm

The long journey from understanding v-structures to creating a usable algorithm for detecting reversed edges in a DAG when compared to the associated dataset is now complete and all that remains is to test the hybrid algorithm to see how it performs.

The Standard Example DAG

The first DAG to test is our old friend that has been used as the mainstay of the article.

The edge W -> Y has been hinted and several edges cannot be tested because reversing them would produce an acyclic graph (Z1 -> X, Z2 -> Y and Z3 -> Y). That leaves 4 edges that can be reversed in the data to test against (X -> W, Z1 -> Z3, Z2 -> Z3 and Z3 -> X).

The results are as follows –

DAG correctly identified as invalid in 4/4 tests (100.0%)
All reversals found in 4/4 tests (100.0%)
False positives found in 1/4 tests (25.0%)

These results are very promising. The reversed edge is correctly identified in 100% of cases, however the algorithm over-identified a false positive in one of the tests.

The next stage is to randomly choose combinations of 2 edges to reverse in the data to see if the algorithm can detect multiple errors in a DAG. Here are the results –

DAG correctly identified as invalid in 8/10 tests (80.0%)
All reversals found in 6/10 tests (70.0%)
False positives found in 7/10 tests (70.0%)

Performance decreases when 2 edges are reversed but the results would still indicate to a data team that their DAG was not correct and would also give them some invaluable pointers as to where the errors were.

Even reversing 3 edges produces a good set of results that are useful and usable …

DAG correctly identified as invalid in 14/15 tests (93.3%)
All reversals found in 7/15 tests (46.7%)
False positives found in 14/15 tests (93.3%)

The Trapezium DAG

The trapezium DAG has 3 single edge reversals (i.e. the others would produce an acyclic DAG) and the test results are as follows –

DAG correctly identified as invalid in 2/3 tests (66.7%)
All reversals found in 2/3 tests (66.7%)
False positives found in 1/3 tests (33.3%)

These results are not as good as for the standard example DAG but the algorithm still correctly identifies an edge in the wrong direction in 66.7% of test cases.

Here are the test results for reversing 2 edges in the trapezium DAG –

DAG correctly identified as invalid in 4/5 tests (80.0%)
All reversals found in 0/5 tests (0.0%)
False positives found in 5/5 tests (100.0%)

The E-Shaped DAG

There are 7 possible single edge reversal tests and the test results are as follows …

DAG correctly identified as invalid in 6/7 tests (85.7%)
All reversals found in 6/7 tests (85.7%)
False positives found in 1/7 tests (14.3%)

This DAG has fewer adjacent and back-door colliders than the previous examples. Hence it would be reasonable to expect better performance and the test results are very good indeed!

The results for detecting 2 edge reversals are also very good …

DAG correctly identified as invalid in 10/10 tests (100.0%)
All reversals found in 8/10 tests (80.0%)
False positives found in 2/10 tests (20.0%)

The Complex Exogenous DAG

The Additional Complexity Exogenous DAG, Image by Author

With the exogenous "inputs" hinted (highlighted in light blue) the test results for a single edge reversal are as follows …

DAG correctly identified as invalid in 6/6 tests (100.0%)
All reversals found in 6/6 tests (100.0%)
False positives found in 0/6 tests (0.0%)

Results for 2 reversals:

DAG correctly identified as invalid in 10/10 tests (100.0%)
All reversals found in 10/10 tests (100.0%)
False positives found in 0/10 tests (0.0%)

And for 3 reversed edges …

DAG correctly identified as invalid in 10/10 tests (100.0%)
All reversals found in 10/10 tests (100.0%)
False positives found in 0/10 tests (0.0%)

The test results for this DAG are remarkable; the algorithm correctly identifies the errors with 100% accuracy if the DAG contains 1, 2 or 3 errors in directionality!

However the speed of execution of the algorithm starts to be an issue with the 3 reversal tests. There will be more on this in the final test …

The Super-Complex Real-World Dag

The super-complex DAG was hinted with B -> X, E-> X and G -> X before testing.

Single edge reversal test results were as follows …

DAG correctly identified as invalid in 2/3 tests (66.7%)
All reversals found in 2/3 tests (66.7%)
False positives found in 1/3 tests (33.3%)

Results for 2 edge reversals:

DAG correctly identified as invalid in 5/5 tests (100.0%)
All reversals found in 1/5 tests (20.0%)
False positives found in 4/5 tests (80.0%)

And for 3 reversed edges …

DAG correctly identified as invalid in 5/6 tests (83.3%)
All reversals found in 1/6 tests (16.7%)
False positives found in 5/6 tests (83.3%)

The results are less promising with this DAG and the speed of the tests was also concerning.

Conclusion – What Have We Learned?

It is difficult to automatically detect the correct direction of the arrows in a proposed DAG that is representing a set of data because if y = 3x then x = 1/3y i.e. there is a co-efficient / slope (3 or 1/3) in both directions so on the face of it detecting the correct direction is impossible.

The DAG is made up of paths between the treatment and outcome and each path is made up of junctions that conform to one of 3 types – chain, fork and collider.

A v-structure is a subset of a collider that is not adjacent and it has been shown that this structure (uniquely) can be identified in the data using dependency tests which provides the key to unlock the problem of detecting directionality, but that back-door paths can produce inconsistent results.

Hence it has been shown that the traditional Pearlean reversed edge detection algorithm is not consistent and reliable. It can correctly identify when a DAG does not match the data and has some success indicating which edge is reversed when just 1 edge is incorrect. However, it will not accurately identify the invalid edges when 2 or more edges are in the wrong direction or when the DAG is complex which at first glance renders it unusable.

The traditional Pearlean algorithm can be improved by using it just to detect an invalid DAG and then extended with a hybrid algorithm to try out all edge reversals to find the minimum number of reversals that produce a valid DAG and results of provide a good indication of exactly where the errors are.

Even this algorithm is not perfect but if the algorithm is given hints (i.e. indications of certainty about some of the edges) the accuracy (i.e. % of the time it accurately identifies all reversed edges) is high enough to validate, challenge and correct a DAG that has been proposed by domain experts.

No algorithm for reversed edge detection can never be perfect because there are multiple DAGs that can genuinely represent the same set of data and also the dependency test which underpins the algorithm is not 100% accurate.

Beyond the accuracy the performance of the hybrid algorithm in terms of processing time degrades exponentially.

A DAG with 5 edges has 2 to the power 5 or 32 combinations of reversals to explore but 16 edges gives 2 to the power 16 or 65,536 possible combinations of reversals. Python is an interpreted programming language and hence slow so carrying out this many combinations can take a long time.

Even with these drawbacks and accepting that detecting reversed edges in a proposed DAG compared to the dataset it is representing can never be absolutely accurate the hybrid algorithm has a usable and useful level of accuracy.

In closing and concluding the hybrid edge reversal detection algorithm is an invaluable tool as part of causal validation which is critical in ensuring that the results of a causal inference model are accurate and can be confidently used to drive organisational impact and outcomes.

Connect and Get in Touch …

If you enjoyed this article please follow me to keep up to date with future articles.

If you have any thoughts or views, especially on causal inference and where this exciting new branch of Data Science is going I would love to hear from you – please leave a message and I will get in touch.

My previous articles can be reviewed by Taking a quick look at my previous articles and this website brings together everything I am doing with my research and causal inference – The Data Blog.

Tags: Causal Inference Data Science Editors Pick Machine Learning Python