No, not those kind of trees. I’m talking about trees a-la computer science trees. Yeah, those. The ones that consist of nodes and paths, and help to store, sort, and process data. The same ones that enable Facebook’s Graph Search.
Have you ever tried to solve this problem? It’s not a trivial one to solve. But the results are pretty important, at least in what I do.
Here’s the problem: given two trees of arbitrary complexity, identify a similarness quotient of the two trees, where 0 indicates that they’re completely different and 1 indicates that they’re exactly the same.
My trees are basically hierarchical data sets that represent the process classification framework, a hierarchical functional decomposition of business processes. I need to compare these trees because each release of the PCF is slightly different than the previous one, and I need an easy way to identify these differences.
Right now, comparing these trees is a manual process involving a lot of copying and pasting, and some Excel wizardry.
The algorithm I’m envisioning is one that will be used recursively. It will operate on nodes of arbitrary complexity (i.e. the node may carry arbitrary information, and the algorithm will consider it all in making the comparisons). It will operate on trees of arbitrary topology and depth. The similarity quotient will depend on how many of the node attributes are the same across the various nodes and how much the topology is the same across the two trees.
I can’t seem to find any relatively simple examples of this work, but I’m open minded. Have you seen such an algorithm or can you point me in the right direction?