TransferConsensus() constructs a consensus tree that
minimizes the sum of transfer distances to a set of input trees, using a
greedy add-and-prune heuristic. Unlike majority-rule consensus, which
can be highly unresolved when phylogenetic signal is diffuse, the
transfer consensus uses the finer-grained transfer distance to produce
more resolved trees.
TransferDist() computes the transfer dissimilarity
between phylogenetic trees, with scaled and unscaled variants. Supports
all-pairs, cross-pairs, and single-pair computations.
LAP (Jonker–Volgenant linear assignment) and MCI (Mutual
Clustering Information) C++ implementations are now exposed via
inst/include/TreeDist/ headers, allowing downstream
packages to use LinkingTo: TreeDist.
Stack-allocated split buffers replaced with dynamically-sized
vectors, removing a hard dependency on the compile-time
SL_MAX_SPLITS constant. TreeDist now supports trees of any
size permitted by TreeTools.
Large-tree support (requires TreeTools ≥ 2.3.0):
all distance functions now accept trees with up to 32 767 tips
(previously limited to SL_MAX_TIPS, 2048 with TreeTools ≤
2.2.0). The R-level tip-count guard (.CheckMaxTips())
detects the TreeTools version at load time and unlocks the higher
ceiling automatically; no code changes are needed. All integer counters
in the C++ hot paths have been widened from int16 to
split_int (int32) to handle split counts above
32 767 without overflow. Direct lg2[] table accesses have
been replaced with lg2_lookup() fallback helpers so that
trees with more tips than SL_MAX_TIPS are computed
correctly via std::log2 /
std::lgamma.
RobinsonFoulds() now uses a fast C++ batch path for
cross-distance computations (tree list vs tree list), matching the
existing all-pairs batch performance. Previously, cross-distance calls
fell through to per-pair R dispatch (~27× slower per pair); the new path
achieves ~21× speedup on typical inputs (e.g. 50 × 250 trees, 50
tips).MCITree() selects the tree from a posterior sample with
the highest total split information content — a Maximum Clade
Information analogue of the Maximum Clade Credibility tree.Pairwise distance computation has been substantially optimized. Typical speedups over v2.12.0 for tree sets where most splits are shared (MCMC posteriors, bootstrap replicates):
| Metric | 100 × 50 tips | 40 × 200 tips |
|---|---|---|
ClusteringInfoDistance |
~5× | ~12× |
MatchingSplitDistance |
~7× | ~11× |
InfoRobinsonFoulds |
~4× | ~5× |
DifferentPhylogeneticInfo |
~1.3× | ~1.1× |
MatchingSplitInfoDistance |
~1.4× | ~1× |
All pairwise distance functions now use an OpenMP multi-threaded batch path when the package is compiled with OpenMP support, for both all-pairs and cross-pairs (tree1 vs tree2) computations.
The number of OpenMP threads is controlled by
options(mc.cores = N); the default is 1
(single-threaded). Set mc.cores to
parallel::detectCores() or a fixed integer to enable
multi-threading. StartParallel() /
StopParallel() are no longer needed when OpenMP is
available.
Exact split matches between trees are now detected via an O(n log n) sort-and-merge pre-scan, reducing the linear assignment problem to only the unmatched splits. For tree sets with high split overlap, this yields the largest portion of the speedups above.
Internal lookup table for log₂ values shrunk from 32 MB to 16 KB, improving L1 cache locality for information-based distance metrics.
Information content accumulation in
MutualClusteringInfo() rewritten as a branchless
expression, reducing per-split-pair table lookups from 16 to 4 and
eliminating 8 branches.
spi_overlap() (used by
SharedPhylogeneticInfo(),
MatchingSplitInfoDistance(), and
JaccardRobinsonFoulds()) rewritten to use a single-pass
hardware POPCNT approach, replacing the previous four-pass boolean
scan.
Hardware POPCNT instruction now used on x86-64 via inline assembly (requires TreeTools ≥ 2.2).
Internal cost-matrix storage is now pooled across tree pairs within each thread, eliminating per-pair heap allocation overhead.
ClusteringInfoDistance(),
DifferentPhylogeneticInfo(),
MatchingSplitInfoDistance(), and
InfoRobinsonFoulds() now avoid duplicate
as.Splits() conversions and use C++ batch functions for
per-tree entropy/information computation. This reduces R-level overhead
by ~8–17% for typical analyses.
Cross-pairs computations (tree1 vs
tree2 where both are lists) now use the same optimized
batch path as all-pairs computations.
KCVector() reimplemented in C++, giving ~220×
speedup per tree.
All-pairs and cross-pairs KendallColijn() Euclidean
distances now computed in C++ (pair_diff_euclidean(),
vec_diff_euclidean()).
Support larger trees in some functions by updating some functions to use 32-bit integers, per TreeTools v2.1.0.
AHMI() now returns negative values (previously
zeroed in error).
Experimental support for a new method of SPR distance calculation: subject to change or removal.
SpectralEigens() tests.HierarchicalMutualInformation() calculates the
information shared between pairs of hierarchical partition structures
.
Fix bug in calculation of MutualClusteringInfo():
the matching chosen was not always the global optimum, causing distances
to be overestimated in some circumstances (#163).
Fix crash in robinson_foulds_all_pairs() and
RobinsonFoulds(list).
Support larger trees in NNI distance calculations.
Note - this release introduced a bug in the computation of the mutual clustering information / clustering information distance. The globally optimal matching between splits was not always found. This was fixed in v2.11.0.
Ntropy() computes entropy from integer
counts.
C++ optimizations and reformatting:
Require R4.0; discontinue tests against R4.0.
VisualizeMatching() allows more control over output
format, and returns the matching (#124).
DistanceFromMedian(Average = median) allows
calculation of MAD.
SpectralEigens() returns correct eigenvalues
(smallest was overlooked).
SpectralEigens() handles values of nEig
larger than the input.
Anticipate new behaviour of unlist(use.names = TRUE)
in R 4.5.
Islands() allows the identification of islands of
trees.
Internal implementation of path and SPR distances, removing dependency on phangorn (and thus R 4.4).
Add progress bar within .MaxValue()
Documentation improvements.
Fix KCDiameter.multiPhylo() for multiple
trees.
Fix calculation error in StrainCol().
App: Display strain in 3D tree space viewer.
Support for distances between larger trees.
Support unrooted trees in VisualizeMatching() (#103).
Fix bug when comparing a “multiPhylo” object containing a single tree.
Documentation clarification: finding non-matching leaves.
LAPJV().StopParallel() gains quietly argument
to suppress unnecessary messages.
Use “PlotTools” package for spectrum legends.
Minor documentation tweaks.
Support comparison of trees with different tips.
Fix caching errors in MapDist() (#98).
Update tests for compatibility with ape 5.7.
New functions to measure cluster sizes (see ?"cluster-statistics").
KMeansPP() conducts clustering using K-means++,
replacing K-means in app.
New vignette on tree landscape analysis.
New vignette on how to compare tree sets.
PathVector() now treats trees with a root node as
rooted.
Fix plot layout in treespace vignette.
Informative failure when not enough memory for
consensus_info().
Replace throw with stop in
C++.
Correct calculation of trustworthiness and continuity metrics.
Depict strain in minimum spanning trees with
StrainCol() and helper function
MSTSegments().
Update tests for consistency with “TreeTools” v1.7.
Use lighter Rcpp headers.
Support ConsensusInfo(p > 0.5).
Address hypervolume comparison in vignettes.
Support uniform manifold approximation and projection in app.
Speed improvements, using optimizations suggested by Alexis Stamatakis’ Bioinformatics group.
Support for parallel computation via
StartParallel().
Progress bars.
Solaris compatibility.
Modest vignette improvements.
spic/scic abbreviation recognition.
ConsensusInfo() quickly calculates the splitwise
information content of the consensus of a set of trees, after Smith
(forthcoming).
SplitwiseInfo() and ClusteringInfo()
gain a p parameter to reflect the reduced information
content of splits with lower support values, and a sum
parameter to allow return of individual split information
content.
KCDiameter() approximates the diameter of the
Kendall-Colijn metric.
Plot3() (experimental) provides pseudo-3D
plotting.
Project()/ProjectionQuality() re-named
to MapTrees()/MappingQuality().
SpectralClustering() re-named to
SpectralEigens().
Add self-organizing map example to treespace vignette.
Allow the specification of custom vectors in the Kendall–Colijn metric.
Faster all-to-all tree distance calculation.
Diagnose and fix memory leaks, including over-long reported matchings.
Explicitly import shiny/shinyjs functions.
Project() launches ‘shiny’ app for projection and
analysis of tree space.
ProjectionQuality() calculates trustworthiness and
continuity of tree space mappings.
Faster calculation of Robinson–Foulds distance (using algorithm of Day (1985)) and clustering information distance.
New class ClusterTable to allow faster distance
computation with Day (1985) algorithm.
Improve error messages in
CalculateTreeDist().
Improvements to vignettes.
Use package ‘vdiffr’ conditionally.
TreeDistance() and related functions now return a
dist object when computing all distances between all pairs
of trees in a list.
Improve floating-point arithmetic in TreeDistance()
functions.
TreeDistance() now returns a distance (as
documented), rather than a similarity.
Fix rounding error in NNI ‘Li’ upper estimate, and improve NNI performance.
Reduce precision of LAPJV so rounding errors do not result in interminable run times.
Improvements to NNIDist() in light of Fack et
al. (2002).
Add NNIDiameter(): approximate diameter of NNI
distance.
Remove vignette ‘Interpreting tree distances’: duplicates https://ms609.github.io/TreeDistData/articles/09-expected-similarity.html.
Remove redundant data object oneOverlap.
Fix an issue when installing on R 3.x (require C++11 to ensure
declaration of UINT_FAST16_MAX).
Fix memory-handling bug in lapjv().