A library for fast interaction search in high-dimensional data.
The xyz package implements the xyz algorithm for fast interaction search in high-dimensional data.
Given a data matrix \( X \in \mathbb{R}^{n\times p} \) and response vector \( Y \in \mathbb{R}^n \). As a simple example we want to fit the following model:
The problem is we don’t know the exact interaction pair \( (j,k) \) . If we would loop through all possible pairs, we get a quadratic runtime \(\mathcal{O}(np^2)\). The xyz algorithm provably returns the correct interaction pair in subquadratic time, that is \( \mathcal{O}(np^{\alpha}) \) with \( \alpha < 2 \). More elaborate models can be considered, see package vignette for details.
You can either install the package directly from CRAN (using install.packages("xyz")
) or you can get
the most up to date version by installing from github:
We generate the model \( Y_i=2 X_{i1}X_{i2}+\varepsilon_i \), where \(X \in \{-1,1\}^{n \times p} \) and \( \varepsilon_i \) is iid Gaussian noise.
For a practical verification of the subquadratic runtime see the plot below.
It depicts three runs of xyz on a data set with different interaction strengths. Green is the weakest interaction and purple the strongest. Recorded is the time taking to discover the correct interaction.
The package also implements interaction search for high-dimensional regression (see the function xyz_regression
).