Help for package sqlm

Type:

Package

Title:

SQL-Backed Linear Regression

Version:

0.1.0

Description:

Fits linear regression models on datasets residing in SQL databases without pulling data into R memory. Computes sufficient statistics inside the database engine via a single aggregation query and solves the normal equations in R.

License:

MIT + file LICENSE

Encoding:

UTF-8

Imports:

dplyr, dbplyr, DBI, glue, purrr, S7, MASS, broom, tibble, stats, utils

Suggests:

testthat (≥ 3.0.0), duckdb, orbital, withr, knitr, rmarkdown, quarto

Config/testthat/edition:

RoxygenNote:

7.3.3

NeedsCompilation:

Packaged:

2026-02-01 02:19:59 UTC; hagan

Author:

Alejandro Hagan [aut, cre]

Maintainer:

Alejandro Hagan <alejandro.hagan@outlook.com>

Depends:

R (≥ 4.1.0)

Repository:

CRAN

Date/Publication:

2026-02-04 17:20:02 UTC

Glance at an lm_sql_result

Description

Extract a single-row tibble of model-level summary statistics from a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
glance(x, ...)

Arguments

x

An 'lm_sql_result' object.

...

Not used.

Details

Returns R-squared, adjusted R-squared, residual standard error, F-statistic and its p-value, model degrees of freedom, log-likelihood, AIC, BIC, number of observations, and residual degrees of freedom.

Value

A single-row tibble with columns 'r.squared', 'adj.r.squared', 'sigma', 'statistic', 'p.value', 'df', 'logLik', 'AIC', 'BIC', 'nobs', and 'df.residual'.

SQL-Backed Linear Regression

Description

Fits a linear regression model using SQL aggregation on a remote database table. The data never leaves the database — only sufficient statistics (sums and cross-products) are returned to R.

Usage

lm_sql(formula, data, tol = 1e-07)

Arguments

formula

A formula object (e.g., price ~ x + cut).

data

A tbl_sql object (from dbplyr).

tol

Tolerance for detecting linear dependency.

Details

The function computes the X^TX and X^Ty matrices entirely inside the database engine via a single SQL aggregation query, then solves the normal equations in R using Cholesky decomposition (falling back to Moore-Penrose pseudoinverse for rank-deficient designs).

Supported formula features:

Numeric and categorical (character/factor) predictors with automatic dummy encoding via 'CASE WHEN'.
Interaction terms ('*' and ':') including numeric × categorical and categorical × categorical cross-products.
Dot expansion ('y ~ .') to all non-response columns.
Transforms: 'I()', 'log()', and 'sqrt()' translated to SQL equivalents ('POWER', 'LN', 'SQRT').
Date and datetime predictors automatically cast to numeric in SQL.
No-intercept models ('y ~ 0 + x').

For grouped data (via [dplyr::group_by()]), a single 'GROUP BY' query is executed and one model per group is returned in a tibble with a 'model' list-column.

NA handling uses listwise deletion: rows with 'NULL' in any model variable are excluded via a 'WHERE ... IS NOT NULL' clause.

Value

An S7 object of class lm_sql_result, or a tibble with a model list-column if the data is grouped.

Result object for SQL-backed Linear Model

Description

An S7 class that stores the complete results of a SQL-backed linear regression fitted by [lm_sql()].

Usage

lm_sql_result(
  coefficients = integer(0),
  std_error = integer(0),
  sigma = numeric(0),
  r_squared = numeric(0),
  adj_r_squared = numeric(0),
  f_statistic = integer(0),
  f_p_value = integer(0),
  logLik = numeric(0),
  AIC = numeric(0),
  BIC = numeric(0),
  nobs = numeric(0),
  df_residual = numeric(0),
  df_model = numeric(0),
  statistic = integer(0),
  p_value = integer(0),
  call = NULL,
  term_expressions = NULL
)

Details

This class is not called directly by users. It is created internally by [lm_sql()] and returned as the model object. It stores fitted coefficients, standard errors, t-statistics, p-values, and model-level summaries (R-squared, F-statistic, AIC, BIC, etc.). The 'term_expressions' property holds named R expressions for each predictor, which are used by the [orbital.lm_sql_result()] method to generate in-database prediction expressions.

Convert an lm_sql_result to an orbital object

Description

Creates an orbital object from a fitted SQL linear model, enabling in-database predictions without pulling data into R.

Usage

orbital.lm_sql_result(x, ..., prefix = ".pred")

Arguments

x

An 'lm_sql_result' object.

...

Not used.

prefix

Column name for predictions. Defaults to '".pred"'.

Details

Builds a single prediction expression by combining the fitted coefficients with the R expressions stored in 'term_expressions'. For categorical predictors, the expression includes 'ifelse()' calls that dbplyr translates to SQL 'CASE WHEN'. The resulting 'orbital_class' object can be used with [orbital::predict()] to get predictions or [orbital::augment()] to append a '.pred' column to a database table.

Value

An 'orbital_class' object.

Print an lm_sql_result

Description

Display a concise summary of a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
print(x, ...)

Arguments

x

An 'lm_sql_result' object.

...

Not used.

Details

Prints the original function call and the named coefficient vector.

Value

Invisibly returns 'x'.

Tidy an lm_sql_result

Description

Extract a tidy tibble of per-term coefficient statistics from a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)

Arguments

x

An 'lm_sql_result' object.

conf.int

Logical. If 'TRUE', include confidence interval columns 'conf.low' and 'conf.high'. Defaults to 'FALSE'.

conf.level

Confidence level for the interval. Defaults to '0.95'.

...

Not used.

Details

Returns one row per model term with the estimate, standard error, t-statistic, and p-value. When 'conf.int = TRUE', confidence intervals are computed using the t-distribution with 'df_residual' degrees of freedom.

Value

A tibble with columns 'term', 'estimate', 'std.error', 'statistic', and 'p.value'. If 'conf.int = TRUE', also 'conf.low' and 'conf.high'.