The connector.databricks package provides a convenient interface for accessing and interacting with Databricks volumes and tables directly from R. This vignette will guide you through the process of connecting to Databricks, retrieving data, and performing various operations using this package.
This package is meant to be used with connector package, which provides a common interface for interacting with various data sources. The connector.databricks package extends the connector package to support Databricks volumes and tables.
You can install the connector.databricks from CRAN package using the following command:
# Install from CRAN
install.packages("connector.databricks")
To get a bug fix or to use a feature from the development version, you can install the development version of connector.databricks from GitHub.
::pak("novonordisk-opensource/connector.databricks") pak
Here is an example of how to connect to databricks and retrieve data:
library(connector.databricks)
# Connect to databricks tables using DBI
<- connector_databricks_table(
con http_path = "path-to-cluster",
catalog = "my_catalog",
schema = "my_schema"
)
# Connect to databricks volume
<- connector_databricks_volume(
con catalog = "my_catalog",
schema = "my_schema",
path = "path-to-file-storage"
)
When connecting to Databricks tables, authentication
to databricks is handled by the odbc::databricks()
driver
and supports general use of personal access tokens and credentials
through Posit Workbench. See also odbc::databricks()
On
more information on how the connection to Databricks is established.
Currently, most package functions rely on brickster
package.
When connecting to Databricks volumes,
authentication is handled using brickster
package. See also
this vignette
on more information how the authentication is handled.
Hopefully in the future whole backend will rely completely only on
brickster
package.
Both types of connections share similar interfaces for reading and writing data. Tables should be used with tabular types of data, while volumes should be used with unstructured data.
Example of how to use the connector object:
# List content
$list_content_cnt()
con
# Write a file
$write_cnt(iris, "iris.rds")
con
# Read a file
$read_cnt("iris.rds") |>
conhead()
# Remove a file
$remove_cnt("file_name.csv") con
Here is an example how it can be used with connector package and configuration YAML file (for more information take a look at the connector package):
# Connect using configuration file
<- connector::connect(
connector config = system.file(
"config",
"example_yaml.yaml",
package = "connector.databricks"
)
)
# List contents in Volume
$volumes$list_content_cnt()
connector
# Get databricks connection object from Tables
$tables$get_conn()
connector
# Write a file
$volumes$write_cnt(iris, "Test/iris.csv")
connector
# Read a file
$tables$read_cnt("example_data") connector
We welcome contributions to the connector.databricks package. If you have any suggestions or find any issues, please open an issue or submit a pull request on GitHub.
This package is licensed under the Apache License.