Workbench-managed Databricks Credentials

Enhanced Advanced

Posit Workbench has a native integration for Databricks that includes support for managed Databricks OAuth credentials and a dedicated pane in the RStudio Pro IDE.

Workbench-managed Databricks credentials allow you to sign into a Databricks workspace from the home page and be immediately granted access to data and compute resources using your existing Databricks identity. Managed credentials avoid the burden (and risk) of managing Databricks personal access tokens (PATs) yourself, work with the Databricks CLI, most official packages and SDKs, and database drivers for Python and R. Anything that implements the Databricks client unified authentication standard picks up the ambient credentials supplied by Workbench.

Workbench-managed Databricks credentials are refreshed automatically while your session is active.

Starting a Session with Workbench-managed Databricks credentials

See Starting a Session with Workbench-managed credentials for more information on how to enable Databricks credentials for use in new sessions. If the Databricks selection is not available, then your administrator has not configured the integration.

Avoid using both Workbench-managed credentials and PATs

If your Workbench administrator has enabled managed Databricks credentials, providing your PATs or host environment variables in a .env, .databrickscfg, .Renviron or other files may interfere with Workbench-managed credentials and lead to inconsistent behavior. If you want to opt out of managed credentials and instead use your own configuration, do not enable the Databricks Session Credentials button when starting a new session.

Databricks CLI

The Databricks CLI wraps the Databricks REST API and provides access to Databricks account and workspace resources and data. For more information, reference the Databricks REST API reference or the Databricks CLI documentation. Providing Databricks credentials is unnecessary when using the CLI with managed Databricks credentials inside Posit Workbench.

# List cluster metadata
databricks clusters get 1234-567890-a12bcde3

Databricks with R

Workbench-managed Databricks credentials have been validated to work with sparklyr, odbc using the Posit Pro Driver for Databricks, and the Databricks R SDK. For more information about using the Databricks pane to manage clusters or sparklyr connections easily, or about the Connections pane in RStudio Pro, reference RStudio Pro: Databricks.

sparklyr version 1.8.4 or higher works with Workbench-managed credentials and modern Databricks Runtimes that make use of Databricks Connect (ML 13+). This is accomplished via reticulate and an additional pysparklyr package that powers sparklyr. For more information and installation instructions see the sparklyr package documentation.

library(sparklyr)

sc <- spark_connect(
  cluster_id = "[Cluster ID]",
  method = "databricks_connect",
  dbr_version = "14.0"
)

odbc version 1.4.0 or higher provides odbc::databricks() which automatically handles many common authentication scenarios. Pair odbc with DBI if you want to write your own SQL:

library(DBI)

con <- dbConnect(
  odbc::databricks(),
  httpPath = "value found under ⁠Advanced Options > JDBC/ODBC⁠ in the Databricks UI"
)

dbGetQuery(con, "
  SELECT passenger_count, AVG(fare_amount) AS avg_fare 
  FROM nyctaxi 
  WHERE trip_distance > 0.5
  GROUP BY passenger_count
")

Or use dbplyr to generate SQL from your dplyr code:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

nyctaxi <- tbl(con, in_catalog("samples", "nyctaxi", "trips"))
nyctaxi |> 
   filter(trip_distance > 0.5) |> 
   summarise(
     avg_fare = mean(fare_amount), 
     .by = passenger_count
   )

The R Databricks SDK can be used for general Databricks operations:

library(databricks)

client <- DatabricksClient()
# List clusters
clustersList(client)[, "cluster_name"]

Databricks with Python

Workbench-managed credentials are expected to work with Python tools that adhere to the Databricks client unified authentication standard. The following have been tested in Posit Workbench: pyodbc, Databricks VS Code extension, Databricks CLI, Databricks SDK for Python, and pyspark via Databricks Connect.

pyodbc has been validated with Posit’s Databricks Pro Driver in VS Code.

import configparser
import os
import pyodbc
import re

# open the DATABRICKS_CONFIG_FILE to get oauth token information
config = configparser.ConfigParser()
config.read(os.environ["DATABRICKS_CONFIG_FILE"])

# Replace <table-name> with the name of the database table to query.
table_name = "samples.nyctaxi.trips"

databricks_host = os.environ["DATABRICKS_HOST"]
databricks_workspace_id = "138962681435081"
databricks_cluster_id = "1108-152427-dq9mgl"
databricks_http_path = (
    f"sql/protocolv1/o/{databricks_workspace_id}/{databricks_cluster_id}"
)
databricks_oauth_token = config["workbench"]["token"]

# Setup the connection string for the Databricks instance
connection_string = f"""
   Driver           = /opt/simba/spark/lib/64/libsparkodbc_sb64.so;
   Host             = {databricks_host};
   HTTPPath         = {databricks_http_path};
   Port             = 443;
   Protocol         = https;
   UID              = token;
   SparkServerType  = 3;
   Schema           = default;
   ThriftTransport  = 2;
   SSL              = 1;
   AuthMech         = 11;
   Auth_Flow        = 0;
   Auth_AccessToken = {databricks_oauth_token};
"""
# remove spaces from connection string
connection_string = re.sub(r"\s+", "", connection_string)

# connect to the Databricks instance
conn = pyodbc.connect(connection_string, autocommit=True)

PySpark is a Python API for working with Apache Spark and can be used with Databricks managed Spark environments. It can be used for SQL or dataframe operations as well as machine learning algorithms via MLlib.

from pyspark.sql import SparkSession

spark: SparkSession = spark 

print("Hello from Databricks")
spark.sql("select * from samples.nyctaxi.trips").show(3)

The .databrickscfg file that Posit Workbench manages for you has a profile called “workbench” containing the short-lived tokens and other metadata for connecting.

from databricks.connect import DatabricksSession
from databricks.sdk.core import Config

# Optional to specify the profile
config = Config(profile="workbench", cluster_id="1234-abcdef-5678hijk")

spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

# PySpark code executed on Databricks Cluster
df = spark.read.table("samples.nyctaxi.trips")
df = df.filter(df.trip_distance > 0.5)
df_pd = df.limit(3).toPandas()

# Python code executed on local machine
print(df_pd)

Python SDK

This brief example prints the available clusters based on your user profile, using the Databricks profile (profile="workbench") that Workbench-managed credentials provides.

from databricks.sdk import WorkspaceClient

# Connect to the Databricks instance
# Optional to specify the profile
w = WorkspaceClient(profile="workbench")

# Retrieve the list of clusters
clusters = w.clusters.list()

# Print the names of the clusters
for c in clusters:
    print(c.cluster_name)
Back to top