DuckDB (native vs SQLAlchemy)#
Beginning in 0.9, JupySQL supports DuckDB via a native connection and SQLAlchemy, both with comparable performance. JupySQL adds a small overhead; however, this overhead is constant.
At the moment, the only difference is that some features are only available when using SQLAlchemy.
Performance comparison (pandas)#
import pandas as pd
import numpy as np
num_rows = 1_000_000
num_cols = 100
df = pd.DataFrame(np.random.randn(num_rows, num_cols))
Raw DuckDB#
import duckdb
conn = duckdb.connect()
%%timeit
conn.execute("SELECT * FROM df").df()
807 ms ± 70.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
DuckDB + SQLALchemy#
%load_ext sql
%config SqlMagic.autopandas = True
%config SqlMagic.displaycon = False
%sql duckdb:// --alias duckdb-sqlalchemy
[JupySQL] Initializing providers...
[JupySQL] JUPYSQL_CNPG_ENABLED=NOT SET
[JupySQL] JUPYSQL_CNPG_NAMESPACE=NOT SET
[CNPG] Provider __init__ called, enabled=False
[JupySQL] CNPG provider registered, enabled=True
[JupySQL] CNPG discovered 0 database(s)
Connecting to 'duckdb-sqlalchemy'
%%timeit
_ = %sql SELECT * FROM df
770 ms ± 35.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
DuckDB + native#
%sql conn --alias duckdb-native
%%timeit
_ = %sql SELECT * FROM df
736 ms ± 50 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Performance comparison (polars)#
%config SqlMagic.autopolars = True
%sql duckdb-sqlalchemy
Disabled 'autopandas' since 'autopolars' was enabled.
Switching to connection 'duckdb-sqlalchemy'
Raw DuckDB#
%%timeit
conn.execute("SELECT * FROM df").pl()
483 ms ± 929 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
DuckDB + SQLAlchemy#
%%timeit
_ = %sql SELECT * FROM df
486 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
DuckDB + native#
%sql duckdb-native
Switching to connection 'duckdb-native'
%%timeit
_ = %sql SELECT * FROM df
475 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Limitations of using native connections#
As of version 0.9.0, the only caveat is that %sqlcmd won’t work with a native connection.
%sqlcmd
UsageError: Missing argument for %sqlcmd. Valid commands are: tables, columns, test, profile, explore, snippets, connect