DuckDB has no dependencies to compile or to run it.
Misc
Compile from sources
Get the sources from Github, compile them and …
$ git clone https://github.com/duckdb/duckdb
$ cd duckdb
$ make
… test the produced executable with an in-memory database:
$ ./build/release/duckdb :memory: "select 'test'"
┌─────────┐
│ 'test' │
│ varchar │
├─────────┤
│ test │
└─────────┘
SELECT * FROM T
select * from t
can be abbreviated with:
from t;
PIVOT
create table p (
txt varchar not null,
val decimal(5,2) not null
);
insert into p values
('A', 4.31),
('A', 2.18),
('A', 3.72),
('B', 2.81),
('B', 5.36),
('C', 6.12),
('C', 8.27);
pivot p
on txt
using
sum (val) as total,
count(val) as cnt;
--
-- ┌───────────────┬───────┬───────────────┬───────┬───────────────┬───────┐
-- │ A_total │ A_cnt │ B_total │ B_cnt │ C_total │ C_cnt │
-- │ decimal(38,2) │ int64 │ decimal(38,2) │ int64 │ decimal(38,2) │ int64 │
-- ├───────────────┼───────┼───────────────┼───────┼───────────────┼───────┤
-- │ 10.21 │ 3 │ 8.17 │ 2 │ 14.39 │ 2 │
-- └───────────────┴───────┴───────────────┴───────┴───────────────┴───────┘
Specify selected columns
Select all columns from t
, except id
and val_2
:
select * exclude (id, val_2) from t;
Replace a given column with an expression
select * replace(upper(val_1) as val_1) from t;
select columns('val_\d') from t;
Select min(…)
for all columns:
select min(columns(*)) from t;
Lists
«List» is a native data type:
select [1, 2, 3, 4, 5];
apply()
can be used to evaluate an expression on each element of a list and return a list.
select ([1, 2, 3, 4, 5]).apply( x -> x**2 );
filter()
selects the elements of a list that meet a given condition:
select ([1, 2, 3, 4, 5]).apply( x -> x**2 ).filter( x -> x>10 );
Create a table with list column:
create table t (id int primary key, elems int[]);
insert into t values (1, [10, 20, 30 ]);
insert into t values (2, [22, 55, 88, 99]);
insert into t values (3, [ ]);
select elems.apply( elem -> elem * id) from t;
TODO: What is the difference between .apply()
and .list_transform()
?
select id, elems.aggregate('sum') from t;
A list of words can be created from a string and the split()
function.
The following example splits a sentences into words and then uses regexp_matches()
to remove the words that contain commas or periods.
create table s(txt varchar);
insert into s values ('Hello world, the number is 42.');
select txt.split(' ').filter( word -> not word.regexp_matches('\.|,')) from s;
A list of increasing integers can be created with generate_series
:
select generate_series( 1, 100 );
A table and column name for such a series can the be specified like so:
select
num
from
generate_series(1, 10) as t(num);
select
t.a,
u.b,
t.a*u.b prod
from
generate_series(1, 5) as t(a) join
generate_series(5,10) as u(b) on t.a + u.b <= 8;
A list can be «rotated» 90 degress:
select unnest(generate_series(1, 100));
List comprehension
select
[ num ** 2 for num in [ 7, -3, 5, 2] if num > 0 ] as squared;
As far as I understand, the above statement is at least semantically equivalent to the following one:
select
list_transform(
list_filter( [ 7, -3, 5, 2 ], num -> num > 0 ),
n -> n ** 2
);
Structs
The special
struct
type allows to create nested (
JSON-like) objects:
create table T(
id integer primary key,
json struct (
num integer,
txt varchar,
dat struct (
val_1 varchar,
val_2 numeric
)
)
);
insert into T values
( 1, { num: 42, txt: 'hello world', dat: {val_1: 'abc', val_2: 17.23 }} ),
( 2, { num: 99, txt: 'ninety-nine', dat: {val_1: 'def', val_2: 91.18 }} );
select
json.dat.val_1,
json.dat.val_2
from
T
where
id = 1;
select
json.*
from
T
where
id = 2;
Selecting a table (i. e. the table name) from a table returns the table data as structs:
create table T (
id integer primary key,
num numeric,
txt varchar
);
insert into T values
( 1, 42, 'forty-two' ),
( 2, 99, 'ninety-nine'),
( 3, 17, 'seventeen' );
select T from T;
--
-- ┌────────────────────────────────────────────────────┐
-- │ T │
-- │ struct(id integer, num decimal(18,3), txt varchar) │
-- ├────────────────────────────────────────────────────┤
-- │ {'id': 1, 'num': 42.000, 'txt': forty-two} │
-- │ {'id': 2, 'num': 99.000, 'txt': ninety-nine} │
-- │ {'id': 3, 'num': 17.000, 'txt': seventeen} │
-- └────────────────────────────────────────────────────┘
Exporting and importing data
export database '/home/rene/dbexp' (format csv, delimiter ',');
export database '/home/rene/dbexp' (format parquet);
There is also an import
command:
import database '/home/rene/dbexp':
A single
CSV file can be loaded with the
copy
command:
copy destTbl from '/home/rene/data.csv';
FILTER clause for aggregate functions
create table T (
grp varchar,
val integer,
txt varchar
);
insert into T values
('A', 7, 'seven'),
('A', 3, 'three'),
('A', 9, 'nine' ),
('B', 2, 'two' ),
('B', 4, 'four' ),
('B', 5, 'five' ),
('B', 1, 'one' ),
('B', 8, 'eight');
select
grp,
--
count( * ) cnt,
count( * ) filter (val < 6) cnt_lt_6,
max (txt) filter (val > 4) max_txt_val_gt_4
from
T
group by
grp;
--
-- ┌─────────┬───────┬──────────┬──────────────────┐
-- │ grp │ cnt │ cnt_lt_6 │ max_txt_val_gt_4 │
-- │ varchar │ int64 │ int64 │ varchar │
-- ├─────────┼───────┼──────────┼──────────────────┤
-- │ A │ 3 │ 1 │ seven │
-- │ B │ 5 │ 4 │ five │
-- └─────────┴───────┴──────────┴──────────────────┘
QUALIFY clause to filter window functions
qualify
is to window functions what having
is to aggregate functions.
With qualify
, it's especially convenient to formulate top-N queries:
create table T (
grp varchar,
val integer,
txt varchar
);
insert into T values
('A', 7, 'seven'),
('A', 3, 'three'),
('A', 9, 'nine' ),
('B', 2, 'two' ),
('B', 4, 'four' ),
('B', 5, 'five' ),
('B', 1, 'one' ),
('C', 8, 'eight'),
('C', 0, 'zero' ),
('C', 6, 'six' );
select
row_number() over (partition by grp order by val desc) rn,
grp,
val,
txt
from
T
qualify
rn = 1;
--
-- ┌───────┬─────────┬───────┬─────────┐
-- │ rn │ grp │ val │ txt │
-- │ int64 │ varchar │ int32 │ varchar │
-- ├───────┼─────────┼───────┼─────────┤
-- │ 1 │ C │ 8 │ eight │
-- │ 1 │ A │ 9 │ nine │
-- │ 1 │ B │ 5 │ five │
-- └───────┴─────────┴───────┴─────────┘
select
grp,
val,
txt
from
T
qualify
row_number() over (partition by grp order by val desc) = 1;
--
-- ┌─────────┬───────┬─────────┐
-- │ grp │ val │ txt │
-- │ varchar │ int32 │ varchar │
-- ├─────────┼───────┼─────────┤
-- │ C │ 8 │ eight │
-- │ A │ 9 │ nine │
-- │ B │ 5 │ five │
-- └─────────┴───────┴─────────┘
create table T(S struct(num int, txt text));
insert into T values (row(42, 'hello world'));
select * from T;
ILIKE
ilike
matches case insensitively.
bar()
bar(…)
allows to plot simple bar charts:
select
x,
cos(x) y,
bar(1+cos(x), 0, 2, 20) y_
from (
select
radians(t.d) x
from
generate_series(0, 360, 10) as t(d)
);
├─────────────────────┼─────────────────────────┼──────────────────────┤
│ 0.0 │ 1.0 │ ████████████████████ │
│ 0.17453292519943295 │ 0.984807753012208 │ ███████████████████▊ │
│ 0.3490658503988659 │ 0.9396926207859084 │ ███████████████████▍ │
│ 0.5235987755982988 │ 0.8660254037844387 │ ██████████████████▋ │
│ 0.6981317007977318 │ 0.766044443118978 │ █████████████████▋ │
│ 0.8726646259971648 │ 0.6427876096865394 │ ████████████████▍ │
│ 1.0471975511965976 │ 0.5000000000000001 │ ███████████████ │
│ 1.2217304763960306 │ 0.3420201433256688 │ █████████████▍ │
│ 1.3962634015954636 │ 0.17364817766693041 │ ███████████▋ │
│ 1.5707963267948966 │ 6.123233995736766e-17 │ ██████████ │
│ 1.7453292519943295 │ -0.1736481776669303 │ ████████▎ │
│ 1.9198621771937625 │ -0.3420201433256687 │ ██████▌ │
│ 2.0943951023931953 │ -0.4999999999999998 │ █████ │
│ 2.2689280275926285 │ -0.6427876096865394 │ ███▌ │
│ 2.443460952792061 │ -0.7660444431189779 │ ██▎ │
│ 2.6179938779914944 │ -0.8660254037844387 │ █▎ │
│ 2.792526803190927 │ -0.9396926207859083 │ ▌ │
│ 2.9670597283903604 │ -0.984807753012208 │ ▏ │
│ 3.141592653589793 │ -1.0 │ │
│ 3.3161255787892263 │ -0.984807753012208 │ ▏ │
│ 3.490658503988659 │ -0.9396926207859084 │ ▌ │
│ 3.6651914291880923 │ -0.8660254037844386 │ █▎ │
│ 3.839724354387525 │ -0.766044443118978 │ ██▎ │
│ 4.014257279586958 │ -0.6427876096865395 │ ███▌ │
│ 4.1887902047863905 │ -0.5000000000000004 │ ████▉ │
│ 4.363323129985824 │ -0.34202014332566855 │ ██████▌ │
│ 4.537856055185257 │ -0.17364817766693033 │ ████████▎ │
│ 4.71238898038469 │ -1.8369701987210297e-16 │ █████████▉ │
│ 4.886921905584122 │ 0.17364817766692997 │ ███████████▋ │
│ 5.061454830783556 │ 0.342020143325669 │ █████████████▍ │
│ 5.235987755982989 │ 0.5000000000000001 │ ███████████████ │
│ 5.410520681182422 │ 0.6427876096865393 │ ████████████████▍ │
│ 5.585053606381854 │ 0.7660444431189778 │ █████████████████▋ │
│ 5.759586531581287 │ 0.8660254037844384 │ ██████████████████▋ │
│ 5.934119456780721 │ 0.9396926207859084 │ ███████████████████▍ │
│ 6.1086523819801535 │ 0.984807753012208 │ ███████████████████▊ │
│ 6.283185307179586 │ 1.0 │ ████████████████████ │
├─────────────────────┴─────────────────────────┴──────────────────────┤
ARG_MAX and ARG_MIN aggregate function
The aggregate functions
arg_max(v, w)
returns the value of
w
of the column where
v
has tha maximum value in an aggregated group. The functionality is equal (or at least similar) to the
max(w) keep (dense_rank last order by v)
, but more legible (imho).
create table T (
grp varchar,
val integer,
txt varchar
);
insert into T values
('A', 7, 'seven'),
('A', 3, 'three'),
('A', 9, 'nine' ),
('B', 2, 'two' ),
('B', 4, 'four' ),
('C', 5, 'five' ),
('C', 1, 'one' ),
('C', 8, 'eight');
select
grp,
--
max(val) maxval,
arg_max(txt, val) txtofmaxval,
--
min(val) minval,
arg_min(txt, val) txtofminval
from
T
group by
grp;
--
-- ┌─────────┬────────┬─────────────┬────────┬─────────────┐
-- │ grp │ maxval │ txtofmaxval │ minval │ txtofminval │
-- │ varchar │ int32 │ varchar │ int32 │ varchar │
-- ├─────────┼────────┼─────────────┼────────┼─────────────┤
-- │ A │ 9 │ nine │ 3 │ three │
-- │ B │ 4 │ four │ 2 │ two │
-- │ C │ 8 │ eight │ 1 │ one │
-- └─────────┴────────┴─────────────┴────────┴─────────────┘
Select from JSON data in a file (or stdin)
$ echo '[
{ "fruit": "Apple" , "color": "Red" , "origin": "USA" },
{ "fruit": "Banana" , "color": "Yellow", "origin": "Ecuador" },
{ "fruit": "Cherry" , "color": "Red" , "origin": "Turkey" },
{ "fruit": "Dragonfruit", "color": "Pink" , "origin": "Vietnam" }
]' | duckdb -c '
install json;
load json;
select
fruit,
color
from
read_json_auto("/dev/stdin")'
Positional joins
A postional join aligns the first record of the left table with the first record of the right table, the second row of the first table with the second row of the right table, and so on:
create table tab_a( num integer, val numeric );
create table tab_b( txt varchar, val numeric );
insert into tab_a values ( 4 , 22.1);
insert into tab_b values ('four', 18.5);
insert into tab_a values ( 9 , 17.8);
insert into tab_b values ('nine', 16.6);
insert into tab_a values ( 2 , 39.5);
insert into tab_b values ('two' , 21.7);
select
a.num,
b.txt,
a.val val_a,
b.val val_b
from
tab_a a positional join
tab_b b;
--
-- ┌───────┬─────────┬───────────────┬───────────────┐
-- │ num │ txt │ val_a │ val_b │
-- │ int32 │ varchar │ decimal(18,3) │ decimal(18,3) │
-- ├───────┼─────────┼───────────────┼───────────────┤
-- │ 4 │ four │ 22.100 │ 18.500 │
-- │ 9 │ nine │ 17.800 │ 16.600 │
-- │ 2 │ two │ 39.500 │ 21.700 │
-- └───────┴─────────┴───────────────┴───────────────┘
The idea and the result of a positional join is similar to that of the shell command
paste
and
pr -mt …
.
Execution plans / run-time profiling
The execution plan of a statement is printed with
explain select …
The above statement does not actually execute the statement. In order to execute the statement, use
explain analyze select …
See also the pragma enable_profiling='json'
and pragma profile_output='/home/rene/duckdb/runtime-profile.json'
statements.
Use python -m duckdb.query_graph /home/rene/duckdb/runtime-profile.json
to generate a HTML file.