DuckDB

DuckDB shell: Usage and options

Usage:

duckdb [OPTIONS] [database-file [sql statements]]

duckdb enters a shell to manipulate a DuckDB database stored in database-file. The DB is created if it does not exist.

The special value :memory: for database-file creates a transient in-memory database.

In the shell, a different database can be opened with .open DBFILE.

The optional argument sql statements allows to pass multiple statements separated by a semicolon:

$ duckdb :memory: "select 42; select 'hello world'"

An SQL script can be run from the command line like so:

$ duckdb xyz.db  '.read  /opt/proj/sql-scripts/exec-01.sql'

Options

`-append`	append the database to the end of the file
`-bail`	stop after hitting an error
`-batch`	force batch I/O
`-cmd COMMAND`	run `COMMAND` before reading stdin
`-c COMMAND`	run `COMMAND` and exit
`-echo`	print commands before execution
`-init FILENAME`	read/process named file
`-[no]header`	turn headers on or off
`-help`	show this message
`-interactive`	force interactive I/O
`-newline SEP`	Output row separator (default is `\n`).
`-nofollow`	refuse to open symbolic links to database files
`-no-stdin`	exit after processing options instead of reading stdin.
`-nullvalue TEXT`	set text string for `NULL` values (default is empty string).
`-readonly`	open the database read-only
`-s COMMAND`	run `COMMAND` and exit
`-separator SEP`	set output column separator. Default: '\|'
`-stats`	print memory stats before each finalize
`-unsigned`	allow loading of unsigned extensions
`-version`	show DuckDB version

Options to set output mode

Several (mutually excluding?) options set the output mode:

-ascii
-box
-column
-csv
-html
-json
-line
-list
-markdown
-quote
-table

Misc

Compile from sources

Get the sources from Github, compile them and …

$ git clone https://github.com/duckdb/duckdb
$ cd duckdb
$ make

… test the produced executable with an in-memory database:

$ ./build/release/duckdb :memory: "select 'test'"
┌─────────┐
│ 'test'  │
│ varchar │
├─────────┤
│ test    │
└─────────┘

SELECT * FROM T

select * from t can be abbreviated with:

from t;

PIVOT

create table p (
   txt      varchar      not null,
   val      decimal(5,2) not null
);

insert into p values
  ('A',   4.31),
  ('A',   2.18),
  ('A',   3.72),
  ('B',   2.81),
  ('B',   5.36),
  ('C',   6.12),
  ('C',   8.27);

pivot p 
   on txt
using
   sum  (val) as total,
   count(val) as cnt;
--
-- ┌───────────────┬───────┬───────────────┬───────┬───────────────┬───────┐
-- │    A_total    │ A_cnt │    B_total    │ B_cnt │    C_total    │ C_cnt │
-- │ decimal(38,2) │ int64 │ decimal(38,2) │ int64 │ decimal(38,2) │ int64 │
-- ├───────────────┼───────┼───────────────┼───────┼───────────────┼───────┤
-- │         10.21 │     3 │          8.17 │     2 │         14.39 │     2 │
-- └───────────────┴───────┴───────────────┴───────┴───────────────┴───────┘

Note: Unlike Oracle's (and possibly other vendors') implementation of PIVOT, DuckDB determines the number of columns at execution time.

Specify selected columns

Select all columns from t, except id and val_2:

select * exclude (id, val_2) from t;

Replace a given column with an expression

select * replace(upper(val_1) as val_1) from t;

Select columns whose names match a regular expression:

select columns('val_\d') from t;

Select min(…) for all columns:

select min(columns(*)) from t;

Lists

«List» is a native data type:

select  [1, 2, 3, 4, 5];

apply() can be used to evaluate an expression on each element of a list and return a list.

select ([1, 2, 3, 4, 5]).apply( x -> x**2 );

filter() selects the elements of a list that meet a given condition:

select ([1, 2, 3, 4, 5]).apply( x -> x**2 ).filter( x -> x>10 );

Create a table with list column:

create table t (id int primary key, elems int[]);

insert into t values (1,  [10, 20, 30    ]);
insert into t values (2,  [22, 55, 88, 99]);
insert into t values (3,  [              ]);

select elems.apply( elem -> elem * id) from t;

TODO: What is the difference between .apply() and .list_transform()?

select id, elems.aggregate('sum') from t;

A list of words can be created from a string and the split() function.

The following example splits a sentences into words and then uses regexp_matches() to remove the words that contain commas or periods.

create table s(txt varchar);

insert into s values ('Hello world, the number is 42.');
select txt.split(' ').filter( word -> not word.regexp_matches('\.|,')) from s;

A list of increasing integers can be created with generate_series:

select generate_series( 1, 100 );

A table and column name for such a series can the be specified like so:

select
   num
from
   generate_series(1, 10) as t(num);

select
   t.a,
   u.b,
   t.a*u.b  prod
from
   generate_series(1, 5) as t(a)                           join
   generate_series(5,10) as u(b) on t.a + u.b <= 8;

A list can be «rotated» 90 degress:

select unnest(generate_series(1, 100));

List comprehension

DuckDB borrows from Python's list comprehension.

select 
   [ num ** 2 for num in [ 7, -3,  5,  2] if num > 0 ] as squared;

As far as I understand, the above statement is at least semantically equivalent to the following one:

select
   list_transform(
       list_filter( [ 7, -3, 5, 2 ],  num -> num > 0 ),
       n -> n ** 2
   );

Structs

The special struct type allows to create nested (JSON-like) objects:

create table T(
    id    integer primary key,
    json  struct (
            num integer,
            txt varchar,
            dat struct (
                  val_1   varchar,
                  val_2   numeric
            )
          )
);

insert into T values
  (  1, { num: 42, txt: 'hello world', dat: {val_1: 'abc', val_2: 17.23 }} ),
  (  2, { num: 99, txt: 'ninety-nine', dat: {val_1: 'def', val_2: 91.18 }} );


select
   json.dat.val_1,
   json.dat.val_2
from
   T
where
   id = 1;

select
   json.*
from
   T
where
   id = 2;

Selecting a table (i. e. the table name) from a table returns the table data as structs:

create table T (
   id   integer primary key,
   num  numeric,
   txt  varchar
);

insert into T values
  ( 1, 42, 'forty-two'  ),
  ( 2, 99, 'ninety-nine'),
  ( 3, 17, 'seventeen'  );


select T from T;
--
-- ┌────────────────────────────────────────────────────┐
-- │                         T                          │
-- │ struct(id integer, num decimal(18,3), txt varchar) │
-- ├────────────────────────────────────────────────────┤
-- │ {'id': 1, 'num': 42.000, 'txt': forty-two}         │
-- │ {'id': 2, 'num': 99.000, 'txt': ninety-nine}       │
-- │ {'id': 3, 'num': 17.000, 'txt': seventeen}         │
-- └────────────────────────────────────────────────────┘

Exporting and importing data

A database can be exported to different file formats such as Apache Parquet:

export database '/home/rene/dbexp' (format csv, delimiter ',');
export database '/home/rene/dbexp' (format parquet);

There is also an import command:

import database '/home/rene/dbexp':

A single CSV file can be loaded with the copy command:

copy destTbl from '/home/rene/data.csv';

FILTER clause for aggregate functions

create table T (
   grp   varchar,
   val   integer,
   txt   varchar
);

insert into T values
   ('A', 7, 'seven'),
   ('A', 3, 'three'),
   ('A', 9, 'nine' ),
   ('B', 2, 'two'  ),
   ('B', 4, 'four' ),
   ('B', 5, 'five' ),
   ('B', 1, 'one'  ),
   ('B', 8, 'eight');

select
   grp,
   --
   count( * )                   cnt,
   count( * ) filter (val <  6) cnt_lt_6,
   max  (txt) filter (val >  4) max_txt_val_gt_4
from
   T
group by
   grp;
--
-- ┌─────────┬───────┬──────────┬──────────────────┐
-- │   grp   │  cnt  │ cnt_lt_6 │ max_txt_val_gt_4 │
-- │ varchar │ int64 │  int64   │     varchar      │
-- ├─────────┼───────┼──────────┼──────────────────┤
-- │ A       │     3 │        1 │ seven            │
-- │ B       │     5 │        4 │ five             │
-- └─────────┴───────┴──────────┴──────────────────┘

QUALIFY clause to filter window functions

qualify is to window functions what having is to aggregate functions.

With qualify, it's especially convenient to formulate top-N queries:

create table T (
   grp   varchar,
   val   integer,
   txt   varchar
);

insert into T values
   ('A', 7, 'seven'),
   ('A', 3, 'three'),
   ('A', 9, 'nine' ),
   ('B', 2, 'two'  ),
   ('B', 4, 'four' ),
   ('B', 5, 'five' ),
   ('B', 1, 'one'  ),
   ('C', 8, 'eight'),
   ('C', 0, 'zero' ),
   ('C', 6, 'six'  );

select
   row_number() over (partition by grp order by val desc)  rn,
   grp,
   val,
   txt
from
   T
qualify
   rn = 1;
--
-- ┌───────┬─────────┬───────┬─────────┐
-- │  rn   │   grp   │  val  │   txt   │
-- │ int64 │ varchar │ int32 │ varchar │
-- ├───────┼─────────┼───────┼─────────┤
-- │     1 │ C       │     8 │ eight   │
-- │     1 │ A       │     9 │ nine    │
-- │     1 │ B       │     5 │ five    │
-- └───────┴─────────┴───────┴─────────┘



select
   grp,
   val,
   txt
from
   T
qualify
   row_number() over (partition by grp order by val desc) = 1;
--
-- ┌─────────┬───────┬─────────┐
-- │   grp   │  val  │   txt   │
-- │ varchar │ int32 │ varchar │
-- ├─────────┼───────┼─────────┤
-- │ C       │     8 │ eight   │
-- │ A       │     9 │ nine    │
-- │ B       │     5 │ five    │
-- └─────────┴───────┴─────────┘

create table T(S struct(num int, txt text));

insert into T values (row(42, 'hello world'));

select * from T;

ILIKE

ilike matches case insensitively.

bar()

bar(…) allows to plot simple bar charts:

select
   x,
   cos(x) y,
   bar(1+cos(x), 0, 2, 20) y_
from (
   select
      radians(t.d) x
   from
      generate_series(0, 360, 10) as t(d)
);

├─────────────────────┼─────────────────────────┼──────────────────────┤
│                 0.0 │                     1.0 │ ████████████████████ │
│ 0.17453292519943295 │       0.984807753012208 │ ███████████████████▊ │
│  0.3490658503988659 │      0.9396926207859084 │ ███████████████████▍ │
│  0.5235987755982988 │      0.8660254037844387 │ ██████████████████▋  │
│  0.6981317007977318 │       0.766044443118978 │ █████████████████▋   │
│  0.8726646259971648 │      0.6427876096865394 │ ████████████████▍    │
│  1.0471975511965976 │      0.5000000000000001 │ ███████████████      │
│  1.2217304763960306 │      0.3420201433256688 │ █████████████▍       │
│  1.3962634015954636 │     0.17364817766693041 │ ███████████▋         │
│  1.5707963267948966 │   6.123233995736766e-17 │ ██████████           │
│  1.7453292519943295 │     -0.1736481776669303 │ ████████▎            │
│  1.9198621771937625 │     -0.3420201433256687 │ ██████▌              │
│  2.0943951023931953 │     -0.4999999999999998 │ █████                │
│  2.2689280275926285 │     -0.6427876096865394 │ ███▌                 │
│   2.443460952792061 │     -0.7660444431189779 │ ██▎                  │
│  2.6179938779914944 │     -0.8660254037844387 │ █▎                   │
│   2.792526803190927 │     -0.9396926207859083 │ ▌                    │
│  2.9670597283903604 │      -0.984807753012208 │ ▏                    │
│   3.141592653589793 │                    -1.0 │                      │
│  3.3161255787892263 │      -0.984807753012208 │ ▏                    │
│   3.490658503988659 │     -0.9396926207859084 │ ▌                    │
│  3.6651914291880923 │     -0.8660254037844386 │ █▎                   │
│   3.839724354387525 │      -0.766044443118978 │ ██▎                  │
│   4.014257279586958 │     -0.6427876096865395 │ ███▌                 │
│  4.1887902047863905 │     -0.5000000000000004 │ ████▉                │
│   4.363323129985824 │    -0.34202014332566855 │ ██████▌              │
│   4.537856055185257 │    -0.17364817766693033 │ ████████▎            │
│    4.71238898038469 │ -1.8369701987210297e-16 │ █████████▉           │
│   4.886921905584122 │     0.17364817766692997 │ ███████████▋         │
│   5.061454830783556 │       0.342020143325669 │ █████████████▍       │
│   5.235987755982989 │      0.5000000000000001 │ ███████████████      │
│   5.410520681182422 │      0.6427876096865393 │ ████████████████▍    │
│   5.585053606381854 │      0.7660444431189778 │ █████████████████▋   │
│   5.759586531581287 │      0.8660254037844384 │ ██████████████████▋  │
│   5.934119456780721 │      0.9396926207859084 │ ███████████████████▍ │
│  6.1086523819801535 │       0.984807753012208 │ ███████████████████▊ │
│   6.283185307179586 │                     1.0 │ ████████████████████ │
├─────────────────────┴─────────────────────────┴──────────────────────┤

See also the Oracle SQL example creates a bar chart.

ARG_MAX and ARG_MIN aggregate function

The aggregate functions arg_max(v, w) returns the value of w of the column where v has tha maximum value in an aggregated group. The functionality is equal (or at least similar) to the max(w) keep (dense_rank last order by v), but more legible (imho).

create table T (
   grp   varchar,
   val   integer,
   txt   varchar
);

insert into T values
   ('A', 7, 'seven'),
   ('A', 3, 'three'),
   ('A', 9, 'nine' ),
   ('B', 2, 'two'  ),
   ('B', 4, 'four' ),
   ('C', 5, 'five' ),
   ('C', 1, 'one'  ),
   ('C', 8, 'eight');

select
   grp,
   --
   max(val)           maxval,
   arg_max(txt, val)  txtofmaxval,
   --
   min(val)           minval,
   arg_min(txt, val)  txtofminval
from
   T
group by
   grp;
--
-- ┌─────────┬────────┬─────────────┬────────┬─────────────┐
-- │   grp   │ maxval │ txtofmaxval │ minval │ txtofminval │
-- │ varchar │ int32  │   varchar   │ int32  │   varchar   │
-- ├─────────┼────────┼─────────────┼────────┼─────────────┤
-- │ A       │      9 │ nine        │      3 │ three       │
-- │ B       │      4 │ four        │      2 │ two         │
-- │ C       │      8 │ eight       │      1 │ one         │
-- └─────────┴────────┴─────────────┴────────┴─────────────┘

Select from JSON data in a file (or stdin)

$ echo '[
  { "fruit": "Apple"      , "color": "Red"   , "origin": "USA"     },
  { "fruit": "Banana"     , "color": "Yellow", "origin": "Ecuador" },
  { "fruit": "Cherry"     , "color": "Red"   , "origin": "Turkey"  },
  { "fruit": "Dragonfruit", "color": "Pink"  , "origin": "Vietnam" }
]'  | duckdb -c '
install json;
load    json;
select
   fruit,
   color
from
   read_json_auto("/dev/stdin")'

Positional joins

A postional join aligns the first record of the left table with the first record of the right table, the second row of the first table with the second row of the right table, and so on:

create table tab_a( num integer, val numeric );
create table tab_b( txt varchar, val numeric );

insert into tab_a values ( 4    ,  22.1);
insert into tab_b values ('four',  18.5);

insert into tab_a values ( 9    ,  17.8);
insert into tab_b values ('nine',  16.6);

insert into tab_a values ( 2    ,  39.5);
insert into tab_b values ('two' ,  21.7);

select
   a.num,
   b.txt,
   a.val  val_a,
   b.val  val_b
from
   tab_a  a  positional join
   tab_b  b;
--
-- ┌───────┬─────────┬───────────────┬───────────────┐
-- │  num  │   txt   │     val_a     │     val_b     │
-- │ int32 │ varchar │ decimal(18,3) │ decimal(18,3) │
-- ├───────┼─────────┼───────────────┼───────────────┤
-- │     4 │ four    │        22.100 │        18.500 │
-- │     9 │ nine    │        17.800 │        16.600 │
-- │     2 │ two     │        39.500 │        21.700 │
-- └───────┴─────────┴───────────────┴───────────────┘

The idea and the result of a positional join is similar to that of the shell command paste and pr -mt ….

Execution plans / run-time profiling

The execution plan of a statement is printed with

explain select …

The above statement does not actually execute the statement. In order to execute the statement, use

explain analyze select …

See also the pragma enable_profiling='json' and pragma profile_output='/home/rene/duckdb/runtime-profile.json' statements.

Use python -m duckdb.query_graph /home/rene/duckdb/runtime-profile.json to generate a HTML file.