Problems with CSV files
Although I like the flexibility, ubiquitousness and ease-of use of CSV files, I find they have the following problems
- They are not self describing, especially the data types. If CSV files are exchanged between multiple parties, the data types of each field in a CSV file needs to be communicated separately.
- Even if the data types are known, the data-format can still vary (dot vs. comma in decimal numbers, differing formats dd/mm/yyyy, mm/dd/yyyy, yyyy-mm-dd etc. This is almost always a problem if CSV files are generated from Excel with different locales)
- CSV files cannot be annotated with comments (although I believe there are some standards that address this issue)
Reasons to use semicolons rather than commas
Although the C in CSV stands for comma, there are good reasons to use semicolons to delimit fields one from another.
One reason is that there are many countries where a comma is already used as a decimal separator (See
problems with CSV files).
Also, semicolons make more sense to be used as separators because they are less likely to occur in ordinary text than commas.
See also
pspg is a command line utility to view CSV data (or generally
tabular data). (TODO: Compare with
usql).
SQLcl allows to create a resultset in CSV
Data exchange formats for
tabular data. An alternative to CSV files are
Parquet files. These are files with a columnar format as well but can be accessed with better performance and additionally support typed and nested schemas.
xsv is command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable: