Extracting relevant columns from csv and tab delimited files.

Exemplary workflow for quick look of papers from pubmed “similar or cited by”

Download the citation/similar articles in csv format from pubmed

wc -l file.csv

csvcut -n file.csv

Option 1: for the output in terminal, comma separated

cat file.csv |csvsort -r -c 7 | csvcut -c 2,11 -l

Option 2: for the output in terminal, a bit more nice (?)

cat file.csv |csvsort -r -c 7 | csvcut -c 2,11 -l | csvlook

Option 3: create a table using pandoc in landscape format

cat file.csv | csvsort -r -c 7 | csvcut -c 2,11 -l | csvlook | pandoc --variable geometry:"landscape, margin=1in" -o table_landscape.pdf

Option 4: create a table using pandoc in default (portrait) format

cat file.csv | csvsort -r -c 7 | csvcut -c 2,11 -l | csvlook | pandoc -o table_portrait.pdf

1. Articles count

2. Column names

3. Table in terminal (Option 1)

3. Table in terminal (Option 2)

3. Table in pdf landscape (Option 3)

3. Table in pdf portrait (Option 4)

The tutorial is very good

When parsing the csv files from pubmed, some characters are not recognized by default => proper encoding need to be set

UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\u0144’ in position 110: character maps to <undefined>

Solution:
Set the encoding variable for csvkit in the current terminal, would be used by shell for all other commands.

export PYTHONUTF8=1