Short script to filter csv file using the first couple of columns using a different csv file

Example usecase

I have an output csv file from cell profiler called data.csv.

First column is an image number and second column is an object number, the rest are the different measurments.

Based on the visual control I want to:

remove measurments single object (object 1) from an image 1
remove measurements of all objects from image 2.

I create a file in excel called filter.csv
Put 1 in first column (image) and second column (object)
Put 2 in first column (image), all in second column
Save as csv.
Run the script:
- using alias csv-filter filter.csv data.csv (the alias must be set in you .bashrc)
- using the script itself bash filter_csv.sh filter.csv data.csv
Check the filt_date-time_data.csv

`data.csv`

`filter.csv`

`filt_date-time_data.csv`

Files

script
data.csv
filter.csv
filt_date-time_data.csv

Script

#! /bin/bash
time_stamp=$(date +'%d-%b_%Hh%Mm%Ss')
[ $# -lt 2 ] && echo """
Usage:
  $0 <filter> <data-to-filter>

Description:
  This script filters out lines from <data-to-filter> that match entries in <filter>
  Each line in <filter> should contain one or more comma-separated columns.
  Use 'ALL' (case-insensitive) in the second, third etc. column(s) to match any value in given column.
"""  \
	&& exit 1

sed -E 's/,([Aa][Ll][Ll])$/,[^,]*/' "$1" \ # convert the all, ALL to *
  | sed -E 's/^/^/; s/$/,/' \              # make sure the patterns are anchored to beginning and end with a comma
  | grep -E -v -f - "$2" > "filt_${time_stamp}_$2" 2>/dev/null # filter the data file

[ $? -ne 0 ] && echo "Something went wrong"

echo "Filtering finished"

Example usecase#

data.csv#

filter.csv#

filt_date-time_data.csv#

Files#

Script#