Filtering tibble using a dplyr

Dplyr is not super intuitive for me, so I made this quick example of how to use it

Use list objects to filter a tibble

Lists are good format to keep data, which are more complicated then a simple table.
Therefore can be used to store different parameters which are then applied to filter tabular data.

Operations:

  1. Matching an exact value
  2. Matching bigger or smaller then one number
  3. Matching values between two numbers
  4. Grouping and summarizing the data
# R
install.packages("tidyverse")
install.packages("dplyr")

library(tidyverse)

test_data<-tibble(grouping_col_01=c("one","one", "one", "two","two"),
                  grouping_col_02=c("A","B","A","B","A"),
                  numbers=c(1,2,3,4,5),
                  letters=c("a","b","c","d","e")
)

parameter<-list(matching_numbers = c(1,2,3,4),
                number_range = c(2,4),
                one_number = 3,
                letter = "b"
)

grouping_cols<-grep("grouping",colnames(test_data))

filtered_output<-test_data %>% 
  dplyr::filter(numbers %in% parameter[["matching_numbers"]]) %>% # keep matching numbers
  dplyr::filter(between(numbers, # keep only numbers inside a range
                        min(parameter[["number_range"]]),
                        max(parameter[["number_range"]]))
  ) %>% 
  dplyr::filter(numbers<parameter[["one_number"]]) %>%  # keep smaller than upper limit number
  dplyr::filter(letters==parameter[["letter"]]) # keep matching letter

grouping_cols<-grep("grouping",colnames(test_data))

test_data %>%
  group_by(across(all_of(grouping_cols))) %>% 
  summarise(n=n())

Parsing TOML files

Motivation

Find a useful syntax for config files which could be used to supply parameters to R scripts
Not sure what is the best, YAML uses fewer complicated characters, but looks to depend on spaces to format, which is do not like, so toml looks better

TOML resources

Link to the toml docs
Link to the R pacakge for parsing toml into List in R
On my pc: ~/R-Projects/TOML

Install package

install.packages("RcppTOML")
library(RcppTOML)

TOML example file

https://rdrr.io/cran/RcppTOML/api/https://rdrr.io/cran/RcppTOML/api/
# Application configuration
title = "My Application"
version = "1.0.0"

[database]
host = "localhost"
port = 5432
username = "app_user"
password = "secure_password"
databases = ["myapp_db", "myapp_cache"]
pool_size = 10
ssl_enabled = true

[server]
host = "0.0.0.0"
port = 8000
debug = false
allowed_hosts = ["localhost", "127.0.0.1", "example.com"]

[logging]
level = "INFO"
format = "%(asctime)s - %(levelname)s - %(message)s"
handlers = ["console", "file"]

[cache]
enabled = true
ttl = 3600
max_size = 1000

[features]
enable_api = true
enable_webhooks = false
rate_limit = 100   

TOML functions:

parseTOML

toml_file_parsed<-parseTOML("toml_file.txt")

summary

Get summary of the parsed data file

summary(toml_file_parsed)

Output:

toml object with top-level slots: cache, database, features, logging, server, title, version read from ‘/home/rstudio/TOML-parsing/data/toml_01’

print

Print the structure

print(toml_file_parsed)

List of 7
$ cache :List of 3
..$ enabled : logi TRUE
..$ max_size: int 1000
..$ ttl : int 3600
$ database:List of 7
..$ databases : chr [1:2] “myapp_db” “myapp_cache”
..$ host : chr “localhost”
..$ password : chr “secure_password”
..$ pool_size : int 10
..$ port : int 5432
..$ ssl_enabled: logi TRUE
..$ username : chr “app_user”
$ features:List of 3
..$ enable_api : logi TRUE
..$ enable_webhooks: logi FALSE
..$ rate_limit : int 100
$ logging :List of 3
..$ format : chr “%(asctime)s - %(levelname)s - %(message)s”
..$ handlers: chr [1:2] “console” “file”
..$ level : chr “INFO”
$ server :List of 4
..$ allowed_hosts: chr [1:3] “localhost” “127.0.0.1” “example.com”
..$ debug : logi FALSE
..$ host : chr “0.0.0.0”
..$ port : int 8000
$ title : chr “My Application”
$ version : chr “1.0.0”

Get items from the parsed file

Get the vector of “allowed_hosts”

toml_file_parsed$server[["allowed_hosts"]]

Output:

[1] “localhost” “127.0.0.1” “example.com”

Iterate over the “allowed_hosts”

library(tidyverse)
for ( item in seq_along(toml_file_parsed$server[["allowed_hosts"]])) {print(toml_file_parsed$server[["allowed_hosts"]][item])}

Output:

[1] “localhost” [1] “127.0.0.1”
[1] “example.com”