Filtering tibble using a dplyr
Dplyr is not super intuitive for me, so I made this quick example of how to use it
Use list objects to filter a tibble
Lists are good format to keep data, which are more complicated then a simple table.
Therefore can be used to store different parameters which are then applied to filter tabular data.
Operations:
- Matching an exact value
- Matching bigger or smaller then one number
- Matching values between two numbers
- Grouping and summarizing the data
# R
install.packages("tidyverse")
install.packages("dplyr")
library(tidyverse)
test_data<-tibble(grouping_col_01=c("one","one", "one", "two","two"),
grouping_col_02=c("A","B","A","B","A"),
numbers=c(1,2,3,4,5),
letters=c("a","b","c","d","e")
)
parameter<-list(matching_numbers = c(1,2,3,4),
number_range = c(2,4),
one_number = 3,
letter = "b"
)
grouping_cols<-grep("grouping",colnames(test_data))
filtered_output<-test_data %>%
dplyr::filter(numbers %in% parameter[["matching_numbers"]]) %>% # keep matching numbers
dplyr::filter(between(numbers, # keep only numbers inside a range
min(parameter[["number_range"]]),
max(parameter[["number_range"]]))
) %>%
dplyr::filter(numbers<parameter[["one_number"]]) %>% # keep smaller than upper limit number
dplyr::filter(letters==parameter[["letter"]]) # keep matching letter
grouping_cols<-grep("grouping",colnames(test_data))
test_data %>%
group_by(across(all_of(grouping_cols))) %>%
summarise(n=n())
Parsing TOML files
Motivation
Find a useful syntax for config files which could be used to supply parameters to R scripts
Not sure what is the best, YAML uses fewer complicated characters, but looks to depend on spaces to format, which is do not like, so toml looks better
TOML resources
Link to the toml docs
Link to the R pacakge for parsing toml into List in R
On my pc: ~/R-Projects/TOML
Install package
install.packages("RcppTOML")
library(RcppTOML)
TOML example file
https://rdrr.io/cran/RcppTOML/api/https://rdrr.io/cran/RcppTOML/api/
# Application configuration
title = "My Application"
version = "1.0.0"
[database]
host = "localhost"
port = 5432
username = "app_user"
password = "secure_password"
databases = ["myapp_db", "myapp_cache"]
pool_size = 10
ssl_enabled = true
[server]
host = "0.0.0.0"
port = 8000
debug = false
allowed_hosts = ["localhost", "127.0.0.1", "example.com"]
[logging]
level = "INFO"
format = "%(asctime)s - %(levelname)s - %(message)s"
handlers = ["console", "file"]
[cache]
enabled = true
ttl = 3600
max_size = 1000
[features]
enable_api = true
enable_webhooks = false
rate_limit = 100
TOML functions:
parseTOML
toml_file_parsed<-parseTOML("toml_file.txt")
summary
Get summary of the parsed data file
summary(toml_file_parsed)
Output:
toml object with top-level slots: cache, database, features, logging, server, title, version read from ‘/home/rstudio/TOML-parsing/data/toml_01’
print
Print the structure
print(toml_file_parsed)
List of 7
$ cache :List of 3
..$ enabled : logi TRUE
..$ max_size: int 1000
..$ ttl : int 3600
$ database:List of 7
..$ databases : chr [1:2] “myapp_db” “myapp_cache”
..$ host : chr “localhost”
..$ password : chr “secure_password”
..$ pool_size : int 10
..$ port : int 5432
..$ ssl_enabled: logi TRUE
..$ username : chr “app_user”
$ features:List of 3
..$ enable_api : logi TRUE
..$ enable_webhooks: logi FALSE
..$ rate_limit : int 100
$ logging :List of 3
..$ format : chr “%(asctime)s - %(levelname)s - %(message)s”
..$ handlers: chr [1:2] “console” “file”
..$ level : chr “INFO”
$ server :List of 4
..$ allowed_hosts: chr [1:3] “localhost” “127.0.0.1” “example.com”
..$ debug : logi FALSE
..$ host : chr “0.0.0.0”
..$ port : int 8000
$ title : chr “My Application”
$ version : chr “1.0.0”
Get items from the parsed file
Get the vector of “allowed_hosts”
toml_file_parsed$server[["allowed_hosts"]]
Output:
[1] “localhost” “127.0.0.1” “example.com”
Iterate over the “allowed_hosts”
library(tidyverse)
for ( item in seq_along(toml_file_parsed$server[["allowed_hosts"]])) {print(toml_file_parsed$server[["allowed_hosts"]][item])}
Output:
[1] “localhost” [1] “127.0.0.1”
[1] “example.com”