R vs SAS Clinical Programming Cheat Sheet

Complete Reference for SAS-to-R Transition

📝 R vs SAS Clinical Programming Cheat Sheet

This comprehensive cheat sheet compares clinical programming tasks across R and SAS, organized by the 7-module training curriculum. Designed for SAS programmers transitioning to R for regulatory submissions.

Note

Target Audience: Clinical programmers with SAS experience learning R for CDISC SDTM, ADaM, and TLF programming.


📁 Module 1: RStudio Environment & Setup

Task SAS R + RStudio
Assign variable x = 5; x <- 5
Print output put x; print(x) or cat(x)
Script file .sas .R
Report file .sas with ODS .Rmd, .qmd
IDE SAS Enterprise Guide RStudio Desktop/Server
Extensions SAS Macros R Packages
Libraries libname mylib "path"; library(package)
Working directory %let path = ...; setwd() or RStudio Projects
Help system help proc means; ?function or help(function)

� Module 2: Data Wrangling & Basic Operations

Task SAS R (dplyr + tidyverse)
Read CSV proc import datafile="file.csv" read_csv("file.csv")
Read SAS files Native read_sas("file.sas7bdat")
Read XPT files libname xpt xport "file.xpt" read_xpt("file.xpt")
Filter rows where visit=1; or if visit=1; filter(visit == 1)
Select columns keep var1 var2; select(var1, var2)
Create variable new_var = old_var * 2; mutate(new_var = old_var * 2)
Rename variables rename old_var=new_var; rename(new_var = old_var)
Sort data proc sort; by var; arrange(var)
Remove duplicates proc sort nodupkey; distinct()
First/Last observations first.var / last.var slice_head(n=1) / slice_tail(n=1)

🔗 Module 3: Joins & Summarizations

Task SAS R (dplyr)
Inner join data c; merge a b; by id; inner_join(a, b, by = "id")
Left join merge a(in=ina) b; by id; if ina; left_join(a, b, by = "id")
Full join merge a b; by id; full_join(a, b, by = "id")
Multiple join keys by var1 var2; by = c("var1", "var2")
Group operations proc means; by group; var value; group_by(group) %>% summarise(mean_val = mean(value))
Count records proc freq; tables group; count(group)
Aggregate functions proc summary; by group; group_by(group) %>% summarise(...)
Frequency tables proc freq; tables var1*var2; count(var1, var2) or table(var1, var2)

📅 Module 4: Date/Time & Text Processing

Task SAS R (lubridate + stringr)
Parse dates input(date_str, yymmdd10.) ymd("2024-01-15") or dmy("15/01/2024")
Format dates put(date, yymmdd10.) format(date, "%Y-%m-%d")
Study day calculation study_day = date - rfstdtc + 1 study_day = as.numeric(date - rfstdtc) + 1
Date arithmetic intnx('day', date, 30) date + days(30)
Extract date parts year(date), month(date) year(date), month(date)
String length length(var) str_length(var)
Substring substr(var, 1, 5) str_sub(var, 1, 5)
Find/replace text tranwrd(var, "old", "new") str_replace(var, "old", "new")
Upper/lower case upcase(var), lowcase(var) str_to_upper(var), str_to_lower(var)
Pattern matching prxmatch('/pattern/', var) str_detect(var, "pattern")
String concatenation var1 || var2 paste0(var1, var2) or str_c(var1, var2)

⚙️ Module 5: Functions & Control Structures

Task SAS R
Define function %macro myfunc(param); ... %mend; myfunc <- function(param) { ... }
Call function %myfunc(value) myfunc(value)
If-then-else if condition then x=1; else x=0; if (condition) x <- 1 else x <- 0
Case/when logic select; when(...) do; ...; end; case_when(condition1 ~ value1, condition2 ~ value2)
Loops do i=1 to 10; ...; end; for (i in 1:10) { ... }
Apply function to groups by processing with macros map(), purrr::walk(), group_modify()
Conditional assignment if var > 5 then flag = "Y"; mutate(flag = if_else(var > 5, "Y", "N"))
Missing value check if missing(var) is.na(var)

📋 Module 6: CDISC SDTM Programming

Task SAS R (haven + sdtm.oak)
Create SDTM domains data dm; set raw_dm; dm_sdtm <- raw_dm %>% mutate(...)
USUBJID derivation USUBJID = study || "-" || subjid; mutate(USUBJID = paste0(study, "-", subjid))
Date formatting AESTDTC = put(ae_date, yymmdd10.); mutate(AESTDTC = format(ae_date, "%Y-%m-%d"))
Study day calculation AESTDY = ae_date - rfstdtc + 1; mutate(AESTDY = as.numeric(ae_date - rfstdtc) + 1)
Sequence numbers retain aeseq; aeseq + 1; mutate(AESEQ = row_number())
Domain assignment DOMAIN = "AE"; mutate(DOMAIN = "AE")
Controlled terminology Manual mapping sdtm.oak::create_mapping() functions
Export to XPT libname xpt xport "dm.xpt"; data xpt.dm; set dm; haven::write_xpt(dm, "dm.xpt")
Metadata validation Custom programming sdtm.oak::check_variables()

🎯 Module 7: Post-Processing, QC & Clinical Reporting

Quality Control Procedures

Task SAS R
Compare datasets proc compare base=a compare=b; all.equal() or dplyr::anti_join()
Missing value checks proc means nmiss; summarise(across(everything(), ~sum(is.na(.))))
Duplicate detection proc sort; by _all_; proc freq; tables _freq_; get_dupes() or custom duplicate checks| | Range validation |proc univariate;|summary()or custom range checks
Frequency tables proc freq; count() or table()

Professional Reporting

Task SAS R
Basic table output proc tabulate; gt() package
Demographics table proc means; proc freq; + formatting gt() with group_by() %>% summarise()
AE summary tables proc freq; tables aebodsys*aedecod; count() + pivot_wider() + gt()
Listing generation proc print; with formatting flextable() or gt()
Figure creation proc sgplot; ggplot2
RTF/PDF export ods rtf; gtsave() or R Markdown to Word/PDF
Table styling ODS styles gt() styling functions

Advanced Reporting Features

Task SAS R
Conditional formatting ODS style overrides tab_style() with conditions
Footnotes footnote statements tab_footnote()
Headers/titles title statements tab_header()
Page breaks ODS controls R Markdown page breaks
Table of contents ODS contents R Markdown TOC

🤖 GitHub Copilot Integration Tips

Effective Prompts for Clinical Programming

SDTM Programming:

# Create CDISC SDTM AE domain from raw adverse event data with proper date formatting
# Derive study day variables following CDISC conventions
# Generate sequence numbers for SDTM domains by subject

ADaM Programming:

# Create ADSL dataset from SDTM DM with treatment flags and population indicators
# Derive baseline values and change from baseline for laboratory parameters
# Create analysis flags for efficacy and safety populations

QC and Validation:

# Generate comprehensive data quality report for clinical datasets
# Compare R dataset output against SAS reference for validation
# Create automated checks for CDISC compliance and data integrity

Clinical Reporting:

# Create regulatory-compliant demographics table with proper formatting
# Generate adverse event summary table by system organ class and preferred term
# Build laboratory shift table showing baseline to endpoint changes

📚 Additional Resources

  • R Packages for Clinical Programming: dplyr, haven, gt, flextable, admiral, sdtm.oak
  • CDISC Implementation: Official CDISC Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM) guides
  • Regulatory Guidance: FDA Study Data Standards and submissions requirements
  • Community: R/Pharma conference, PhUSE working groups, Posit (formerly RStudio) clinical resources

This cheat sheet supports the clinical programming training curriculum and serves as a quick reference for daily SAS-to-R programming tasks.