Module 7 Theory — Post-Processing, QC & Reporting

📊 Module 7 — Post-Processing, QC & Reporting

🎯 Learning Objectives

By the end of this module, you will:

Master data formatting and presentation for clinical reports
Implement quality control (QC) practices and validation procedures
Create professional clinical reports using gt (and learn future SAS validation methods)
Learn best practices for reproducible clinical programming
Export publication-ready tables and figures
Use GitHub Copilot in RStudio for efficient report generation and QC automation

📋 1. Post-Processing Clinical Data

Data Formatting Principles

Post-processing involves preparing analysis datasets for final reporting and includes:

Variable Formatting: Proper display formats for dates, numbers, and categorical variables
Label Assignment: Descriptive labels for variables and datasets
Value Formatting: Consistent presentation of results (e.g., “12.3 (4.5)” for mean (SD))
Missing Data Handling: Appropriate display of missing values
Ordering and Sorting: Logical arrangement of observations and variables

Common Formatting Functions

library(dplyr)
library(stringr)
library(lubridate)

# Format numeric variables
format_number <- function(x, digits = 1) {
  case_when(
    is.na(x) ~ "Missing",
    TRUE ~ format(round(x, digits), nsmall = digits)
  )
}

# Format percentages  
format_percent <- function(x, digits = 1) {
  case_when(
    is.na(x) ~ "Missing",
    TRUE ~ paste0(format(round(x, digits), nsmall = digits), "%")
  )
}

# Format mean (SD)
format_mean_sd <- function(mean_val, sd_val, digits = 1) {
  case_when(
    is.na(mean_val) | is.na(sd_val) ~ "Missing",
    TRUE ~ paste0(
      format(round(mean_val, digits), nsmall = digits), 
      " (", 
      format(round(sd_val, digits), nsmall = digits), 
      ")"
    )
  )
}

# Format dates consistently
format_clinical_date <- function(date_var) {
  case_when(
    is.na(date_var) ~ "",
    TRUE ~ format(date_var, "%d%b%Y")  # e.g., "15JAN2024"
  )
}

🔍 2. Quality Control (QC) Practices

QC Programming Principles

Quality control in clinical programming ensures data integrity and regulatory compliance:

Independent QC: Second programmer reviews and validates all programming
Reproducible Code: Clear, well-documented code that produces consistent results
Data Validation: Multiple checks to ensure data quality and consistency
Output Verification: Systematic comparison of results across programmers
Documentation: Comprehensive documentation of methods and assumptions

Essential QC Functions

# Check for missing required variables
check_required_vars <- function(data, required_vars, dataset_name = "Dataset") {
  missing_vars <- setdiff(required_vars, names(data))
  
  if (length(missing_vars) > 0) {
    warning(paste0(dataset_name, " missing required variables: ", 
                   paste(missing_vars, collapse = ", ")))
    return(FALSE)
  }
  
  cat(paste0(dataset_name, " has all required variables\n"))
  return(TRUE)
}

# Validate data ranges
validate_data_ranges <- function(data, variable, min_val = NULL, max_val = NULL) {
  var_data <- data[[variable]]
  
  issues <- data.frame(
    issue_type = character(),
    count = numeric(),
    stringsAsFactors = FALSE
  )
  
  # Check for values outside expected range
  if (!is.null(min_val)) {
    below_min <- sum(var_data < min_val, na.rm = TRUE)
    if (below_min > 0) {
      issues <- rbind(issues, data.frame(
        issue_type = paste0("Values below ", min_val),
        count = below_min
      ))
    }
  }
  
  if (!is.null(max_val)) {
    above_max := sum(var_data > max_val, na.rm = TRUE)
    if (above_max > 0) {
      issues <- rbind(issues, data.frame(
        issue_type = paste0("Values above ", max_val),
        count = above_max
      ))
    }
  }
  
  return(issues)
}

# Compare datasets for QC
compare_datasets <- function(original_data, qc_data, key_vars, tolerance = 1e-6) {
  
  # Check dimensions
  if (nrow(original_data) != nrow(qc_data)) {
    warning("Row counts differ: Original=", nrow(original_data), 
            ", QC=", nrow(qc_data))
  }
  
  # Check key variables
  merged_data <- full_join(
    original_data %>% mutate(.source = "original"),
    qc_data %>% mutate(.source = "qc"),
    by = key_vars,
    suffix = c("_orig", "_qc")
  )
  
  # Identify discrepancies
  discrepancies <- merged_data %>%
    filter(is.na(.source_orig) | is.na(.source_qc))
  
  if (nrow(discrepancies) > 0) {
    warning("Found ", nrow(discrepancies), " records that don't match")
    return(discrepancies)
  }
  
  cat("Datasets match successfully\n")
  return(NULL)
}
------------------------------------------------------------------------

## 📈 3. Clinical Reporting with gt (and Future SAS Validation)

### The `gt` Package for Grammar of Tables

The `gt` package provides a structured approach to table creation with excellent formatting capabilities:

```r
library(gt)
library(dplyr)

# Example: Demographics summary table
create_demographics_table <- function(data) {
  data %>%
    group_by(ARM) %>%
    summarise(
      n = n(),
      age_mean = mean(AGE, na.rm = TRUE),
      age_sd = sd(AGE, na.rm = TRUE),
      male_n = sum(SEX == "M", na.rm = TRUE),
      male_pct = male_n / n * 100,
      .groups = "drop"
    ) %>%
    mutate(
      age_formatted = format_mean_sd(age_mean, age_sd, 1),
      male_formatted = paste0(male_n, " (", format(round(male_pct, 1), nsmall = 1), "%)")
    ) %>%
    select(ARM, n, age_formatted, male_formatted) %>%
    gt() %>%
    cols_label(
      ARM = "Treatment Arm",
      n = "N",
      age_formatted = "Age, years",
      male_formatted = "Male, n (%)"
    ) %>%
    tab_header(
      title = "Subject Demographics",
      subtitle = "Safety Population"
    ) %>%
    tab_footnote(
      footnote = "Age presented as mean (SD)",
      locations = cells_column_labels(columns = age_formatted)
    ) %>%
    tab_style(
      style = cell_text(weight = "bold"),
      locations = cells_column_labels()
    ) %>%
    tab_options(
      table.font.size = 12,
      heading.title.font.size = 14,
      heading.subtitle.font.size = 12
    )
}

The `flextable` Package for Flexible Tables (Future: SAS Validation)

Note

Future Enhancement: This section will be replaced with SAS validation procedures showing how to validate R-created datasets using SAS, demonstrating R-SAS interoperability and quality control workflows.

flextable offers extensive customization options and excellent Word/PowerPoint integration (current content for reference):

library(flextable)

# Example: Adverse events listing
create_ae_listing <- function(data) {
  data %>%
    select(USUBJID, AETERM, AESEV, AESTDTC, AEENDTC, AEOUT) %>%
    arrange(USUBJID, AESTDTC) %>%
    flextable() %>%
    set_header_labels(
      USUBJID = "Subject ID",
      AETERM = "Adverse Event Term", 
      AESEV = "Severity",
      AESTDTC = "Start Date",
      AEENDTC = "End Date",
      AEOUT = "Outcome"
    ) %>%
    add_header_row(
      values = c("", "Adverse Event Details", "", "", ""),
      colwidths = c(1, 4, 1, 1, 1)
    ) %>%
    theme_booktabs() %>%
    fontsize(size = 10, part = "all") %>%
    bold(part = "header") %>%
    align(align = "center", part = "header") %>%
    align(j = c("AESTDTC", "AEENDTC"), align = "center", part = "body") %>%
    width(j = "AETERM", width = 2.5) %>%
    width(j = c("AESTDTC", "AEENDTC"), width = 1.2)
}

📊 4. Advanced Table Formatting

Conditional Formatting

# Highlight severe adverse events
format_ae_severity <- function(ft_table) {
  ft_table %>%
    bg(
      i = ~ AESEV == "SEVERE",
      bg = "#ffcccc"  # Light red background
    ) %>%
    color(
      i = ~ AESEV == "SEVERE", 
      j = "AESEV",
      color = "#cc0000"  # Dark red text
    ) %>%
    bold(
      i = ~ AESEV == "SEVERE",
      j = "AESEV"
    )
}

# Format numeric values with appropriate precision
format_lab_values <- function(data) {
  data %>%
    mutate(
      result_formatted = case_when(
        is.na(LBSTRESN) ~ "Missing",
        LBTESTCD == "HGB" ~ format(round(LBSTRESN, 1), nsmall = 1),
        LBTESTCD == "WBC" ~ format(round(LBSTRESN, 2), nsmall = 2),
        TRUE ~ as.character(LBSTRESN)
      ),
      flag_formatted = case_when(
        LBNRIND == "HIGH" ~ paste0(result_formatted, " ↑"),
        LBNRIND == "LOW" ~ paste0(result_formatted, " ↓"),
        TRUE ~ result_formatted  
      )
    )
}

Cross-Reference Tables

# Create cross-tabulation with percentages
create_crosstab <- function(data, row_var, col_var) {
  data %>%
    count({{row_var}}, {{col_var}}) %>%
    group_by({{col_var}}) %>%
    mutate(
      pct = round(n / sum(n) * 100, 1),
      formatted = paste0(n, " (", format(pct, nsmall = 1), "%)")
    ) %>%
    select(-n, -pct) %>%
    pivot_wider(names_from = {{col_var}}, values_from = formatted, values_fill = "0 (0.0%)") %>%
    gt() %>%
    tab_header(title = "Cross-tabulation with Percentages") %>%
    tab_footnote(footnote = "Data presented as n (%)")
}

📤 5. Export and Output Management

Export Functions for Different Formats

# Export gt table to multiple formats
export_gt_table <- function(gt_table, filename_base, formats = c("html", "rtf", "png")) {
  
  if ("html" %in% formats) {
    gt_table %>%
      gtsave(filename = paste0(filename_base, ".html"))
  }
  
  if ("rtf" %in% formats) {
    gt_table %>%
      gtsave(filename = paste0(filename_base, ".rtf"))
  }
  
  if ("png" %in% formats) {
    gt_table %>%
      gtsave(filename = paste0(filename_base, ".png"))
  }
  
  cat("Table exported in", length(formats), "format(s)\n")
}

# Export flextable to Word and PowerPoint
export_flextable <- function(ft_table, filename_base) {
  
  # Export to Word
  ft_table %>%
    save_as_docx(path = paste0(filename_base, ".docx"))
  
  # Export to PowerPoint
  ft_table %>%
    save_as_pptx(path = paste0(filename_base, ".pptx"))
  
  cat("Flextable exported to Word and PowerPoint\n")
}

Batch Processing and Automation

# Process multiple datasets with consistent formatting
batch_process_tables <- function(dataset_list, table_specs) {
  
  results <- list()
  
  for (dataset_name in names(dataset_list)) {
    
    cat("Processing", dataset_name, "...\n")
    
    data <- dataset_list[[dataset_name]]
    spec <- table_specs[[dataset_name]]
    
    if (spec$type == "summary") {
      table <- create_demographics_table(data)
    } else if (spec$type == "listing") {
      table <- create_ae_listing(data)
    }
    
    # Export with consistent naming
    export_path <- file.path("output", paste0(dataset_name, "_", spec$type))
    export_gt_table(table, export_path, spec$formats)
    
    results[[dataset_name]] <- table
  }
  
  return(results)
}

🤖 6. GitHub Copilot in RStudio Best Practices

Effective Prompts for Clinical Reporting

Comment Prompt	Copilot Suggestion Focus
`# Create demographics table by treatment arm with gt`	Table structure with grouping
`# Format adverse events listing with severity highlighting`	Conditional formatting
`# Export table to Word with custom styling`	Export functions
`# Add footnotes explaining statistical methods`	Documentation elements
`# Create QC function to validate table outputs`	Validation logic

QC and Validation with Copilot

# Good: Specific QC requirements
# Create QC function to compare demographic counts between original and QC programmer results

# Good: Validation logic description  
# Validate that all adverse events have valid start dates and severity categories

# Good: Cross-check specifications
# Compare table output against statistical analysis plan specifications for demographics table

Report Generation Automation

# Good: Comprehensive workflow description
# Create automated report generation function that produces demographics, AE summary, and safety listings

# Good: Error handling requirements
# Add error handling for missing data and invalid table specifications in batch processing

# Good: Template-based approach
# Generate standardized clinical report template with consistent formatting across all tables

✅ 7. Best Practices Summary

Code Organization

Modular Functions: Create reusable functions for common formatting tasks
Consistent Naming: Use clear, consistent naming conventions
Documentation: Comment code thoroughly, especially complex formatting logic
Version Control: Track changes in table specifications and formatting

Quality Assurance

Independent QC: Have another programmer validate all outputs
Automated Checks: Use validation functions to catch common errors
Output Comparison: Systematically compare results across programmers
Documentation: Maintain clear records of QC findings and resolutions

Regulatory Compliance

Standard Formats: Follow company and regulatory standards for table presentation
Traceability: Maintain clear links between data, code, and outputs
Validation: Document validation of all statistical methods and results
Archive Management: Properly archive code and outputs for regulatory submissions

🎯 Next Steps

In the demo and exercise, you’ll practice: - Creating production-quality clinical tables and listings - Implementing comprehensive QC procedures - Using advanced formatting techniques with gt and flextable - Automating report generation workflows - Leveraging GitHub Copilot in RStudio for efficient clinical reporting - Following regulatory best practices for clinical programming

📊 Module 7 — Post-Processing, QC & Reporting

🎯 Learning Objectives

📋 1. Post-Processing Clinical Data

Data Formatting Principles

Common Formatting Functions

🔍 2. Quality Control (QC) Practices

QC Programming Principles

Essential QC Functions

The flextable Package for Flexible Tables (Future: SAS Validation)

📊 4. Advanced Table Formatting

Conditional Formatting

Cross-Reference Tables

📤 5. Export and Output Management

Export Functions for Different Formats

Batch Processing and Automation

🤖 6. GitHub Copilot in RStudio Best Practices

Effective Prompts for Clinical Reporting

QC and Validation with Copilot

Report Generation Automation

✅ 7. Best Practices Summary

Code Organization

Quality Assurance

Regulatory Compliance

🎯 Next Steps

The `flextable` Package for Flexible Tables (Future: SAS Validation)