Skip to Main Content
Go to Penn Libraries homepage   Go to Guides homepage

R for Business Guide

Quick start

R typically recognizes relationships between variables given data in this format:

dependent variable(s) ~ independent variable(s)

Example:

  • You're fitting data to this linear model: Yi = β0 + β1Xi + β2Zi + εi.
  • You have a single dependent variable with measurements stored in the object y.
  • You have two independent variables with measurements stored in the objects x and z, respectively.
  • In R, you can code y ~ x + z.

Reference data objects

Functions that accept formula notation often let you reference data objects in either of two ways:

  1. Named vectors
    e.g.: y and x are each vectors in your environment, so lm(formula = y ~ x)
  2. Names within a data frame
    e.g.: data frame dat has columns dat$y and dat$x, so lm(formula = y ~ x, data = dat)

Include and omit variables

  • One independent variable, or Yi = β0 + β1Xi + εi: y ~ x
  • Two independent variables, or Yi = β0 + β1Xi + β2Zi + εi: y ~ x + z
  • Two independent variables without intercept, or Yi = β1Xi + β2Zi + εi: y ~ x + z - 1
  • All remaining columns in data frame as independent variables: y ~ .
  • All remaining columns in data frame (but not z) as independent variables: y ~ . - z

Interactions, conditions, and transformations

Interactions

  • The interaction between x and z: x:z
  • x, z, w, and all interactions among them: x * z * w
  • x, z, w, and all interactions up to two-way interactions: (x + z + w)^2

Conditional effects

When fitting a mixed effects model (e.g.: lme(...)), fit x grouped by z: x|z

Mathematical transformations

Model the variable resulting from the enclosed mathematical operation: I(...)

  • e.g.: The product of x and z: I(x*z)
  • e.g.: The sum of x, z, and w: I(x+z+w)

Business & Data Analysis Librarian

Profile Photo
Kevin Thomas
He/Him/His
Subjects: Statistics

Chat

Penn Libraries Home Search the Catalog
(215) 898-7555