45 Plotting Predictors

##       year           age                     maritl           race     
##  Min.   :2003   Min.   :18.00   1. Never Married: 648   1. White:2480  
##  1st Qu.:2004   1st Qu.:33.75   2. Married      :2074   2. Black: 293  
##  Median :2006   Median :42.00   3. Widowed      :  19   3. Asian: 190  
##  Mean   :2006   Mean   :42.41   4. Divorced     : 204   4. Other:  37  
##  3rd Qu.:2008   3rd Qu.:51.00   5. Separated    :  55                  
##  Max.   :2009   Max.   :80.00                                          
##                                                                        
##               education                     region               jobclass   
##  1. < HS Grad      :268   2. Middle Atlantic   :3000   1. Industrial :1544  
##  2. HS Grad        :971   1. New England       :   0   2. Information:1456  
##  3. Some College   :650   3. East North Central:   0                        
##  4. College Grad   :685   4. West North Central:   0                        
##  5. Advanced Degree:426   5. South Atlantic    :   0                        
##                           6. East South Central:   0                        
##                           (Other)              :   0                        
##             health      health_ins      logwage           wage       
##  1. <=Good     : 858   1. Yes:2083   Min.   :3.000   Min.   : 20.09  
##  2. >=Very Good:2142   2. No : 917   1st Qu.:4.447   1st Qu.: 85.38  
##                                      Median :4.653   Median :104.92  
##                                      Mean   :4.654   Mean   :111.70  
##                                      3rd Qu.:4.857   3rd Qu.:128.68  
##                                      Max.   :5.763   Max.   :318.34  
## 
## [1] 2102   11
## [1] 898  11

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

45.1 Making Factors with the Hmisc package

An easy way to automate the creation of factor variables with levels is to use the cut2() function from the Hmisc package. The ‘g’ argument tells the function how many levels you would like to have in your new factor.

## cutWage
## [ 20.1, 91.7) [ 91.7,118.9) [118.9,318.3] 
##           704           726           672

45.1.1 Making Tables

The factor variables created using the cut2() function can then be used to make tables and proportion tables that are useful for comparing other factors.

##                
## cutWage         1. Industrial 2. Information
##   [ 20.1, 91.7)           459            245
##   [ 91.7,118.9)           375            351
##   [118.9,318.3]           265            407
##                
## cutWage         1. Industrial 2. Information
##   [ 20.1, 91.7)     0.6519886      0.3480114
##   [ 91.7,118.9)     0.5165289      0.4834711
##   [118.9,318.3]     0.3943452      0.6056548

45.2 Notes

  • Only make plots on the training data.
  • Need to look out for imbalance in the outcomes / predictors
  • Need to look at outliers
  • Need to look for groups of points not explained by a predictor
  • Need to look for skewed variables.