This blog post answers some questions about your baseline model generated in RMV2.0.
However, you will need to use my fork of RMV2.0
to generate the model and save the project to a project.rds file. E.g. use this command in RStudio to
install this version of RMV2.0.
devtools::install_github("simularis/RMV2.0")What information was missing in project files before my updates to RMV2.0?
What ideas are not yet put in place here?
This is a technical article that does not discuss using RMV2.0. It just shows details of the regression model that are not available to find using the tool. There are other solutions for these problems: write your own script, or try another NMEC tool like nmecr or open-eemeter.
The blog is generated from a Jupyter notebook with the IRkernel, which is fairly easy to install with conda. This is convenient if you already use Python. You can also use the R script understand_model.R from the repository to try out the code. Wherever you see a box like below, that is the command sent to R and its output.
cat("hello, world!")
hello, world!
If you need to look up basic syntax or built-in data types, try these:
library(ggplot2)
library(dplyr)
library(glue)
#library(patchwork) # To display 2 charts together
#library(hrbrthemes)
Warning message:
"package 'ggplot2' was built under R version 3.6.3"Warning message:
"package 'dplyr' was built under R version 3.6.3"
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
    filter, lag
The following objects are masked from 'package:base':
    intersect, setdiff, setequal, union
Warning message:
"package 'glue' was built under R version 3.6.3"
Attaching package: 'glue'
The following object is masked from 'package:dplyr':
    collapse
#rds_file <- "C:/RMV2.0 Workshop/uncertain_savings3/Project_02.13.rds"
#rds_file <- "C:/RMV2.0 Workshop/uncertain_savings4/Project_02.16.rds"
#rds_file <- "C:/RMV2.0 Workshop/uncertain_screening1/Project_02.17.rds"
#rds_file <- "C:/RMV2.0 Workshop/uncertain_screening2/Project_02.18.rds"
#rds_file <- "C:/RMV2.0 Workshop/uncertain_savings5/Project_02.18.rds"
rds_file <- "C:/RMV2.0 Workshop/something/Project_02.19.rds"
Project <- readRDS(rds_file)
cat("Your project has the following models.",fill=T)
print(names(Project$model_obj_list$models_list))
Warning message: "namespace 'RMV2.0' is not available and has been replaced by .GlobalEnv when processing object ''"
Your project has the following models. [1] "Data_pre_2_2.csv"
# This only works in RStudio ...
# Uncomment to have a graphical View appear and show you everything from the project file.
View(Project)
Error in View(Project): 'View()' not yet supported in the Jupyter R kernel
Traceback:
1. View(Project)
2. stop(sQuote("View()"), " not yet supported in the Jupyter R kernel")
There Project object is a list. Elements of the list can be any data type. To understand what was stored, we want to look at a few important elements:
Project$fahrenheit (logical): Whether the temperature data was read in with units = degrees Fahrenheit.Project$model_obj_list$models_list[["Data_pre_2_2.csv"]] (list): This data structure includes the regression model coefficients, as well as some calculated quantities like goodness_of_fit.Note that Data_pre_2_2.csv was the name of the data file when we created the model. That name is used to organize the data, in case we had multiple models in the project. We can also refer to that element as models_list[[1]].
As we dive in to explore, note that there is some redundant information and obvious names, so I won't explain it all.
models[["Data_pre_2_2.csv"]]$train stores the data used to train regression models, including load and temperature.
In the towt_model data structure:
timescaleDays is the hyperparameter used to break up your 1 year of data into blocks (think seasons or months) at your discretion. I used 90 days, and as a result I had 6 blocks: 365 / 90 = 6, give or take. To be precise, blocks start on the hours indexed by pointlist. More on this later.regOutList we have stored the outputs of running linear regression fit. This is a list of 6 elements, one for each seasonal block of time.WeightMatrix contains weights for each (block, timestamp) combination. These are used for weighted regression.PredMatrix contains load "predictions" using the baseline regression models with the baseline weather data, one time series for each block. For a single time series "prediction", the separate blocks have been re-combined by contraction (matrix multiplication) with the weights, into arrays like models[["Data_pre_2_2.csv"]]$fitting and prediction. If the model fits, the prediction should be correlated well with baseline data.Here is the regression model data structure for the first block, regOutList[[1]]. Note:
amod represents a regression model for occupied periods. More on this next.bmod represents a regression model for unoccupied periods.tempKnots are the changepoint temperatures This and all internal calculations are kept in degrees Celsius.The amod element is the output of a call to the lm function which performs regression (linear model).
lm is a commonly used function R, not specific to RMV2.0. If you want to write your own regression script, you will probably use lm at the heart of it.call element remembers how the lm function was invoked. The formula loadVec ~ . + 0 is a lazy way to say that the output of the regression should try to match loadVec, and it should be linear function of all the columns (".") in the input data.frame, which is stored as model. Those column names are recorded in the terms formula:ftow is the time of week factor. As a factor, it represents a set of indicator variables like ftow1 ... ftow168, each represent one hour of time-of-week.X1 through X5 are the amount by which temperature exceeds the corresponding changepoint from tempKnots (and one implicit '0' changepoint).The occupied periods include only some of the time-of-week hours (e.g. 33, 34, ...), which are called the levels of the ftow factor. Other levels (e.g. 1, 2, ...) appear in the bmod linear model for unoccupied periods.
Here we can also see the coefficients. To count them up,
All together, there are 80 coefficients in the amod linear model.
In the bmod linear model, we can see a different set of coefficients for the ftow factor.
Now that we know a little about the stored data structure, let's start scripting something useful.
cat("Project file had the following structure:",fill=T)
print(names(Project))
# Typical for screening analysis
# [1] "files_path_sc"      "fahrenheit"         "Data_pre_summary_0" "Data_pre_summary"   "files_names"        "model_obj_list"
# [7] "p_name_sc"          "Data_pre"           "pre_dir_sc"         "files_names_mod"    "Model"
# Typical for savings analysis
# [1] "Data_pre_summary_0"  "load"                "files_names"         "post_names_sav"      "pre_names_sav"
# [6] "Data_pre"            "files_names_mod"     "p_name_sav"          "post_path_sav"       "Data_pre_summary"
# [11] "fahrenheit"          "post_dir_sav"        "model_obj_list"      "pre_path_sav"        "Data_post"
# [16] "Data_post_summary"   "nre_done"            "Data_post_summary_0" "pre_dir_sav"         "Model"
Project file had the following structure: [1] "files_path_sc" "fahrenheit" "Data_pre_summary_0" [4] "Data_pre_summary" "files_names" "model_obj_list" [7] "p_name_sc" "Data_pre" "pre_dir_sc" [10] "files_names_mod" "Model"
cat("Project file has Data_post? (is savings analysis?)",fill=T)
print("Data_post" %in% names(Project))
isSavings = "Data_post" %in% names(Project)
projectType <- if (isSavings) {"savings!"} else {"screenings!"}
cat("Type of models used?",fill=T)
print(Project$Model)
if (Project$Model != "TOWT") {
  cat("Warning: this script only understands TOWT models.",fill=T)
}
Project file has Data_post? (is savings analysis?) [1] FALSE Type of models used? [1] "TOWT"
Here is how RMV2.0 actually used the hyperparameter, timescaleDays, to determine the season blocks.
cat("Attempting to understand the first TOWT model. Grokking ...",fill=T)
# Pick a model from the models_list by number, then load its variables into the workspace
i <- 1
attach(Project$model_obj_list$models_list[[i]]$towt_model,name="towt_model")
dataTime <- if (isSavings) {timeVec} else {trainTime}
npoints = length(dataTime)
t0 = min(dataTime,na.rm=T)
t1 = max(dataTime,na.rm=T)
deltaT = as.numeric(difftime(t1,t0,units="days"))
nsegments = max(1,ceiling(deltaT/timescaleDays))
segmentwidth = (npoints-1)/nsegments
segmentDeltaT = deltaT / nsegments
Attempting to understand the first TOWT model. Grokking ...
#detach("towt_model")
##
# Build dataframe from timeVec (shape [n]) and WeightMatrix (shape [26,n]), transposed
df <- data.frame(
  day = timeVec,
  weights=t(WeightMatrix)
)
# "Melt" from wide table to long table (day, variable, value)
df2 <- reshape2::melt(data=df,id.vars="day")
library(IRdisplay)
display(head(df2))
display(tail(df2))
| day | variable | value | 
|---|---|---|
| 2006-01-01 01:00:00 | weights.1 | 1.0000000 | 
| 2006-01-01 02:00:00 | weights.1 | 0.9999998 | 
| 2006-01-01 03:00:00 | weights.1 | 0.9999991 | 
| 2006-01-01 04:00:00 | weights.1 | 0.9999981 | 
| 2006-01-01 05:00:00 | weights.1 | 0.9999966 | 
| 2006-01-01 06:00:00 | weights.1 | 0.9999946 | 
| day | variable | value | |
|---|---|---|---|
| 52543 | 2006-12-31 18:00:00 | weights.6 | 0.9999946 | 
| 52544 | 2006-12-31 19:00:00 | weights.6 | 0.9999966 | 
| 52545 | 2006-12-31 20:00:00 | weights.6 | 0.9999981 | 
| 52546 | 2006-12-31 21:00:00 | weights.6 | 0.9999991 | 
| 52547 | 2006-12-31 22:00:00 | weights.6 | 0.9999998 | 
| 52548 | 2006-12-31 23:00:00 | weights.6 | 1.0000000 | 
## Use modern library to plot weighting functions together
## https://stackoverflow.com/questions/2564258/plot-two-graphs-in-same-plot-in-r
figtitle0 <- if (isSavings) {
  "Raw weights for weighted average of baseline models in post-period\n(Denominator is sum of weights)"
  } else {
    "Weights used to generate baseline models\n(Weighted least squares linear regression)"
  }
figtitle <- glue("{figtitle0}
                  deltaT = {format(deltaT,digits=4)} days, \\
                  timescaleDays = {timescaleDays}, \\
                  segmentDays = {format(segmentDeltaT, digits=4)}")
cat(figtitle)
fig <- ggplot(data = df2, aes(x = day, y = value, colour = variable)) + geom_line() +
  theme(aspect.ratio=0.3) +
  ggtitle(figtitle)
print(fig)
#ggsave("my_weights_plot.png")
Weights used to generate baseline models (Weighted least squares linear regression) deltaT = 364.9 days, timescaleDays = 90, segmentDays = 72.98
# Remove items from the workspace to avoid mistakes
rm(figtitle,fig)
## Proceed with this step only if you have modified the code to save extra variables
length(Project$model_obj_list$models_list[[1]]$towt_model$regOutList)
print(names(Project$model_obj_list$models_list[[1]]$towt_model$regOutList[[1]]))
[1] "training" "predictions" "amod" "bmod" "tempKnots" [6] "occVec" "okocc"
cat("The hyperparameter timescaleDays",fill=T)
print(timescaleDays)
cat("the index of timestamps used for weights",fill=T)
print(pointlist)
cat("the re-computed timestamps used for weights",fill=T)
#for (point in pointlist) {
#  print(trainTime[point])
#}
tCenters = trainTime[pointlist]
print(tCenters)
The hyperparameter timescaleDays [1] 90 the index of timestamps used for weights [1] 1 1752 3503 5255 7006 8758 the re-computed timestamps used for weights [1] "2006-01-01 01:00:00 MST" "2006-03-15 01:00:00 MST" [3] "2006-05-27 00:00:00 MST" "2006-08-08 00:00:00 MST" [5] "2006-10-19 23:00:00 MST" "2006-12-31 23:00:00 MST"
nModelRuns <- length(regOutList)
cat(paste("There are ",nModelRuns,"model blocks"),fill=T)
for (irun in 1:nModelRuns) {
  cat(paste("Model block ", irun),fill=T)
  regOut <- regOutList[[irun]]
# Explain how coefficients are linked to input variables in model
#print(regOut$amod$terms)
cat("The formula for the regression:",fill=T)
print(deparse(regOut$amod$terms))
cat("The type of each variable:",fill=T)
print(attr(regOut$amod$terms,"dataClasses"))
cat("The temperature knots used to define temperature variables X1, X2, ...:",fill=T)
print(regOut$tempKnots)
cat(paste("Number of parameters in occupied period model (excluding tempKnots)  :",regOut$amod$rank),fill=T)
cat(paste("Number of parameters in unoccupied period model (excluding tempKnots):",regOut$bmod$rank),fill=T)
cat(fill=T)
}
There are 6 model blocks Model block 1 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98 Model block 2 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98 Model block 3 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98 Model block 4 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98 Model block 5 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98 Model block 6 The formula for the regression: [1] "loadVec ~ ftow + X1 + X2 + X3 + X4 + X5 + 0" The type of each variable: loadVec ftow X1 X2 X3 X4 X5 (weights) "numeric" "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" The temperature knots used to define temperature variables X1, X2, ...: [1] 4.444444 12.777778 18.333333 26.666667 Number of parameters in occupied period model (excluding tempKnots) : 80 Number of parameters in unoccupied period model (excluding tempKnots): 98
# Show some or all of the coefficients in a table
cat("Regression fit coefficients for occupied periods:")
#print(head(regOut$amod$coefficients))
#print(tail(regOut$amod$coefficients))
print(regOut$amod)
# Result: e.g. ftow33 ... ftow158, X1 ... X5
# Not all ftow factors appear
cat("Regression fit coefficients for unoccupied periods:")
#print(head(regOut$bmod$coefficients))
#print(tail(regOut$bmod$coefficients))
print(regOut$bmod)
# Result: e.g. ftow1 ... ftow168, X1 ... X5
Regression fit coefficients for occupied periods:
Call:
lm(formula = loadVec ~ . + 0, data = dframe, subset = okocc, 
    weights = weightvec, na.action = na.exclude)
Coefficients:
  ftow33    ftow34    ftow35    ftow36    ftow37    ftow38    ftow39    ftow40  
 889.513  1129.579  1083.434  1055.898  1030.532  1032.293  1053.327  1055.209  
  ftow41    ftow42    ftow43    ftow44    ftow45    ftow57    ftow58    ftow59  
1086.492  1052.536   890.060   781.562   687.989   956.480  1239.358  1193.083  
  ftow60    ftow61    ftow62    ftow63    ftow64    ftow65    ftow66    ftow67  
1166.313  1141.000  1148.110  1175.762  1181.462  1211.249  1164.571   973.030  
  ftow68    ftow69    ftow70    ftow81    ftow82    ftow83    ftow84    ftow85  
 850.676   746.600   652.988   965.799  1251.020  1207.634  1182.726  1152.299  
  ftow86    ftow87    ftow88    ftow89    ftow90    ftow91    ftow92    ftow93  
1162.257  1190.022  1188.593  1217.781  1172.343   980.118   861.779   752.962  
  ftow94   ftow105   ftow106   ftow107   ftow108   ftow109   ftow110   ftow111  
 651.490   954.417  1203.931  1160.355  1132.017  1107.869  1115.496  1140.647  
 ftow112   ftow113   ftow114   ftow115   ftow116   ftow117   ftow118   ftow129  
1143.676  1169.399  1124.270   939.357   831.399   719.134   631.937   970.610  
 ftow130   ftow131   ftow132   ftow133   ftow134   ftow135   ftow136   ftow137  
1249.137  1209.257  1187.430  1162.131  1171.442  1198.762  1200.905  1228.215  
 ftow138   ftow139   ftow140   ftow141   ftow142   ftow153   ftow154   ftow155  
1183.935   985.538   866.343   751.631   653.014   624.374   733.561   685.567  
 ftow156   ftow157   ftow158        X1        X2        X3        X4        X5  
 663.234   661.950   662.021     1.851     4.336    27.944     7.310    45.556  
Regression fit coefficients for unoccupied periods:
Call:
lm(formula = loadVec ~ . + 0, data = dframe, subset = !okocc, 
    weights = weightvec, na.action = na.exclude)
Coefficients:
   ftow1     ftow2     ftow3     ftow4     ftow5     ftow6     ftow7     ftow8  
273.1739  270.1190  271.7262  272.8517  279.2212  293.7085  321.3417  352.5974  
   ftow9    ftow10    ftow11    ftow12    ftow13    ftow14    ftow15    ftow16  
350.7285  316.0362  268.5449  239.1591  234.6808  246.7670  244.2578  236.3076  
  ftow17    ftow18    ftow19    ftow20    ftow21    ftow22    ftow23    ftow24  
271.9386  338.8820  365.6205  358.6873  350.0431  338.1211  318.8483  295.9443  
  ftow25    ftow26    ftow27    ftow28    ftow29    ftow30    ftow31    ftow32  
279.7471  319.8969  322.2569  322.8331  330.4514  362.2822  445.7943  547.8207  
  ftow46    ftow47    ftow48    ftow49    ftow50    ftow51    ftow52    ftow53  
659.3880  481.4336  347.8056  317.5771  319.1767  319.8283  320.9749  327.9593  
  ftow54    ftow55    ftow56    ftow71    ftow72    ftow73    ftow74    ftow75  
361.2094  447.8091  568.7063  511.9503  355.3213  321.6388  318.0751  319.8035  
  ftow76    ftow77    ftow78    ftow79    ftow80    ftow95    ftow96    ftow97  
320.1643  327.0424  361.1173  449.3805  569.4319  506.8513  355.8040  322.2515  
  ftow98    ftow99   ftow100   ftow101   ftow102   ftow103   ftow104   ftow119  
314.3528  315.8488  315.6444  322.8590  358.7039  441.9594  563.9345  491.8609  
 ftow120   ftow121   ftow122   ftow123   ftow124   ftow125   ftow126   ftow127  
352.5433  320.4745  318.0097  317.8134  318.6468  324.3257  358.6189  444.9453  
 ftow128   ftow143   ftow144   ftow145   ftow146   ftow147   ftow148   ftow149  
569.6498  504.8645  353.6033  321.0042  266.5064  267.2594  268.3067  275.8720  
 ftow150   ftow151   ftow152   ftow159   ftow160   ftow161   ftow162   ftow163  
291.1315  412.7272  488.8743  566.0680  426.6568  453.0773  457.9925  372.8548  
 ftow164   ftow165   ftow166   ftow167   ftow168        X1        X2        X3  
350.2605  345.8815  333.8005  311.7801  290.0373   -1.0110   -0.8148    3.1848  
      X4        X5  
 22.2031    7.6776  
## Clean up: remove global variables from workspace and detach scope
## Comment out these lines if you want to inspect further
rm(rds_file, isSavings, projectType, i, df, df2, tCenters, nModelRuns, irun, regOut)
detach("towt_model")
rm(Project)