Title: | Logistic Knowledge Tracing |
---|---|
Description: | Computes Logistic Knowledge Tracing ('LKT') which is a general method for tracking human learning in an educational software system. Please see Pavlik, Eglington, and Harrel-Williams (2021) <https://ieeexplore.ieee.org/document/9616435>. 'LKT' is a method to compute features of student data that are used as predictors of subsequent performance. 'LKT' allows great flexibility in the choice of predictive components and features computed for these predictive components. The system is built on top of 'LiblineaR', which enables extremely fast solutions compared to base glm() in R. |
Authors: | Philip I. Pavlik Jr. [aut, ctb, cre]
|
Maintainer: | Philip I. Pavlik Jr. <[email protected]> |
License: | GPL-3 |
Version: | 1.7.0 |
Built: | 2025-02-10 05:00:54 UTC |
Source: | https://github.com/optimal-learning-lab/lkt |
Forward and backwards stepwise search for a set of features and components
with tracking of nonlinear parameters.
buildLKTModel( data, usefolds = NA, allcomponents, allfeatures, currentcomponents = c(), specialcomponents = c(), specialfeatures = c(), forv, bacv, preset = NA, presetint = T, currentfeatures = c(), verbose = FALSE, currentfixedpars = c(), maxitv = 10, interc = FALSE, forward = TRUE, backward = TRUE, metric = "BIC", removefeat = c(), removecomp = c() )
buildLKTModel( data, usefolds = NA, allcomponents, allfeatures, currentcomponents = c(), specialcomponents = c(), specialfeatures = c(), forv, bacv, preset = NA, presetint = T, currentfeatures = c(), verbose = FALSE, currentfixedpars = c(), maxitv = 10, interc = FALSE, forward = TRUE, backward = TRUE, metric = "BIC", removefeat = c(), removecomp = c() )
data |
is a dataset with Anon.Student.Id and CF..ansbin. |
usefolds |
Numeric Vector | Specifies the folds for model fitting in LKT; the features are still calculated across all folds to compute test fold fit externally |
allcomponents |
is search space for LKT components |
allfeatures |
is search space for LKT features |
currentcomponents |
components to start search from |
specialcomponents |
add special components (not crossed with features, only paired with special features 1 for 1) |
specialfeatures |
features for each special component (not crossed during search) |
forv |
the minimuum amount of improvement needed for the addition of a new term |
bacv |
the maximuum amount of loss for a term to be removed |
preset |
One of "static","AFM","PFA","advanced","AFMLLTM","PFALLTM","advancedLLTM" |
presetint |
should the intercepts be included for preset components |
currentfeatures |
features to start search from |
verbose |
passed to LKT |
currentfixedpars |
used for current features as an option to start |
maxitv |
passed to LKT |
interc |
passed to LKT |
forward |
TRUE or FALSE |
backward |
TRUE or FALSE |
metric |
One of "BIC","AUC","AIC", and "RMSE" |
removefeat |
Character Vector | Excludes specified features from the test list. |
removecomp |
Character Vector | Excludes specified components from the test list. |
list of values "tracetable" and "currentfit"
Compute feature describing prior practice effect.
computefeatures(data, feat, par1, par2, index, index2, par3, par4, par5, fcomp)
computefeatures(data, feat, par1, par2, index, index2, par3, par4, par5, fcomp)
data |
copy of main data frame. |
feat |
is the feature to be computed. |
par1 |
nonlinear parameters used for nonlinear features. |
par2 |
nonlinear parameters used for nonlinear features. |
index |
a student by component levels index |
index2 |
a component levels index |
par3 |
nonlinear parameters used for nonlinear features. |
par4 |
nonlinear parameters used for nonlinear features. |
par5 |
nonlinear parameters used for nonlinear features. |
fcomp |
the component name. |
a vector suitable for regression input.
Compute repetition spacing time based features from input data CF..Time. and/or CF..reltime.
which will be automatically computed from Duration..sec. if not present themselves.
computeSpacingPredictors(data, KCs)
computeSpacingPredictors(data, KCs)
data |
is a dataset with Anon.Student.Id and CF..ansbin. |
KCs |
are the components for which spaced features will be specified in LKT |
data which is the same frame with the added spacing relevant columns.
Compute the prior sum of the response appearing in the outcome column for the index
countOutcomeold(data, index, response)
countOutcomeold(data, index, response)
data |
the dataset to compute an outcome vector for |
index |
the subsets to count over |
response |
the actually response value being counted |
the vector of the lagged cumulative sum.
A dataset containing a raw sample from the Memphis Datashop.
largerawsample
largerawsample
A data frame please see the DataShop for more info.
It has many columns.
https://pslcdatashop.web.cmu.edu/Export?datasetId=5513
Forward and backwards stepwise search for a set of features and components
with tracking of nonlinear parameters.
LASSOLKTData( data, gridpars, allcomponents, allfeatures, preset = NA, presetint = T, specialcomponents = c(), specialfeatures = c(), specialpars = c(), removefeat = c(), removecomp = c() )
LASSOLKTData( data, gridpars, allcomponents, allfeatures, preset = NA, presetint = T, specialcomponents = c(), specialfeatures = c(), specialpars = c(), removefeat = c(), removecomp = c() )
data |
is a dataset with Anon.Student.Id and CF..ansbin. |
gridpars |
a vector of parameters to create each feature at |
allcomponents |
is search space for LKT components |
allfeatures |
is search space for LKT features |
preset |
One of "static","AFM","PFA","advanced","AFMLLTM","PFALLTM","advancedLLTM" |
presetint |
should the intercepts be included for preset components |
specialcomponents |
add special components (not crossed with features, only paired with special features 1 for 1) |
specialfeatures |
features for each special component (not crossed during search) |
specialpars |
parameters for the special features (if needed) |
removefeat |
Character Vector | Excludes specified features from the test list. |
removecomp |
Character Vector | Excludes specified components from the test list. |
data which is the same frame with the added spacing relevant columns.
list of values "tracetable" and "currentfit"
runs LASSO search on the data
LASSOLKTModel( data, gridpars, allcomponents, preset = NA, presetint = T, allfeatures, specialcomponents = c(), specialfeatures = c(), specialpars = c(), target_n, removefeat = c(), removecomp = c(), test_fold = 1 )
LASSOLKTModel( data, gridpars, allcomponents, preset = NA, presetint = T, allfeatures, specialcomponents = c(), specialfeatures = c(), specialpars = c(), target_n, removefeat = c(), removecomp = c(), test_fold = 1 )
data |
is a dataset with Anon.Student.Id and CF..ansbin. |
gridpars |
a vector of parameters to create each feature at |
allcomponents |
is search space for LKT components |
preset |
One of "static","AFM","PFA","advanced","AFMLLTM","PFALLTM","advancedLLTM" |
presetint |
should the intercepts be included for preset components |
allfeatures |
is search space for LKT features |
specialcomponents |
add special components (not crossed with features, only paired with special features 1 for 1) |
specialfeatures |
features for each special component (not crossed during search) |
specialpars |
parameters for the special features (if needed) |
target_n |
chosen number of features in model |
removefeat |
Character Vector | Excludes specified features from the test list. |
removecomp |
Character Vector | Excludes specified components from the test list. |
test_fold |
the fold that the chosen LASSO model will be tested on |
list of matrices and values "train_x","train_y","test_x","test_y","fit","target_auc","target_rmse","n_features","auc_lambda","rmse_lambda","BIC_lambda","target_idx", "preds"
Compute a logistic regression model of learning for input data.
LKT( data, usefolds = NA, components, features, fixedpars = NA, seedpars = NA, interacts = NA, curvefeats = NA, dualfit = FALSE, interc = FALSE, verbose = TRUE, epsilon = 1e-04, cost = 512, lowb = 1e-05, highb = 0.99999, type = 0, maketimes = FALSE, bias = 0, maxitv = 100, factrv = 1e+12, nosolve = FALSE, autoKC = rep(0, length(components)), autoKCcont = rep("NA", length(components)), connectors = rep("+", max(1, length(components) - 1)) )
LKT( data, usefolds = NA, components, features, fixedpars = NA, seedpars = NA, interacts = NA, curvefeats = NA, dualfit = FALSE, interc = FALSE, verbose = TRUE, epsilon = 1e-04, cost = 512, lowb = 1e-05, highb = 0.99999, type = 0, maketimes = FALSE, bias = 0, maxitv = 100, factrv = 1e+12, nosolve = FALSE, autoKC = rep(0, length(components)), autoKCcont = rep("NA", length(components)), connectors = rep("+", max(1, length(components) - 1)) )
data |
A dataset with Anon.Student.Id and CF..ansbin. |
usefolds |
Numeric Vector | Specifies the folds for model fitting in LKT; the features are still calculated across all folds to compute test fold fit externally |
components |
A vector of factors that can be used to compute each features for each subject. |
features |
a vector methods to use to compute a feature for the component. |
fixedpars |
a vector of parameters for all features+components. |
seedpars |
a vector of parameters for all features+components to seed non-linear parameter search. |
interacts |
A list of components that interacts with component by feature in the main specification. |
curvefeats |
vector of columns to use with "diff" functions |
dualfit |
TRUE or FALSE, fit a simple latency using logit. Requires Duration..sec. column in data. |
interc |
TRUE or FALSE, include a global intercept. |
verbose |
provides more output in some cases. |
epsilon |
passed to LiblineaR |
cost |
passed to LiblineaR |
lowb |
lower bound for non-linear optimizations |
highb |
upper bound for non-linear optimizations |
type |
passed to LiblineaR |
maketimes |
Boolean indicating whether to create time based features (or may be precomputed) |
bias |
passed to LiblineaR |
maxitv |
passed to nonlinear optimization a maxit control |
factrv |
controls the optim() function |
nosolve |
causes the function to return a sparse data matrix of the features, rather than a solution |
autoKC |
a vector to indicate whether to use autoKC for the component (0) or the k for the numebr of clusters |
autoKCcont |
a vector of text strings set to "rand" for component to make autoKC assignment to cluster is randomized (for comaprison) |
connectors |
a vector if linear equation R operators including +, * and : |
list of values "model", "coefs", "r2", "prediction", "nullmodel", "latencymodel", "optimizedpars","subjectrmse", "newdata", and "automat"
Bootstrap credibility intervals to aid in interpreting coefficients.
LKT_HDI( dat, n_boot, n_students, comps, feats, conns = rep("+", max(1, length(comps) - 1)), ints = NA, fixeds, get_hdi = TRUE, cred_mass = 0.95 )
LKT_HDI( dat, n_boot, n_students, comps, feats, conns = rep("+", max(1, length(comps) - 1)), ints = NA, fixeds, get_hdi = TRUE, cred_mass = 0.95 )
dat |
Dataframe |
n_boot |
Number of subsamples to fit |
n_students |
Number of students per subsample |
comps |
Components in model |
feats |
Features in model |
conns |
R notation for linear equation connectors in model |
ints |
Interacts in model |
fixeds |
Fixed parameters in model |
get_hdi |
Boolean to decide if generating HDI per coefficient |
cred_mass |
Credibility mass parameter to decide width of HDI |
List of values "par_reps", "mod_full", "coef_hdi"
Generates predictions and evaluates logistic regression models tailored for learning data, specifically designed for Logistic Knowledge Tracing (LKT) models. This function provides flexibility in returning either just the predicted probabilities or both the predictions and key evaluation statistics.
predict_lkt( modelob, data, fold = NULL, return_stats = FALSE, min_pred_limit = 1e-05, max_pred_limit = 0.99999 )
predict_lkt( modelob, data, fold = NULL, return_stats = FALSE, min_pred_limit = 1e-05, max_pred_limit = 0.99999 )
modelob |
An LKT model object containing necessary model coefficients and predictors for generating predictions. |
data |
A dataset including predictor variables, the outcome variable |
fold |
Optional. Numeric vector specifying which folds to include for prediction. If NULL or empty, uses all data. |
return_stats |
Logical. If TRUE, returns both predictions and evaluation statistics (Log-Likelihood, AUC, RMSE, R^2). If FALSE, returns only the predictions. |
min_pred_limit |
Minimum prediction limit. Default is 0.00001. |
max_pred_limit |
Maximum prediction limit. Default is 0.99999. |
If return_stats is FALSE, returns a list containing:
predictions
: The predicted probabilities for each observation in the specified fold(s).
If return_stats is TRUE, returns a list containing:
predictions
: The predicted probabilities for each observation in the specified fold(s).
LL
: Log-Likelihood of the model given the actual outcomes.
AUC
: Area Under the ROC Curve.
RMSE
: Root Mean Squared Error.
R2
: R-squared value, indicating the proportion of variance explained by the model.
A dataset containing a small sample of participants in a memory experiment.
samplelkt
samplelkt
A data frame with 2074 rows and many variables:
unique identifier for each student
unique identifier for each student
unique identifier for each student
unique identifier for each student
...
https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=5508
smallSet
smallSet(data, nSub)
smallSet(data, nSub)
data |
Dataframe of student data |
nSub |
Number of students |
ViewExcel
ViewExcel(df = .Last.value, file = tempfile(fileext = ".csv"))
ViewExcel(df = .Last.value, file = tempfile(fileext = ".csv"))
df |
Dataframe |
file |
name of the Excel file |