Title: | Iota Inter Coder Reliability for Content Analysis |
---|---|
Description: | Routines and tools for assessing the quality of content analysis on the basis of the Iota Reliability Concept. The concept is inspired by item response theory and can be applied to any kind of content analysis which uses a standardized coding scheme and discrete categories. It is also applicable for content analysis conducted by artificial intelligence. The package provides reliability measures for a complete scale as well as for every single category. Analysis of subgroup-invariance and error corrections are implemented. This information can support the development process of a coding scheme and allows a detailed inspection of the quality of the generated data. Equations and formulas working in this package are part of Berding et al. (2022)<doi:10.3389/feduc.2022.818365> and Berding and Pargmann (2022) <doi:10.30819/5581>. |
Authors: | Berding Florian [aut, cre] , Pargmann Julia [ctb] |
Maintainer: | Berding Florian <[email protected]> |
License: | GPL-3 |
Version: | 0.1.5 |
Built: | 2024-11-25 06:28:47 UTC |
Source: | https://github.com/fberding/iotarelr |
This function tests if the probabilities within the Assignment Error Matrix are in line with the assumption of weak superiority.
check_conformity_c(aem)
check_conformity_c(aem)
aem |
matrix of probabilities |
Returns the number of violations of the assumption of weak superiority. 0 if the assumptions are fulfilled.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function for checking if the coding scheme is the same for different sub-groups.
check_dgf( data, splitcr, random_starts = 300, max_iterations = 5000, cr_rel_change = 1e-12, con_step_size = 1e-04, con_random_starts = 10, con_max_iterations = 5000, con_rel_convergence = 1e-12, b_min = 0.01, trace = FALSE, con_trace = FALSE, fast = TRUE )
check_dgf( data, splitcr, random_starts = 300, max_iterations = 5000, cr_rel_change = 1e-12, con_step_size = 1e-04, con_random_starts = 10, con_max_iterations = 5000, con_rel_convergence = 1e-12, b_min = 0.01, trace = FALSE, con_trace = FALSE, fast = TRUE )
data |
Data for which the elements should be estimated. Data must be
an object of type |
splitcr |
|
random_starts |
An integer for the number of random starts for the EM algorithm. |
max_iterations |
An integer for the maximum number of iterations within the EM algorithm. |
cr_rel_change |
Positive numeric value for defining the convergence of the EM algorithm. |
con_step_size |
|
con_random_starts |
|
con_max_iterations |
|
con_rel_convergence |
|
b_min |
Value ranging between 0 and 1 determining the minimal size of the categories for checking if boundary values occurred. The algorithm tries to select solutions that are not considered to be boundary values. |
trace |
|
con_trace |
|
fast |
|
Returns an object of class iotarelr_iota2_dif
. For each group,
the results of the estimation are saved separately. The structure within each
group is similar to the results from compute_iota2()
. Please check
that documentation.
Florian Berding and Julia Pargmann (2022).Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin:Logos. https://doi.org/10.30819/5581
Function for estimating the reliability of codings for a new rater based on Iota 2
check_new_rater( true_values, assigned_values, con_step_size = 1e-04, con_random_starts = 5, con_max_iterations = 5000, con_rel_convergence = 1e-12, con_trace = FALSE, fast = TRUE, free_aem = FALSE )
check_new_rater( true_values, assigned_values, con_step_size = 1e-04, con_random_starts = 5, con_max_iterations = 5000, con_rel_convergence = 1e-12, con_trace = FALSE, fast = TRUE, free_aem = FALSE )
true_values |
|
assigned_values |
|
con_step_size |
|
con_random_starts |
|
con_max_iterations |
|
con_rel_convergence |
|
con_trace |
|
fast |
|
free_aem |
|
Returns a list
with the following three components:
The first component estimates_categorical_level
comprises all
elements that describe the ratings on a categorical level. The elements are
sub-divided into raw estimates and chance-corrected estimates.
raw_estimates
alpha_reliability:
A vector containing the Alpha Reliabilities for each category. These values represent probabilities.
beta_reliability:
A vector containing the Beta Reliabilities for each category. These values represent probabilities.
assignment_error_matrix:
An Assignment Error Matrix containing the conditional probabilities for assigning a unit of category i to categories 1 to n.
iota:
A vector containing the Iota values for each category.
elements_chance_corrected
alpha_reliability:
A vector containing the chance-corrected Alpha Reliabilities for each category.
beta_reliability:
A vector containing the chance-corrected Beta Reliabilities for each category.
The second component estimates_scale_level
contains elements to
describe the quality of the ratings on a scale level. It contains the
following elements:
iota_index:
The Iota Index representing the reliability on a scale level.
iota_index_d4:
The Static Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
iota_index_dyn2:
The Dynamic Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
The third component information
contains important information
regarding the parameter estimation. It comprises the following elements:
log_likelihood:
Log-likelihood of the best solution.
convergence:
If estimation converged 0, otherwise 1.
est_true_cat_sizes:
Estimated categorical sizes. This is the estimated amount of the categories.
conformity:
0
if the solution is in line with assumptions of weak superiority.
A number greater 0 indicates the number of violations of the assumption
of weak superiority.
random_starts:
Numer of random starts for the EM algorithm.
boundaries:
False
if the best solution does not contain boundary values.
True
if the best solution does contain boundary values
p_boundaries:
Percentage of solutions with boundary values during estimation.
call:
Name of the function that created the object.
n_rater:
Number of raters.
n_cunits:
Number of coding units.
The returned object contains further slots since the returned object is
of class iotarelr_iota2
. These slots are empty because they are not part of the
estimation within this function.
Please do not use the measures on the scale level if the Assignment Error Matrix was freely estimated since this kind of matrix is not conceptualized for comparing the coding process with random guessing.
Florian Berding and Julia Pargmann (2022). Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin:Logos. https://doi.org/10.30819/5581
Computes all elements of the Iota Reliability Concept
compute_iota1(data)
compute_iota1(data)
data |
Data for which the elements should be estimated. Data must be
an object of type |
A list with the following components
alpha |
A vector containing the chance-corrected Alpha Reliabilities for every category. |
beta |
A vector containing the chance-corrected Beta Reliabilities for every category. |
iota |
A vector containing the Iota values for every category. |
assignment_error_matrix |
A matrix with the conditional probabilities for every category. The rows refer to the true categories and the columns refer to the assigned categories. The elements on the diagonal represent the alpha errors of that category. The other elements in each row represent the conditioned probabilities that a coding unit is wrongly assigned to another category. |
average_iota |
A numeric value ranging between 0 and 1, representing the Average Iota values on a categorical level. It describes the reliability of the whole scale. |
- Berding, Florian, Elisabeth Riebenbauer, Simone Stuetz, Heike Jahncke, Andreas Slopinski, and Karin Rebmann. 2022. Performance and Configuration of Artificial Intelligence in Educational Settings.Introducing a New Reliability Concept Based on Content Analysis. Frontiers in Education. https://doi.org/10.3389/feduc.2022.818365
Fits a model of Iota2 to the data
compute_iota2( data, random_starts = 10, max_iterations = 5000, cr_rel_change = 1e-12, con_step_size = 1e-04, con_rel_convergence = 1e-12, con_max_iterations = 5000, con_random_starts = 5, b_min = 0.01, fast = TRUE, trace = TRUE, con_trace = FALSE )
compute_iota2( data, random_starts = 10, max_iterations = 5000, cr_rel_change = 1e-12, con_step_size = 1e-04, con_rel_convergence = 1e-12, con_max_iterations = 5000, con_random_starts = 5, b_min = 0.01, fast = TRUE, trace = TRUE, con_trace = FALSE )
data |
Data for which the elements should be estimated. Data must be
an object of type |
random_starts |
An integer for the number of random starts for the EM algorithm. |
max_iterations |
An integer for the maximum number of iterations within the EM algorithm. |
cr_rel_change |
Positive numeric value for defining the convergence of the EM algorithm. |
con_step_size |
|
con_rel_convergence |
|
con_max_iterations |
|
con_random_starts |
|
b_min |
Value ranging between 0 and 1, determining the minimal size of the categories for checking if boundary values occurred. The algorithm tries to select solutions that are not considered to be boundary values. |
fast |
|
trace |
|
con_trace |
|
Returns a list
with the following three components:
The first component estimates_categorical_level
comprises all
elements that describe the ratings on a categorical level. The elements are
sub-divided into raw estimates and chance-corrected estimates.
raw_estimates
alpha_reliability:
A vector containing the Alpha Reliabilities for each category. These values represent probabilities.
beta_reliability:
A vector containing the Beta Reliabilities for each category. These values represent probabilities.
assignment_error_matrix:
Assignment Error Matrix containing the conditional probabilities for assigning a unit of category i to categories 1 to n.
iota:
A vector containing the Iota values for each category.
iota_error_1:
A vector containing the Iota Error Type I values for each category.
iota_error_2:
A vector containing the Iota Error Type II values for each category.
elements_chance_corrected
alpha_reliability:
A vector containing the chance-corrected Alpha Reliabilities for each category.
beta_reliability:
A vector containing the chance-corrected Beta Reliabilities for each category.
The second component estimates_scale_level
contains elements for
describing the quality of the ratings on a scale level. It comprises the
following elements:
iota_index:
The Iota Index, representing the reliability on a scale level.
iota_index_d4:
The Static Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
iota_index_dyn2:
The Dynamic Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
The third component information
contains important information
regarding the parameter estimation. It comprises the following elements:
log_likelihood:
Log-likelihood of the best solution.
convergence:
If estimation converged 0, otherwise 1.
est_true_cat_sizes:
Estimated categorical sizes. This is the estimated amount of the categories.
conformity:
0
if the solution is in line with assumptions of weak superiority.
A number greater 0 indicates the number of violations of the assumption
of weak superiority.
random_starts:
Numer of random starts for the EM algorithm.
boundaries:
False
if the best solution does not contain boundary values.
True
if the best solution does contain boundary values
p_boundaries:
Percentage of solutions with boundary values during the estimation.
call:
Name of the function that created the object.
n_rater:
Number of raters.
n_cunits:
Number of coding units.
Florian Berding and Julia Pargmann (2022).Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function written in C++
for estimating the parameters of the model
via Expectation Maximization (EM Algorithm).
EM_algo_c( obs_pattern_shape, obs_pattern_frq, obs_internal_count, categorical_levels, random_starts, max_iterations, rel_convergence, con_step_size, con_random_starts, con_max_iterations, con_rel_convergence, fast, trace, con_trace )
EM_algo_c( obs_pattern_shape, obs_pattern_frq, obs_internal_count, categorical_levels, random_starts, max_iterations, rel_convergence, con_step_size, con_random_starts, con_max_iterations, con_rel_convergence, fast, trace, con_trace )
obs_pattern_shape |
|
obs_pattern_frq |
|
obs_internal_count |
|
categorical_levels |
|
random_starts |
|
max_iterations |
|
rel_convergence |
|
con_step_size |
|
con_random_starts |
|
con_max_iterations |
|
con_rel_convergence |
|
fast |
|
trace |
|
con_trace |
|
Function returns a list
with the estimated parameter sets for
every random start. Every parameter set contains the following components:
log_likelihood |
Log likelihood of the estimated solution. |
aem |
Estimated Assignment Error Matrix (aem). The rows represent the true categories while the columns stand for the assigned categories. The cells describe the probability that a coding unit of category i is assigned to category j. |
categorial_sizes |
|
convergence |
If the algorithm converged within the iteration limit
|
iteration |
Number of iterations when the algorithm was terminated. |
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function written in C++
estimating the log likelihood of a given
parameter set during the condition stage.
est_con_multinominal_c( observations, anchor, max_iter = 500000L, step_size = 1e-04, cr_rel_change = 1e-12, n_random_starts = 10L, fast = TRUE, trace = FALSE )
est_con_multinominal_c( observations, anchor, max_iter = 500000L, step_size = 1e-04, cr_rel_change = 1e-12, n_random_starts = 10L, fast = TRUE, trace = FALSE )
observations |
|
anchor |
|
max_iter |
|
step_size |
|
cr_rel_change |
|
n_random_starts |
|
fast |
|
trace |
|
Returns the log likelihood as a single numeric value.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function for estimating the expected category of coding units.
est_expected_categories(data, aem)
est_expected_categories(data, aem)
data |
|
aem |
Assignment Error Matrix based on the second generation of the Iota Concept (Iota2). |
Returns a matrix
with the original data, the conditioned
probability of each true category, and the expected category for every coding unit.
Florian Berding and Julia Pargmann (2022).Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin:Logos. https://doi.org/10.30819/5581
Function written in C++
estimating the log likelihood of a given
parameter set.
fct_log_likelihood_c( categorial_sizes, aem, obs_pattern_shape, obs_pattern_frq, categorical_levels )
fct_log_likelihood_c( categorial_sizes, aem, obs_pattern_shape, obs_pattern_frq, categorical_levels )
categorial_sizes |
|
aem |
|
obs_pattern_shape |
|
obs_pattern_frq |
|
categorical_levels |
|
Returns the log likelihood as a single numeric value.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function estimating the consequences of reliability for subsequent analysis.
get_consequences( measure_typ = "dynamic_iota_index", measure_1_val, measure_2_val = NULL, level = 0.95, strength = NULL, data_type, sample_size )
get_consequences( measure_typ = "dynamic_iota_index", measure_1_val, measure_2_val = NULL, level = 0.95, strength = NULL, data_type, sample_size )
measure_typ |
Type of measure used for estimation. Set "iota_index" for the original Iota Index, "static_iota_index" for the static transformation of the Iota Index with d=4 or "dynamic_iota_index" for the dynamic transformation of the Iota Index with d=2. |
measure_1_val |
Reliability value for the independent variable. |
measure_2_val |
Reliability value for the dependent variable. If not set, the function uses the same value as for the independent variable. |
level |
Level of certainty for calculating the prediction intervals. |
strength |
True strength of the relationship between the independent and dependent variable. Possible values are "no", "weak", "medium" and "strong". If no value is supplied, a strong relationship is assumed for deviation and a weak relationship for all others. They represent the most demanding situations for the reliability. |
data_type |
Type of data. Possible values are "nominal" or "ordinal". |
sample_size |
Size of the sample in the study. |
Returns a data.frame
which contains the prediction intervals
for the deviation between true and estimated sample association/correlation,
risk of Type I errors and chance to correctly classify the effect size.
Additionally, the probability is estimated so that the statistics of the sample
deviate from an error free sample with no or only a weak effect .
The classification of effect sizes uses the work of Cohen (1988), who differentiates effect sizes by their relevance for practice.
For nominal data, all statistics refer to Cramer's V. For ordinal data, all statistics refer to Kendall's Tau.
The models for calculating the consequences are taken from Berding and Pargmann (2022).
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd Ed.). Taylor & Francis.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin:Logos. https://doi.org/10.30819/5581
Function for calculating the elements of the Iota Concept 2
get_iota2_measures(aem, categorical_sizes, categorical_levels)
get_iota2_measures(aem, categorical_sizes, categorical_levels)
aem |
Assignment Error Matrix. |
categorical_sizes |
Probabilities for the different categories to occur. |
categorical_levels |
|
Returns a list
of all measures belonging to the Iota Concept
of the second generation.
The first component estimates_categorical_level
comprises all
elements that describe the ratings on a categorical level. The elements are
sub-divided into raw estimates and chance-corrected estimates.
raw_estimates
iota:
A vector containing the Iota values for each category.
iota_error_1:
A vector containing the Iota Error Type I values for each category.
iota_error_2:
A vector containing the Iota Error Type II values for each category.
alpha_reliability:
A vector containing the Alpha Reliabilities for each category. These values represent probabilities.
beta_reliability:
A vector containing the Beta Reliabilities for each category. These values represent probabilities.
assignment_error_matrix:
Assignment Error Matrix containing the conditional probabilities for assigning a unit of category i to categories 1 to n.
elements_chance_corrected
alpha_reliability:
A vector containing the chance-corrected Alpha Reliabilities for each category.
beta_reliability:
A vector containing the chance-corrected Beta Reliabilities for each category.
The second component estimates_scale_level
contains elements for
describing the quality of the ratings on a scale level. It comprises the
following elements:
iota_index:
The Iota Index, representing the reliability on a scale level.
iota_index_d4:
The Static Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
iota_index_dyn2:
The Dynamic Iota Index, which is a transformation of the original Iota Index, in order to consider the uncertainty of estimation.
Florian Berding and Julia Pargmann (2022).Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Auxiliary function written in R
for providing the necessary information
about the patterns generated by raters. This function produces the
input for the EM-algorithm.
get_patterns(data, categorical_levels)
get_patterns(data, categorical_levels)
data |
|
categorical_levels |
|
Function returns a list
with the following components:
n |
Integer representing the number of different patterns in the data. |
shape |
|
frq |
|
count |
|
Function written in C++
for generating a set of randomly chosen
probabilities describing the size of the different classes. The
probabilities describe the relative frequencies of the categories in the data.
get_random_start_values_class_sizes(n_categories)
get_random_start_values_class_sizes(n_categories)
n_categories |
Integer for the number of categories in the data. Must be at least 2. |
Returns a vector of randomly chosen categorical sizes.
Function written in C++
for generating a set of randomly chosen
probabilities for the Assignment Error Matrix.
get_random_start_values_p(n_categories)
get_random_start_values_p(n_categories)
n_categories |
Integer for the number of categories in the data. Must be at least 2. |
Returns a matrix for Assignment Error Matrix (AEM) with randomly generated probabilities. The generated probabilities are in line with the assumption of weak superiority.
Function for creating a short summary of the estimated Iota components.
get_summary(object)
get_summary(object)
object |
An object of class |
Prints central statistics of the estimated model.
Function written in C++
estimating the gradient of the log likelihood
function for a given parameter set and given observations.
grad_ll(param_values, observations)
grad_ll(param_values, observations)
param_values |
|
observations |
|
Returns the gradient as a NumericVector
.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
A vector containing the ratings of a new rater. The data is not real and is only created for illustration purposes.
iotarelr_new_rater
iotarelr_new_rater
A vector with the length of 318.
A data set containing the ratings of three coders for written exams. It also contains the gender of the people who took the exam. The data is not real and is only created for illustrating purposes.
iotarelr_written_exams
iotarelr_written_exams
A data frame with 318 rows and 4 variables:
Ratings of coder A.
Ratings of coder B.
Ratings of coder C.
Referring to the biological aspects of an individual.
Function written in C++
estimating the log likelihood of a given
parameter set during the condition stage.
log_likelihood_multi_c(probabilities, observations)
log_likelihood_multi_c(probabilities, observations)
probabilities |
|
observations |
|
Returns the log likelihood as a single numeric value.
Berding, Florian, and Pargmann, Julia (2022).Iota Reliability Concept of the Second Generation.Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function for creating a plot object that can be plotted via ggplot2.
plot_iota( object, xlab = "Amount on all cases", ylab = "Categories", liota = "Assignment of the true category (Iota)", lcase2 = "Assignment to the false category", lcase3 = "Assignment from the false true category", lscale_quality = "Scale Quality", lscale_cat = c("insufficent", "minimum", "satisfactory", "good", "excellent"), number_size = 6, key_size = 0.5, text_size = 10, legend_position = "bottom", legend_direction = "vertical", scale = "none" )
plot_iota( object, xlab = "Amount on all cases", ylab = "Categories", liota = "Assignment of the true category (Iota)", lcase2 = "Assignment to the false category", lcase3 = "Assignment from the false true category", lscale_quality = "Scale Quality", lscale_cat = c("insufficent", "minimum", "satisfactory", "good", "excellent"), number_size = 6, key_size = 0.5, text_size = 10, legend_position = "bottom", legend_direction = "vertical", scale = "none" )
object |
Estimates of Iota 2 created with |
xlab |
|
ylab |
|
liota |
|
lcase2 |
|
lcase3 |
|
lscale_quality |
|
lscale_cat |
Vector of strings with length 5. This vector contains the labels for each category of quality for the scale. |
number_size |
|
key_size |
|
text_size |
|
legend_position |
|
legend_direction |
|
scale |
|
Function returns an object of class gg, ggplot
illustrating how
the data of the different categories influence each other.
An example for interpreting the plot can be found in the vignette
Get started or via
vignette("iotarelr", package = "iotarelr")
.
Florian Berding and Julia Pargmann (2022).Iota Reliability Concept of the Second Generation. Measures for Content Analysis Done by Humans or Artificial Intelligences. Berlin: Logos. https://doi.org/10.30819/5581
Function for creating an alluvial plot that can be plotted via ggplot2.
plot_iota2_alluvial( object, label_titel = "Coding Stream from True to Assigned Categories", label_prefix_true = "true", label_prefix_assigned = "labeled as", label_legend_title = "True Categories", label_true_category = "True Category", label_assigned_category = "Assigned Category", label_y_axis = "Relative Frequencies", label_categories_size = 3, key_size = 0.5, text_size = 10, legend_position = "right", legend_direction = "vertical" )
plot_iota2_alluvial( object, label_titel = "Coding Stream from True to Assigned Categories", label_prefix_true = "true", label_prefix_assigned = "labeled as", label_legend_title = "True Categories", label_true_category = "True Category", label_assigned_category = "Assigned Category", label_y_axis = "Relative Frequencies", label_categories_size = 3, key_size = 0.5, text_size = 10, legend_position = "right", legend_direction = "vertical" )
object |
Estimates of Iota 2 created with |
label_titel |
|
label_prefix_true |
|
label_prefix_assigned |
|
label_legend_title |
|
label_true_category |
|
label_assigned_category |
|
label_y_axis |
|
label_categories_size |
|
key_size |
|
text_size |
|
legend_position |
|
legend_direction |
|
Returns an object of class gg
and ggplot
which can be
shown with plot()
.
An example for interpreting the plot can be found in the vignette
Get started or via
vignette("iotarelr", package = "iotarelr")
.