Utils

JudiLing.is_truly_sparseFunction

Check whether a matrix is truly sparse regardless its format, where M is originally a sparse matrix format.

source

Check whether a matrix is truly sparse regardless its format, where M is originally a dense matrix format.

source
JudiLing.cal_max_timestepFunction
function cal_max_timestep(
    data_train::DataFrame,
    data_val::DataFrame,
    target_col::Union{String, Symbol};
    tokenized::Bool = false,
    sep_token::Union{Nothing, String, Char} = "",
)

Calculate the max timestep given training and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin_train, latin_val, target_col=:Word)
source
function cal_max_timestep(
    data::DataFrame,
    target_col::Union{String, Symbol};
    tokenized::Bool = false,
    sep_token::Union{Nothing, String, Char} = "",
)

Calculate the max timestep given training dataset.

Obligatory Arguments

  • data::DataFrame: the dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin, target_col=:Word)
source