Make Adjacency Matrix

JudiLing.make_full_adjacency_matrixFunction
make_adjacency_matrix(i2f)

Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.

Obligatory Arguments

  • i2f::Dict: the dictionary returning features given indices

Optional Arguments

  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)

# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
    i2f,
    tokenized=true,
    sep_token="-")
source
JudiLing.make_full_adjacency_matrixMethod
make_adjacency_matrix(i2f)

Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.

Obligatory Arguments

  • i2f::Dict: the dictionary returning features given indices

Optional Arguments

  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
JudiLing.make_adjacency_matrix(i2f)

# with tokenization
i2f = Dict([(1, "#-a-b"), (2, "a-b-c"), (3, "b-c-#"), (4, "#-b-c"), (5, "a-b-#")])
JudiLing.make_adjacency_matrix(
    i2f,
    tokenized=true,
    sep_token="-")
source
JudiLing.make_combined_adjacency_matrixMethod
make_combined_adjacency_matrix(data_train, data_val)

Make combined adjacency matrix.

Obligatory Arguments

  • data_train::DataFrame: training dataset
  • data_val::DataFrame: validation dataset

Optional Arguments

  • grams=3: the number of grams for cues
  • target_col=:Words: the column name for target strings
  • tokenized=false:if true, the dataset target is assumed to be tokenized
  • sep_token=nothing: separator
  • keep_sep=false: if true, keep separators in cues
  • start_end_token="#": start and end token in boundary cues
  • verbose=false: if true, more information is printed

Examples

JudiLing.make_combined_adjacency_matrix(
    latin_train,
    latin_val,
    grams=3,
    target_col=:Word,
    tokenized=false,
    keep_sep=false
    )
source