detect_sdg_systems identifies SDGs in text using multiple SDG query systems.
character vector or object of class tCorpus containing text in which SDGs shall be detected.
character vector specifying the query systems to be used. Can be one or more of "Aurora", "Elsevier", "Auckland", "SIRIS" "SDSN", and "SDGO". By default all systems except "SDGO" and "SDSN" are used.
numeric vector with integers between 1 and 17 specifying the sdgs to identify in text. Defaults to 1:17.
character specifying the level of detail in the output. The default "features" returns a tibble with one row per matched query, include a variable containing the features of the query that were matched in the text. By contrast, "documents" returns an aggregated tibble with one row per matched sdg, without information on the features.
logical specifying whether messages on the function's progress should be printed.
The function returns a tibble containing the SDG hits found in the vector of documents. The columns of the tibble depend on the value of output. Possible columns are:
Index of the element in text where match was found. Formatted as a factor with the number of levels matching the original number of documents.
Label of the SDG found in document.
The name of the query system that produced the match.
Index of the query within the query system that produced the match.
Concatenated list of words that caused the query to match.
Index of hit for a given system.
Number of queries that produced a hit for a given system, sdg, and document.
detect_sdg_systems implements six SDG query systems. Four systems developed by the Aurora Universities Network (see aurora_queries), Elsevier (see elsevier_queries), Auckland University (see elsevier_queries), and SIRIS Academic (see siris_queries) rely on Lucene-style Boolean queries, whereas two systems, namely SDGO (see sdgo_queries) and SDSN (see sdsn_queries) rely on basic keyword matching. `detect_sdg_systems` calls dedicated detect_* for each of the five system. Search of the queries is implemented using the search_features function from the corpustools package.
By default, detect_sdg_systems runs only the Aurora, Elsevier, Auckland, and Siris query systems, as they are considerably less liberal than the SDSN and SDGO systems and therefore likely produce more valid SDG classifications. Users should be aware that systematic validations and comparison between the systems are largely lacking and that results should be interpreted with caution.
# \donttest{
# run sdg detection
hits <- detect_sdg_systems(projects)
#> Running Aurora
#> Running Elsevier
#> Running Auckland
#> Running SIRIS
# run sdg detection with Aurora only
hits <- detect_sdg_systems(projects, systems = "Aurora")
#> Running Aurora
# run sdg detection for sdg 3 only
hits <- detect_sdg_systems(projects, sdgs = 3)
#> Running Aurora
#> Running Elsevier
#> Running Auckland
#> Running SIRIS
# }