Package 'newsmap' reference manual

Title:	Semi-Supervised Model for Geographical Document Classification
Description:	Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Authors:	Kohei Watanabe [aut, cre, cph], Stefan Müller [aut], Dani Madrid-Morales [aut], Katerina Tertytchnaya [aut], Ke Cheng [aut], Chung-hong Chan [aut], Claude Grasland [aut], Giuseppe Carteny [aut], Elad Segev [aut], Dai Yamao [aut], Barbara Ellynes Zucchi Nobre Silva [aut], Lanabi la Lova [aut], Lungta Seki [aut]
Maintainer:	Kohei Watanabe <[email protected]>
License:	MIT + file LICENSE
Version:	0.9.1
Built:	2025-03-08 04:16:04 UTC
Source:	https://github.com/koheiw/newsmap

Evaluate classification accuracy in precision and recall

Description

Evaluate classification accuracy in precision and recall

Usage

accuracy(x, y)
accuracy(x, y)

Arguments

`x`	vector of predicted classes
`y`	vector of true classes

Examples

class_pred <- c('US', 'GB', 'US', 'CN', 'JP', 'FR', 'CN') # prediction
class_true <- c('US', 'FR', 'US', 'CN', 'KP', 'EG', 'US') # true class
acc <- accuracy(class_pred, class_true)
print(acc)
summary(acc)
class_pred <- c('US', 'GB', 'US', 'CN', 'JP', 'FR', 'CN') # prediction
class_true <- c('US', 'FR', 'US', 'CN', 'KP', 'EG', 'US') # true class
acc <- accuracy(class_pred, class_true)
print(acc)
summary(acc)

Compute average feature entropy (AFE)

Description

AFE computes randomness of occurrences features in labelled documents.

Usage

afe(x, y, smooth = 1)
afe(x, y, smooth = 1)

Arguments

`x`	a dfm for features
`y`	a dfm for labels
`smooth`	a numeric value for smoothing to include all the features

Extract coefficients for features

Description

Extract coefficients for features

Usage

## S3 method for class 'textmodel_newsmap'
coef(object, n = 10, select = NULL, ...)

## S3 method for class 'textmodel_newsmap'
coefficients(object, n = 10, select = NULL, ...)
## S3 method for class 'textmodel_newsmap'
coef(object, n = 10, select = NULL, ...)

## S3 method for class 'textmodel_newsmap'
coefficients(object, n = 10, select = NULL, ...)

Arguments

`object`	a Newsmap model fitted by `textmodel_newsmap()`.
`n`	the number of coefficients to extract.
`select`	returns the coefficients for the selected class; specify by the names of rows in `object$model`.
`...`	not used.

Seed geographical dictionary in Arabic

Description

Seed geographical dictionary in Arabic

Author(s)

Dai Yamao [email protected]

Seed geographical dictionary in German

Description

Seed geographical dictionary in German

Author(s)

Stefan Müller [email protected]

Seed geographical dictionary in English

Description

Seed geographical dictionary in English

Author(s)

Kohei Watanabe [email protected]

Seed geographical dictionary in Spanish

Description

Seed geographical dictionary in Spanish

Author(s)

Dani Madrid-Morales [email protected]

Seed geographical dictionary in French

Description

Seed geographical dictionary in French

Author(s)

Claude Grasland [email protected]

Seed geographical dictionary in Hebrew

Description

Seed geographical dictionary in Hebrew

Author(s)

Elad Segev [email protected]

Seed geographical dictionary in Italian

Description

Seed geographical dictionary in Italian

Author(s)

Giuseppe Carteny [email protected]

Seed geographical dictionary in Japanese

Description

Seed geographical dictionary in Japanese

Author(s)

Kohei Watanabe [email protected]

Seed geographical dictionary in Portuguese

Description

Seed geographical dictionary in Portuguese

Author(s)

Barbara Ellynes Zucchi Nobre Silva [email protected]

Seed geographical dictionary in Russian

Description

Seed geographical dictionary in Russian

Author(s)

Katerina Tertytchnaya [email protected]

Lanabi la Lova [email protected]

Seed geographical dictionary in Turkish

Description

Seed geographical dictionary in Turkish

Author(s)

Lungta Seki [email protected]

Seed geographical dictionary in Chinese (simplified)

Description

Seed geographical dictionary in Chinese (simplified)

Author(s)

Ke Cheng [email protected]

Seed geographical dictionary in Chinese (traditional)

Description

Seed geographical dictionary in Chinese (traditional)

Author(s)

Chung-hong Chan [email protected]

Prediction method for textmodel_newsmap

Description

Predict document class using trained a Newsmap model

Usage

## S3 method for class 'textmodel_newsmap'
predict(
  object,
  newdata = NULL,
  confidence = FALSE,
  rank = 1L,
  type = c("top", "all"),
  rescale = FALSE,
  min_conf = -Inf,
  min_n = 0L,
  ...
)
## S3 method for class 'textmodel_newsmap'
predict(
  object,
  newdata = NULL,
  confidence = FALSE,
  rank = 1L,
  type = c("top", "all"),
  rescale = FALSE,
  min_conf = -Inf,
  min_n = 0L,
  ...
)

Arguments

`object`	a fitted Newsmap textmodel.
`newdata`	dfm on which prediction should be made.
`confidence`	if `TRUE`, it returns likelihood ratio score.
`rank`	rank of the class to be predicted. Only used when `type = "top"`.
`type`	if `top`, returns the most likely class specified by `rank`; otherwise return a matrix of likelihood ratio scores for all possible classes.
`rescale`	if `TRUE`, likelihood ratio scores are normalized using `scale()`. This affects both types of results.
`min_conf`	return `NA` when confidence is lower than this value.
`min_n`	set the minimum number of polarity words in documents.
`...`	not used.

Calculate micro and macro average measures of accuracy

Description

This function calculates micro-average precision (p) and recall (r) and macro-average precision (P) and recall (R) based on a confusion matrix from accuracy().

Usage

## S3 method for class 'textmodel_newsmap_accuracy'
summary(object, ...)
## S3 method for class 'textmodel_newsmap_accuracy'
summary(object, ...)

Arguments

`object`	output of accuracy()
`...`	not used.

Semi-supervised Bayesian multinomial model for geographical document classification

Description

Train a Newsmap model to predict geographical focus of documents with labels given by a dictionary.

Usage

textmodel_newsmap(
  x,
  y,
  label = c("all", "max"),
  smooth = 1,
  boolean = FALSE,
  drop_label = TRUE,
  verbose = quanteda_options("verbose"),
  entropy = c("none", "global", "local", "average"),
  ...
)
textmodel_newsmap(
  x,
  y,
  label = c("all", "max"),
  smooth = 1,
  boolean = FALSE,
  drop_label = TRUE,
  verbose = quanteda_options("verbose"),
  entropy = c("none", "global", "local", "average"),
  ...
)

Arguments

`x`	a dfm or fcm created by `quanteda::dfm()`
`y`	a dfm or a sparse matrix that record class membership of the documents. It can be created applying `quanteda::dfm_lookup()` to `x`.
`label`	if "max", uses only labels for the maximum value in each row of `y`.
`smooth`	a value added to the frequency of words to smooth likelihood ratios.
`boolean`	if `TRUE`, only consider presence or absence of features in each document to limit the impact of words repeated in few documents.
`drop_label`	if `TRUE`, drops empty columns of `y` and ignore their labels.
`verbose`	if `TRUE`, shows progress of training.
`entropy`	[experimental] the scheme to compute the entropy to regularize likelihood ratios. The entropy of features are computed over labels if `global` or over documents with the same labels if `local`. Local entropy is averaged if `average`. See the details.
`...`	additional arguments passed to internal functions.

Details

Newsmap learns association between words and classes as likelihood ratios based on the features in x and the labels in y. The large likelihood ratios tend to concentrate to a small number of features but the entropy of their frequencies over labels or documents helps to disperse the distribution.

References

Kohei Watanabe. 2018. "Newsmap: semi-supervised approach to geographical news classification." Digital Journalism 6(3): 294-309.

Examples

require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
             text2 = "The South Korean prime minister was re-elected.")

toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)

feat_dfm_en <- dfm(toks_en, tolower = FALSE)

model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)

require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
             text2 = "The South Korean prime minister was re-elected.")

toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)

feat_dfm_en <- dfm(toks_en, tolower = FALSE)

model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)

Package 'newsmap'

Help Index

Evaluate classification accuracy in precision and recall

Description

Usage

Arguments

Examples

Compute average feature entropy (AFE)

Description

Usage

Arguments

Extract coefficients for features

Description

Usage

Arguments

Seed geographical dictionary in Arabic

Description

Author(s)

Seed geographical dictionary in German

Description

Author(s)

Seed geographical dictionary in English

Description

Author(s)

Seed geographical dictionary in Spanish

Description

Author(s)

Seed geographical dictionary in French

Description

Author(s)

Seed geographical dictionary in Hebrew

Description

Author(s)

Seed geographical dictionary in Italian

Description

Author(s)

Seed geographical dictionary in Japanese

Description

Author(s)

Seed geographical dictionary in Portuguese

Description

Author(s)

Seed geographical dictionary in Russian

Description

Author(s)

Seed geographical dictionary in Turkish

Description

Author(s)

Seed geographical dictionary in Chinese (simplified)

Description

Author(s)

Seed geographical dictionary in Chinese (traditional)

Description

Author(s)

Prediction method for textmodel_newsmap

Description

Usage

Arguments

Calculate micro and macro average measures of accuracy

Description

Usage

Arguments

Semi-supervised Bayesian multinomial model for geographical document classification

Description

Usage

Arguments

Details

References

Examples