R packages by koheiw

proxyC - Computes Proximity in Large Sparse Matrices

Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast.

Last updated 3 months ago

data-sciencedistance-measuressimilarity-measuresopenblasonetbbcpp

8.70 score 29 stars 31 dependents 23 scripts 3.2k downloads

seededlda - Seeded Sequential LDA for Topic Modeling

Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.

Last updated 11 days ago

semi-supervised-learningtext-classificationonetbbcpp

7.36 score 74 stars 1 dependents 173 scripts 904 downloads

newsmap - Semi-Supervised Model for Geographical Document Classification

Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).

Last updated 8 months ago

machine-learningnews-storiesquantedatext-analysis

6.26 score 61 stars 8 scripts 453 downloads

LSX - Semi-Supervised Algorithm for Document Scaling

A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.

Last updated 24 days ago

lsaquantedasentiment-analysistext-analysis

6.23 score 55 stars 14 scripts 455 downloads

wordmap - Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Last updated 3 days ago

4.98 score 2 stars 1 scripts 342 downloads

wordvector - Word and Document Vector Models

Create dense vector representation of words and documents using 'quanteda'. Currently implements Word2vec (Mikolov et al., 2013) <doi:10.48550/arXiv.1310.4546> and Latent Semantic Analysis (Deerwester et al., 1990) <doi:10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9>.

Last updated 24 days ago

cpp

4.26 score 2 stars 13 scripts 284 downloads