proxyC - Computes Proximity in Large Sparse Matrices
Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast.
Last updated 3 months ago
data-sciencedistance-measuressimilarity-measuresopenblasonetbbcpp
8.70 score 29 stars 31 dependents 23 scripts 3.2k downloadsseededlda - Seeded Sequential LDA for Topic Modeling
Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.
Last updated 11 days ago
semi-supervised-learningtext-classificationonetbbcpp
7.36 score 74 stars 1 dependents 173 scripts 904 downloadsnewsmap - Semi-Supervised Model for Geographical Document Classification
Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Last updated 8 months ago
machine-learningnews-storiesquantedatext-analysis
6.26 score 61 stars 8 scripts 453 downloadsLSX - Semi-Supervised Algorithm for Document Scaling
A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.
Last updated 24 days ago
lsaquantedasentiment-analysistext-analysis
6.23 score 55 stars 14 scripts 455 downloadswordmap - Feature Extraction and Document Classification with Noisy Labels
Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.
Last updated 3 days ago
4.98 score 2 stars 1 scripts 342 downloadswordvector - Word and Document Vector Models
Create dense vector representation of words and documents using 'quanteda'. Currently implements Word2vec (Mikolov et al., 2013) <doi:10.48550/arXiv.1310.4546> and Latent Semantic Analysis (Deerwester et al., 1990) <doi:10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9>.
Last updated 24 days ago
cpp
4.26 score 2 stars 13 scripts 284 downloads