proxyC - Computes Proximity in Large Sparse Matrices
Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity and Euclidean distance is particularly fast.
Last updated 10 days ago
data-sciencedistance-measuressimilarity-measures
8.96 score 29 stars 29 packages 23 scripts 5.3k downloadsseededlda - Seeded Sequential LDA for Topic Modeling
Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.
Last updated 2 months ago
semi-supervised-learningtext-classification
7.26 score 73 stars 1 packages 152 scripts 1.1k downloadsLSX - Semi-Supervised Algorithm for Document Scaling
A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.
Last updated 3 months ago
lsaquantedasentiment-analysistext-analysis
6.35 score 55 stars 17 scripts 442 downloadsnewsmap - Semi-Supervised Model for Geographical Document Classification
Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Last updated 5 months ago
machine-learningnews-storiesquantedatext-analysis
6.25 score 59 stars 8 scripts 526 downloadswordmap - Feature Extraction and Document Classification with Noisy Labels
Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.
Last updated 5 days ago
5.03 score 2 stars 1 scripts 225 downloads