proxyC - Computes Proximity in Large Sparse Matrices
Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity, Dice coefficient and Euclidean distance is particularly fast.
Last updated
data-sciencedistance-measuressimilarity-measuresopenblasonetbbcpp
9.08 score 29 stars 35 dependents 41 scripts 9.5k downloadsseededlda - Seeded Sequential LDA for Topic Modeling
Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.
Last updated
semi-supervised-learningtext-classificationonetbbcpp
6.88 score 79 stars 1 dependents 212 scripts 725 downloadsLSX - Semi-Supervised Algorithm for Document Scaling
A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.
Last updated
lsaquantedasentiment-analysistext-analysis
6.41 score 57 stars 25 scripts 808 downloadsnewsmap - Semi-Supervised Model for Geographical Document Classification
Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
Last updated
machine-learningnews-storiesquantedatext-analysis
6.01 score 66 stars 13 scripts 409 downloadswordvector - Word and Document Vector Models
Create dense vector representation of words and documents using 'quanteda'. Implements Word2vec (Mikolov et al., 2013) <doi:10.48550/arXiv.1310.4546>, Doc2vec (Le & Mikolov, 2014) <doi:10.48550/arXiv.1405.4053> and Latent Semantic Analysis (Deerwester et al., 1990) <doi:10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9>.
Last updated
cpp
5.52 score 15 stars 37 scripts 557 downloads