• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
koheiw
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links tokoheiw

proxyC - Computes Proximity in Large Sparse Matrices

Computes proximity between rows or columns of large matrices efficiently in C++. Functions are optimised for large sparse matrices using the Armadillo and Intel TBB libraries. Among various built-in similarity/distance measures, computation of correlation, cosine similarity, Dice coefficient and Euclidean distance is particularly fast.

Last updated

data-sciencedistance-measuressimilarity-measuresopenblasonetbbcpp

9.08 score 29 stars 35 dependents 41 scripts 9.5k downloads

seededlda - Seeded Sequential LDA for Topic Modeling

Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.

Last updated

semi-supervised-learningtext-classificationonetbbcpp

6.88 score 79 stars 1 dependents 212 scripts 725 downloads

LSX - Semi-Supervised Algorithm for Document Scaling

A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.

Last updated

lsaquantedasentiment-analysistext-analysis

6.41 score 57 stars 25 scripts 808 downloads

newsmap - Semi-Supervised Model for Geographical Document Classification

Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).

Last updated

machine-learningnews-storiesquantedatext-analysis

6.01 score 66 stars 13 scripts 409 downloads

wordvector - Word and Document Vector Models

Create dense vector representation of words and documents using 'quanteda'. Implements Word2vec (Mikolov et al., 2013) <doi:10.48550/arXiv.1310.4546>, Doc2vec (Le & Mikolov, 2014) <doi:10.48550/arXiv.1405.4053> and Latent Semantic Analysis (Deerwester et al., 1990) <doi:10.1002/(SICI)1097-4571(199009)41:6%3C391::AID-ASI1%3E3.0.CO;2-9>.

Last updated

cpp

5.52 score 15 stars 37 scripts 557 downloads

wordmap - Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Last updated

4.68 score 4 stars 1 scripts 187 downloads