Skip to the content.

Bahasa Indonesia Open Source NLP Resource

moved from here

A few might know open sourced resources for Bahasa Indonesia NLP, since they are scattered everywhere on github. Here are a few that I know, hope it helps other people for getting started their NLP projects:

Negative and Positive Unigrams

  1. https://github.com/masdevid/ID-OpinionWords/

Stopword List

  1. https://github.com/pebbie/pebahasa/blob/master/indonesian
  2. https://github.com/aliakbars/bilp/blob/master/stoplist

POS-Taggers

  1. https://github.com/pebbie/pebahasa (python)
  2. https://github.com/andryluthfi/indonesian-postag (java)

MWE (Multi Word Expression) Lists

  1. https://github.com/andryluthfi/indonesian-postag (see the resources folder)

Twitter Sample Corpus

  1. https://github.com/aliakbars/bilp/tree/master/sample (on the csv files)
Written on October 8, 2016