As a part of the consortium project 'Development of Sanskrit computational tools and Sanskrit-Hindi Machine Translation system (2008-2012)', funded by DeiTy, Government of India, under the TDIL programme, manually tagged data was developed.
The data was tagged following these guidelines.
Following data is available for research.
  1. POS tagged Corpus
  2. Dependency Analysis of Corpus
  3. samAsa Annotated Data in WX notation
  4. Frequency of Compound components in WX notation
  5. Frequency of sandhi rules within compound components in WX notation
  6. Frequency of words in WX notation
  7. External sandhi rules with their frequencies in WX notation
  8. Sandhi split parallel data extracted from annotated texts