Free/open-source machine translation software
Here's a non-exhaustive list of links to existing free/open-source machine
translation systems, which I will try to complete as I find about them. To the
best of my knowledge, software listed here has:
- Apertium, a free/open-source
rule-based machine translation platform.
- Matxin, a free/open-source rule-based
machine translation system for Basque.
- OpenLogos, a free/open-source
version of the historical Logos machine translation system.
- Anusaaraka, English-Hindi machine translation system.
Statistical machine translation systems
- Moses, a statistical machine
- Marie, an
n-gram-based statistical machine translation decoder.
- Joshua, an open
source decoder for statistical translation models based on synchronous
context free grammars
- Phramer, an
open-source statistical phrase-based machine translation decoder
- GREAT, a decoder based on stochastic finite-state transducers, which includes a training toolkit.
- The Thot toolkit includes a decoder as of 2014.
- Travatar is a tree-to-string statistical machine translation system.
- CDEC is a decoder,
aligner, and model optimizer for statistical machine translation and other
structured prediction models based on (mostly) context-free formalisms
written by Chris Dyer at the Language Technologies Institute in Carnegie
Training translation models
- Giza++ is a tool to train
translation models for statistical machine translation (see also the
related mkcls tool to train
- Thot includes a toolkit to
train phrase-based models for statistical machine translation.
free/open-source language modelling tool to be used with Moses instead of
SRILM, which is not free.
space-efficient ngram-based language models built using randomized
representations (Bloom Filters etc).
- Kenneth Heafield's software for the fast
filtering of ARPA format language models to multiple vocabularies.
- Holger Schwenk's Continuous Space Language Model toolkit (CSLM) works by projecting the word indices onto a continuous space and using a probability estimator operating on this space.
- Kenneth Heafield's scripts that make it
easy to score machine translation output using NIST's BLEU and NIST, TER,
- RIA is a
tool for automatic induction of transfer rules for Transfer-Based
Statistical Machine Translation using dependency structures.
- Chaski: Distributed
phrase-based machine translation training tool based on Hadoop.
- Grammatical Framework, a free/open-source programming language used to create grammars for multilingual applications.
Example-based machine translation systems
Multi-engine machine translation / system combination
Aligners and translation models
- Giza++: training of
statistical translation models.
- Anymalign, a
multilingual sub-sentential aligner.
- Ventsislav Zhechev's Sub-tree
aligner which can be used for the automatic generation of parallel
Web services around machine translation
- Tradubi is an
open-source Ajax-based web application for social translation built upon
Apertium (may be tested online).
Distributed machine translation
- ScaleMT (no release yet, browse
at the Apertium Subversion repository) is a free/open-source framework for
building scalable machine translation web services.
- Quest++, an
open source tool for translation quality estimation developed by the
group of Lucia Specia at the Univ. of Sheffield (note that the current
version still has one important non-free dependency: SRILM).
Other useful tools
... that may be used to build machine translation systems
- Freeling, a
free/open-source suite of language analyzers.
- Bitextor, an automatic
- Foma, a finite-state machine
toolkit and library
Helsinki Finite State Technology for natural-language morphologies.
- VISL CG-3, the constraint grammar parser at the Visual Interactive Syntax
Learning project of Syddansk Universitet: browse
Subversion repository, source snapshots.
Additions/corrections/updates to: Mikel L. Forcada,mlf...@dlsi.ua.es