<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jsmith.info" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title></title>
 <link>http://www.jsmith.info/taxonomy/name/work</link>
 <description>Taxonomy view based on term name.</description>
 <language>en</language>
<item>
 <title>Lost Phrases</title>
 <link>http://www.jsmith.info/node/46</link>
 <description>&lt;p&gt;I started this morning hoping to sort out the issue whereby spans were not being split up correctly. First, I realised that this website isn&#039;t backed up at all, so I set up a script to sort that out. Then I got back to work. The issue is that matches are currently split into contiguous spans based only on whether the input words they match are contiguous. But what if the input words are contiguous, but the matched words in the example are not? We need to split this into multiple spans. Further, what if the words are contiguous in both the input and the example, but the ordering is changed?&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/46&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/46#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Tue, 09 Sep 2008 20:04:21 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">46 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Dependency Matching</title>
 <link>http://www.jsmith.info/node/45</link>
 <description>&lt;p&gt;There was a problem with the dependency matching algorithm which resulted in some bad matches ranking more highly than they should. The algorithm allowed a single dependency in the input to match several dependencies in an example, provided they all had the same lemmas and relationships. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/45&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/45#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Mon, 08 Sep 2008 16:01:38 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">45 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Switch back to Moses alignments</title>
 <link>http://www.jsmith.info/node/44</link>
 <description>&lt;p&gt;Having abandoned the idea of using the GIZA++ alignment scores, I can now switch back to the more reliable Moses alignments (a smart combination of both GIZA++ alignment directions). Having done so, the BLEU score jumped about 3 points:&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
r157-20080731053726: BLEU = 20.05, 52.9/27.1/16.8/11.0 (BP=0.885, ratio=0.891, hyp_len=4809, ref_len=5398)&lt;br /&gt;
r169-20080805193448: BLEU = 23.21, 52.3/28.2/18.3/12.4 (BP=0.966, ratio=0.967, hyp_len=5219, ref_len=5398)&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/44&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/44#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:43:57 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">44 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Repeated words in output</title>
 <link>http://www.jsmith.info/node/43</link>
 <description>&lt;p&gt;I thought I&#039;d fixed the problem of target words appearing more than once in the output, but I hadn&#039;t done enough. While I had prevented a word from appearing twice in one phrase, it could still get into the output twice by being aligned to words in two separate but &lt;code&gt;&amp;lt;linked&amp;gt;&lt;/code&gt; spans. Fixing this required a bit of restructuring, as generation of target phrases needed to be more aware of what was going on elsewhere.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/43&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/43#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:34:43 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">43 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Script updates</title>
 <link>http://www.jsmith.info/node/42</link>
 <description>&lt;p&gt;Updated &lt;code&gt;do_everything.sh&lt;/code&gt;:&lt;br /&gt;
  - includes support for running EBMT only or SMT only&lt;br /&gt;
  - can use an existing results dir as a base when running SMT only&lt;br /&gt;
  - cleaned up params file&lt;/p&gt;
&lt;p&gt;Updated &lt;code&gt;make_alignment_images.pl&lt;/code&gt;:&lt;br /&gt;
  - includes support for Moses alignment files&lt;br /&gt;
  - includes basic range support (&lt;code&gt;-from&lt;/code&gt; and &lt;code&gt;-to&lt;/code&gt;)&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/42#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:26:26 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">42 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Debug Threads</title>
 <link>http://www.jsmith.info/node/37</link>
 <description>&lt;p&gt;I modified the EBMT system so that when in debug mode it doesn&#039;t hang around waiting for diagrams to be created. Before the change, CPU utilisation was down around 40% for the java process, because it would wait for deps2dot.pl and dot to run before resuming. I&#039;ve changed it so that diagrams are created in their own thread, which are spawned and then forgotten about. Now the java process is consistently running at 100%, with about six or seven other threads creating images simultaneously, using up roughly another 100% CPU.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/37#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Thu, 31 Jul 2008 05:50:08 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">37 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Per sentence BLEU scores</title>
 <link>http://www.jsmith.info/node/36</link>
 <description>&lt;p&gt;New script, &lt;code&gt;sentence-bleu-scores.pl&lt;/code&gt;, generates an HTML page which displays per sentence BLEU scores for several systems, and their difference from the base system score (i.e. Moses). Can be sorted on any column using JavaScript. SHould help identify problem sentences.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/36#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Thu, 31 Jul 2008 00:03:55 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">36 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Removed GIZA scores</title>
 <link>http://www.jsmith.info/node/35</link>
 <description>&lt;p&gt;Tried running the system with the GIZA++ alignment &#039;probabilities&#039; removed. BLEU score barely changed. Changing XML mode from exclusive to inclusive gave a big performance jump though. Looks like I need to play around with the probability parameter some more.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/35#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Fri, 18 Jul 2008 20:42:34 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">35 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Do Everything</title>
 <link>http://www.jsmith.info/node/34</link>
 <description>&lt;p&gt;Finally created a script that pulls the whole compilation-ebmt-smt-bleu process together into one easy package. &lt;code&gt;do_everything.sh&lt;/code&gt; grabs the latest revision from subversion, compiles it, runs the EBMT stage, then feeds it through Moses, and finally scores the output using BLEU. All of the output is stored tidily in a new folder with details of the software revision and all script parameters.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/34#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Fri, 18 Jul 2008 20:39:19 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">34 at http://www.jsmith.info</guid>
</item>
<item>
 <title>GIZA++ Structure</title>
 <link>http://www.jsmith.info/node/33</link>
 <description>&lt;p&gt;Just a quick one to keep track of what I know about how GIZA++ works.&lt;/p&gt;
&lt;p&gt;Files:&lt;br /&gt;
&lt;code&gt;hmm.{h,cc}&lt;/code&gt;: defines and creates HMM network&lt;br /&gt;
&lt;code&gt;model1.{h,cc}&lt;/code&gt;: defines and initialises tTable (translation table?)&lt;br /&gt;
&lt;code&gt;TTables.{h,cc}&lt;/code&gt;: template used for tTable&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/33#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Mon, 07 Jul 2008 19:02:53 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">33 at http://www.jsmith.info</guid>
</item>
</channel>
</rss>
