<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.jsmith.info" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title></title>
 <link>http://www.jsmith.info/taxonomy/term/1</link>
 <description>Taxonomy view based on term id.</description>
 <language>en</language>
<item>
 <title>Meeting about Prepostition Error Correction and Syntax-matching Analysis</title>
 <link>http://www.jsmith.info/node/48</link>
 <description>&lt;p&gt;Thoughts before meeting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt; Do I need to determine semantic categories (Boston vs MIT)?
&lt;li&gt; If so, I need to expand matching to give some sort of score based on the closeness of the match (perhaps using previous WordNet code)
&lt;li&gt; How does syntax-based matching help? Isn&#039;t immediate context most relevant?
&lt;li&gt; Google n-grams approach is different to any currently published work
&lt;/ul&gt;
&lt;p&gt;Discussed in meeting:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt; Don&#039;t bother with sem. cats. yet, just assume exact matches.&lt;br /&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/48&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/48#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Thu, 13 Nov 2008 12:00:00 +0000</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">48 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Potential Progress</title>
 <link>http://www.jsmith.info/node/47</link>
 <description>&lt;p&gt;Just met with Steve. Still need to sort out bug in Moses which is preventing insertions, and will do some analysis on current system with large chunks to see if it&#039;s working for those, and if not why not. Also discussed potential third chapter - using what I&#039;ve got to do some paraphrasing, extending C C-B&#039;s work to find non-contiguous paraphrases. Will do reading this week to see if there&#039;s a good opportunity here.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/47#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Fri, 17 Oct 2008 14:46:09 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">47 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Lost Phrases</title>
 <link>http://www.jsmith.info/node/46</link>
 <description>&lt;p&gt;I started this morning hoping to sort out the issue whereby spans were not being split up correctly. First, I realised that this website isn&#039;t backed up at all, so I set up a script to sort that out. Then I got back to work. The issue is that matches are currently split into contiguous spans based only on whether the input words they match are contiguous. But what if the input words are contiguous, but the matched words in the example are not? We need to split this into multiple spans. Further, what if the words are contiguous in both the input and the example, but the ordering is changed?&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/46&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/46#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Tue, 09 Sep 2008 20:04:21 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">46 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Dependency Matching</title>
 <link>http://www.jsmith.info/node/45</link>
 <description>&lt;p&gt;There was a problem with the dependency matching algorithm which resulted in some bad matches ranking more highly than they should. The algorithm allowed a single dependency in the input to match several dependencies in an example, provided they all had the same lemmas and relationships. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/45&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/45#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Mon, 08 Sep 2008 16:01:38 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">45 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Switch back to Moses alignments</title>
 <link>http://www.jsmith.info/node/44</link>
 <description>&lt;p&gt;Having abandoned the idea of using the GIZA++ alignment scores, I can now switch back to the more reliable Moses alignments (a smart combination of both GIZA++ alignment directions). Having done so, the BLEU score jumped about 3 points:&lt;br /&gt;
&lt;code&gt;&lt;br /&gt;
r157-20080731053726: BLEU = 20.05, 52.9/27.1/16.8/11.0 (BP=0.885, ratio=0.891, hyp_len=4809, ref_len=5398)&lt;br /&gt;
r169-20080805193448: BLEU = 23.21, 52.3/28.2/18.3/12.4 (BP=0.966, ratio=0.967, hyp_len=5219, ref_len=5398)&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/44&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/44#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:43:57 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">44 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Repeated words in output</title>
 <link>http://www.jsmith.info/node/43</link>
 <description>&lt;p&gt;I thought I&#039;d fixed the problem of target words appearing more than once in the output, but I hadn&#039;t done enough. While I had prevented a word from appearing twice in one phrase, it could still get into the output twice by being aligned to words in two separate but &lt;code&gt;&amp;lt;linked&amp;gt;&lt;/code&gt; spans. Fixing this required a bit of restructuring, as generation of target phrases needed to be more aware of what was going on elsewhere.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.jsmith.info/node/43&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.jsmith.info/node/43#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:34:43 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">43 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Script updates</title>
 <link>http://www.jsmith.info/node/42</link>
 <description>&lt;p&gt;Updated &lt;code&gt;do_everything.sh&lt;/code&gt;:&lt;br /&gt;
  - includes support for running EBMT only or SMT only&lt;br /&gt;
  - can use an existing results dir as a base when running SMT only&lt;br /&gt;
  - cleaned up params file&lt;/p&gt;
&lt;p&gt;Updated &lt;code&gt;make_alignment_images.pl&lt;/code&gt;:&lt;br /&gt;
  - includes support for Moses alignment files&lt;br /&gt;
  - includes basic range support (&lt;code&gt;-from&lt;/code&gt; and &lt;code&gt;-to&lt;/code&gt;)&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/42#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Wed, 06 Aug 2008 13:26:26 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">42 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Debug Threads</title>
 <link>http://www.jsmith.info/node/37</link>
 <description>&lt;p&gt;I modified the EBMT system so that when in debug mode it doesn&#039;t hang around waiting for diagrams to be created. Before the change, CPU utilisation was down around 40% for the java process, because it would wait for deps2dot.pl and dot to run before resuming. I&#039;ve changed it so that diagrams are created in their own thread, which are spawned and then forgotten about. Now the java process is consistently running at 100%, with about six or seven other threads creating images simultaneously, using up roughly another 100% CPU.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/37#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Thu, 31 Jul 2008 05:50:08 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">37 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Per sentence BLEU scores</title>
 <link>http://www.jsmith.info/node/36</link>
 <description>&lt;p&gt;New script, &lt;code&gt;sentence-bleu-scores.pl&lt;/code&gt;, generates an HTML page which displays per sentence BLEU scores for several systems, and their difference from the base system score (i.e. Moses). Can be sorted on any column using JavaScript. SHould help identify problem sentences.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/36#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Thu, 31 Jul 2008 00:03:55 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">36 at http://www.jsmith.info</guid>
</item>
<item>
 <title>Removed GIZA scores</title>
 <link>http://www.jsmith.info/node/35</link>
 <description>&lt;p&gt;Tried running the system with the GIZA++ alignment &#039;probabilities&#039; removed. BLEU score barely changed. Changing XML mode from exclusive to inclusive gave a big performance jump though. Looks like I need to play around with the probability parameter some more.&lt;/p&gt;
</description>
 <comments>http://www.jsmith.info/node/35#comments</comments>
 <category domain="http://www.jsmith.info/taxonomy/term/1">work</category>
 <pubDate>Fri, 18 Jul 2008 20:42:34 +0100</pubDate>
 <dc:creator>james</dc:creator>
 <guid isPermaLink="false">35 at http://www.jsmith.info</guid>
</item>
</channel>
</rss>
